Here is an interesting anecdote. I had regionservers running on each of 8
node hadoop cluster. Yesterday morning, I ran a series of MR jobs where the
last MR job does a batched inserts into a production MySQL server. All
other MR jobs have 3 mappers and 3 reducers running on a node. The db job
* Each node is a 4 CPU machine with max of 3 mappers and 1 regionserver. No
reducer when importing data to hbase.
* Each region server is allocated 4G of memory. The full options are:
-Xmx4096m -server -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -server -XX:+H
So your java process paused for 250116ms, that's how long the process
wasn't responding (aka "stop-the-world" pause).
You should:
- Make sure HBase isn't CPU starved (how many MR tasks on those
machines? Left some room for HBase?)
- Make sure there's no swap. Also set swappiness to 0
- Give mo
J-D,
Below is the closest I found in the regionserver log. There was no 'slept'
in either the master or zookeeper logs.
2010-07-21 14:36:18,664 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x129f24e134a002c to sun.nio.ch.selectionkeyi...@356f144c
java.io.IOException: TIMED OUT
ZooKeeper is only a canary, telling the region server that it was
partionned from the cluster for longer than the default timeout
somehow, usually because of GC pauses. You should see lines like
"slept for x, long than y" messages before what you pasted.
J-D
On Wed, Jul 21, 2010 at 2:49 PM, Steve
It's shaping up to be zookeeper problem. The UI showed only 4 RS's running
but when I went on one of the nodes, I saw one of the missing RS was still
running. This RS eventually got terminated due to the following exception
and proceeded to shut down.
I will search on all zookeeper related threa
I started a hbase cluster of 8 nodes and two regionservers died before I
even started any Map job writing data into it. There are several
interesting exceptions and I really appreciate any help on identifying the
culprit and methods to fix it. BTW, I restarted these regionservers
manually and the