* Each node is a 4 CPU machine with max of 3 mappers and 1 regionserver. No reducer when importing data to hbase.
* Each region server is allocated 4G of memory. The full options are: -Xmx4096m -server -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -server -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode On Wed, Jul 21, 2010 at 5:21 PM, Jean-Daniel Cryans <[email protected]>wrote: > So your java process paused for 250116ms, that's how long the process > wasn't responding (aka "stop-the-world" pause). > > You should: > > - Make sure HBase isn't CPU starved (how many MR tasks on those > machines? Left some room for HBase?) > - Make sure there's no swap. Also set swappiness to 0 > - Give more RAM to HBase > > Last resort is giving a higher timeout, but that would mean that you > are overcommitting your machines. > > J-D > > On Wed, Jul 21, 2010 at 5:15 PM, Steve Kuo <[email protected]> wrote: > > J-D, > > > > Below is the closest I found in the regionserver log. There was no > 'slept' > > in either the master or zookeeper logs. > > > > 2010-07-21 14:36:18,664 WARN org.apache.zookeeper.ClientCnxn: Exception > > closing session 0x129f24e134a002c to sun.nio.ch.selectionkeyi...@356f144c > > java.io.IOException: TIMED OUT > > at > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) > > 2010-07-21 14:36:18,665 WARN org.apache.hadoop.hbase.util.Sleeper: We > slept > > 250116ms, ten times longer than scheduled: 10000 > > 2010-07-21 14:36:18,667 WARN > > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to > > master for 250229 milliseconds - retrying > > 2010-07-21 14:36:18,760 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP > > 2010-07-21 14:36:18,761 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > > server on 60020 > > > > From http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9, I think my > best > > best is to stop client swapping, which in my case would be a Map job and > > increase zookeeper timeout settings. I will give them a trial after > > finishing queued data load. > > > > Other suggestions are most welcome. > > > > On Wed, Jul 21, 2010 at 2:55 PM, Jean-Daniel Cryans <[email protected] > >wrote: > > > >> ZooKeeper is only a canary, telling the region server that it was > >> partionned from the cluster for longer than the default timeout > >> somehow, usually because of GC pauses. You should see lines like > >> "slept for x, long than y" messages before what you pasted. > >> > >> J-D > >> > >> > > >
