J-D,

Below is the closest I found in the regionserver log.  There was no 'slept'
in either the master or zookeeper logs.

2010-07-21 14:36:18,664 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x129f24e134a002c to sun.nio.ch.selectionkeyi...@356f144c
java.io.IOException: TIMED OUT
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2010-07-21 14:36:18,665 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
250116ms, ten times longer than scheduled: 10000
2010-07-21 14:36:18,667 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
master for 250229 milliseconds - retrying
2010-07-21 14:36:18,760 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
2010-07-21 14:36:18,761 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020

>From http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9, I think my best
best is to stop client swapping, which in my case would be a Map job and
increase zookeeper timeout settings.  I will give them a trial after
finishing queued data load.

Other suggestions are most welcome.

On Wed, Jul 21, 2010 at 2:55 PM, Jean-Daniel Cryans <[email protected]>wrote:

> ZooKeeper is only a canary, telling the region server that it was
> partionned from the cluster for longer than the default timeout
> somehow, usually because of GC pauses. You should see lines like
> "slept for x, long than y" messages before what you pasted.
>
> J-D
>
>

Reply via email to