Re: Regionserver died due to problem connecting to HMaster?

Jean-Daniel Cryans Wed, 21 Jul 2010 17:22:01 -0700

So your java process paused for 250116ms, that's how long the process
wasn't responding (aka "stop-the-world" pause).


You should:

 - Make sure HBase isn't CPU starved (how many MR tasks on those
machines? Left some room for HBase?)
 - Make sure there's no swap. Also set swappiness to 0
 - Give more RAM to HBase

Last resort is giving a higher timeout, but that would mean that you
are overcommitting your machines.

J-D

On Wed, Jul 21, 2010 at 5:15 PM, Steve Kuo <kuosen...@gmail.com> wrote:
> J-D,
>
> Below is the closest I found in the regionserver log.  There was no 'slept'
> in either the master or zookeeper logs.
>
> 2010-07-21 14:36:18,664 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x129f24e134a002c to sun.nio.ch.selectionkeyi...@356f144c
> java.io.IOException: TIMED OUT
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> 2010-07-21 14:36:18,665 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
> 250116ms, ten times longer than scheduled: 10000
> 2010-07-21 14:36:18,667 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
> master for 250229 milliseconds - retrying
> 2010-07-21 14:36:18,760 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
> 2010-07-21 14:36:18,761 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
>
> From http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9, I think my best
> best is to stop client swapping, which in my case would be a Map job and
> increase zookeeper timeout settings.  I will give them a trial after
> finishing queued data load.
>
> Other suggestions are most welcome.
>
> On Wed, Jul 21, 2010 at 2:55 PM, Jean-Daniel Cryans 
> <jdcry...@apache.org>wrote:
>
>> ZooKeeper is only a canary, telling the region server that it was
>> partionned from the cluster for longer than the default timeout
>> somehow, usually because of GC pauses. You should see lines like
>> "slept for x, long than y" messages before what you pasted.
>>
>> J-D
>>
>>
>

Re: Regionserver died due to problem connecting to HMaster?

Reply via email to