Re: YouAreDeadException with hbase

Ryan Rawson Fri, 16 Jul 2010 12:58:03 -0700

Sometimes the GC can chain multiple medium pauses into one large
pause.  I've seen this before where there are 2 long pauses back to
back and the result was a 50second+ pause.


This article talks a lot about GC performance and tuning, check it out:
http://wiki.apache.org/hadoop/PerformanceTuning

-ryan

On Fri, Jul 16, 2010 at 11:55 AM, Jinsong Hu <[email protected]> wrote:
> Yes, the root cause seems to be the gap of 4 minutes between
> 2010-07-16 05:49:26,805   and 2010-07-16 05:53:23,476 .  but I checked GC
> gc-hbase.log and don't see
> 4 minute gap in gc. I just wonder what could cause this large gap.
>
> I also wonder if there is a configuration that I can do to avoid this long
> pause, or get around the problem
> cause by this long pause.
>
> Jimmy
>
> --------------------------------------------------
> From: "Stack" <[email protected]>
> Sent: Friday, July 16, 2010 11:44 AM
> To: <[email protected]>
> Subject: Re: YouAreDeadException with hbase
>
>> You'll see this if the server reports to the master after the master
>> has ruled it 'dead'.
>>
>> Here is the code that produces the exception:
>>
>>   if (!isDead(serverName)) return;
>>   String message = "Server " + what + " rejected; currently processing " +
>>     serverName + " as dead server";
>>   LOG.debug(message);
>>   throw new YouAreDeadException(message);
>>
>> Servers are on the 'dead' list if zk reports their session has
>> expired.  The master moves then to cleanup after the dead server and
>> process its logs.  If during this cleanup time the server reports in,
>> master will return the youaredead exception.
>>
>> Usually the RS has lost its zk session but has yet to realize it.
>>
>> St.Ack
>>
>> On Thu, Jul 15, 2010 at 11:52 PM, Jinsong Hu <[email protected]>
>> wrote:
>>>
>>> Hi, There:
>>>  I got some YouAreDeadException  with hbase. what can cause it ? I do
>>> notice
>>> between 5:49 to 5:53 ,
>>> for 4 minutes, there is no log. This doesn't look like GC issue as I
>>> checked
>>> the GC log, the longest GC
>>> is only 9.6 seconds.
>>>
>>> Jimmy.
>>>
>>>
>>> 2010-07-16 05:49:26,805 DEBUG
>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
>>> che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB
>>> (425113472),
>>> Max=4
>>> 08.775MB (428631648), Counts: Blocks=1, Access=2178914, Hit=1034,
>>> Miss=2177880,
>>> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04745483165606856%, Miss
>>> Ratio=99.95
>>> 254278182983%, Evicted/Run=NaN
>>> 2010-07-16 05:53:23,476 DEBUG
>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Ca
>>> che Stats: Sizes: Total=3.355194MB (3518176), Free=405.4198MB
>>> (425113472),
>>> Max=4
>>> 08.775MB (428631648), Counts: Blocks=1, Access=2178915, Hit=1035,
>>> Miss=2177880,
>>> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.04750070511363447%, Miss
>>> Ratio=99.95
>>> 250105857849%, Evicted/Run=NaN
>>>
>>> ....
>>> 2010-07-16 05:53:26,171 INFO org.apache.zookeeper.ClientCnxn: Client
>>> session
>>> tim
>>> ed out, have not heard from server in 240540ms for sessionid
>>> 0x329c88039b0006c,
>>> closing socket connection and attempting reconnect
>>> 2010-07-16 05:53:27,333 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket
>>> con
>>> nection to server t-zookeeper2.cloud.ppops.net/10.110.24.57:2181
>>> 2010-07-16 05:53:27,334 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection
>>> established to t-zookeeper2.cloud.ppops.net/10.110.24.57:2181, initiating
>>> sessio
>>> n
>>> 2010-07-16 05:53:27,335 INFO org.apache.zookeeper.ClientCnxn: Unable to
>>> reconnec
>>> t to ZooKeeper service, session 0x329c88039b0006c has expired, closing
>>> socket co
>>> nnection
>>> 2010-07-16 05:53:27,896 INFO org.apache.zookeeper.ClientCnxn: Client
>>> session
>>> tim
>>> ed out, have not heard from server in 240520ms for sessionid
>>> 0x129c87a7f98007a,
>>> closing socket connection and attempting reconnect
>>>
>>>
>>> 2010-07-16 05:53:39,090 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer
>>> : Aborting region server
>>> serverName=m0002028.ppops.net,60020,1279237223465,
>>> load
>>> =(requests=952, regions=21, usedHeap=575, maxHeap=2043): Unhandled
>>> exception
>>> org.apache.hadoop.hbase.YouAreDeadException:
>>> org.apache.hadoop.hbase.YouAreDeadE
>>> xception: Server REPORT rejected; currently processing
>>> m0002028.ppops.net,60020,
>>> 1279237223465 as dead server
>>>      at
>>> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManage
>>> r.java:217)
>>>      at
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(Serve
>>> rManager.java:271)
>>>      at
>>> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.jav
>>> a:684)
>>>      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>>>      at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>> sorImpl.java:25)
>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>      at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>>      at
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
>>> 919)
>>>
>>>      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>
>>
>

Re: YouAreDeadException with hbase

Reply via email to