Re: region server dead and datanode block movement error

Jean-Marc Spaggiari Thu, 27 Feb 2014 08:18:20 -0800

Hi Rohit,

Usually YouAreDeadException is when your RegionServer is to slow. It gets
kicked out by Master+ZK but then try to join back and get informed it has
bene kicked out.


Reasons:
- Long Gargabe Collection;
- Swapping;
- Network issues (get disconnected, then re-connected);
- etc.

what do you have before 2014-02-21 13:41:00,308 in the logs?


2014-02-27 11:13 GMT-05:00 Rohit Kelkar <[email protected]>:

> Hi, has anybody been facing similar issues?
>
> - R
>
>
> On Wed, Feb 26, 2014 at 12:55 PM, Rohit Kelkar <[email protected]
> >wrote:
>
> > We are running hbase 0.94.2 on hadoop 0.20 append version in production
> > (yes we have plans to upgrade hadoop). Its a 5 node cluster and a 6th
> node
> > running just the name node and hmaster.
> > I am seeing frequent RS YouAreDeadExceptions. Logs here
> > http://pastebin.com/44aFyYZV
> > The RS log shows a DFSOutputStream ResponseProcessor exception  for block
> > blk_-6695300470410774365_837638 java.io.EOFException at 13:41:00 followed
> > by YouAreDeadException at the same time.
> > I grep'ed this block in the Datanode (see log here
> > http://pastebin.com/2jfwCfcK). At 13:41:00 I see an Exception in
> > receiveBlock for block blk_-6695300470410774365_837638
> > java.nio.channels.ClosedByInterruptException.
> > I have also attached the namenode logs around the block here
> > http://pastebin.com/9NE9J8s1
> >
> > Across several RS failure instances I see the following pattern - the
> > region server YouAreDeadException is always preceeded by the EOFException
> > and datanode ClosedByInterruptException
> >
> > Is the error in the movement of the block causing the region server to
> > report a YouAreDeadException? And of course, how do I solve this?
> >
> > - R
> >
>

Re: region server dead and datanode block movement error

Reply via email to