Hi Rohit, Usually YouAreDeadException is when your RegionServer is to slow. It gets kicked out by Master+ZK but then try to join back and get informed it has bene kicked out.
Reasons: - Long Gargabe Collection; - Swapping; - Network issues (get disconnected, then re-connected); - etc. what do you have before 2014-02-21 13:41:00,308 in the logs? 2014-02-27 11:13 GMT-05:00 Rohit Kelkar <[email protected]>: > Hi, has anybody been facing similar issues? > > - R > > > On Wed, Feb 26, 2014 at 12:55 PM, Rohit Kelkar <[email protected] > >wrote: > > > We are running hbase 0.94.2 on hadoop 0.20 append version in production > > (yes we have plans to upgrade hadoop). Its a 5 node cluster and a 6th > node > > running just the name node and hmaster. > > I am seeing frequent RS YouAreDeadExceptions. Logs here > > http://pastebin.com/44aFyYZV > > The RS log shows a DFSOutputStream ResponseProcessor exception for block > > blk_-6695300470410774365_837638 java.io.EOFException at 13:41:00 followed > > by YouAreDeadException at the same time. > > I grep'ed this block in the Datanode (see log here > > http://pastebin.com/2jfwCfcK). At 13:41:00 I see an Exception in > > receiveBlock for block blk_-6695300470410774365_837638 > > java.nio.channels.ClosedByInterruptException. > > I have also attached the namenode logs around the block here > > http://pastebin.com/9NE9J8s1 > > > > Across several RS failure instances I see the following pattern - the > > region server YouAreDeadException is always preceeded by the EOFException > > and datanode ClosedByInterruptException > > > > Is the error in the movement of the block causing the region server to > > report a YouAreDeadException? And of course, how do I solve this? > > > > - R > > >
