Hey Guys,
We are seeing an issue where Master dies with something like the following.
Any idea why the master dies ? Ideally, if a RS isnt behaving well, shouldnt
that RS be blacklisted and ignored or something of that sort ?
This is on a cluster with Hadoop 205 and Hbase 0.90.3
Thanks
Amit
2011-11-07 02:38:00,252 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed
out:
items,023b3bba-5282-3edc-a984-dbed11d1cc51,1320395576309.bf3cd2b2cc06f8708050ce725cf1fa7d.
state=PENDING_CLOSE, ts=1320631670889
2011-11-07 02:38:00,252 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE
for too long, running forced unassign again on
region=items,023b3bba-5282-3edc-a984-dbed11d1cc51,1320395576309.bf3cd2b2cc06f8708050ce725cf1fa7d.
2011-11-07 02:38:51,501 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor FATAL
org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception
java.io.IOException: Call to /216.109.127.135:60020 failed on local exception:
java.io.IOException: Connection reset by peer
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy7.closeRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:601)
at
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1126)
at
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1788)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
2011-11-07 02:38:51,502 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO
org.apache.hadoop.hbase.master.HMaster: Aborting
2011-11-07 02:38:51,502 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
nng2.coke.ac4.yahoo.com:60000.timeoutMonitor exiting