Fixed in https://issues.apache.org/jira/browse/HBASE-3617, upgrade to 0.90.4
J-D On Tue, Nov 8, 2011 at 6:08 PM, Amit Phadke <[email protected]> wrote: > Adding right address. > > On Nov 7, 2011, at 2:45 PM, Amit Phadke wrote: > > Hey Guys, > > We are seeing an issue where Master dies with something like the following. > Any idea why the master dies ? Ideally, if a RS isnt behaving well, shouldnt > that RS be blacklisted and ignored or something of that sort ? > > This is on a cluster with Hadoop 205 and Hbase 0.90.3 > > Thanks > Amit > > 2011-11-07 02:38:00,252 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: > items,023b3bba-5282-3edc-a984-dbed11d1cc51,1320395576309.bf3cd2b2cc06f8708050ce725cf1fa7d. > state=PENDING_CLOSE, ts=1320631670889 > 2011-11-07 02:38:00,252 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been > PENDING_CLOSE for too long, running forced unassign again on > region=items,023b3bba-5282-3edc-a984-dbed11d1cc51,1320395576309.bf3cd2b2cc06f8708050ce725cf1fa7d. > 2011-11-07 02:38:51,501 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor FATAL > org.apache.hadoop.hbase.master.HMaster: Remote unexpected exception > java.io.IOException: Call to /216.109.127.135:60020 failed on local > exception: java.io.IOException: Connection reset by peer > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy7.closeRegion(Unknown Source) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:601) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1126) > at > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1788) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) > at sun.nio.ch.IOUtil.read(IOUtil.java:171) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) > 2011-11-07 02:38:51,502 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO > org.apache.hadoop.hbase.master.HMaster: Aborting > 2011-11-07 02:38:51,502 nng2.coke.ac4.yahoo.com:60000.timeoutMonitor INFO > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: > nng2.coke.ac4.yahoo.com:60000.timeoutMonitor exiting > >
