Vidhya: So its failing to send close to an explicit server -- see the IP in the below -- and the other server is closing down the request prematurely so we get the EOFE. Can you see anything in the logs on that machine?
Regards EOFE crashing Master, you might want to pick up a TRUNK change. See http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/AssignmentManager.html#1261 (This is how TRUNK looks). Notice that its more generic than what you currently have -- or add a catch for the EOFE. The patch is actually kinda small and targetted explicitly to fix the likes of what you are seeing: + HBASE-3617 NoRouteToHostException during balancing will cause Master abort + (Ted Yu via Stack) Let me know if it works for you. If so, I'll backport it to the branch. St.Ack On Wed, May 11, 2011 at 2:32 PM, Vidhyashankar Venkataraman <[email protected]> wrote: > The master of my Hbase instance (0.90.x) crashes each time it is restarted, > with the exceptions shown below. Can you let me know what this is usually due > to? (I also saw these exceptions in a JIRA but they were about uncaught EOF > exception). Only the master dies while the region servers wait for a master > to wake back up. > > Thank you > Vidhya > > The master log: > > 2011-05-11 21:19:04,259 FATAL org.apache.hadoop.hbase.master.HMaster: Remote > unexpected exception > java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: > java.io.EOFException at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy7.closeRegion(Unknown Source) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039) > at > org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808) > at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691) > at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)2011-05-11 > 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting > 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: balance > hri=WCC.davesch2,r:at#start#www!/Gateway2000!http,1302916227366.b7d206f663282e2a37adb24ba7e4c0de., > src=b3110318.yst.yahoo.net,44420,1305073517470, > dest=b3110175.yst.yahoo.net,44420,1305073507459 > 2011-05-11 21:19:04,260 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region WCC.davesch2,r:at#start#www!/Gateway2000!http > ,1302916227366.b7d206f663282e2a37adb24ba7e4c0de. (offlining) > 2011-05-11 21:19:04,260 FATAL org.apache.hadoop.hbase.master.HMaster: Remote > unexpected exception > java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: > java.io.EOFException > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy7.closeRegion(Unknown Source) at > org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039) > at > org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808) > at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691) > at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459) > 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.HMaster: > Stopping service threads > 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting >
