Vidhya:

So its failing to send close to an explicit server -- see the IP in
the below -- and the other server is closing down the request
prematurely so we get the EOFE.  Can you see anything in the logs on
that machine?

Regards EOFE crashing Master, you might want to pick up a TRUNK
change.  See 
http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/AssignmentManager.html#1261
(This is how TRUNK looks).  Notice that its more generic than what you
currently have -- or add a catch for the EOFE.

The patch is actually kinda small and targetted explicitly to fix the
likes of what you are seeing:

+   HBASE-3617  NoRouteToHostException during balancing will cause Master abort
+               (Ted Yu via Stack)

Let me know if it works for you.  If so, I'll backport it to the branch.

St.Ack



On Wed, May 11, 2011 at 2:32 PM, Vidhyashankar Venkataraman
<[email protected]> wrote:
> The master of my Hbase instance (0.90.x) crashes each time it is restarted, 
> with the exceptions shown below. Can you let me know what this is usually due 
> to? (I also saw these exceptions in a JIRA but they were about uncaught EOF 
> exception). Only the master dies while the region servers wait for a master 
> to wake back up.
>
> Thank you
> Vidhya
>
> The master log:
>
> 2011-05-11 21:19:04,259 FATAL org.apache.hadoop.hbase.master.HMaster: Remote 
> unexpected exception
> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: 
> java.io.EOFException        at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)  
>       at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy7.closeRegion(Unknown Source)
>        at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>        at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>        at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>        at 
> org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)2011-05-11
>  21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: balance 
> hri=WCC.davesch2,r:at#start#www!/Gateway2000!http,1302916227366.b7d206f663282e2a37adb24ba7e4c0de.,
>  src=b3110318.yst.yahoo.net,44420,1305073517470, 
> dest=b3110175.yst.yahoo.net,44420,1305073507459
> 2011-05-11 21:19:04,260 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region WCC.davesch2,r:at#start#www!/Gateway2000!http
> ,1302916227366.b7d206f663282e2a37adb24ba7e4c0de. (offlining)
> 2011-05-11 21:19:04,260 FATAL org.apache.hadoop.hbase.master.HMaster: Remote 
> unexpected exception
> java.io.IOException: Call to /67.195.47.230:44420 failed on local exception: 
> java.io.EOFException
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:788)   
>      at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy7.closeRegion(Unknown Source)        at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
>        at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1092)
>        at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1039)
>        at 
> org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1808)
>        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:691)
>        at org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:582)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:521)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:459)
> 2011-05-11 21:19:04,260 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Stopping service threads
> 2011-05-11 21:19:04,260 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>

Reply via email to