This happens in similar conditions but is distinct from HBASE-3617. When the 
region hosting ROOT isn't available during restart, the NoRouteToHostException 
propagates all the way up the call stack and causes the master to abort.  It 
looks like this can be addressed by handling NoRouteToHostException at some 
point and considering that node/region server offline.

I applied the patch from HBASE-3617 and it didn't fix the problem I'm seeing, 
which I expected given the stack trace below.  Assuming this reasoning is 
correct, does this merit a separate JIRA?  It does seem critical in that the 
failure of a single node is preventing us from being up our cluster.

2011-04-01 10:15:19,472 INFO org.apache.hadoop.hbase.master.ServerManager: 
Exiting wait on regionserver(s) to checkin; count=2, stopped=false, count of 
regions out on cluster=0
2011-04-01 10:15:19,486 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://iphadoop01:9000/hbase/.logs/iphadoop03.northamerica.cerner.net,60020,1301665635981
 belongs to an existing region server
2011-04-01 10:15:19,486 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://iphadoop01:9000/hbase/.logs/iphadoop05.northamerica.cerner.net,60020,1301665659785
 belongs to an existing region server
2011-04-01 10:15:22,508 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.net.NoRouteToHostException: No route to host
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
     at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
     at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
     at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
     at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
     at $Proxy6.getProtocolVersion(Unknown Source)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
     at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
     at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:385)
     at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:211)
     at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:458)
     at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:425)
     at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:383)
     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
2011-04-01 10:15:22,510 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-04-01 10:15:22,510 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping 
service threads

----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.

Reply via email to