This happens in similar conditions but is distinct from HBASE-3617. When the
region hosting ROOT isn't available during restart, the NoRouteToHostException
propagates all the way up the call stack and causes the master to abort. It
looks like this can be addressed by handling NoRouteToHostException at some
point and considering that node/region server offline.
I applied the patch from HBASE-3617 and it didn't fix the problem I'm seeing,
which I expected given the stack trace below. Assuming this reasoning is
correct, does this merit a separate JIRA? It does seem critical in that the
failure of a single node is preventing us from being up our cluster.
2011-04-01 10:15:19,472 INFO org.apache.hadoop.hbase.master.ServerManager:
Exiting wait on regionserver(s) to checkin; count=2, stopped=false, count of
regions out on cluster=0
2011-04-01 10:15:19,486 INFO org.apache.hadoop.hbase.master.MasterFileSystem:
Log folder
hdfs://iphadoop01:9000/hbase/.logs/iphadoop03.northamerica.cerner.net,60020,1301665635981
belongs to an existing region server
2011-04-01 10:15:19,486 INFO org.apache.hadoop.hbase.master.MasterFileSystem:
Log folder
hdfs://iphadoop01:9000/hbase/.logs/iphadoop05.northamerica.cerner.net,60020,1301665659785
belongs to an existing region server
2011-04-01 10:15:22,508 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled
exception. Starting shutdown.
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:385)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:211)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:458)
at
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:425)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:383)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
2011-04-01 10:15:22,510 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-04-01 10:15:22,510 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping
service threads
----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from
Cerner Corporation and are intended only for the addressee. The information
contained in this message is confidential and may constitute inside or
non-public information under international, federal, or state securities laws.
Unauthorized forwarding, printing, copying, distribution, or use of such
information is strictly prohibited and may be unlawful. If you are not the
addressee, please promptly delete this message and notify the sender of the
delivery error by e-mail or you may call Cerner's corporate offices in Kansas
City, Missouri, U.S.A at (+1) (816)221-1024.