I think this is HBASE-4168. I just put patched 0.90.4 onto our staging cluster.
On Tue, Aug 9, 2011 at 10:03 AM, Brent Miller <[email protected]>wrote: > Thanks for the reply. > > I think the exception that you're asking about is this one: > > 2011-08-05 10:57:34,529 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received > exception accessing META during server shutdown of > hadoop-3.ionamerica.priv,60020,1312306642172, retrying META read > 2011-08-05 10:57:37,538 WARN > org.apache.hadoop.hbase.zookeeper.MetaNodeTracker: Tried to reset META > server location after seeing the completion of a new META assignment but > got > an IOE > java.net.NoRouteToHostException: No route to host > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408) > at > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy6.getRegionInfo(Unknown Source) > at > > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424) > at > > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:272) > at > > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331) > at > > org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364) > at > > org.apache.hadoop.hbase.zookeeper.MetaNodeTracker.nodeDeleted(MetaNodeTracker.java:64) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:276) > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > > If it helps at all, I put a copy master log up at > > https://s3-us-west-1.amazonaws.com/brent-be-public/hbase-hbase-master-hadoop-master.log.2011-08-05-partial > which > contains the time frame from when the master first noticed the region > server > was dead until it started spitting out "Received exception accessing META > during server shutdown..." over and over again. > > Thanks, > Brent > > > On Mon, Aug 8, 2011 at 4:14 PM, Stack <[email protected]> wrote: > > > On Fri, Aug 5, 2011 at 2:13 PM, Brent Miller <[email protected]> > > wrote: > > > I was under the assumption that if a regionserver failed, the clients > > would > > > automatically switch over to a good regionserver. Also, if I pull up > the > > > mater's web UI, it no longer shows the failed regionserver in the > "Region > > > Servers" section. Is this a bug or does the client have to somehow > check > > if > > > a regionserver is valid? > > > > > > We're using Clouder'a HBase 0.90.3-cdh3u1 on Ubuntu 10.04 > > > > > > > What usually happens is that when a regionserver dies, the master will > > notice its absence and then it will deploy the regions the dead server > > was carrying elsewhere. The process that does this is named > > ServerShutdownHandler. In your case above, it seems that this handler > > is having an issue processing the dead server -- so the regions did > > not get reassigned. What is the exception that is being thrown when > > we try to contact .META. region? > > > > St.Ack > > >
