Hi
I've updated our dev environment from Hbase 0.90.0 (ASF+CDH3b3) which
behaved very stable to Hbase 0.90.1 (CDH3B4) and since then the HMaster
dies regularly. Issue seems to be regarded to the connection to
Zookeeper. Even if I use a standby HMaster, this one also dies from same
cause:
2011-03-04 15:05:54,699 FATAL
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Unexpected exception handling nodeDeleted event
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:232)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:165)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-03-04 15:05:54,704 INFO org.apache.zookeeper.ZooKeeper: Session:
0x22e80dcc2350001 closed
2011-03-04 15:05:54,704 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-04 15:05:54,718 INFO org.apache.zookeeper.ZooKeeper: Session:
0x22e80dcc2350000 closed
2011-03-04 15:05:54,718 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-03-04 15:05:54,718 INFO org.apache.hadoop.hbase.master.HMaster:
HMaster main thread exiting
just before this one there is an other exception
2011-03-04 15:07:00,611 FATAL org.apache.hadoop.hbase.master.HMaster:
Failed assignment of regions to serverName=desktop,60020,1299242075991,
load=(requests=0, regions=0, usedHeap=34, maxHeap=996)
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting
up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to
/172.28.124.148:60020 after attempts=1
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
at
org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:560)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
at
org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
... 8 more
2011-03-04 15:07:00,615 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
Any hint for me what could be wrong there?
Thanks
Daniel