Thanks for Jean-Daniel Cryans's reply. I have refered to the issue of HBASE-3065.And it's indeed the same problem. Liyin Tang has given a resolvent to this issue . When the ConnectionLossException happened, take some retries to re-connetct to the ZK server. Maybe it can be reconnect successfully with high probability, but not always. In my scenario: 1. The ConnectionLossException happened. 2. The Hmaster process aborted due to session got expired. 3. When I restart the Hmaster process, the ConnectionLossException was happened again. So the initialization failed, and the Hmaster aborted again.
My question is under what conditions does the ConnectionLossException happened? I know the network reasons can cause this problem. Does any other possibilities exists? Thanks! Jieshan Bean =================================================================================================================== -----邮件原件----- 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans 发送时间: 2011年4月15日 2:27 收件人: [email protected] 主题: Re: Does it necessarily to handle the "Zookeeper.ConnectionLossException" in ZKUtil.getDataAndWatch? I guess we should, there's https://issues.apache.org/jira/browse/HBASE-3065 that's open, but in your case like I mentioned in your other email there seems to be something weird in your environment. J-D On Thu, Apr 14, 2011 at 12:51 AM, bijieshan <[email protected]> wrote: > Hi, > The "KeeperException$ConnectionLossException" exception occurred while the > cluster is running, as we know, it's a Zookeeper "recoverable" exception(And > this exception has been handled in the method of > ZooKeeperWatcher.ZooKeeperWatcher),and the suggestion is that we should retry > a while. Does it necessarily? > > Here is the exception logs: > > 2011-03-21 13:26:53,135 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:60000-0x22e8e6ee15f0046 Unable to get data of znode > /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739) > at > org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > 2011-03-21 13:26:53,137 ERROR > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: > master:60000-0x22e8e6ee15f0046 Received unexpected KeeperException, > re-throwing exception > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739) > at > org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > > Expecting for the reply! > Thank you. > > Regards, > Jeason Bean > >
