Take a look at the zookeeper server log, it should give you a clue. If it says there's too many connections, then you're hitting a well known problem with HBase 0.90, just look for the other threads in this mailing list about that.
J-D On Sat, Apr 16, 2011 at 3:01 AM, bijieshan <[email protected]> wrote: > Thanks for Jean-Daniel Cryans's reply. > I have refered to the issue of HBASE-3065.And it's indeed the same problem. > Liyin Tang has given a resolvent to this issue . When the > ConnectionLossException happened, take some retries to re-connetct to the ZK > server. > Maybe it can be reconnect successfully with high probability, but not always. > In my scenario: > 1. The ConnectionLossException happened. > 2. The Hmaster process aborted due to session got expired. > 3. When I restart the Hmaster process, the ConnectionLossException was > happened again. So the initialization failed, and the Hmaster aborted again. > > My question is under what conditions does the ConnectionLossException > happened? I know the network reasons can cause this problem. Does any other > possibilities exists? > Thanks! > > Jieshan Bean > > =================================================================================================================== > -----邮件原件----- > 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans > 发送时间: 2011年4月15日 2:27 > 收件人: [email protected] > 主题: Re: Does it necessarily to handle the "Zookeeper.ConnectionLossException" > in ZKUtil.getDataAndWatch? > > I guess we should, there's > https://issues.apache.org/jira/browse/HBASE-3065 that's open, but in > your case like I mentioned in your other email there seems to be > something weird in your environment. > > J-D > > On Thu, Apr 14, 2011 at 12:51 AM, bijieshan <[email protected]> wrote: >> Hi, >> The "KeeperException$ConnectionLossException" exception occurred while the >> cluster is running, as we know, it's a Zookeeper "recoverable" exception(And >> this exception has been handled in the method of >> ZooKeeperWatcher.ZooKeeperWatcher),and the suggestion is that we should >> retry a while. Does it necessarily? >> >> Here is the exception logs: >> >> 2011-03-21 13:26:53,135 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: >> master:60000-0x22e8e6ee15f0046 Unable to get data of znode >> /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for >> /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932) >> at >> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) >> at >> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739) >> at >> org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) >> at >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) >> at >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) >> 2011-03-21 13:26:53,137 ERROR >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: >> master:60000-0x22e8e6ee15f0046 Received unexpected KeeperException, >> re-throwing exception >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for >> /hbase/unassigned/59ba25120921011b7d9ed4025d30c105 >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932) >> at >> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) >> at >> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739) >> at >> org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) >> at >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) >> at >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) >> >> Expecting for the reply! >> Thank you. >> >> Regards, >> Jeason Bean >> >> >
