Take a look at the zookeeper server log, it should give you a clue. If
it says there's too many connections, then you're hitting a well known
problem with HBase 0.90, just look for the other threads in this
mailing list about that.

J-D

On Sat, Apr 16, 2011 at 3:01 AM, bijieshan <[email protected]> wrote:
> Thanks for Jean-Daniel Cryans's reply.
> I have refered to the issue of HBASE-3065.And it's indeed the same problem.
> Liyin Tang has given a resolvent to this issue . When the 
> ConnectionLossException happened, take some retries to re-connetct to the ZK 
> server.
> Maybe it can be reconnect successfully with high probability, but not always.
> In my scenario:
> 1. The ConnectionLossException happened.
> 2. The Hmaster process aborted due to session got expired.
> 3. When I restart the Hmaster process, the ConnectionLossException was 
> happened again. So the initialization failed, and the Hmaster aborted again.
>
> My question is under what conditions does the ConnectionLossException 
> happened? I know the network reasons can cause this problem. Does any other 
> possibilities exists?
> Thanks!
>
> Jieshan Bean
>
> ===================================================================================================================
> -----邮件原件-----
> 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans
> 发送时间: 2011年4月15日 2:27
> 收件人: [email protected]
> 主题: Re: Does it necessarily to handle the "Zookeeper.ConnectionLossException" 
> in ZKUtil.getDataAndWatch?
>
> I guess we should, there's
> https://issues.apache.org/jira/browse/HBASE-3065 that's open, but in
> your case like I mentioned in your other email there seems to be
> something weird in your environment.
>
> J-D
>
> On Thu, Apr 14, 2011 at 12:51 AM, bijieshan <[email protected]> wrote:
>> Hi,
>> The "KeeperException$ConnectionLossException" exception occurred while the 
>> cluster is running, as we know, it's a Zookeeper "recoverable" exception(And 
>> this exception has been handled in the method of 
>> ZooKeeperWatcher.ZooKeeperWatcher),and the suggestion is that we should 
>> retry a while. Does it necessarily?
>>
>> Here is the exception logs:
>>
>> 2011-03-21 13:26:53,135 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: 
>> master:60000-0x22e8e6ee15f0046 Unable to get data of znode 
>> /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
>> org.apache.zookeeper.KeeperException$ConnectionLossException: 
>> KeeperErrorCode = ConnectionLoss for 
>> /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
>>         at 
>> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
>>         at 
>> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
>>         at 
>> org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
>>         at 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
>>         at 
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
>> 2011-03-21 13:26:53,137 ERROR 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
>> master:60000-0x22e8e6ee15f0046 Received unexpected KeeperException, 
>> re-throwing exception
>> org.apache.zookeeper.KeeperException$ConnectionLossException: 
>> KeeperErrorCode = ConnectionLoss for 
>> /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
>>         at 
>> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
>>         at 
>> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
>>         at 
>> org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
>>         at 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
>>         at 
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
>>
>> Expecting for the reply!
>> Thank you.
>>
>> Regards,
>> Jeason Bean
>>
>>
>

Reply via email to