I'm cleaning this up in this jira
https://issues.apache.org/jira/browse/HBASE-3755

But it's a failure case I haven't seen before, really interesting.
There's a HTable that's created in the guts if HCM that will throw a
ZookeeperConnectionException but it will bubble up as an IOE. I'll try
to address this too in 3755.

J-D

On Mon, Apr 11, 2011 at 11:03 AM, Sandy Pratt <[email protected]> wrote:
> Hi all,
>
> I had an issue recently where a scan job I frequently run caught 
> ConnectionLoss and subsequently failed to recover.
>
> The stack trace looks like this:
>
> 11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d8 closed
> 11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No longer 
> connected to ZooKeeper, current state: Disconnected
> 11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server localhost/127.0.0.1:21811
> 11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d9 closed
> 11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper
> 11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=localhost:21811 sessionTimeout=60000 
> watcher=org.apache.hadoop.hbase.z
> ookeeper.ZooKeeperWrapper@51127a
> 11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server localhost/127.0.0.1:21811
> 11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
> unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting stats for 
> /hbase/rs
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/rs
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>        at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)
>        at 
> org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>        at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:102)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:732)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:677)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:650)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:470)
>        at 
> org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1145)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
>        at 
> com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuilder.java:215)
>        at 
> com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:310)
>        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130)
>        at BuildAfs.main(BuildAfs.java:43)
> 11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server localhost/127.0.0.1:21811
> 11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
> unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server localhost/127.0.0.1:21811
> 11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
> unexpected error, closing socket connection and attempting reconnect
>
> It then goes on to retry endlessly.  Killing the spinning job and running it 
> again worked fine, so crashing would be preferable to me over retrying 
> endlessly.
>
> I'm not especially concerned about what went wrong to cause ConnectionLoss in 
> the first place, but I am interested in being able to set some behavior for 
> handling the ZK exceptions elegantly.  For example, the call site in my code 
> leading to the exception is this:
>
> Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts)));
> Result result = timestampsTable.get(get);
>
> I suppose this means that if I want to catch ConnectionLoss in my code, I 
> have to wrap all my gets and puts with that catch block.  Or maybe just the 
> first one?  It seems like HTable and friends might be able to catch this 
> exception in a more central location, maybe somewhere in here:
>
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)
>
> I'm running HBase 0.89.20100924+28.  Will this issue go away if I upgrade to 
> a newer version?
>
> Thanks,
> Sandy
>

Reply via email to