Hi all,

I had an issue recently where a scan job I frequently run caught ConnectionLoss 
and subsequently failed to recover.

The stack trace looks like this:

11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d8 closed
11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No longer 
connected to ZooKeeper, current state: Disconnected
11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection to 
server localhost/127.0.0.1:21811
11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d9 closed
11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper
11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:21811 sessionTimeout=60000 
watcher=org.apache.hadoop.hbase.z
ookeeper.ZooKeeperWrapper@51127a
11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection to 
server localhost/127.0.0.1:21811
11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting stats for 
/hbase/rs
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)
        at 
org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
        at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:102)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:732)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:677)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:650)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:470)
        at 
org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1145)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
        at 
com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuilder.java:215)
        at 
com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:310)
        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130)
        at BuildAfs.main(BuildAfs.java:43)
11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection to 
server localhost/127.0.0.1:21811
11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection to 
server localhost/127.0.0.1:21811
11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect

It then goes on to retry endlessly.  Killing the spinning job and running it 
again worked fine, so crashing would be preferable to me over retrying 
endlessly.

I'm not especially concerned about what went wrong to cause ConnectionLoss in 
the first place, but I am interested in being able to set some behavior for 
handling the ZK exceptions elegantly.  For example, the call site in my code 
leading to the exception is this:

Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts)));
Result result = timestampsTable.get(get);

I suppose this means that if I want to catch ConnectionLoss in my code, I have 
to wrap all my gets and puts with that catch block.  Or maybe just the first 
one?  It seems like HTable and friends might be able to catch this exception in 
a more central location, maybe somewhere in here:

at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)

I'm running HBase 0.89.20100924+28.  Will this issue go away if I upgrade to a 
newer version?

Thanks,
Sandy

Reply via email to