I'm cleaning this up in this jira https://issues.apache.org/jira/browse/HBASE-3755
But it's a failure case I haven't seen before, really interesting. There's a HTable that's created in the guts if HCM that will throw a ZookeeperConnectionException but it will bubble up as an IOE. I'll try to address this too in 3755. J-D On Mon, Apr 11, 2011 at 11:03 AM, Sandy Pratt <[email protected]> wrote: > Hi all, > > I had an issue recently where a scan job I frequently run caught > ConnectionLoss and subsequently failed to recover. > > The stack trace looks like this: > > 11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d8 closed > 11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No longer > connected to ZooKeeper, current state: Disconnected > 11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/127.0.0.1:21811 > 11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d9 closed > 11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper > 11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client connection, > connectString=localhost:21811 sessionTimeout=60000 > watcher=org.apache.hadoop.hbase.z > ookeeper.ZooKeeperWrapper@51127a > 11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/127.0.0.1:21811 > 11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) > 11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting stats for > /hbase/rs > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/rs > at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754) > at > org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:102) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:732) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:677) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:650) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:470) > at > org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1145) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503) > at > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuilder.java:215) > at > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:310) > at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130) > at BuildAfs.main(BuildAfs.java:43) > 11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/127.0.0.1:21811 > 11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) > 11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/127.0.0.1:21811 > 11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server null, > unexpected error, closing socket connection and attempting reconnect > > It then goes on to retry endlessly. Killing the spinning job and running it > again worked fine, so crashing would be preferable to me over retrying > endlessly. > > I'm not especially concerned about what went wrong to cause ConnectionLoss in > the first place, but I am interested in being able to set some behavior for > handling the ZK exceptions elegantly. For example, the call site in my code > leading to the exception is this: > > Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts))); > Result result = timestampsTable.get(get); > > I suppose this means that if I want to catch ConnectionLoss in my code, I > have to wrap all my gets and puts with that catch block. Or maybe just the > first one? It seems like HTable and friends might be able to catch this > exception in a more central location, maybe somewhere in here: > > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754) > > I'm running HBase 0.89.20100924+28. Will this issue go away if I upgrade to > a newer version? > > Thanks, > Sandy >
