I implemented a retry of every ZK method call I make and the exceptions stopped. I recognize that we are currently using an older ZK version (3.3.4) -- but is the current version better in this respect? I got the email today about the new release of Curator. I wonder why this is not part of the ZK offering if it makes ZK that much easier to use. (I don't know that this is a true statement, but the Curator doc claims so.)
Chris IBM Tivoli Systems Research Triangle Park, NC (919) 224-2240 Internet: [email protected] From: Chris Barlock/Raleigh/IBM@IBMUS To: [email protected] Date: 01/15/2015 09:37 PM Subject: Re: ConnectionLossException My ensemble is a single ZK node running on the same computer as the rest of my application. I think it is a "good state" because my configuration data does get loaded into ZK. Does ZK create the ConnectionLossException if the session timed out and I tried to use it after this happened? If so, it would be related to the timeout. If not, what causes ConnectionLossException? Chris IBM Tivoli Systems Research Triangle Park, NC (919) 224-2240 Internet: [email protected] From: "[email protected]" <[email protected]> To: "[email protected]" <[email protected]> Date: 01/15/2015 09:18 PM Subject: Re: ConnectionLossException Connection loss is not related with the session timeout. If it frequently ocurrs, then it indicates that the ensemble of zookeeper are not in a good state. [email protected] From: Chris Barlock Date: 2015-01-16 10:04 To: user Subject: ConnectionLossException We are currently using ZK 3.3.4, which is included in the version of Kafka we are using. I'm seeing a number of exceptions like: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /com at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843) at com.ibm.tivoli.ccm.config.rest.ConfigClient.setValueAtNode(ConfigClient.java:630) My method setValueAtNode includes a call to this method before I make any zk (ZooKeeper) calls: private void connectZooKeeper() { final String methodName = "connectZooKeeper"; trace.entry(CLASS_NAME, methodName); if (zk == null || zk.getState() != States.CONNECTED) { if (zk != null) { close(); } try { zk = new ZooKeeper(connectString, sessionTimeout, this); int connectAttempts = 0; while (zk.getState() != States.CONNECTED && connectAttempts < MAX_ZK_CONNECT_ATTEMPTS) { try { Thread.sleep(ZK_CONNECT_WAIT); } catch (InterruptedException e) { // Ignore } connectAttempts++; } } catch (IOException e) { trace.exception(CLASS_NAME, methodName, e); } } trace.exit(CLASS_NAME, methodName); } I'm totally guessing that the connection is timing out between the time this method is called and when I make the following zk method calls. Is there a best practise for ensuring one is connected to ZooKeeper? My session timeout is 3000 ms. Chris
