Hi, We are seeing zookeeper disconnects on the controller and the controller gets into a state from which it cannot reconnect back. We see messages like the ones below over and over again. It keeps trying to re-establish connections against the same session ID and keeps failing. On the other hand, the participants see one hiccup while in their zookeeper connection but gracefully reconnect back. What would cause the controller to keep retrying but failing to connect even after the zookeeper comes back to a healthy state ?
2015-05-01 20:47:02,865 [main-SendThread(terrapinzk001a:2181)] (ClientCnxn.java:1061) INFO Opening socket connection to server terrapinzk001a/10.115.59.31:2181 2015-05-01 20:47:02,866 [main-SendThread(terrapinzk001a:2181)] (ClientCnxn.java:950) INFO Socket connection established to terrapinzk001a/ 10.115.59.31:2181, initiating session 2015-05-01 20:47:02,880 [main-SendThread(terrapinzk001a:2181)] (ClientCnxn.java:739) INFO Session establishment complete on server terrapinzk001a/10.115.59.31:2181, sessionid = 0x14d111892390023, negotiated timeout = 30000 2015-05-01 20:47:02,884 [main-EventThread] (ZkClient.java:449) INFO zookeeper state changed (SyncConnected) 2015-05-01 20:47:02,884 [main-SendThread(terrapinzk001a:2181)] (ClientCnxn.java:1186) INFO Unable to read additional data from server sessionid 0x14d111892390023, likely server has closed socket, closing socket connection and attempting reconnect 2015-05-01 20:47:02,988 [main-EventThread] (ZkClient.java:449) INFO zookeeper state changed (Disconnected)
