you may also check zookeeper log to see if there is any error/exception messages
On Sat, May 2, 2015 at 1:08 PM, kishore g <[email protected]> wrote: > Is zookeeper quorum working fine?. Can you run each stat| nc zkhost zkPort > for each zk server and paste the output. > On May 2, 2015 1:02 PM, "Varun Sharma" <[email protected]> wrote: > >> We are also seeing that all our machines (participants and controller) >> are connecting to the same zookeeper machine which is rather weird - it >> also makes it hard to scale up traffic via observers. Is the following the >> right way to pass the zookeeper string (with comma separation): >> >> zk001:2181, zk002:2181,zk003:2181 >> >> Thanks >> Varun >> >> On Fri, May 1, 2015 at 3:32 PM, Varun Sharma <[email protected]> wrote: >> >>> Hi, >>> >>> We are seeing zookeeper disconnects on the controller and the controller >>> gets into a state from which it cannot reconnect back. We see messages like >>> the ones below over and over again. It keeps trying to re-establish >>> connections against the same session ID and keeps failing. On the other >>> hand, the participants see one hiccup while in their zookeeper connection >>> but gracefully reconnect back. What would cause the controller to keep >>> retrying but failing to connect even after the zookeeper comes back to a >>> healthy state ? >>> >>> 2015-05-01 20:47:02,865 [main-SendThread(terrapinzk001a:2181)] >>> (ClientCnxn.java:1061) INFO Opening socket connection to server >>> terrapinzk001a/10.115.59.31:2181 >>> >>> 2015-05-01 20:47:02,866 [main-SendThread(terrapinzk001a:2181)] >>> (ClientCnxn.java:950) INFO Socket connection established to terrapinzk001a/ >>> 10.115.59.31:2181, initiating session >>> >>> 2015-05-01 20:47:02,880 [main-SendThread(terrapinzk001a:2181)] >>> (ClientCnxn.java:739) INFO Session establishment complete on server >>> terrapinzk001a/10.115.59.31:2181, sessionid = 0x14d111892390023, >>> negotiated timeout = 30000 >>> >>> 2015-05-01 20:47:02,884 [main-EventThread] (ZkClient.java:449) INFO >>> zookeeper state changed (SyncConnected) >>> >>> 2015-05-01 20:47:02,884 [main-SendThread(terrapinzk001a:2181)] >>> (ClientCnxn.java:1186) INFO Unable to read additional data from server >>> sessionid 0x14d111892390023, likely server has closed socket, closing >>> socket connection and attempting reconnect >>> >>> 2015-05-01 20:47:02,988 [main-EventThread] (ZkClient.java:449) INFO >>> zookeeper state changed (Disconnected) >>> >> >>
