We are also seeing that all our machines (participants and controller) are
connecting to the same zookeeper machine which is rather weird - it also
makes it hard to scale up traffic via observers. Is the following the right
way to pass the zookeeper string (with comma separation):

zk001:2181, zk002:2181,zk003:2181

Thanks
Varun

On Fri, May 1, 2015 at 3:32 PM, Varun Sharma <[email protected]> wrote:

> Hi,
>
> We are seeing zookeeper disconnects on the controller and the controller
> gets into a state from which it cannot reconnect back. We see messages like
> the ones below over and over again. It keeps trying to re-establish
> connections against the same session ID and keeps failing. On the other
> hand, the participants see one hiccup while in their zookeeper connection
> but gracefully reconnect back. What would cause the controller to keep
> retrying but failing to connect even after the zookeeper comes back to a
> healthy state ?
>
> 2015-05-01 20:47:02,865 [main-SendThread(terrapinzk001a:2181)]
> (ClientCnxn.java:1061) INFO  Opening socket connection to server
> terrapinzk001a/10.115.59.31:2181
>
> 2015-05-01 20:47:02,866 [main-SendThread(terrapinzk001a:2181)]
> (ClientCnxn.java:950) INFO  Socket connection established to terrapinzk001a/
> 10.115.59.31:2181, initiating session
>
> 2015-05-01 20:47:02,880 [main-SendThread(terrapinzk001a:2181)]
> (ClientCnxn.java:739) INFO  Session establishment complete on server
> terrapinzk001a/10.115.59.31:2181, sessionid = 0x14d111892390023,
> negotiated timeout = 30000
>
> 2015-05-01 20:47:02,884 [main-EventThread] (ZkClient.java:449) INFO
> zookeeper state changed (SyncConnected)
>
> 2015-05-01 20:47:02,884 [main-SendThread(terrapinzk001a:2181)]
> (ClientCnxn.java:1186) INFO  Unable to read additional data from server
> sessionid 0x14d111892390023, likely server has closed socket, closing
> socket connection and attempting reconnect
>
> 2015-05-01 20:47:02,988 [main-EventThread] (ZkClient.java:449) INFO
> zookeeper state changed (Disconnected)
>

Reply via email to