> It seems like it should be more than that since connectionTimeoutMs will be 
> the time it takes to check only 1 node.

Why do you think it will only check 1 node? I haven’t looked at that code in a 
long time but a quick check suggests to me that 
internalBlockUntilConnectedOrTimedOut() waits for a successful connection 
irrespective of which node it is.

> On Jul 30, 2016, at 12:06 AM, Benjamin Jaton <[email protected]> wrote:
> 
> The internalBlockUntilConnectedOrTimedOut() method will timeout after 
> connectionTimeoutMs.
> 
> It seems like it should be more than that since connectionTimeoutMs will be 
> the time it takes to check only 1 node.
> 
> Thoughts?
> 
> On Fri, Jul 29, 2016 at 2:55 PM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> Adding the end of the log:
> 
> 570438 [main] DEBUG org.apache.curator.RetryLoop  - Retry policy not allowing 
> retry
> 570438 [main-EventThread] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> SUSPENDED
> Exception in thread "main" 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at 
> testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25)
> 
> On Fri, Jul 29, 2016 at 2:49 PM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> I was making a mistake in the ZK test, the ZK client works properly ( I was 
> not waiting for the watcher to give me a ConnectedReadOnly event ).
> 
> So I retried with zk 3.5.2 and curator 2.11.0, I have more logs (see below)
> 
> 10.11.9.70:2181 <http://10.11.9.70:2181/> is the functioning node,
> 10.11.157.4:2181 <http://10.11.157.4:2181/> and 10.11.157.210:2181 
> <http://10.11.157.210:2181/> don't exist.
> 
> I stopped capturing at 300sec.
> 
> This line:
> 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - 
> blockUntilConnectedOrTimedOut() end. isConnected: false
> is not right. Why did it stop blocking?
> Is there a way to get more info?
> 
> 0 [main] INFO org.apache.curator.framework.imps.CuratorFrameworkImpl  - 
> Starting
> 4 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - Starting
> 4 [main] DEBUG org.apache.curator.ConnectionState  - Starting
> 4 [main] DEBUG org.apache.curator.ConnectionState  - reset
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:zookeeper.version=3.5.2-alpha-1753710, built on 07/21/2016 16:24 
> GMT
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:host.name 
> <http://host.name/>=w-rli09-ben
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.version=1.8.0_11
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.vendor=Oracle Corporation
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.class.path=/home/benji/workspace-trunk/testzkclient/bin:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/commons-cli-1.2.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jackson-core-asl-1.9.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jackson-mapper-asl-1.9.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/javacc.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty-util-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jline-2.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/log4j-1.2.17.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/netty-3.10.5.Final.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/servlet-api-2.5-20081211.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-api-1.7.5.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-log4j12-1.7.5.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/zookeeper-3.5.2-alpha.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/guava-16.0.1.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator-client-2.11.0.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator-framework-2.11.0.jar
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.io.tmpdir=/tmp
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:java.compiler=<NA>
> 12 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:os.name 
> <http://os.name/>=Linux
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:os.arch=amd64
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:os.version=3.8.0-44-generic
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client environment:user.name 
> <http://user.name/>=benji
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:user.home=/home/benji
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:user.dir=/home/benji/workspace-trunk/testzkclient
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:os.memory.free=378MB
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:os.memory.max=5358MB
> 13 [main] INFO org.apache.zookeeper.ZooKeeper  - Client 
> environment:os.memory.total=381MB
> 16 [main] INFO org.apache.zookeeper.ZooKeeper  - Initiating client 
> connection, connectString=10.11.9.70:2181 
> <http://10.11.9.70:2181/>,10.11.157.4:2181 
> <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/> 
> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@1ff8b8f
> 27 [main] DEBUG org.apache.zookeeper.ClientCnxnSocket  - jute.maxbuffer is 
> 4194304
> 39 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - 
> blockUntilConnectedOrTimedOut() start
> 39 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 
> 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. 
> Will not attempt to authenticate using SASL (unknown error)
> 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient  - 
> blockUntilConnectedOrTimedOut() end. isConnected: false
> Successfully established the connection with ZooKeeper
> 15058 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] WARN 
> org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard 
> from server in 15023ms for sessionid 0x0
> 15058 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Client session timed out, have not heard 
> from server in 15023ms for sessionid 0x0, closing socket connection and 
> attempting reconnect
> 15059 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during 
> shutdown input
> java.net.SocketException: Socket is not connected
>     at sun.nio.ch.Net.translateToSocketException(Net.java:143)
>     at sun.nio.ch.Net.translateException(Net.java:177)
>     at sun.nio.ch.Net.translateException(Net.java:183)
>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
> Caused by: java.nio.channels.NotYetConnectedException
>     at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777)
>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402)
>     ... 3 more
> 15062 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during 
> shutdown output
> java.net.SocketException: Socket is not connected
>     at sun.nio.ch.Net.translateToSocketException(Net.java:143)
>     at sun.nio.ch.Net.translateException(Net.java:177)
>     at sun.nio.ch.Net.translateException(Net.java:183)
>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
> Caused by: java.nio.channels.NotYetConnectedException
>     at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794)
>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410)
>     ... 3 more
> 15169 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not 
> attempt to authenticate using SASL (unknown error)
> 15170 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Socket connection established, initiating 
> session, client: /10.11.12.4:44742 <http://10.11.12.4:44742/>, server: 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
> 15173 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG 
> org.apache.zookeeper.ClientCnxn  - Session establishment request sent on 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
> 15180 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = 
> 0x10003fa439a0060, negotiated timeout = 40000 (READ-ONLY mode)
> 15182 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Checking server 
> 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being 
> r/w. Timeout 200
> 15187 [main-EventThread] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> READ_ONLY
> 142434 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Unable to read additional data from server 
> sessionid 0x10003fa439a0060, likely server has closed socket, closing socket 
> connection and attempting reconnect
> 142538 [main-EventThread] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> SUSPENDED
> 142545 [main] DEBUG org.apache.curator.RetryLoop  - Retry-able exception 
> received
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at 
> testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25)
> 143211 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 
> 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. 
> Will not attempt to authenticate using SASL (unknown error)
> 147546 [main] DEBUG org.apache.curator.RetryLoop  - Retrying operation
> 156556 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> WARN org.apache.zookeeper.ClientCnxn  - Client session timed out, have not 
> heard from server in 14019ms for sessionid 0x10003fa439a0060
> 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> INFO org.apache.zookeeper.ClientCnxn  - Client session timed out, have not 
> heard from server in 14019ms for sessionid 0x10003fa439a0060, closing socket 
> connection and attempting reconnect
> 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during 
> shutdown input
> java.net.SocketException: Socket is not connected
>     at sun.nio.ch.Net.translateToSocketException(Net.java:143)
>     at sun.nio.ch.Net.translateException(Net.java:177)
>     at sun.nio.ch.Net.translateException(Net.java:183)
>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
> Caused by: java.nio.channels.NotYetConnectedException
>     at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777)
>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402)
>     ... 3 more
> 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during 
> shutdown output
> java.net.SocketException: Socket is not connected
>     at sun.nio.ch.Net.translateToSocketException(Net.java:143)
>     at sun.nio.ch.Net.translateException(Net.java:177)
>     at sun.nio.ch.Net.translateException(Net.java:183)
>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
> Caused by: java.nio.channels.NotYetConnectedException
>     at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794)
>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410)
>     ... 3 more
> 157544 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState  - 
> Connection timed out for connection string (10.11.9.70:2181 
> <http://10.11.9.70:2181/>,10.11.157.4:2181 
> <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/>) 
> and timeout (15000) / elapsed (15006)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss
>     at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
>     at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>     at 
> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> 158163 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not 
> attempt to authenticate using SASL (unknown error)
> 158164 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Socket connection established, initiating 
> session, client: /10.11.12.4:45070 <http://10.11.12.4:45070/>, server: 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
> 158164 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG 
> org.apache.zookeeper.ClientCnxn  - Session establishment request sent on 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
> 158166 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = 
> 0x10003fa439a0061, negotiated timeout = 40000 (READ-ONLY mode)
> 158166 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Checking server 
> 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being 
> r/w. Timeout 400
> 158167 [main-EventThread] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> READ_ONLY
> 285409 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Unable to read additional data from server 
> sessionid 0x10003fa439a0061, likely server has closed socket, closing socket 
> connection and attempting reconnect
> 285510 [main] DEBUG org.apache.curator.RetryLoop  - Retry-able exception 
> received
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at 
> testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25)
> 285516 [main-EventThread] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> SUSPENDED
> 286312 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> INFO org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 
> 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. 
> Will not attempt to authenticate using SASL (unknown error)
> 290511 [main] DEBUG org.apache.curator.RetryLoop  - Retrying operation
> 295516 [Curator-Framework-0] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> LOST
> 295517 [Curator-Framework-0] ERROR 
> org.apache.curator.framework.imps.CuratorFrameworkImpl  - Background 
> operation retry gave up
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> 295518 [Curator-Framework-0] ERROR 
> org.apache.curator.framework.imps.CuratorFrameworkImpl  - Background retry 
> gave up
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> 299659 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> WARN org.apache.zookeeper.ClientCnxn  - Client session timed out, have not 
> heard from server in 14149ms for sessionid 0x10003fa439a0061
> 299659 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> INFO org.apache.zookeeper.ClientCnxn  - Client session timed out, have not 
> heard from server in 14149ms for sessionid 0x10003fa439a0061, closing socket 
> connection and attempting reconnect
> 299660 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during 
> shutdown input
> java.net.SocketException: Socket is not connected
>     at sun.nio.ch.Net.translateToSocketException(Net.java:143)
>     at sun.nio.ch.Net.translateException(Net.java:177)
>     at sun.nio.ch.Net.translateException(Net.java:183)
>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
> Caused by: java.nio.channels.NotYetConnectedException
>     at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777)
>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402)
>     ... 3 more
> 299660 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] 
> DEBUG org.apache.zookeeper.ClientCnxnSocketNIO  - Ignoring exception during 
> shutdown output
> java.net.SocketException: Socket is not connected
>     at sun.nio.ch.Net.translateToSocketException(Net.java:143)
>     at sun.nio.ch.Net.translateException(Net.java:177)
>     at sun.nio.ch.Net.translateException(Net.java:183)
>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245)
> Caused by: java.nio.channels.NotYetConnectedException
>     at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794)
>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410)
>     ... 3 more
> 300520 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState  - 
> Connection timed out for connection string (10.11.9.70:2181 
> <http://10.11.9.70:2181/>,10.11.157.4:2181 
> <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/>) 
> and timeout (15000) / elapsed (15004)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss
>     at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
>     at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>     at 
> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>     at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> 301077 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not 
> attempt to authenticate using SASL (unknown error)
> 301078 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Socket connection established, initiating 
> session, client: /10.11.12.4:45385 <http://10.11.12.4:45385/>, server: 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
> 301078 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG 
> org.apache.zookeeper.ClientCnxn  - Session establishment request sent on 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>
> 301081 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 
> 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = 
> 0x10003fa439a0062, negotiated timeout = 40000 (READ-ONLY mode)
> 301082 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO 
> org.apache.zookeeper.ClientCnxn  - Checking server 
> 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being 
> r/w. Timeout 800
> 301082 [main-EventThread] INFO 
> org.apache.curator.framework.state.ConnectionStateManager  - State change: 
> READ_ONLY
> 
> 
> On Tue, Jul 12, 2016 at 4:11 PM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> Cameron, you seem to be right, the simple zookeeper client also fails 
> sometimes.
> I will bring it up to the ZK team, thank you!
> 
> On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie <[email protected] 
> <mailto:[email protected]>> wrote:
> I'm at a bit of a loss then. Looking at the stack trace, it's in the ZK code 
> (not the Curator code) that it appears to be blocking. Maybe try and create a 
> test that removes Curator and just uses the ZK client directly and see if you 
> still have the same problem?
> 
> On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> Yes, I mean the ping goes through properly
> 
> On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie <[email protected] 
> <mailto:[email protected]>> wrote:
> Are the 'dead' hosts reachable? I seem to recall having problems with the ZK 
> client blocking for a long time if you had a host configured that was 
> resolvable but not reachable.
> 
> On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> More to the story: 
> 
> While it consistently works on the same machine than the sole running ZK node 
> when I specify only 2 of the 3 servers in the connection string,
> it doesn't work when I specify 3 hosts:
> 
> 
> 2016-07-12 11:23:46 INFO  o.a.z.ZooKeeper:716 - Initiating client connection, 
> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 
> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3
> 2016-07-12 11:23:46 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, 
> have not heard from server in 15018ms for sessionid 0x0, closing socket 
> connection and attempting reconnect
> Successfully established the connection with ZooKeeper
> 2016-07-12 11:24:01 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, 
> have not heard from server in 15000ms for sessionid 0x0, closing socket 
> connection and attempting reconnect
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:980 - Socket connection 
> established, initiating session, client: /10.11.8.236:50423 
> <http://10.11.8.236:50423/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1400 - Session establishment 
> complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, 
> sessionid = 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 11:24:16 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 
> 200
> 2016-07-12 11:24:16 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: READ_ONLY
> 2016-07-12 11:24:37 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 
> 400
> 2016-07-12 11:24:58 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, 
> have not heard from server in 42023ms for sessionid 0x10003c8af64001b, 
> closing socket connection and attempting reconnect
> 2016-07-12 11:24:58 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: SUSPENDED
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:980 - Socket connection 
> established, initiating session, client: /10.11.8.236:50428 
> <http://10.11.8.236:50428/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1400 - Session establishment 
> complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, 
> sessionid = 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 11:25:00 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 
> 800
> 2016-07-12 11:25:00 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: READ_ONLY
> 2016-07-12 11:25:21 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 
> 1600
> 2016-07-12 11:25:42 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, 
> have not heard from server in 42011ms for sessionid 0x10003c8af64001c, 
> closing socket connection and attempting reconnect
> 2016-07-12 11:25:42 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: SUSPENDED
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:980 - Socket connection 
> established, initiating session, client: /10.11.8.236:50435 
> <http://10.11.8.236:50435/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1400 - Session establishment 
> complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, 
> sessionid = 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 
> 3200
> 2016-07-12 11:25:44 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: READ_ONLY
> 2016-07-12 11:26:05 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 
> 6400
> 2016-07-12 11:26:26 INFO  o.a.z.ClientCnxn:1251 - Client session timed out, 
> have not heard from server in 42003ms for sessionid 0x10003c8af64001d, 
> closing socket connection and attempting reconnect
> 2016-07-12 11:26:26 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: SUSPENDED
> Exception in thread "main" 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at test.Test.main(Test.java:25)
> 
> 
> That seems to suggest that adding 1 host to check somehow messes up something 
> in the timeouts.
> I am suspicious of those:
> 2016-07-12 11:25:44 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout 
> 3200
> They seem to be "blocking" the existing connection we have to the healthy ZK 
> server, which goes stale. Is that possible?
> 
> 
> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> I did not use the zkCli tool but the ZK java API directly. And other tools 
> like ZooInspector (contrib). They were run from a remote machine.
> I ran it on the same machine and the one read only server is running 
> properly, I can browse it using those methods.
> 
> Trying to make sense of those client-server interactions is difficult but 
> here is what I got:
> 
> Client:
> 2016-07-12 10:55:01,508 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-12 10:55:01,512 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:60586 
> <http://10.11.12.4:60586/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-12 10:55:01,521 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x10003c8af640007, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-12 10:55:01,523 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 200
> 2016-07-12 10:55:01,528 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-12 10:57:08,800 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x10003c8af640007, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 2016-07-12 10:57:08,903 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> 
> 
> Server:
> 2016-07-12T10:55:56,468 [myid:] - INFO  [ProcessThread(sid:1 
> cport:-1)::ZooKeeperServer@678] - Established session 0x10003c8af640007 with 
> negotiated timeout 45000 for client /10.11.12.4:60586 
> <http://10.11.12.4:60586/>
> (...)
> 2016-07-12T10:56:41,756 [myid:] - INFO  [SessionTracker:ZooKeeperServer@384] 
> - Expiring session 0x10003c8af640007, timeout of 45000ms exceeded
> 
> 
> Note: Don't worry about the time being about 1 min off between the machines.
> 
> What I don't understand is why we are losing the read-only connection we 
> successfully established. ZK ro server says it timed out?
> 
> Now interestingly, when I run it on the same machine it works and completes 
> pretty quickly:
> 
> 
> 2016-07-12 10:43:49 INFO  o.a.z.ZooKeeper:716 - Initiating client connection, 
> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=45000 
> watcher=org.apache.curator.ConnectionState@e4487af
> 2016-07-12 10:43:49 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> Successfully established the connection with ZooKeeper
> 2016-07-12 10:44:10 WARN  o.a.z.ClientCnxn:1257 - Session 0x0 for server 
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection timed out: no further information
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
> ~[?:1.8.0_91]
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
> ~[?:1.8.0_91]
>       at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>  ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) 
> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518]
> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1138 - Opening socket connection 
> to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not 
> attempt to authenticate using SASL (unknown error)
> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:980 - Socket connection 
> established, initiating session, client: /10.11.8.236:50147 
> <http://10.11.8.236:50147/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1400 - Session establishment 
> complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, 
> sessionid = 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode)
> 2016-07-12 10:44:11 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout 
> 200
> 2016-07-12 10:44:11 INFO  o.a.c.f.s.ConnectionStateManager:228 - State 
> change: READ_ONLY
> 2016-07-12 10:44:32 INFO  o.a.z.ClientCnxn:1297 - Checking server 
> QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/> for being r/w. Timeout 
> 400
> Done.
> 
> 
> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie <[email protected] 
> <mailto:[email protected]>> wrote:
> It looks like a connection issue based on the logs. When you confirmed that 
> the connection was ok via the zkCli, was that running on the same host as the 
> read only server? Can you run it from the same host that you're running the 
> Java client from?
> 
> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> Well I was trying to let it run and it eventually fails, with a stack:
> 
> Successfully established the connection with ZooKeeper
> 2016-07-11 16:28:21,943 INFO  org.apache.zookeeper.ClientCnxn:1251 - Client 
> session timed out, have not heard from server in 22519ms for sessionid 0x0, 
> closing socket connection and attempting reconnect
> 2016-07-11 16:28:22,046 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 16:28:22,050 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:54702 
> <http://10.11.12.4:54702/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 16:28:22,058 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f09360080, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 16:28:22,061 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 200
> 2016-07-11 16:28:22,066 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-11 16:30:29,312 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f09360080, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 2016-07-11 16:30:29,416 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> 2016-07-11 16:30:31,031 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 16:30:31,032 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:55156 
> <http://10.11.12.4:55156/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f09360083, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 16:30:31,034 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 400
> 2016-07-11 16:30:31,035 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-11 16:32:38,335 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f09360083, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 2016-07-11 16:32:38,441 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> 2016-07-11 16:32:40,425 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 16:32:40,426 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:55637 
> <http://10.11.12.4:55637/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f09360087, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 16:32:40,428 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 800
> 2016-07-11 16:32:40,428 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-11 16:34:47,615 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f09360087, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 2016-07-11 16:34:47,716 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> 2016-07-11 16:34:49,557 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 16:34:49,558 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:56094 
> <http://10.11.12.4:56094/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f09360089, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 16:34:49,560 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 1600
> 2016-07-11 16:34:49,560 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-11 16:36:56,895 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f09360089, likely 
> server has closed socket, closing socket connection and attempting reconnect
> Exception in thread "main" 2016-07-11 16:36:56,996 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
>     at 
> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
>     at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)
> 
> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <[email protected] 
> <mailto:[email protected]>> wrote:
> So, it appears that the initial connection occurs ok, but then something goes 
> wrong
> 
> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f0936005b, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 
> Then a reconnection occurs ok, and then the cycle starts again.
> 
> Can you see where in the stack the getData() call is hanging?
> 
> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> The blockUntilConnectedOrTimedOut() does return, we see the line:
> "Successfully established the connection with ZooKeeper"
> which happens right after.
> 
> The getData() is where it hangs.
> 
> We see this first:
> 2016-07-11 15:47:08,420 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 
> and then we see it again 2 minutes later:
> 2016-07-11 15:49:17,316 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 
> and then again 2 minutes after that:
> 2016-07-11 15:51:26,470 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 
> Then I stopped it.
> 
> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <[email protected] 
> <mailto:[email protected]>> wrote:
> hey Ben,
> From the logs it looks like the connection is established at the end?
> 
> Are you saying that the blockUntilConnectedOrTimedOut() does not return when 
> this occurs? 
> cheers
> 
> 
> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> ZK is 3.5.1-alpha
> Curator is 2.11.0
> 
> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <[email protected] 
> <mailto:[email protected]>> wrote:
> I have a 3 nodes ZK cluster , readonly mode is enabled.
> 2 nodes are down, so the third one is in read-only (verified by using the ZK 
> API manually).
> 
> I'm using this piece of code:
> 
>               Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>                               
> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>                               
> .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>                               .retryPolicy(new RetryNTimes(3, 
> 5000)).canBeReadOnly(true);
> 
>               CuratorFramework client = curatorClientBuilder.build();
>               client.start();
>               client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>               System.out.println("Successfully established the connection 
> with ZooKeeper");
>               
>               client.getData().forPath("/");
>               System.out.println("Done.");
> 
> When curator pick the host that is UP first, it goes through very quickly. 
> When it picks the host that is down first (QA-E8WIN12), it never goes through.
> 
> Am I missing something here?
> 
> Below is the log:
> 
> 2016-07-11 15:46:45,733 INFO  
> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting
> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:zookeeper.version=3.5.1-alpha-1748895, built on 06/17/2016 22:24 
> GMT
> 2016-07-11 15:46:45,743 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:host.name <http://host.name/>=w-rli09-ben
> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:java.version=1.8.0_11
> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:java.vendor=Oracle Corporation
> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
> 2016-07-11 15:46:45,744 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:java.class.path=(..)
> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:java.io.tmpdir=/tmp
> 2016-07-11 15:46:45,746 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:java.compiler=<NA>
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:os.name <http://os.name/>=Linux
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:os.arch=amd64
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:os.version=3.8.0-44-generic
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:user.name <http://user.name/>=benji
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:user.home=(..)
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:user.dir=(..)
> 2016-07-11 15:46:45,747 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:os.memory.free=385MB
> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:os.memory.max=5358MB
> 2016-07-11 15:46:45,748 INFO  org.apache.zookeeper.ZooKeeper:109 - Client 
> environment:os.memory.total=397MB
> 2016-07-11 15:46:45,750 INFO  org.apache.zookeeper.ZooKeeper:716 - Initiating 
> client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 
> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf
> 2016-07-11 15:46:45,769 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN12/10.11.8.232:2181 
> <http://10.11.8.232:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> Successfully established the connection with ZooKeeper
> 2016-07-11 15:47:08,300 INFO  org.apache.zookeeper.ClientCnxn:1251 - Client 
> session timed out, have not heard from server in 22534ms for sessionid 0x0, 
> closing socket connection and attempting reconnect
> 2016-07-11 15:47:08,403 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 15:47:08,406 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:47004 
> <http://10.11.12.4:47004/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 15:47:08,413 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f0936005b, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 15:47:08,415 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 200
> 2016-07-11 15:47:08,420 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-11 15:49:15,712 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f0936005b, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 2016-07-11 15:49:15,815 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> 2016-07-11 15:49:17,313 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 15:49:17,314 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:47295 
> <http://10.11.12.4:47295/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f0936005f, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 15:49:17,316 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 400
> 2016-07-11 15:49:17,316 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> 2016-07-11 15:51:24,607 INFO  org.apache.zookeeper.ClientCnxn:1253 - Unable 
> to read additional data from server sessionid 0x100024f0936005f, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 2016-07-11 15:51:24,713 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> SUSPENDED
> 2016-07-11 15:51:26,467 INFO  org.apache.zookeeper.ClientCnxn:1138 - Opening 
> socket connection to server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL 
> (unknown error)
> 2016-07-11 15:51:26,468 INFO  org.apache.zookeeper.ClientCnxn:980 - Socket 
> connection established, initiating session, client: /10.11.12.4:47576 
> <http://10.11.12.4:47576/>, server: QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>
> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1400 - Session 
> establishment complete on server QA-E8WIN11/10.11.8.236:2181 
> <http://10.11.8.236:2181/>, sessionid = 0x100024f09360061, negotiated timeout 
> = 45000 (READ-ONLY mode)
> 2016-07-11 15:51:26,470 INFO  org.apache.zookeeper.ClientCnxn:1297 - Checking 
> server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. 
> Timeout 800
> 2016-07-11 15:51:26,470 INFO  
> org.apache.curator.framework.state.ConnectionStateManager:228 - State change: 
> READ_ONLY
> (...)
> 
> Thanks
> Ben
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

Reply via email to