> It seems like it should be more than that since connectionTimeoutMs will be > the time it takes to check only 1 node.
Why do you think it will only check 1 node? I haven’t looked at that code in a long time but a quick check suggests to me that internalBlockUntilConnectedOrTimedOut() waits for a successful connection irrespective of which node it is. > On Jul 30, 2016, at 12:06 AM, Benjamin Jaton <[email protected]> wrote: > > The internalBlockUntilConnectedOrTimedOut() method will timeout after > connectionTimeoutMs. > > It seems like it should be more than that since connectionTimeoutMs will be > the time it takes to check only 1 node. > > Thoughts? > > On Fri, Jul 29, 2016 at 2:55 PM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > Adding the end of the log: > > 570438 [main] DEBUG org.apache.curator.RetryLoop - Retry policy not allowing > retry > 570438 [main-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > SUSPENDED > Exception in thread "main" > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at > testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25) > > On Fri, Jul 29, 2016 at 2:49 PM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > I was making a mistake in the ZK test, the ZK client works properly ( I was > not waiting for the watcher to give me a ConnectedReadOnly event ). > > So I retried with zk 3.5.2 and curator 2.11.0, I have more logs (see below) > > 10.11.9.70:2181 <http://10.11.9.70:2181/> is the functioning node, > 10.11.157.4:2181 <http://10.11.157.4:2181/> and 10.11.157.210:2181 > <http://10.11.157.210:2181/> don't exist. > > I stopped capturing at 300sec. > > This line: > 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient - > blockUntilConnectedOrTimedOut() end. isConnected: false > is not right. Why did it stop blocking? > Is there a way to get more info? > > 0 [main] INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - > Starting > 4 [main] DEBUG org.apache.curator.CuratorZookeeperClient - Starting > 4 [main] DEBUG org.apache.curator.ConnectionState - Starting > 4 [main] DEBUG org.apache.curator.ConnectionState - reset > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:zookeeper.version=3.5.2-alpha-1753710, built on 07/21/2016 16:24 > GMT > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name > <http://host.name/>=w-rli09-ben > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.version=1.8.0_11 > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.vendor=Oracle Corporation > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.home=/usr/local/apps/jdk1.8.0_11/jre > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.class.path=/home/benji/workspace-trunk/testzkclient/bin:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/commons-cli-1.2.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jackson-core-asl-1.9.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jackson-mapper-asl-1.9.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/javacc.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty-util-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jline-2.11.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/log4j-1.2.17.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/netty-3.10.5.Final.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/servlet-api-2.5-20081211.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-api-1.7.5.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-log4j12-1.7.5.jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/zookeeper-3.5.2-alpha.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/guava-16.0.1.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator-client-2.11.0.jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator-framework-2.11.0.jar > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.io.tmpdir=/tmp > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:java.compiler=<NA> > 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name > <http://os.name/>=Linux > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:os.arch=amd64 > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:os.version=3.8.0-44-generic > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name > <http://user.name/>=benji > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:user.home=/home/benji > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:user.dir=/home/benji/workspace-trunk/testzkclient > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:os.memory.free=378MB > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:os.memory.max=5358MB > 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client > environment:os.memory.total=381MB > 16 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client > connection, connectString=10.11.9.70:2181 > <http://10.11.9.70:2181/>,10.11.157.4:2181 > <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/> > sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@1ff8b8f > 27 [main] DEBUG org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer is > 4194304 > 39 [main] DEBUG org.apache.curator.CuratorZookeeperClient - > blockUntilConnectedOrTimedOut() start > 39 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. > Will not attempt to authenticate using SASL (unknown error) > 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient - > blockUntilConnectedOrTimedOut() end. isConnected: false > Successfully established the connection with ZooKeeper > 15058 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 15023ms for sessionid 0x0 > 15058 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 15023ms for sessionid 0x0, closing socket connection and > attempting reconnect > 15059 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > DEBUG org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during > shutdown input > java.net.SocketException: Socket is not connected > at sun.nio.ch.Net.translateToSocketException(Net.java:143) > at sun.nio.ch.Net.translateException(Net.java:177) > at sun.nio.ch.Net.translateException(Net.java:183) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404) > at > org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245) > Caused by: java.nio.channels.NotYetConnectedException > at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402) > ... 3 more > 15062 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > DEBUG org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during > shutdown output > java.net.SocketException: Socket is not connected > at sun.nio.ch.Net.translateToSocketException(Net.java:143) > at sun.nio.ch.Net.translateException(Net.java:177) > at sun.nio.ch.Net.translateException(Net.java:183) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412) > at > org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245) > Caused by: java.nio.channels.NotYetConnectedException > at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410) > ... 3 more > 15169 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not > attempt to authenticate using SASL (unknown error) > 15170 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established, initiating > session, client: /10.11.12.4:44742 <http://10.11.12.4:44742/>, server: > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181> > 15173 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG > org.apache.zookeeper.ClientCnxn - Session establishment request sent on > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181> > 15180 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = > 0x10003fa439a0060, negotiated timeout = 40000 (READ-ONLY mode) > 15182 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Checking server > 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being > r/w. Timeout 200 > 15187 [main-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > READ_ONLY > 142434 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Unable to read additional data from server > sessionid 0x10003fa439a0060, likely server has closed socket, closing socket > connection and attempting reconnect > 142538 [main-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > SUSPENDED > 142545 [main] DEBUG org.apache.curator.RetryLoop - Retry-able exception > received > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at > testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25) > 143211 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. > Will not attempt to authenticate using SASL (unknown error) > 147546 [main] DEBUG org.apache.curator.RetryLoop - Retrying operation > 156556 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not > heard from server in 14019ms for sessionid 0x10003fa439a0060 > 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not > heard from server in 14019ms for sessionid 0x10003fa439a0060, closing socket > connection and attempting reconnect > 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > DEBUG org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during > shutdown input > java.net.SocketException: Socket is not connected > at sun.nio.ch.Net.translateToSocketException(Net.java:143) > at sun.nio.ch.Net.translateException(Net.java:177) > at sun.nio.ch.Net.translateException(Net.java:183) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404) > at > org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245) > Caused by: java.nio.channels.NotYetConnectedException > at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402) > ... 3 more > 156557 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > DEBUG org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during > shutdown output > java.net.SocketException: Socket is not connected > at sun.nio.ch.Net.translateToSocketException(Net.java:143) > at sun.nio.ch.Net.translateException(Net.java:177) > at sun.nio.ch.Net.translateException(Net.java:183) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412) > at > org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245) > Caused by: java.nio.channels.NotYetConnectedException > at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410) > ... 3 more > 157544 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - > Connection timed out for connection string (10.11.9.70:2181 > <http://10.11.9.70:2181/>,10.11.157.4:2181 > <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/>) > and timeout (15000) / elapsed (15006) > org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = > ConnectionLoss > at > org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) > at > org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at > org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 158163 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not > attempt to authenticate using SASL (unknown error) > 158164 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established, initiating > session, client: /10.11.12.4:45070 <http://10.11.12.4:45070/>, server: > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181> > 158164 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG > org.apache.zookeeper.ClientCnxn - Session establishment request sent on > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181> > 158166 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = > 0x10003fa439a0061, negotiated timeout = 40000 (READ-ONLY mode) > 158166 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Checking server > 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being > r/w. Timeout 400 > 158167 [main-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > READ_ONLY > 285409 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Unable to read additional data from server > sessionid 0x10003fa439a0061, likely server has closed socket, closing socket > connection and attempting reconnect > 285510 [main] DEBUG org.apache.curator.RetryLoop - Retry-able exception > received > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at > testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadOnlyClient.java:25) > 285516 [main-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > SUSPENDED > 286312 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 10.11.157.210/10.11.157.210:2181 <http://10.11.157.210/10.11.157.210:2181>. > Will not attempt to authenticate using SASL (unknown error) > 290511 [main] DEBUG org.apache.curator.RetryLoop - Retrying operation > 295516 [Curator-Framework-0] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > LOST > 295517 [Curator-Framework-0] ERROR > org.apache.curator.framework.imps.CuratorFrameworkImpl - Background > operation retry gave up > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 295518 [Curator-Framework-0] ERROR > org.apache.curator.framework.imps.CuratorFrameworkImpl - Background retry > gave up > org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = > ConnectionLoss > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 299659 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not > heard from server in 14149ms for sessionid 0x10003fa439a0061 > 299659 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not > heard from server in 14149ms for sessionid 0x10003fa439a0061, closing socket > connection and attempting reconnect > 299660 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > DEBUG org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during > shutdown input > java.net.SocketException: Socket is not connected > at sun.nio.ch.Net.translateToSocketException(Net.java:143) > at sun.nio.ch.Net.translateException(Net.java:177) > at sun.nio.ch.Net.translateException(Net.java:183) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404) > at > org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245) > Caused by: java.nio.channels.NotYetConnectedException > at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:777) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402) > ... 3 more > 299660 [main-SendThread(10.11.157.210:2181 <http://10.11.157.210:2181/>)] > DEBUG org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during > shutdown output > java.net.SocketException: Socket is not connected > at sun.nio.ch.Net.translateToSocketException(Net.java:143) > at sun.nio.ch.Net.translateException(Net.java:177) > at sun.nio.ch.Net.translateException(Net.java:183) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412) > at > org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:205) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1325) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1245) > Caused by: java.nio.channels.NotYetConnectedException > at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:794) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410) > ... 3 more > 300520 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - > Connection timed out for connection string (10.11.9.70:2181 > <http://10.11.9.70:2181/>,10.11.157.4:2181 > <http://10.11.157.4:2181/>,10.11.157.210:2181 <http://10.11.157.210:2181/>) > and timeout (15000) / elapsed (15004) > org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = > ConnectionLoss > at > org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) > at > org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at > org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 301077 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>. Will not > attempt to authenticate using SASL (unknown error) > 301078 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established, initiating > session, client: /10.11.12.4:45385 <http://10.11.12.4:45385/>, server: > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181> > 301078 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] DEBUG > org.apache.zookeeper.ClientCnxn - Session establishment request sent on > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181> > 301081 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > 10.11.9.70/10.11.9.70:2181 <http://10.11.9.70/10.11.9.70:2181>, sessionid = > 0x10003fa439a0062, negotiated timeout = 40000 (READ-ONLY mode) > 301082 [main-SendThread(10.11.9.70:2181 <http://10.11.9.70:2181/>)] INFO > org.apache.zookeeper.ClientCnxn - Checking server > 10.11.157.4/10.11.157.4:2181 <http://10.11.157.4/10.11.157.4:2181> for being > r/w. Timeout 800 > 301082 [main-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > READ_ONLY > > > On Tue, Jul 12, 2016 at 4:11 PM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > Cameron, you seem to be right, the simple zookeeper client also fails > sometimes. > I will bring it up to the ZK team, thank you! > > On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie <[email protected] > <mailto:[email protected]>> wrote: > I'm at a bit of a loss then. Looking at the stack trace, it's in the ZK code > (not the Curator code) that it appears to be blocking. Maybe try and create a > test that removes Curator and just uses the ZK client directly and see if you > still have the same problem? > > On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > Yes, I mean the ping goes through properly > > On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie <[email protected] > <mailto:[email protected]>> wrote: > Are the 'dead' hosts reachable? I seem to recall having problems with the ZK > client blocking for a long time if you had a host configured that was > resolvable but not reachable. > > On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > More to the story: > > While it consistently works on the same machine than the sole running ZK node > when I specify only 2 of the 3 servers in the connection string, > it doesn't work when I specify 3 hosts: > > > 2016-07-12 11:23:46 INFO o.a.z.ZooKeeper:716 - Initiating client connection, > connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 > sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3 > 2016-07-12 11:23:46 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/>. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1251 - Client session timed out, > have not heard from server in 15018ms for sessionid 0x0, closing socket > connection and attempting reconnect > Successfully established the connection with ZooKeeper > 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1251 - Client session timed out, > have not heard from server in 15000ms for sessionid 0x0, closing socket > connection and attempting reconnect > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50423 > <http://10.11.8.236:50423/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, > sessionid = 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout > 200 > 2016-07-12 11:24:16 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 11:24:37 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout > 400 > 2016-07-12 11:24:58 INFO o.a.z.ClientCnxn:1251 - Client session timed out, > have not heard from server in 42023ms for sessionid 0x10003c8af64001b, > closing socket connection and attempting reconnect > 2016-07-12 11:24:58 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50428 > <http://10.11.8.236:50428/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, > sessionid = 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout > 800 > 2016-07-12 11:25:00 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 11:25:21 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout > 1600 > 2016-07-12 11:25:42 INFO o.a.z.ClientCnxn:1251 - Client session timed out, > have not heard from server in 42011ms for sessionid 0x10003c8af64001c, > closing socket connection and attempting reconnect > 2016-07-12 11:25:42 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50435 > <http://10.11.8.236:50435/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, > sessionid = 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout > 3200 > 2016-07-12 11:25:44 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 11:26:05 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout > 6400 > 2016-07-12 11:26:26 INFO o.a.z.ClientCnxn:1251 - Client session timed out, > have not heard from server in 42003ms for sessionid 0x10003c8af64001d, > closing socket connection and attempting reconnect > 2016-07-12 11:26:26 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: SUSPENDED > Exception in thread "main" > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at test.Test.main(Test.java:25) > > > That seems to suggest that adding 1 host to check somehow messes up something > in the timeouts. > I am suspicious of those: > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 <http://10.11.8.230:2181/> for being r/w. Timeout > 3200 > They seem to be "blocking" the existing connection we have to the healthy ZK > server, which goes stale. Is that possible? > > > On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > I did not use the zkCli tool but the ZK java API directly. And other tools > like ZooInspector (contrib). They were run from a remote machine. > I ran it on the same machine and the one read only server is running > properly, I can browse it using those methods. > > Trying to make sense of those client-server interactions is difficult but > here is what I got: > > Client: > 2016-07-12 10:55:01,508 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-12 10:55:01,512 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:60586 > <http://10.11.12.4:60586/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-12 10:55:01,521 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x10003c8af640007, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-12 10:55:01,523 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 200 > 2016-07-12 10:55:01,528 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-12 10:57:08,800 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x10003c8af640007, likely > server has closed socket, closing socket connection and attempting reconnect > 2016-07-12 10:57:08,903 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > > > Server: > 2016-07-12T10:55:56,468 [myid:] - INFO [ProcessThread(sid:1 > cport:-1)::ZooKeeperServer@678] - Established session 0x10003c8af640007 with > negotiated timeout 45000 for client /10.11.12.4:60586 > <http://10.11.12.4:60586/> > (...) > 2016-07-12T10:56:41,756 [myid:] - INFO [SessionTracker:ZooKeeperServer@384] > - Expiring session 0x10003c8af640007, timeout of 45000ms exceeded > > > Note: Don't worry about the time being about 1 min off between the machines. > > What I don't understand is why we are losing the read-only connection we > successfully established. ZK ro server says it timed out? > > Now interestingly, when I run it on the same machine it works and completes > pretty quickly: > > > 2016-07-12 10:43:49 INFO o.a.z.ZooKeeper:716 - Initiating client connection, > connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=45000 > watcher=org.apache.curator.ConnectionState@e4487af > 2016-07-12 10:43:49 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/>. Will not > attempt to authenticate using SASL (unknown error) > Successfully established the connection with ZooKeeper > 2016-07-12 10:44:10 WARN o.a.z.ClientCnxn:1257 - Session 0x0 for server > null, unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connection timed out: no further information > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > ~[?:1.8.0_91] > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > ~[?:1.8.0_91] > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) > ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) > [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] > 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1138 - Opening socket connection > to server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50147 > <http://10.11.8.236:50147/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/>, > sessionid = 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. Timeout > 200 > 2016-07-12 10:44:11 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 10:44:32 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN11/10.11.8.236:2181 <http://10.11.8.236:2181/> for being r/w. Timeout > 400 > Done. > > > On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie <[email protected] > <mailto:[email protected]>> wrote: > It looks like a connection issue based on the logs. When you confirmed that > the connection was ok via the zkCli, was that running on the same host as the > read only server? Can you run it from the same host that you're running the > Java client from? > > On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > Well I was trying to let it run and it eventually fails, with a stack: > > Successfully established the connection with ZooKeeper > 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - Client > session timed out, have not heard from server in 22519ms for sessionid 0x0, > closing socket connection and attempting reconnect > 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:54702 > <http://10.11.12.4:54702/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f09360080, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 200 > 2016-07-11 16:28:22,066 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f09360080, likely > server has closed socket, closing socket connection and attempting reconnect > 2016-07-11 16:30:29,416 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:55156 > <http://10.11.12.4:55156/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f09360083, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 400 > 2016-07-11 16:30:31,035 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f09360083, likely > server has closed socket, closing socket connection and attempting reconnect > 2016-07-11 16:32:38,441 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:55637 > <http://10.11.12.4:55637/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f09360087, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 800 > 2016-07-11 16:32:40,428 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f09360087, likely > server has closed socket, closing socket connection and attempting reconnect > 2016-07-11 16:34:47,716 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:56094 > <http://10.11.12.4:56094/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f09360089, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 1600 > 2016-07-11 16:34:49,560 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f09360089, likely > server has closed socket, closing socket connection and attempting reconnect > Exception in thread "main" 2016-07-11 16:36:56,996 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) > > On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <[email protected] > <mailto:[email protected]>> wrote: > So, it appears that the initial connection occurs ok, but then something goes > wrong > > 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f0936005b, likely > server has closed socket, closing socket connection and attempting reconnect > > Then a reconnection occurs ok, and then the cycle starts again. > > Can you see where in the stack the getData() call is hanging? > > On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > The blockUntilConnectedOrTimedOut() does return, we see the line: > "Successfully established the connection with ZooKeeper" > which happens right after. > > The getData() is where it hangs. > > We see this first: > 2016-07-11 15:47:08,420 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > > and then we see it again 2 minutes later: > 2016-07-11 15:49:17,316 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > > and then again 2 minutes after that: > 2016-07-11 15:51:26,470 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > > Then I stopped it. > > On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <[email protected] > <mailto:[email protected]>> wrote: > hey Ben, > From the logs it looks like the connection is established at the end? > > Are you saying that the blockUntilConnectedOrTimedOut() does not return when > this occurs? > cheers > > > On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > ZK is 3.5.1-alpha > Curator is 2.11.0 > > On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <[email protected] > <mailto:[email protected]>> wrote: > I have a 3 nodes ZK cluster , readonly mode is enabled. > 2 nodes are down, so the third one is in read-only (verified by using the ZK > API manually). > > I'm using this piece of code: > > Builder curatorClientBuilder = CuratorFrameworkFactory.builder() > > .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") > > .sessionTimeoutMs(45000).connectionTimeoutMs(15000) > .retryPolicy(new RetryNTimes(3, > 5000)).canBeReadOnly(true); > > CuratorFramework client = curatorClientBuilder.build(); > client.start(); > client.getZookeeperClient().blockUntilConnectedOrTimedOut(); > System.out.println("Successfully established the connection > with ZooKeeper"); > > client.getData().forPath("/"); > System.out.println("Done."); > > When curator pick the host that is UP first, it goes through very quickly. > When it picks the host that is down first (QA-E8WIN12), it never goes through. > > Am I missing something here? > > Below is the log: > > 2016-07-11 15:46:45,733 INFO > org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting > 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:zookeeper.version=3.5.1-alpha-1748895, built on 06/17/2016 22:24 > GMT > 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:host.name <http://host.name/>=w-rli09-ben > 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:java.version=1.8.0_11 > 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:java.vendor=Oracle Corporation > 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:java.home=/usr/local/apps/jdk1.8.0_11/jre > 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:java.class.path=(..) > 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:java.io.tmpdir=/tmp > 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:java.compiler=<NA> > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:os.name <http://os.name/>=Linux > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:os.arch=amd64 > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:os.version=3.8.0-44-generic > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:user.name <http://user.name/>=benji > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:user.home=(..) > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:user.dir=(..) > 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:os.memory.free=385MB > 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:os.memory.max=5358MB > 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - Client > environment:os.memory.total=397MB > 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 - Initiating > client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 > sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf > 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN12/10.11.8.232:2181 > <http://10.11.8.232:2181/>. Will not attempt to authenticate using SASL > (unknown error) > Successfully established the connection with ZooKeeper > 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 - Client > session timed out, have not heard from server in 22534ms for sessionid 0x0, > closing socket connection and attempting reconnect > 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:47004 > <http://10.11.12.4:47004/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f0936005b, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 200 > 2016-07-11 15:47:08,420 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f0936005b, likely > server has closed socket, closing socket connection and attempting reconnect > 2016-07-11 15:49:15,815 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:47295 > <http://10.11.12.4:47295/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f0936005f, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 400 > 2016-07-11 15:49:17,316 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable > to read additional data from server sessionid 0x100024f0936005f, likely > server has closed socket, closing socket connection and attempting reconnect > 2016-07-11 15:51:24,713 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > SUSPENDED > 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 - Opening > socket connection to server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>. Will not attempt to authenticate using SASL > (unknown error) > 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:47576 > <http://10.11.12.4:47576/>, server: QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/> > 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 - Session > establishment complete on server QA-E8WIN11/10.11.8.236:2181 > <http://10.11.8.236:2181/>, sessionid = 0x100024f09360061, negotiated timeout > = 45000 (READ-ONLY mode) > 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 - Checking > server QA-E8WIN12/10.11.8.232:2181 <http://10.11.8.232:2181/> for being r/w. > Timeout 800 > 2016-07-11 15:51:26,470 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State change: > READ_ONLY > (...) > > Thanks > Ben > > > > > > > > > > > > > > > >
