Yes the code is in the 1st email of this thread, it's just a connection to a read only ensemble.
On Thu, Oct 6, 2016 at 2:12 PM, Jordan Zimmerman <[email protected] > wrote: > I don’t see it in the code. The connectionTimeoutMs is total time for > connection to any node. > > Did you post your test somewhere? Maybe I can try to debug it. > > -JZ > > On Oct 6, 2016, at 11:11 PM, Benjamin Jaton <[email protected]> > wrote: > > I was thinking that connectionTimeoutMs is the time it would take for 1 > node to timeout. > But then once you've checked 1 node, waitTimeMs becomes < 0 and the while > loop exits without checking the other nodes. > > But that's just a guess, maybe the problem is somewhere else. > > On Thu, Oct 6, 2016 at 2:04 PM, Jordan Zimmerman < > [email protected]> wrote: > >> > It seems like it should be more than that since connectionTimeoutMs >> will be the time it takes to check only 1 node. >> >> Why do you think it will only check 1 node? I haven’t looked at that code >> in a long time but a quick check suggests to me >> that internalBlockUntilConnectedOrTimedOut() waits for a successful >> connection irrespective of which node it is. >> >> On Jul 30, 2016, at 12:06 AM, Benjamin Jaton <[email protected]> >> wrote: >> >> The internalBlockUntilConnectedOrTimedOut() method will timeout after >> connectionTimeoutMs. >> >> It seems like it should be more than that since connectionTimeoutMs will >> be the time it takes to check only 1 node. >> >> Thoughts? >> >> On Fri, Jul 29, 2016 at 2:55 PM, Benjamin Jaton <[email protected] >> > wrote: >> >>> Adding the end of the log: >>> >>> 570438 [main] DEBUG org.apache.curator.RetryLoop - Retry policy not >>> allowing retry >>> 570438 [main-EventThread] INFO >>> org.apache.curator.framework.state.ConnectionStateManager >>> - State change: SUSPENDED >>> Exception in thread "main" org.apache.zookeeper.KeeperExc >>> eption$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / >>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>> java:99) >>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>> java:51) >>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956) >>> at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call( >>> GetDataBuilderImpl.java:310) >>> at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call( >>> GetDataBuilderImpl.java:299) >>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>> at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInF >>> oreground(GetDataBuilderImpl.java:295) >>> at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath >>> (GetDataBuilderImpl.java:287) >>> at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath >>> (GetDataBuilderImpl.java:34) >>> at testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadO >>> nlyClient.java:25) >>> >>> On Fri, Jul 29, 2016 at 2:49 PM, Benjamin Jaton < >>> [email protected]> wrote: >>> >>>> I was making a mistake in the ZK test, the ZK client works properly ( I >>>> was not waiting for the watcher to give me a ConnectedReadOnly event ). >>>> >>>> So I retried with zk 3.5.2 and curator 2.11.0, I have more logs (see >>>> below) >>>> >>>> 10.11.9.70:2181 is the functioning node, >>>> 10.11.157.4:2181 and 10.11.157.210:2181 don't exist. >>>> >>>> I stopped capturing at 300sec. >>>> >>>> This line: >>>> 15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient - >>>> blockUntilConnectedOrTimedOut() end. isConnected: false >>>> is not right. Why did it stop blocking? >>>> Is there a way to get more info? >>>> >>>> 0 [main] INFO org.apache.curator.framework.imps.CuratorFrameworkImpl >>>> - Starting >>>> 4 [main] DEBUG org.apache.curator.CuratorZookeeperClient - Starting >>>> 4 [main] DEBUG org.apache.curator.ConnectionState - Starting >>>> 4 [main] DEBUG org.apache.curator.ConnectionState - reset >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:zookeeper.version=3.5.2-alpha-1753710, built on 07/21/2016 >>>> 16:24 GMT >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment: >>>> host.name=w-rli09-ben >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.version=1.8.0_11 >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.vendor=Oracle Corporation >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.class.path=/home/benji/workspace-trunk/test >>>> zkclient/bin:/home/benji/workspace-trunk/zookeeper-3.5.2/ >>>> build/lib/commons-cli-1.2.jar:/home/benji/workspace-trunk/ >>>> zookeeper-3.5.2/build/lib/jackson-core-asl-1.9.11.jar:/ >>>> home/benji/workspace-trunk/zookeeper-3.5.2/build/lib/ >>>> jackson-mapper-asl-1.9.11.jar:/home/benji/workspace-trunk/ >>>> zookeeper-3.5.2/build/lib/javacc.jar:/home/benji/workspa >>>> ce-trunk/zookeeper-3.5.2/build/lib/jetty-6.1.26.jar:/ho >>>> me/benji/workspace-trunk/zookeeper-3.5.2/build/lib/jetty- >>>> util-6.1.26.jar:/home/benji/workspace-trunk/zookeeper-3.5. >>>> 2/build/lib/jline-2.11.jar:/home/benji/workspace-trunk/ >>>> zookeeper-3.5.2/build/lib/log4j-1.2.17.jar:/home/benji/ >>>> workspace-trunk/zookeeper-3.5.2/build/lib/netty-3.10.5. >>>> Final.jar:/home/benji/workspace-trunk/zookeeper-3.5. >>>> 2/build/lib/servlet-api-2.5-20081211.jar:/home/benji/ >>>> workspace-trunk/zookeeper-3.5.2/build/lib/slf4j-api-1.7.5. >>>> jar:/home/benji/workspace-trunk/zookeeper-3.5.2/build/ >>>> lib/slf4j-log4j12-1.7.5.jar:/home/benji/workspace-trunk/zoo >>>> keeper-3.5.2/build/zookeeper-3.5.2-alpha.jar:/home/benji/ >>>> workspace-trunk/apache-curator-3.2.0/guava-16.0.1. >>>> jar:/home/benji/workspace-trunk/apache-curator-3.2.0/curator >>>> -client-2.11.0.jar:/home/benji/workspace-trunk/apache- >>>> curator-3.2.0/curator-framework-2.11.0.jar >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.library.path=/usr/java/packages/lib/amd64:/ >>>> usr/lib64:/lib64:/lib:/usr/lib >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.io.tmpdir=/tmp >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:java.compiler=<NA> >>>> 12 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment: >>>> os.name=Linux >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:os.arch=amd64 >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:os.version=3.8.0-44-generic >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment: >>>> user.name=benji >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:user.home=/home/benji >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:user.dir=/home/benji/workspace-trunk/testzkclient >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:os.memory.free=378MB >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:os.memory.max=5358MB >>>> 13 [main] INFO org.apache.zookeeper.ZooKeeper - Client >>>> environment:os.memory.total=381MB >>>> 16 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client >>>> connection, connectString=10.11.9.70:2181,10.11.157.4:2181, >>>> 10.11.157.210:2181 sessionTimeout=45000 watcher=org.apache.curator.Con >>>> nectionState@1ff8b8f >>>> 27 [main] DEBUG org.apache.zookeeper.ClientCnxnSocket - >>>> jute.maxbuffer is 4194304 >>>> 39 [main] DEBUG org.apache.curator.CuratorZookeeperClient - >>>> blockUntilConnectedOrTimedOut() start >>>> 39 [main-SendThread(10.11.157.210:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> 10.11.157.210/10.11.157.210:2181. Will not attempt to authenticate >>>> using SASL (unknown error) >>>> *15043 [main] DEBUG org.apache.curator.CuratorZookeeperClient - >>>> blockUntilConnectedOrTimedOut() end. isConnected: false* >>>> Successfully established the connection with ZooKeeper >>>> 15058 [main-SendThread(10.11.157.210:2181)] WARN >>>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not >>>> heard from server in 15023ms for sessionid 0x0 >>>> 15058 [main-SendThread(10.11.157.210:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not >>>> heard from server in 15023ms for sessionid 0x0, closing socket connection >>>> and attempting reconnect >>>> 15059 [main-SendThread(10.11.157.210:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during >>>> shutdown input >>>> java.net.SocketException: Socket is not connected >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSock >>>> etException(Net.java:143) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:177) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:183) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404) >>>> at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnS >>>> ocketNIO.java:198) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnx >>>> n.java:1325) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn. >>>> java:1245) >>>> Caused by: java.nio.channels.NotYetConnectedException >>>> at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl >>>> .java:777) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402) >>>> ... 3 more >>>> 15062 [main-SendThread(10.11.157.210:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during >>>> shutdown output >>>> java.net.SocketException: Socket is not connected >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSock >>>> etException(Net.java:143) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:177) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:183) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412) >>>> at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnS >>>> ocketNIO.java:205) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnx >>>> n.java:1325) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn. >>>> java:1245) >>>> Caused by: java.nio.channels.NotYetConnectedException >>>> at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImp >>>> l.java:794) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410) >>>> ... 3 more >>>> 15169 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> 10.11.9.70/10.11.9.70:2181. Will not attempt to authenticate using >>>> SASL (unknown error) >>>> 15170 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Socket connection established, >>>> initiating session, client: /10.11.12.4:44742, server: >>>> 10.11.9.70/10.11.9.70:2181 >>>> 15173 [main-SendThread(10.11.9.70:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxn - Session establishment request sent >>>> on 10.11.9.70/10.11.9.70:2181 >>>> 15180 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Session establishment complete on >>>> server 10.11.9.70/10.11.9.70:2181, sessionid = 0x10003fa439a0060, >>>> negotiated timeout = 40000 (READ-ONLY mode) >>>> 15182 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Checking server >>>> 10.11.157.4/10.11.157.4:2181 for being r/w. Timeout 200 >>>> 15187 [main-EventThread] INFO >>>> org.apache.curator.framework.state.ConnectionStateManager >>>> - State change: READ_ONLY >>>> 142434 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >>>> server sessionid 0x10003fa439a0060, likely server has closed socket, >>>> closing socket connection and attempting reconnect >>>> 142538 [main-EventThread] INFO >>>> org.apache.curator.framework.state.ConnectionStateManager >>>> - State change: SUSPENDED >>>> 142545 [main] DEBUG org.apache.curator.RetryLoop - Retry-able >>>> exception received >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for / >>>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>>> java:99) >>>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>>> java:51) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call( >>>> GetDataBuilderImpl.java:310) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call( >>>> GetDataBuilderImpl.java:299) >>>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInF >>>> oreground(GetDataBuilderImpl.java:295) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath >>>> (GetDataBuilderImpl.java:287) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath >>>> (GetDataBuilderImpl.java:34) >>>> at testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadO >>>> nlyClient.java:25) >>>> 143211 [main-SendThread(10.11.157.210:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> 10.11.157.210/10.11.157.210:2181. Will not attempt to authenticate >>>> using SASL (unknown error) >>>> 147546 [main] DEBUG org.apache.curator.RetryLoop - Retrying operation >>>> 156556 [main-SendThread(10.11.157.210:2181)] WARN >>>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not >>>> heard from server in 14019ms for sessionid 0x10003fa439a0060 >>>> 156557 [main-SendThread(10.11.157.210:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not >>>> heard from server in 14019ms for sessionid 0x10003fa439a0060, closing >>>> socket connection and attempting reconnect >>>> 156557 [main-SendThread(10.11.157.210:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during >>>> shutdown input >>>> java.net.SocketException: Socket is not connected >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSock >>>> etException(Net.java:143) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:177) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:183) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404) >>>> at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnS >>>> ocketNIO.java:198) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnx >>>> n.java:1325) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn. >>>> java:1245) >>>> Caused by: java.nio.channels.NotYetConnectedException >>>> at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl >>>> .java:777) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402) >>>> ... 3 more >>>> 156557 [main-SendThread(10.11.157.210:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during >>>> shutdown output >>>> java.net.SocketException: Socket is not connected >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSock >>>> etException(Net.java:143) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:177) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:183) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412) >>>> at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnS >>>> ocketNIO.java:205) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnx >>>> n.java:1325) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn. >>>> java:1245) >>>> Caused by: java.nio.channels.NotYetConnectedException >>>> at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImp >>>> l.java:794) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410) >>>> ... 3 more >>>> 157544 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState >>>> - Connection timed out for connection string (10.11.9.70:2181, >>>> 10.11.157.4:2181,10.11.157.210:2181) and timeout (15000) / elapsed >>>> (15006) >>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = >>>> ConnectionLoss >>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS >>>> tate.java:197) >>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt >>>> ate.java:88) >>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat >>>> orZookeeperClient.java:116) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo >>>> rmBackgroundOperation(CuratorFrameworkImpl.java:835) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg >>>> roundOperationsLoop(CuratorFrameworkImpl.java:809) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces >>>> s$300(CuratorFrameworkImpl.java:64) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal >>>> l(CuratorFrameworkImpl.java:267) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>> Executor.java:1142) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>> lExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 158163 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> 10.11.9.70/10.11.9.70:2181. Will not attempt to authenticate using >>>> SASL (unknown error) >>>> 158164 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Socket connection established, >>>> initiating session, client: /10.11.12.4:45070, server: >>>> 10.11.9.70/10.11.9.70:2181 >>>> 158164 [main-SendThread(10.11.9.70:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxn - Session establishment request sent >>>> on 10.11.9.70/10.11.9.70:2181 >>>> 158166 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Session establishment complete on >>>> server 10.11.9.70/10.11.9.70:2181, sessionid = 0x10003fa439a0061, >>>> negotiated timeout = 40000 (READ-ONLY mode) >>>> 158166 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Checking server >>>> 10.11.157.4/10.11.157.4:2181 for being r/w. Timeout 400 >>>> 158167 [main-EventThread] INFO >>>> org.apache.curator.framework.state.ConnectionStateManager >>>> - State change: READ_ONLY >>>> 285409 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >>>> server sessionid 0x10003fa439a0061, likely server has closed socket, >>>> closing socket connection and attempting reconnect >>>> 285510 [main] DEBUG org.apache.curator.RetryLoop - Retry-able >>>> exception received >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for / >>>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>>> java:99) >>>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>>> java:51) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1956) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call( >>>> GetDataBuilderImpl.java:310) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call( >>>> GetDataBuilderImpl.java:299) >>>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInF >>>> oreground(GetDataBuilderImpl.java:295) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath >>>> (GetDataBuilderImpl.java:287) >>>> at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath >>>> (GetDataBuilderImpl.java:34) >>>> at testzkclient.TestCuratorReadOnlyClient.main(TestCuratorReadO >>>> nlyClient.java:25) >>>> 285516 [main-EventThread] INFO >>>> org.apache.curator.framework.state.ConnectionStateManager >>>> - State change: SUSPENDED >>>> 286312 [main-SendThread(10.11.157.210:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> 10.11.157.210/10.11.157.210:2181. Will not attempt to authenticate >>>> using SASL (unknown error) >>>> 290511 [main] DEBUG org.apache.curator.RetryLoop - Retrying operation >>>> 295516 [Curator-Framework-0] INFO >>>> org.apache.curator.framework.state.ConnectionStateManager >>>> - State change: LOST >>>> 295517 [Curator-Framework-0] ERROR >>>> org.apache.curator.framework.imps.CuratorFrameworkImpl >>>> - Background operation retry gave up >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss >>>> at org.apache.zookeeper.KeeperException.create(KeeperException. >>>> java:99) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check >>>> BackgroundRetry(CuratorFrameworkImpl.java:728) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo >>>> rmBackgroundOperation(CuratorFrameworkImpl.java:857) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg >>>> roundOperationsLoop(CuratorFrameworkImpl.java:809) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces >>>> s$300(CuratorFrameworkImpl.java:64) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal >>>> l(CuratorFrameworkImpl.java:267) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>> Executor.java:1142) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>> lExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 295518 [Curator-Framework-0] ERROR >>>> org.apache.curator.framework.imps.CuratorFrameworkImpl >>>> - Background retry gave up >>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = >>>> ConnectionLoss >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo >>>> rmBackgroundOperation(CuratorFrameworkImpl.java:838) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg >>>> roundOperationsLoop(CuratorFrameworkImpl.java:809) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces >>>> s$300(CuratorFrameworkImpl.java:64) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal >>>> l(CuratorFrameworkImpl.java:267) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>> Executor.java:1142) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>> lExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 299659 [main-SendThread(10.11.157.210:2181)] WARN >>>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not >>>> heard from server in 14149ms for sessionid 0x10003fa439a0061 >>>> 299659 [main-SendThread(10.11.157.210:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not >>>> heard from server in 14149ms for sessionid 0x10003fa439a0061, closing >>>> socket connection and attempting reconnect >>>> 299660 [main-SendThread(10.11.157.210:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during >>>> shutdown input >>>> java.net.SocketException: Socket is not connected >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSock >>>> etException(Net.java:143) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:177) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:183) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:404) >>>> at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnS >>>> ocketNIO.java:198) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnx >>>> n.java:1325) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn. >>>> java:1245) >>>> Caused by: java.nio.channels.NotYetConnectedException >>>> at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl >>>> .java:777) >>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:402) >>>> ... 3 more >>>> 299660 [main-SendThread(10.11.157.210:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxnSocketNIO - Ignoring exception during >>>> shutdown output >>>> java.net.SocketException: Socket is not connected >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateToSock >>>> etException(Net.java:143) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:177) >>>> at sun.nio.ch.Net <http://sun.nio.ch.net/>.translateExcept >>>> ion(Net.java:183) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:412) >>>> at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnS >>>> ocketNIO.java:205) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnx >>>> n.java:1325) >>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn. >>>> java:1245) >>>> Caused by: java.nio.channels.NotYetConnectedException >>>> at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImp >>>> l.java:794) >>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:410) >>>> ... 3 more >>>> 300520 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState >>>> - Connection timed out for connection string (10.11.9.70:2181, >>>> 10.11.157.4:2181,10.11.157.210:2181) and timeout (15000) / elapsed >>>> (15004) >>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = >>>> ConnectionLoss >>>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS >>>> tate.java:197) >>>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt >>>> ate.java:88) >>>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat >>>> orZookeeperClient.java:116) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo >>>> rmBackgroundOperation(CuratorFrameworkImpl.java:835) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg >>>> roundOperationsLoop(CuratorFrameworkImpl.java:809) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces >>>> s$300(CuratorFrameworkImpl.java:64) >>>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal >>>> l(CuratorFrameworkImpl.java:267) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >>>> tureTask.run(ScheduledThreadPoolExecutor.java:293) >>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>> Executor.java:1142) >>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>> lExecutor.java:617) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 301077 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>> 10.11.9.70/10.11.9.70:2181. Will not attempt to authenticate using >>>> SASL (unknown error) >>>> 301078 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Socket connection established, >>>> initiating session, client: /10.11.12.4:45385, server: >>>> 10.11.9.70/10.11.9.70:2181 >>>> 301078 [main-SendThread(10.11.9.70:2181)] DEBUG >>>> org.apache.zookeeper.ClientCnxn - Session establishment request sent >>>> on 10.11.9.70/10.11.9.70:2181 >>>> 301081 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Session establishment complete on >>>> server 10.11.9.70/10.11.9.70:2181, sessionid = 0x10003fa439a0062, >>>> negotiated timeout = 40000 (READ-ONLY mode) >>>> 301082 [main-SendThread(10.11.9.70:2181)] INFO >>>> org.apache.zookeeper.ClientCnxn - Checking server >>>> 10.11.157.4/10.11.157.4:2181 for being r/w. Timeout 800 >>>> 301082 [main-EventThread] INFO >>>> org.apache.curator.framework.state.ConnectionStateManager >>>> - State change: READ_ONLY >>>> >>>> >>>> On Tue, Jul 12, 2016 at 4:11 PM, Benjamin Jaton < >>>> [email protected]> wrote: >>>> >>>>> Cameron, you seem to be right, the simple zookeeper client also fails >>>>> sometimes. >>>>> I will bring it up to the ZK team, thank you! >>>>> >>>>> On Tue, Jul 12, 2016 at 3:09 PM, Cameron McKenzie < >>>>> [email protected]> wrote: >>>>> >>>>>> I'm at a bit of a loss then. Looking at the stack trace, it's in the >>>>>> ZK code (not the Curator code) that it appears to be blocking. Maybe try >>>>>> and create a test that removes Curator and just uses the ZK client >>>>>> directly >>>>>> and see if you still have the same problem? >>>>>> >>>>>> On Wed, Jul 13, 2016 at 7:54 AM, Benjamin Jaton < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Yes, I mean the ping goes through properly >>>>>>> >>>>>>> On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Are the 'dead' hosts reachable? I seem to recall having problems >>>>>>>> with the ZK client blocking for a long time if you had a host >>>>>>>> configured >>>>>>>> that was resolvable but not reachable. >>>>>>>> >>>>>>>> On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> More to the story: >>>>>>>>> >>>>>>>>> While it consistently works on the same machine than the sole >>>>>>>>> running ZK node when I specify only 2 of the 3 servers in the >>>>>>>>> connection >>>>>>>>> string, >>>>>>>>> it doesn't work when I specify 3 hosts: >>>>>>>>> >>>>>>>>> >>>>>>>>> 2016-07-12 11:23:46 INFO o.a.z.ZooKeeper:716 - Initiating client >>>>>>>>> connection, connectString=QA-E8WIN11:2181, >>>>>>>>> QA-E8WIN12:2181,QA-E8WIN13:2181 sessionTimeout=45000 >>>>>>>>> watcher=org.apache.curator.ConnectionState@c86b9e3 >>>>>>>>> 2016-07-12 11:23:46 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>> connection to server QA-E8WIN13/10.11.8.230:2181. Will not >>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1251 - Client session >>>>>>>>> timed out, have not heard from server in 15018ms for sessionid 0x0, >>>>>>>>> closing >>>>>>>>> socket connection and attempting reconnect >>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>> 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not >>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1251 - Client session >>>>>>>>> timed out, have not heard from server in 15000ms for sessionid 0x0, >>>>>>>>> closing >>>>>>>>> socket connection and attempting reconnect >>>>>>>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not >>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:980 - Socket connection >>>>>>>>> established, initiating session, client: /10.11.8.236:50423, >>>>>>>>> server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1400 - Session >>>>>>>>> establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>>>> sessionid = 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY >>>>>>>>> mode) >>>>>>>>> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 200 >>>>>>>>> 2016-07-12 11:24:16 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>> State change: READ_ONLY >>>>>>>>> 2016-07-12 11:24:37 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >>>>>>>>> 2016-07-12 11:24:58 INFO o.a.z.ClientCnxn:1251 - Client session >>>>>>>>> timed out, have not heard from server in 42023ms for sessionid >>>>>>>>> 0x10003c8af64001b, closing socket connection and attempting reconnect >>>>>>>>> 2016-07-12 11:24:58 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>> State change: SUSPENDED >>>>>>>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not >>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:980 - Socket connection >>>>>>>>> established, initiating session, client: /10.11.8.236:50428, >>>>>>>>> server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1400 - Session >>>>>>>>> establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>>>> sessionid = 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY >>>>>>>>> mode) >>>>>>>>> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 800 >>>>>>>>> 2016-07-12 11:25:00 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>> State change: READ_ONLY >>>>>>>>> 2016-07-12 11:25:21 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 >>>>>>>>> 2016-07-12 11:25:42 INFO o.a.z.ClientCnxn:1251 - Client session >>>>>>>>> timed out, have not heard from server in 42011ms for sessionid >>>>>>>>> 0x10003c8af64001c, closing socket connection and attempting reconnect >>>>>>>>> 2016-07-12 11:25:42 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>> State change: SUSPENDED >>>>>>>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not >>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:980 - Socket connection >>>>>>>>> established, initiating session, client: /10.11.8.236:50435, >>>>>>>>> server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1400 - Session >>>>>>>>> establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>>>> sessionid = 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY >>>>>>>>> mode) >>>>>>>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 >>>>>>>>> 2016-07-12 11:25:44 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>> State change: READ_ONLY >>>>>>>>> 2016-07-12 11:26:05 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 6400 >>>>>>>>> 2016-07-12 11:26:26 INFO o.a.z.ClientCnxn:1251 - Client session >>>>>>>>> timed out, have not heard from server in 42003ms for sessionid >>>>>>>>> 0x10003c8af64001d, closing socket connection and attempting reconnect >>>>>>>>> 2016-07-12 11:26:26 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>> State change: SUSPENDED >>>>>>>>> Exception in thread "main" org.apache.zookeeper.KeeperExc >>>>>>>>> eption$ConnectionLossException: KeeperErrorCode = ConnectionLoss >>>>>>>>> for / >>>>>>>>> at org.apache.zookeeper.KeeperExc >>>>>>>>> eption.create(KeeperException.java:99) >>>>>>>>> at org.apache.zookeeper.KeeperExc >>>>>>>>> eption.create(KeeperException.java:51) >>>>>>>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) >>>>>>>>> at org.apache.curator.framework.i >>>>>>>>> mps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >>>>>>>>> at org.apache.curator.framework.i >>>>>>>>> mps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >>>>>>>>> at org.apache.curator.RetryLoop.c >>>>>>>>> allWithRetry(RetryLoop.java:108) >>>>>>>>> at org.apache.curator.framework.i >>>>>>>>> mps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl. >>>>>>>>> java:295) >>>>>>>>> at org.apache.curator.framework.i >>>>>>>>> mps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >>>>>>>>> at org.apache.curator.framework.i >>>>>>>>> mps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >>>>>>>>> at test.Test.main(Test.java:25) >>>>>>>>> >>>>>>>>> >>>>>>>>> That seems to suggest that adding 1 host to check somehow messes >>>>>>>>> up something in the timeouts. >>>>>>>>> I am suspicious of those: >>>>>>>>> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 >>>>>>>>> They seem to be "blocking" the existing connection we have to the >>>>>>>>> healthy ZK server, which goes stale. Is that possible? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I did not use the zkCli tool but the ZK java API directly. And >>>>>>>>>> other tools like ZooInspector (contrib). They were run from a remote >>>>>>>>>> machine. >>>>>>>>>> I ran it on the same machine and the one read only server is >>>>>>>>>> running properly, I can browse it using those methods. >>>>>>>>>> >>>>>>>>>> Trying to make sense of those client-server interactions is >>>>>>>>>> difficult but here is what I got: >>>>>>>>>> >>>>>>>>>> Client: >>>>>>>>>> 2016-07-12 10:55:01,508 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>> - *Opening socket connection to server QA-E8WIN11*/ >>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using SASL >>>>>>>>>> (unknown error) >>>>>>>>>> 2016-07-12 10:55:01,512 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>> - *Socket connection established*, initiating session, client: / >>>>>>>>>> 10.11.12.4:60586, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>> 2016-07-12 10:55:01,521 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>> - *Session establishment complete on server QA-E8WIN11*/ >>>>>>>>>> 10.11.8.236:2181, *sessionid = 0x10003c8af640007*, negotiated >>>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>>> 2016-07-12 10:55:01,523 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>>> Timeout 200 >>>>>>>>>> 2016-07-12 10:55:01,528 INFO org.apache.curator.framework. >>>>>>>>>> state.ConnectionStateManager:228 - *State change: READ_ONLY* >>>>>>>>>> 2016-07-12 10:57:08,800 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>> - *Unable to read additional data from server sessionid >>>>>>>>>> 0x10003c8af640007, likely server has closed socket, closing socket >>>>>>>>>> connection and attempting reconnect* >>>>>>>>>> 2016-07-12 10:57:08,903 INFO org.apache.curator.framework. >>>>>>>>>> state.ConnectionStateManager:228 - *State change: SUSPENDED* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Server: >>>>>>>>>> 2016-07-12T10:55:56,468 [myid:] - INFO [ProcessThread(sid:1 >>>>>>>>>> cport:-1)::ZooKeeperServer@678] - *Established session >>>>>>>>>> 0x10003c8af640007 with negotiated timeout 45000* for client / >>>>>>>>>> 10.11.12.4:60586 >>>>>>>>>> (...) >>>>>>>>>> 2016-07-12T10:56:41,756 [myid:] - INFO >>>>>>>>>> [SessionTracker:ZooKeeperServer@384] - *Expiring session >>>>>>>>>> 0x10003c8af640007, timeout of 45000ms exceeded* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Note: Don't worry about the time being about 1 min off between >>>>>>>>>> the machines. >>>>>>>>>> >>>>>>>>>> What I don't understand is why we are losing the read-only >>>>>>>>>> connection we successfully established. ZK ro server says it timed >>>>>>>>>> out? >>>>>>>>>> >>>>>>>>>> Now interestingly, when I run it on the same machine it works and >>>>>>>>>> completes pretty quickly: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2016-07-12 10:43:49 INFO o.a.z.ZooKeeper:716 - Initiating client >>>>>>>>>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>>>>>>> sessionTimeout=45000 watcher=org.apache.curator.Con >>>>>>>>>> nectionState@e4487af >>>>>>>>>> 2016-07-12 10:43:49 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not >>>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>>> 2016-07-12 10:44:10 WARN o.a.z.ClientCnxn:1257 - Session 0x0 for >>>>>>>>>> server null, unexpected error, closing socket connection and >>>>>>>>>> attempting >>>>>>>>>> reconnect >>>>>>>>>> java.net.ConnectException: Connection timed out: no further >>>>>>>>>> information >>>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>>>> ~[?:1.8.0_91] >>>>>>>>>> at >>>>>>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>>>>>>>>> ~[?:1.8.0_91] >>>>>>>>>> at >>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>>>>>>>> ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >>>>>>>>>> at >>>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) >>>>>>>>>> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >>>>>>>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1138 - Opening socket >>>>>>>>>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not >>>>>>>>>> attempt to authenticate using SASL (unknown error) >>>>>>>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:980 - Socket >>>>>>>>>> connection established, initiating session, client: / >>>>>>>>>> 10.11.8.236:50147, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1400 - Session >>>>>>>>>> establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>>>>>> sessionid = 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY >>>>>>>>>> mode) >>>>>>>>>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>>>>>>>> 2016-07-12 10:44:11 INFO o.a.c.f.s.ConnectionStateManager:228 - >>>>>>>>>> State change: READ_ONLY >>>>>>>>>> 2016-07-12 10:44:32 INFO o.a.z.ClientCnxn:1297 - Checking server >>>>>>>>>> QA-E8WIN11/10.11.8.236:2181 for being r/w. Timeout 400 >>>>>>>>>> Done. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> It looks like a connection issue based on the logs. When you >>>>>>>>>>> confirmed that the connection was ok via the zkCli, was that >>>>>>>>>>> running on the >>>>>>>>>>> same host as the read only server? Can you run it from the same >>>>>>>>>>> host that >>>>>>>>>>> you're running the Java client from? >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Well I was trying to let it run and it eventually fails, with a >>>>>>>>>>>> stack: >>>>>>>>>>>> >>>>>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>>>>> 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 >>>>>>>>>>>> - Client session timed out, have not heard from server in 22519ms >>>>>>>>>>>> for >>>>>>>>>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>>>>>>>>> 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using SASL >>>>>>>>>>>> (unknown error) >>>>>>>>>>>> 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>> 10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360080, negotiated >>>>>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>>>>> Timeout 200 >>>>>>>>>>>> 2016-07-11 16:28:22,066 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>> 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>> 0x100024f09360080, >>>>>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>>>>> attempting >>>>>>>>>>>> reconnect >>>>>>>>>>>> 2016-07-11 16:30:29,416 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: SUSPENDED >>>>>>>>>>>> 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using SASL >>>>>>>>>>>> (unknown error) >>>>>>>>>>>> 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>> 10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360083, negotiated >>>>>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>>>>> Timeout 400 >>>>>>>>>>>> 2016-07-11 16:30:31,035 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>> 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>> 0x100024f09360083, >>>>>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>>>>> attempting >>>>>>>>>>>> reconnect >>>>>>>>>>>> 2016-07-11 16:32:38,441 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: SUSPENDED >>>>>>>>>>>> 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using SASL >>>>>>>>>>>> (unknown error) >>>>>>>>>>>> 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>> 10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360087, negotiated >>>>>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>>>>> Timeout 800 >>>>>>>>>>>> 2016-07-11 16:32:40,428 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>> 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>> 0x100024f09360087, >>>>>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>>>>> attempting >>>>>>>>>>>> reconnect >>>>>>>>>>>> 2016-07-11 16:34:47,716 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: SUSPENDED >>>>>>>>>>>> 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using SASL >>>>>>>>>>>> (unknown error) >>>>>>>>>>>> 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>> 10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360089, negotiated >>>>>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>>>>> Timeout 1600 >>>>>>>>>>>> 2016-07-11 16:34:49,560 INFO org.apache.curator.framework.s >>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>> 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>> 0x100024f09360089, >>>>>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>>>>> attempting >>>>>>>>>>>> reconnect >>>>>>>>>>>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO >>>>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 >>>>>>>>>>>> - State change: SUSPENDED >>>>>>>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>>>> KeeperErrorCode = ConnectionLoss for / >>>>>>>>>>>> at org.apache.zookeeper.KeeperExc >>>>>>>>>>>> eption.create(KeeperException.java:99) >>>>>>>>>>>> at org.apache.zookeeper.KeeperExc >>>>>>>>>>>> eption.create(KeeperException.java:51) >>>>>>>>>>>> at org.apache.zookeeper.ZooKeeper >>>>>>>>>>>> .getData(ZooKeeper.java:1753) >>>>>>>>>>>> at org.apache.curator.framework.i >>>>>>>>>>>> mps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >>>>>>>>>>>> at org.apache.curator.framework.i >>>>>>>>>>>> mps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >>>>>>>>>>>> at org.apache.curator.RetryLoop.c >>>>>>>>>>>> allWithRetry(RetryLoop.java:108) >>>>>>>>>>>> at org.apache.curator.framework.i >>>>>>>>>>>> mps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl. >>>>>>>>>>>> java:295) >>>>>>>>>>>> at org.apache.curator.framework.i >>>>>>>>>>>> mps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >>>>>>>>>>>> at org.apache.curator.framework.i >>>>>>>>>>>> mps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >>>>>>>>>>>> at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> So, it appears that the initial connection occurs ok, but then >>>>>>>>>>>>> something goes wrong >>>>>>>>>>>>> >>>>>>>>>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>>> 0x100024f0936005b, >>>>>>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>>>>>> attempting >>>>>>>>>>>>> reconnect >>>>>>>>>>>>> >>>>>>>>>>>>> Then a reconnection occurs ok, and then the cycle starts again. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you see where in the stack the getData() call is hanging? >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The blockUntilConnectedOrTimedOut() does return, we see the >>>>>>>>>>>>>> line: >>>>>>>>>>>>>> "Successfully established the connection with ZooKeeper" >>>>>>>>>>>>>> which happens right after. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The getData() is where it hangs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We see this first: >>>>>>>>>>>>>> 2016-07-11 15:47:08,420 INFO org.apache.curator.framework.s >>>>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>>>> >>>>>>>>>>>>>> and then we see it again 2 minutes later: >>>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO org.apache.curator.framework.s >>>>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>>>> >>>>>>>>>>>>>> and then again 2 minutes after that: >>>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO org.apache.curator.framework.s >>>>>>>>>>>>>> tate.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then I stopped it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> hey Ben, >>>>>>>>>>>>>>> From the logs it looks like the connection is established at >>>>>>>>>>>>>>> the end? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Are you saying that the blockUntilConnectedOrTimedOut() >>>>>>>>>>>>>>> does not return when this occurs? >>>>>>>>>>>>>>> cheers >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ZK is 3.5.1-alpha >>>>>>>>>>>>>>>> Curator is 2.11.0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>>>>>>>>>>>>>> 2 nodes are down, so the third one is in read-only >>>>>>>>>>>>>>>>> (verified by using the ZK API manually). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm using this piece of code: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Builder curatorClientBuilder = >>>>>>>>>>>>>>>>> CuratorFrameworkFactory.builder() >>>>>>>>>>>>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>>>>>>>>>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>>>>>>>>>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>>>>>>>>>>>>>> client.start(); >>>>>>>>>>>>>>>>> client.getZookeeperClient().bl >>>>>>>>>>>>>>>>> ockUntilConnectedOrTimedOut(); >>>>>>>>>>>>>>>>> System.out.println("Successfully established the >>>>>>>>>>>>>>>>> connection with ZooKeeper"); >>>>>>>>>>>>>>>>> client.getData().forPath("/"); >>>>>>>>>>>>>>>>> System.out.println("Done."); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When curator pick the host that is UP first, it goes >>>>>>>>>>>>>>>>> through very quickly. When it picks the host that is down >>>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>>> (QA-E8WIN12), it never goes through. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Am I missing something here? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Below is the log: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,733 INFO >>>>>>>>>>>>>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 >>>>>>>>>>>>>>>>> - Starting >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,743 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:zookeeper.version=3.5.1-alpha-1748895, >>>>>>>>>>>>>>>>> built on 06/17/2016 22:24 GMT >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,743 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:host.name=w-rli09-ben >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:java.version=1.8.0_11 >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:java.vendor=Oracle Corporation >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:java.home=/usr/loc >>>>>>>>>>>>>>>>> al/apps/jdk1.8.0_11/jre >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,744 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:java.class.path=(..) >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,746 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:java.io.tmpdir=/tmp >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,746 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:java.compiler=<NA> >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:os.name=Linux >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:os.arch=amd64 >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:os.version=3.8.0-44-generic >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:user.name=benji >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:user.home=(..) >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:user.dir=(..) >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,747 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:os.memory.free=385MB >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,748 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:os.memory.max=5358MB >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,748 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>>>>>>>>> - Client environment:os.memory.total=397MB >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,750 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ZooKeeper:716 >>>>>>>>>>>>>>>>> - Initiating client connection, >>>>>>>>>>>>>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>>>>>>>>>>>>>> sessionTimeout=45000 watcher=org.apache.curator.Con >>>>>>>>>>>>>>>>> nectionState@3e96bacf >>>>>>>>>>>>>>>>> 2016-07-11 15:46:45,769 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>>>>>>> - Opening socket connection to server QA-E8WIN12/ >>>>>>>>>>>>>>>>> 10.11.8.232:2181. Will not attempt to authenticate using >>>>>>>>>>>>>>>>> SASL (unknown error) >>>>>>>>>>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>>>>>>>>>> 2016-07-11 15:47:08,300 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1251 >>>>>>>>>>>>>>>>> - Client session timed out, have not heard from server in >>>>>>>>>>>>>>>>> 22534ms for >>>>>>>>>>>>>>>>> sessionid 0x0, closing socket connection and attempting >>>>>>>>>>>>>>>>> reconnect >>>>>>>>>>>>>>>>> 2016-07-11 15:47:08,403 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using >>>>>>>>>>>>>>>>> SASL (unknown error) >>>>>>>>>>>>>>>>> 2016-07-11 15:47:08,406 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>>>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>>>>>>> 2016-07-11 15:47:08,413 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f0936005b, >>>>>>>>>>>>>>>>> negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>>>>>>> 2016-07-11 15:47:08,415 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being >>>>>>>>>>>>>>>>> r/w. Timeout 200 >>>>>>>>>>>>>>>>> 2016-07-11 15:47:08,420 INFO org.apache.curator.framework. >>>>>>>>>>>>>>>>> state.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>>>>>>> 2016-07-11 15:49:15,712 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>>>>>>> 0x100024f0936005b, >>>>>>>>>>>>>>>>> likely server has closed socket, closing socket connection >>>>>>>>>>>>>>>>> and attempting >>>>>>>>>>>>>>>>> reconnect >>>>>>>>>>>>>>>>> 2016-07-11 15:49:15,815 INFO org.apache.curator.framework. >>>>>>>>>>>>>>>>> state.ConnectionStateManager:228 - State change: SUSPENDED >>>>>>>>>>>>>>>>> 2016-07-11 15:49:17,313 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using >>>>>>>>>>>>>>>>> SASL (unknown error) >>>>>>>>>>>>>>>>> 2016-07-11 15:49:17,314 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>>>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f0936005f, >>>>>>>>>>>>>>>>> negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being >>>>>>>>>>>>>>>>> r/w. Timeout 400 >>>>>>>>>>>>>>>>> 2016-07-11 15:49:17,316 INFO org.apache.curator.framework. >>>>>>>>>>>>>>>>> state.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>>>>>>> 2016-07-11 15:51:24,607 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>>>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>>>>>>>>>> 0x100024f0936005f, >>>>>>>>>>>>>>>>> likely server has closed socket, closing socket connection >>>>>>>>>>>>>>>>> and attempting >>>>>>>>>>>>>>>>> reconnect >>>>>>>>>>>>>>>>> 2016-07-11 15:51:24,713 INFO org.apache.curator.framework. >>>>>>>>>>>>>>>>> state.ConnectionStateManager:228 - State change: SUSPENDED >>>>>>>>>>>>>>>>> 2016-07-11 15:51:26,467 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>>>>>>>>>> - Opening socket connection to server QA-E8WIN11/ >>>>>>>>>>>>>>>>> 10.11.8.236:2181. Will not attempt to authenticate using >>>>>>>>>>>>>>>>> SASL (unknown error) >>>>>>>>>>>>>>>>> 2016-07-11 15:51:26,468 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>>>>>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>>>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>>>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360061, >>>>>>>>>>>>>>>>> negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>>>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being >>>>>>>>>>>>>>>>> r/w. Timeout 800 >>>>>>>>>>>>>>>>> 2016-07-11 15:51:26,470 INFO org.apache.curator.framework. >>>>>>>>>>>>>>>>> state.ConnectionStateManager:228 - State change: READ_ONLY >>>>>>>>>>>>>>>>> (...) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>> Ben >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> > >
