Are the 'dead' hosts reachable? I seem to recall having problems with the ZK client blocking for a long time if you had a host configured that was resolvable but not reachable.
On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <[email protected]> wrote: > More to the story: > > While it consistently works on the same machine than the sole running ZK > node when I specify only 2 of the 3 servers in the connection string, > it doesn't work when I specify 3 hosts: > > > 2016-07-12 11:23:46 INFO o.a.z.ZooKeeper:716 - Initiating client > connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 > sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3 > 2016-07-12 11:23:46 INFO o.a.z.ClientCnxn:1138 - Opening socket > connection to server QA-E8WIN13/10.11.8.230:2181. Will not attempt to > authenticate using SASL (unknown error) > 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1251 - Client session timed > out, have not heard from server in 15018ms for sessionid 0x0, closing > socket connection and attempting reconnect > Successfully established the connection with ZooKeeper > 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1138 - Opening socket > connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to > authenticate using SASL (unknown error) > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1251 - Client session timed > out, have not heard from server in 15000ms for sessionid 0x0, closing > socket connection and attempting reconnect > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1138 - Opening socket > connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to > authenticate using SASL (unknown error) > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50423, server: > QA-E8WIN11/10.11.8.236:2181 > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = > 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 200 > 2016-07-12 11:24:16 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 11:24:37 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 > 2016-07-12 11:24:58 INFO o.a.z.ClientCnxn:1251 - Client session timed > out, have not heard from server in 42023ms for sessionid 0x10003c8af64001b, > closing socket connection and attempting reconnect > 2016-07-12 11:24:58 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1138 - Opening socket > connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to > authenticate using SASL (unknown error) > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50428, server: > QA-E8WIN11/10.11.8.236:2181 > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = > 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 800 > 2016-07-12 11:25:00 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 11:25:21 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 > 2016-07-12 11:25:42 INFO o.a.z.ClientCnxn:1251 - Client session timed > out, have not heard from server in 42011ms for sessionid 0x10003c8af64001c, > closing socket connection and attempting reconnect > 2016-07-12 11:25:42 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1138 - Opening socket > connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to > authenticate using SASL (unknown error) > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:980 - Socket connection > established, initiating session, client: /10.11.8.236:50435, server: > QA-E8WIN11/10.11.8.236:2181 > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1400 - Session establishment > complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = > 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 > 2016-07-12 11:25:44 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-12 11:26:05 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 6400 > 2016-07-12 11:26:26 INFO o.a.z.ClientCnxn:1251 - Client session timed > out, have not heard from server in 42003ms for sessionid 0x10003c8af64001d, > closing socket connection and attempting reconnect > 2016-07-12 11:26:26 INFO o.a.c.f.s.ConnectionStateManager:228 - State > change: SUSPENDED > Exception in thread "main" > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at test.Test.main(Test.java:25) > > > That seems to suggest that adding 1 host to check somehow messes up > something in the timeouts. > I am suspicious of those: > 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server > QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 > They seem to be "blocking" the existing connection we have to the healthy > ZK server, which goes stale. Is that possible? > > > On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <[email protected]> > wrote: > >> I did not use the zkCli tool but the ZK java API directly. And other >> tools like ZooInspector (contrib). They were run from a remote machine. >> I ran it on the same machine and the one read only server is running >> properly, I can browse it using those methods. >> >> Trying to make sense of those client-server interactions is difficult but >> here is what I got: >> >> Client: >> 2016-07-12 10:55:01,508 INFO org.apache.zookeeper.ClientCnxn:1138 - *Opening >> socket connection to server QA-E8WIN11*/10.11.8.236:2181. Will not >> attempt to authenticate using SASL (unknown error) >> 2016-07-12 10:55:01,512 INFO org.apache.zookeeper.ClientCnxn:980 - *Socket >> connection established*, initiating session, client: /10.11.12.4:60586, >> server: QA-E8WIN11/10.11.8.236:2181 >> 2016-07-12 10:55:01,521 INFO org.apache.zookeeper.ClientCnxn:1400 - *Session >> establishment complete on server QA-E8WIN11*/10.11.8.236:2181, *sessionid >> = 0x10003c8af640007*, negotiated timeout = 45000 (READ-ONLY mode) >> 2016-07-12 10:55:01,523 INFO org.apache.zookeeper.ClientCnxn:1297 - >> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >> 2016-07-12 10:55:01,528 INFO >> org.apache.curator.framework.state.ConnectionStateManager:228 - *State >> change: READ_ONLY* >> 2016-07-12 10:57:08,800 INFO org.apache.zookeeper.ClientCnxn:1253 - *Unable >> to read additional data from server sessionid 0x10003c8af640007, likely >> server has closed socket, closing socket connection and attempting >> reconnect* >> 2016-07-12 10:57:08,903 INFO >> org.apache.curator.framework.state.ConnectionStateManager:228 - *State >> change: SUSPENDED* >> >> >> Server: >> 2016-07-12T10:55:56,468 [myid:] - INFO [ProcessThread(sid:1 >> cport:-1)::ZooKeeperServer@678] - *Established session 0x10003c8af640007 >> with negotiated timeout 45000* for client /10.11.12.4:60586 >> (...) >> 2016-07-12T10:56:41,756 [myid:] - INFO >> [SessionTracker:ZooKeeperServer@384] - *Expiring session >> 0x10003c8af640007, timeout of 45000ms exceeded* >> >> >> Note: Don't worry about the time being about 1 min off between the >> machines. >> >> What I don't understand is why we are losing the read-only connection we >> successfully established. ZK ro server says it timed out? >> >> Now interestingly, when I run it on the same machine it works and >> completes pretty quickly: >> >> >> 2016-07-12 10:43:49 INFO o.a.z.ZooKeeper:716 - Initiating client >> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@e4487af >> 2016-07-12 10:43:49 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to >> authenticate using SASL (unknown error) >> Successfully established the connection with ZooKeeper >> 2016-07-12 10:44:10 WARN o.a.z.ClientCnxn:1257 - Session 0x0 for server >> null, unexpected error, closing socket connection and attempting reconnect >> java.net.ConnectException: Connection timed out: no further information >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_91] >> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >> ~[?:1.8.0_91] >> at >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >> ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) >> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:980 - Socket connection >> established, initiating session, client: /10.11.8.236:50147, server: >> QA-E8WIN11/10.11.8.236:2181 >> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1400 - Session establishment >> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >> 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode) >> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >> 2016-07-12 10:44:11 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: READ_ONLY >> 2016-07-12 10:44:32 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN11/10.11.8.236:2181 for being r/w. Timeout 400 >> Done. >> >> >> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie <[email protected] >> > wrote: >> >>> It looks like a connection issue based on the logs. When you confirmed >>> that the connection was ok via the zkCli, was that running on the same host >>> as the read only server? Can you run it from the same host that you're >>> running the Java client from? >>> >>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton < >>> [email protected]> wrote: >>> >>>> Well I was trying to let it run and it eventually fails, with a stack: >>>> >>>> Successfully established the connection with ZooKeeper >>>> 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>> Client session timed out, have not heard from server in 22519ms for >>>> sessionid 0x0, closing socket connection and attempting reconnect >>>> 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>> 2016-07-11 16:28:22,066 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>> Unable to read additional data from server sessionid 0x100024f09360080, >>>> likely server has closed socket, closing socket connection and attempting >>>> reconnect >>>> 2016-07-11 16:30:29,416 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >>>> 2016-07-11 16:30:31,035 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>> Unable to read additional data from server sessionid 0x100024f09360083, >>>> likely server has closed socket, closing socket connection and attempting >>>> reconnect >>>> 2016-07-11 16:32:38,441 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 >>>> 2016-07-11 16:32:40,428 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>> Unable to read additional data from server sessionid 0x100024f09360087, >>>> likely server has closed socket, closing socket connection and attempting >>>> reconnect >>>> 2016-07-11 16:34:47,716 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 >>>> 2016-07-11 16:34:49,560 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>> Unable to read additional data from server sessionid 0x100024f09360089, >>>> likely server has closed socket, closing socket connection and attempting >>>> reconnect >>>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for / >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >>>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >>>> at >>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >>>> at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) >>>> >>>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie < >>>> [email protected]> wrote: >>>> >>>>> So, it appears that the initial connection occurs ok, but then >>>>> something goes wrong >>>>> >>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f0936005b, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> >>>>> Then a reconnection occurs ok, and then the cycle starts again. >>>>> >>>>> Can you see where in the stack the getData() call is hanging? >>>>> >>>>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton < >>>>> [email protected]> wrote: >>>>> >>>>>> The blockUntilConnectedOrTimedOut() does return, we see the line: >>>>>> "Successfully established the connection with ZooKeeper" >>>>>> which happens right after. >>>>>> >>>>>> The getData() is where it hangs. >>>>>> >>>>>> We see this first: >>>>>> 2016-07-11 15:47:08,420 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> >>>>>> and then we see it again 2 minutes later: >>>>>> 2016-07-11 15:49:17,316 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> >>>>>> and then again 2 minutes after that: >>>>>> 2016-07-11 15:51:26,470 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> >>>>>> Then I stopped it. >>>>>> >>>>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> hey Ben, >>>>>>> From the logs it looks like the connection is established at the end? >>>>>>> >>>>>>> Are you saying that the blockUntilConnectedOrTimedOut() does not >>>>>>> return when this occurs? >>>>>>> cheers >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> ZK is 3.5.1-alpha >>>>>>>> Curator is 2.11.0 >>>>>>>> >>>>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>>>>>> 2 nodes are down, so the third one is in read-only (verified by >>>>>>>>> using the ZK API manually). >>>>>>>>> >>>>>>>>> I'm using this piece of code: >>>>>>>>> >>>>>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>>>>>> >>>>>>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>>>>>> client.start(); >>>>>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>>>>>>> System.out.println("Successfully established the connection with >>>>>>>>> ZooKeeper"); >>>>>>>>> client.getData().forPath("/"); >>>>>>>>> System.out.println("Done."); >>>>>>>>> >>>>>>>>> When curator pick the host that is UP first, it goes through very >>>>>>>>> quickly. When it picks the host that is down first (QA-E8WIN12), it >>>>>>>>> never >>>>>>>>> goes through. >>>>>>>>> >>>>>>>>> Am I missing something here? >>>>>>>>> >>>>>>>>> Below is the log: >>>>>>>>> >>>>>>>>> 2016-07-11 15:46:45,733 INFO >>>>>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting >>>>>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on >>>>>>>>> 06/17/2016 22:24 GMT >>>>>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:host.name=w-rli09-ben >>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:java.version=1.8.0_11 >>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:java.vendor=Oracle Corporation >>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:java.class.path=(..) >>>>>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:java.io.tmpdir=/tmp >>>>>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:java.compiler=<NA> >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:os.name=Linux >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:os.arch=amd64 >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:os.version=3.8.0-44-generic >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:user.name=benji >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:user.home=(..) >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:user.dir=(..) >>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:os.memory.free=385MB >>>>>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:os.memory.max=5358MB >>>>>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>>>>> Client environment:os.memory.total=397MB >>>>>>>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 - >>>>>>>>> Initiating client connection, >>>>>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>>>>>> sessionTimeout=45000 >>>>>>>>> watcher=org.apache.curator.ConnectionState@3e96bacf >>>>>>>>> 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>> - Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. >>>>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>> 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 >>>>>>>>> - Client session timed out, have not heard from server in 22534ms for >>>>>>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>>>>>> 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>> - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>> 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f0936005b, negotiated >>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>> 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>> Timeout 200 >>>>>>>>> 2016-07-11 15:47:08,420 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: READ_ONLY >>>>>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>> 0x100024f0936005b, >>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>> attempting >>>>>>>>> reconnect >>>>>>>>> 2016-07-11 15:49:15,815 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: SUSPENDED >>>>>>>>> 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>> - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f0936005f, negotiated >>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>> Timeout 400 >>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: READ_ONLY >>>>>>>>> 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 >>>>>>>>> - Unable to read additional data from server sessionid >>>>>>>>> 0x100024f0936005f, >>>>>>>>> likely server has closed socket, closing socket connection and >>>>>>>>> attempting >>>>>>>>> reconnect >>>>>>>>> 2016-07-11 15:51:24,713 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: SUSPENDED >>>>>>>>> 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 >>>>>>>>> - Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>>>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 >>>>>>>>> - Session establishment complete on server QA-E8WIN11/ >>>>>>>>> 10.11.8.236:2181, sessionid = 0x100024f09360061, negotiated >>>>>>>>> timeout = 45000 (READ-ONLY mode) >>>>>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 >>>>>>>>> - Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. >>>>>>>>> Timeout 800 >>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>>>> change: READ_ONLY >>>>>>>>> (...) >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Ben >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
