Yes, I mean the ping goes through properly On Tue, Jul 12, 2016 at 2:50 PM, Cameron McKenzie <[email protected]> wrote:
> Are the 'dead' hosts reachable? I seem to recall having problems with the > ZK client blocking for a long time if you had a host configured that was > resolvable but not reachable. > > On Wed, Jul 13, 2016 at 4:30 AM, Benjamin Jaton <[email protected]> > wrote: > >> More to the story: >> >> While it consistently works on the same machine than the sole running ZK >> node when I specify only 2 of the 3 servers in the connection string, >> it doesn't work when I specify 3 hosts: >> >> >> 2016-07-12 11:23:46 INFO o.a.z.ZooKeeper:716 - Initiating client >> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181,QA-E8WIN13:2181 >> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@c86b9e3 >> 2016-07-12 11:23:46 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN13/10.11.8.230:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1251 - Client session timed >> out, have not heard from server in 15018ms for sessionid 0x0, closing >> socket connection and attempting reconnect >> Successfully established the connection with ZooKeeper >> 2016-07-12 11:24:01 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1251 - Client session timed >> out, have not heard from server in 15000ms for sessionid 0x0, closing >> socket connection and attempting reconnect >> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:980 - Socket connection >> established, initiating session, client: /10.11.8.236:50423, server: >> QA-E8WIN11/10.11.8.236:2181 >> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1400 - Session establishment >> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >> 0x10003c8af64001b, negotiated timeout = 45000 (READ-ONLY mode) >> 2016-07-12 11:24:16 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 200 >> 2016-07-12 11:24:16 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: READ_ONLY >> 2016-07-12 11:24:37 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >> 2016-07-12 11:24:58 INFO o.a.z.ClientCnxn:1251 - Client session timed >> out, have not heard from server in 42023ms for sessionid 0x10003c8af64001b, >> closing socket connection and attempting reconnect >> 2016-07-12 11:24:58 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: SUSPENDED >> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:980 - Socket connection >> established, initiating session, client: /10.11.8.236:50428, server: >> QA-E8WIN11/10.11.8.236:2181 >> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1400 - Session establishment >> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >> 0x10003c8af64001c, negotiated timeout = 45000 (READ-ONLY mode) >> 2016-07-12 11:25:00 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 800 >> 2016-07-12 11:25:00 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: READ_ONLY >> 2016-07-12 11:25:21 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 >> 2016-07-12 11:25:42 INFO o.a.z.ClientCnxn:1251 - Client session timed >> out, have not heard from server in 42011ms for sessionid 0x10003c8af64001c, >> closing socket connection and attempting reconnect >> 2016-07-12 11:25:42 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: SUSPENDED >> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1138 - Opening socket >> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >> authenticate using SASL (unknown error) >> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:980 - Socket connection >> established, initiating session, client: /10.11.8.236:50435, server: >> QA-E8WIN11/10.11.8.236:2181 >> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1400 - Session establishment >> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >> 0x10003c8af64001d, negotiated timeout = 45000 (READ-ONLY mode) >> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 >> 2016-07-12 11:25:44 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: READ_ONLY >> 2016-07-12 11:26:05 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 6400 >> 2016-07-12 11:26:26 INFO o.a.z.ClientCnxn:1251 - Client session timed >> out, have not heard from server in 42003ms for sessionid 0x10003c8af64001d, >> closing socket connection and attempting reconnect >> 2016-07-12 11:26:26 INFO o.a.c.f.s.ConnectionStateManager:228 - State >> change: SUSPENDED >> Exception in thread "main" >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for / >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) >> at >> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >> at >> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >> at >> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) >> at >> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >> at >> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >> at test.Test.main(Test.java:25) >> >> >> That seems to suggest that adding 1 host to check somehow messes up >> something in the timeouts. >> I am suspicious of those: >> 2016-07-12 11:25:44 INFO o.a.z.ClientCnxn:1297 - Checking server >> QA-E8WIN13/10.11.8.230:2181 for being r/w. Timeout 3200 >> They seem to be "blocking" the existing connection we have to the healthy >> ZK server, which goes stale. Is that possible? >> >> >> On Tue, Jul 12, 2016 at 11:17 AM, Benjamin Jaton <[email protected] >> > wrote: >> >>> I did not use the zkCli tool but the ZK java API directly. And other >>> tools like ZooInspector (contrib). They were run from a remote machine. >>> I ran it on the same machine and the one read only server is running >>> properly, I can browse it using those methods. >>> >>> Trying to make sense of those client-server interactions is difficult >>> but here is what I got: >>> >>> Client: >>> 2016-07-12 10:55:01,508 INFO org.apache.zookeeper.ClientCnxn:1138 - >>> *Opening >>> socket connection to server QA-E8WIN11*/10.11.8.236:2181. Will not >>> attempt to authenticate using SASL (unknown error) >>> 2016-07-12 10:55:01,512 INFO org.apache.zookeeper.ClientCnxn:980 - *Socket >>> connection established*, initiating session, client: /10.11.12.4:60586, >>> server: QA-E8WIN11/10.11.8.236:2181 >>> 2016-07-12 10:55:01,521 INFO org.apache.zookeeper.ClientCnxn:1400 - >>> *Session >>> establishment complete on server QA-E8WIN11*/10.11.8.236:2181, *sessionid >>> = 0x10003c8af640007*, negotiated timeout = 45000 (READ-ONLY mode) >>> 2016-07-12 10:55:01,523 INFO org.apache.zookeeper.ClientCnxn:1297 - >>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>> 2016-07-12 10:55:01,528 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - *State >>> change: READ_ONLY* >>> 2016-07-12 10:57:08,800 INFO org.apache.zookeeper.ClientCnxn:1253 - *Unable >>> to read additional data from server sessionid 0x10003c8af640007, likely >>> server has closed socket, closing socket connection and attempting >>> reconnect* >>> 2016-07-12 10:57:08,903 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - *State >>> change: SUSPENDED* >>> >>> >>> Server: >>> 2016-07-12T10:55:56,468 [myid:] - INFO [ProcessThread(sid:1 >>> cport:-1)::ZooKeeperServer@678] - *Established session >>> 0x10003c8af640007 with negotiated timeout 45000* for client / >>> 10.11.12.4:60586 >>> (...) >>> 2016-07-12T10:56:41,756 [myid:] - INFO >>> [SessionTracker:ZooKeeperServer@384] - *Expiring session >>> 0x10003c8af640007, timeout of 45000ms exceeded* >>> >>> >>> Note: Don't worry about the time being about 1 min off between the >>> machines. >>> >>> What I don't understand is why we are losing the read-only connection we >>> successfully established. ZK ro server says it timed out? >>> >>> Now interestingly, when I run it on the same machine it works and >>> completes pretty quickly: >>> >>> >>> 2016-07-12 10:43:49 INFO o.a.z.ZooKeeper:716 - Initiating client >>> connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@e4487af >>> 2016-07-12 10:43:49 INFO o.a.z.ClientCnxn:1138 - Opening socket >>> connection to server QA-E8WIN12/10.11.8.232:2181. Will not attempt to >>> authenticate using SASL (unknown error) >>> Successfully established the connection with ZooKeeper >>> 2016-07-12 10:44:10 WARN o.a.z.ClientCnxn:1257 - Session 0x0 for server >>> null, unexpected error, closing socket connection and attempting reconnect >>> java.net.ConnectException: Connection timed out: no further information >>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_91] >>> at >>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>> ~[?:1.8.0_91] >>> at >>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>> ~[zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236) >>> [zookeeper-3.5.0-alpha.jar:3.5.2-alpha-SNAPSHOT-1702518] >>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1138 - Opening socket >>> connection to server QA-E8WIN11/10.11.8.236:2181. Will not attempt to >>> authenticate using SASL (unknown error) >>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:980 - Socket connection >>> established, initiating session, client: /10.11.8.236:50147, server: >>> QA-E8WIN11/10.11.8.236:2181 >>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1400 - Session establishment >>> complete on server QA-E8WIN11/10.11.8.236:2181, sessionid = >>> 0x10003b315b30012, negotiated timeout = 45000 (READ-ONLY mode) >>> 2016-07-12 10:44:11 INFO o.a.z.ClientCnxn:1297 - Checking server >>> QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>> 2016-07-12 10:44:11 INFO o.a.c.f.s.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> 2016-07-12 10:44:32 INFO o.a.z.ClientCnxn:1297 - Checking server >>> QA-E8WIN11/10.11.8.236:2181 for being r/w. Timeout 400 >>> Done. >>> >>> >>> On Mon, Jul 11, 2016 at 4:42 PM, Cameron McKenzie < >>> [email protected]> wrote: >>> >>>> It looks like a connection issue based on the logs. When you confirmed >>>> that the connection was ok via the zkCli, was that running on the same host >>>> as the read only server? Can you run it from the same host that you're >>>> running the Java client from? >>>> >>>> On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton < >>>> [email protected]> wrote: >>>> >>>>> Well I was trying to let it run and it eventually fails, with a stack: >>>>> >>>>> Successfully established the connection with ZooKeeper >>>>> 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>>> Client session timed out, have not heard from server in 22519ms for >>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>> 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:54702, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>>> 2016-07-11 16:28:22,066 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f09360080, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> 2016-07-11 16:30:29,416 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: SUSPENDED >>>>> 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:55156, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >>>>> 2016-07-11 16:30:31,035 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f09360083, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> 2016-07-11 16:32:38,441 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: SUSPENDED >>>>> 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:55637, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 >>>>> 2016-07-11 16:32:40,428 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f09360087, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> 2016-07-11 16:34:47,716 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: SUSPENDED >>>>> 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>>> not attempt to authenticate using SASL (unknown error) >>>>> 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>> Socket connection established, initiating session, client: / >>>>> 10.11.12.4:56094, server: QA-E8WIN11/10.11.8.236:2181 >>>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>> sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode) >>>>> 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>> 1600 >>>>> 2016-07-11 16:34:49,560 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: READ_ONLY >>>>> 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>> Unable to read additional data from server sessionid 0x100024f09360089, >>>>> likely server has closed socket, closing socket connection and attempting >>>>> reconnect >>>>> Exception in thread "main" 2016-07-11 16:36:56,996 INFO >>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>> change: SUSPENDED >>>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>> KeeperErrorCode = ConnectionLoss for / >>>>> at >>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >>>>> at >>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) >>>>> at >>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) >>>>> at >>>>> org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) >>>>> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) >>>>> at >>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) >>>>> at >>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) >>>>> at >>>>> org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) >>>>> at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) >>>>> >>>>> On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie < >>>>> [email protected]> wrote: >>>>> >>>>>> So, it appears that the initial connection occurs ok, but then >>>>>> something goes wrong >>>>>> >>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>> Unable to read additional data from server sessionid 0x100024f0936005b, >>>>>> likely server has closed socket, closing socket connection and attempting >>>>>> reconnect >>>>>> >>>>>> Then a reconnection occurs ok, and then the cycle starts again. >>>>>> >>>>>> Can you see where in the stack the getData() call is hanging? >>>>>> >>>>>> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> The blockUntilConnectedOrTimedOut() does return, we see the line: >>>>>>> "Successfully established the connection with ZooKeeper" >>>>>>> which happens right after. >>>>>>> >>>>>>> The getData() is where it hangs. >>>>>>> >>>>>>> We see this first: >>>>>>> 2016-07-11 15:47:08,420 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> >>>>>>> and then we see it again 2 minutes later: >>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> >>>>>>> and then again 2 minutes after that: >>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>>> change: READ_ONLY >>>>>>> >>>>>>> Then I stopped it. >>>>>>> >>>>>>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> hey Ben, >>>>>>>> From the logs it looks like the connection is established at the >>>>>>>> end? >>>>>>>> >>>>>>>> Are you saying that the blockUntilConnectedOrTimedOut() does not >>>>>>>> return when this occurs? >>>>>>>> cheers >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> ZK is 3.5.1-alpha >>>>>>>>> Curator is 2.11.0 >>>>>>>>> >>>>>>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>>>>>>> 2 nodes are down, so the third one is in read-only (verified by >>>>>>>>>> using the ZK API manually). >>>>>>>>>> >>>>>>>>>> I'm using this piece of code: >>>>>>>>>> >>>>>>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>>>>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>>>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>>>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>>>>>>> >>>>>>>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>>>>>>> client.start(); >>>>>>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>>>>>>>> System.out.println("Successfully established the connection with >>>>>>>>>> ZooKeeper"); >>>>>>>>>> client.getData().forPath("/"); >>>>>>>>>> System.out.println("Done."); >>>>>>>>>> >>>>>>>>>> When curator pick the host that is UP first, it goes through very >>>>>>>>>> quickly. When it picks the host that is down first (QA-E8WIN12), it >>>>>>>>>> never >>>>>>>>>> goes through. >>>>>>>>>> >>>>>>>>>> Am I missing something here? >>>>>>>>>> >>>>>>>>>> Below is the log: >>>>>>>>>> >>>>>>>>>> 2016-07-11 15:46:45,733 INFO >>>>>>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - >>>>>>>>>> Starting >>>>>>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:zookeeper.version=3.5.1-alpha-1748895, built on >>>>>>>>>> 06/17/2016 22:24 GMT >>>>>>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:host.name=w-rli09-ben >>>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:java.version=1.8.0_11 >>>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:java.vendor=Oracle Corporation >>>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>>>>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:java.class.path=(..) >>>>>>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:java.io.tmpdir=/tmp >>>>>>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:java.compiler=<NA> >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:os.name=Linux >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:os.arch=amd64 >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:os.version=3.8.0-44-generic >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:user.name=benji >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:user.home=(..) >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:user.dir=(..) >>>>>>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:os.memory.free=385MB >>>>>>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:os.memory.max=5358MB >>>>>>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 >>>>>>>>>> - Client environment:os.memory.total=397MB >>>>>>>>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 >>>>>>>>>> - Initiating client connection, >>>>>>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 sessionTimeout=45000 >>>>>>>>>> watcher=org.apache.curator.ConnectionState@3e96bacf >>>>>>>>>> 2016-07-11 15:46:45,769 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to >>>>>>>>>> server >>>>>>>>>> QA-E8WIN12/10.11.8.232:2181. Will not attempt to authenticate >>>>>>>>>> using SASL (unknown error) >>>>>>>>>> Successfully established the connection with ZooKeeper >>>>>>>>>> 2016-07-11 15:47:08,300 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1251 - Client session timed out, >>>>>>>>>> have not >>>>>>>>>> heard from server in 22534ms for sessionid 0x0, closing socket >>>>>>>>>> connection >>>>>>>>>> and attempting reconnect >>>>>>>>>> 2016-07-11 15:47:08,403 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to >>>>>>>>>> server >>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate >>>>>>>>>> using SASL (unknown error) >>>>>>>>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>> 2016-07-11 15:47:08,413 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 - Session establishment >>>>>>>>>> complete on >>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>>>>>>>> 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>> 2016-07-11 15:47:08,415 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/ >>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 200 >>>>>>>>>> 2016-07-11 15:47:08,420 INFO >>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - >>>>>>>>>> State >>>>>>>>>> change: READ_ONLY >>>>>>>>>> 2016-07-11 15:49:15,712 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional >>>>>>>>>> data from >>>>>>>>>> server sessionid 0x100024f0936005b, likely server has closed socket, >>>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>>> 2016-07-11 15:49:15,815 INFO >>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - >>>>>>>>>> State >>>>>>>>>> change: SUSPENDED >>>>>>>>>> 2016-07-11 15:49:17,313 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to >>>>>>>>>> server >>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate >>>>>>>>>> using SASL (unknown error) >>>>>>>>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 - Session establishment >>>>>>>>>> complete on >>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>>>>>>>> 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/ >>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 400 >>>>>>>>>> 2016-07-11 15:49:17,316 INFO >>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - >>>>>>>>>> State >>>>>>>>>> change: READ_ONLY >>>>>>>>>> 2016-07-11 15:51:24,607 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional >>>>>>>>>> data from >>>>>>>>>> server sessionid 0x100024f0936005f, likely server has closed socket, >>>>>>>>>> closing socket connection and attempting reconnect >>>>>>>>>> 2016-07-11 15:51:24,713 INFO >>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - >>>>>>>>>> State >>>>>>>>>> change: SUSPENDED >>>>>>>>>> 2016-07-11 15:51:26,467 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1138 - Opening socket connection to >>>>>>>>>> server >>>>>>>>>> QA-E8WIN11/10.11.8.236:2181. Will not attempt to authenticate >>>>>>>>>> using SASL (unknown error) >>>>>>>>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 >>>>>>>>>> - Socket connection established, initiating session, client: / >>>>>>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1400 - Session establishment >>>>>>>>>> complete on >>>>>>>>>> server QA-E8WIN11/10.11.8.236:2181, sessionid = >>>>>>>>>> 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode) >>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>> org.apache.zookeeper.ClientCnxn:1297 - Checking server QA-E8WIN12/ >>>>>>>>>> 10.11.8.232:2181 for being r/w. Timeout 800 >>>>>>>>>> 2016-07-11 15:51:26,470 INFO >>>>>>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - >>>>>>>>>> State >>>>>>>>>> change: READ_ONLY >>>>>>>>>> (...) >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Ben >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
