It looks like a connection issue based on the logs. When you confirmed that the connection was ok via the zkCli, was that running on the same host as the read only server? Can you run it from the same host that you're running the Java client from?
On Tue, Jul 12, 2016 at 9:40 AM, Benjamin Jaton <[email protected]> wrote: > Well I was trying to let it run and it eventually fails, with a stack: > > Successfully established the connection with ZooKeeper > 2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - > Client session timed out, have not heard from server in 22519ms for > sessionid 0x0, closing socket connection and attempting reconnect > 2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:54702, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 > 2016-07-11 16:28:22,066 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360080, > likely server has closed socket, closing socket connection and attempting > reconnect > 2016-07-11 16:30:29,416 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:55156, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 > 2016-07-11 16:30:31,035 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360083, > likely server has closed socket, closing socket connection and attempting > reconnect > 2016-07-11 16:32:38,441 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:55637, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 > 2016-07-11 16:32:40,428 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360087, > likely server has closed socket, closing socket connection and attempting > reconnect > 2016-07-11 16:34:47,716 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > 2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 - > Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not > attempt to authenticate using SASL (unknown error) > 2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - Socket > connection established, initiating session, client: /10.11.12.4:56094, > server: QA-E8WIN11/10.11.8.236:2181 > 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 - > Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, > sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode) > 2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 - > Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600 > 2016-07-11 16:34:49,560 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > 2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - > Unable to read additional data from server sessionid 0x100024f09360089, > likely server has closed socket, closing socket connection and attempting > reconnect > Exception in thread "main" 2016-07-11 16:36:56,996 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: SUSPENDED > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310) > at > org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287) > at > org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34) > at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23) > > On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <[email protected]> > wrote: > >> So, it appears that the initial connection occurs ok, but then something >> goes wrong >> >> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >> Unable to read additional data from server sessionid 0x100024f0936005b, >> likely server has closed socket, closing socket connection and attempting >> reconnect >> >> Then a reconnection occurs ok, and then the cycle starts again. >> >> Can you see where in the stack the getData() call is hanging? >> >> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <[email protected] >> > wrote: >> >>> The blockUntilConnectedOrTimedOut() does return, we see the line: >>> "Successfully established the connection with ZooKeeper" >>> which happens right after. >>> >>> The getData() is where it hangs. >>> >>> We see this first: >>> 2016-07-11 15:47:08,420 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> >>> and then we see it again 2 minutes later: >>> 2016-07-11 15:49:17,316 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> >>> and then again 2 minutes after that: >>> 2016-07-11 15:51:26,470 INFO >>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>> change: READ_ONLY >>> >>> Then I stopped it. >>> >>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie < >>> [email protected]> wrote: >>> >>>> hey Ben, >>>> From the logs it looks like the connection is established at the end? >>>> >>>> Are you saying that the blockUntilConnectedOrTimedOut() does not return >>>> when this occurs? >>>> cheers >>>> >>>> >>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton < >>>> [email protected]> wrote: >>>> >>>>> ZK is 3.5.1-alpha >>>>> Curator is 2.11.0 >>>>> >>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton < >>>>> [email protected]> wrote: >>>>> >>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>>>> 2 nodes are down, so the third one is in read-only (verified by using >>>>>> the ZK API manually). >>>>>> >>>>>> I'm using this piece of code: >>>>>> >>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>>>> >>>>>> CuratorFramework client = curatorClientBuilder.build(); >>>>>> client.start(); >>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>>>> System.out.println("Successfully established the connection with >>>>>> ZooKeeper"); >>>>>> client.getData().forPath("/"); >>>>>> System.out.println("Done."); >>>>>> >>>>>> When curator pick the host that is UP first, it goes through very >>>>>> quickly. When it picks the host that is down first (QA-E8WIN12), it never >>>>>> goes through. >>>>>> >>>>>> Am I missing something here? >>>>>> >>>>>> Below is the log: >>>>>> >>>>>> 2016-07-11 15:46:45,733 INFO >>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting >>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on >>>>>> 06/17/2016 22:24 GMT >>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:host.name=w-rli09-ben >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.version=1.8.0_11 >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.vendor=Oracle Corporation >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.class.path=(..) >>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.io.tmpdir=/tmp >>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:java.compiler=<NA> >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.name=Linux >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.arch=amd64 >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.version=3.8.0-44-generic >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:user.name=benji >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:user.home=(..) >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:user.dir=(..) >>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.memory.free=385MB >>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.memory.max=5358MB >>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>>>> Client environment:os.memory.total=397MB >>>>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 - >>>>>> Initiating client connection, >>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf >>>>>> 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> Successfully established the connection with ZooKeeper >>>>>> 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>>>> Client session timed out, have not heard from server in 22534ms for >>>>>> sessionid 0x0, closing socket connection and attempting reconnect >>>>>> 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>> Socket connection established, initiating session, client: / >>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>>>> 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>> sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY >>>>>> mode) >>>>>> 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>> 200 >>>>>> 2016-07-11 15:47:08,420 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>> Unable to read additional data from server sessionid 0x100024f0936005b, >>>>>> likely server has closed socket, closing socket connection and attempting >>>>>> reconnect >>>>>> 2016-07-11 15:49:15,815 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: SUSPENDED >>>>>> 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>> Socket connection established, initiating session, client: / >>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>> sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY >>>>>> mode) >>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>> 400 >>>>>> 2016-07-11 15:49:17,316 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>>>> Unable to read additional data from server sessionid 0x100024f0936005f, >>>>>> likely server has closed socket, closing socket connection and attempting >>>>>> reconnect >>>>>> 2016-07-11 15:51:24,713 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: SUSPENDED >>>>>> 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. >>>>>> Will not attempt to authenticate using SASL (unknown error) >>>>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 - >>>>>> Socket connection established, initiating session, client: / >>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>>>> sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY >>>>>> mode) >>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout >>>>>> 800 >>>>>> 2016-07-11 15:51:26,470 INFO >>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>>>> change: READ_ONLY >>>>>> (...) >>>>>> >>>>>> Thanks >>>>>> Ben >>>>>> >>>>>> >>>>> >>>> >>> >> >
