So, it appears that the initial connection occurs ok, but then something goes wrong
2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable to read additional data from server sessionid 0x100024f0936005b, likely server has closed socket, closing socket connection and attempting reconnect Then a reconnection occurs ok, and then the cycle starts again. Can you see where in the stack the getData() call is hanging? On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <[email protected]> wrote: > The blockUntilConnectedOrTimedOut() does return, we see the line: > "Successfully established the connection with ZooKeeper" > which happens right after. > > The getData() is where it hangs. > > We see this first: > 2016-07-11 15:47:08,420 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > > and then we see it again 2 minutes later: > 2016-07-11 15:49:17,316 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > > and then again 2 minutes after that: > 2016-07-11 15:51:26,470 INFO > org.apache.curator.framework.state.ConnectionStateManager:228 - State > change: READ_ONLY > > Then I stopped it. > > On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <[email protected]> > wrote: > >> hey Ben, >> From the logs it looks like the connection is established at the end? >> >> Are you saying that the blockUntilConnectedOrTimedOut() does not return >> when this occurs? >> cheers >> >> >> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <[email protected]> >> wrote: >> >>> ZK is 3.5.1-alpha >>> Curator is 2.11.0 >>> >>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <[email protected] >>> > wrote: >>> >>>> I have a 3 nodes ZK cluster , readonly mode is enabled. >>>> 2 nodes are down, so the third one is in read-only (verified by using >>>> the ZK API manually). >>>> >>>> I'm using this piece of code: >>>> >>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder() >>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") >>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000) >>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); >>>> >>>> CuratorFramework client = curatorClientBuilder.build(); >>>> client.start(); >>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut(); >>>> System.out.println("Successfully established the connection with >>>> ZooKeeper"); >>>> client.getData().forPath("/"); >>>> System.out.println("Done."); >>>> >>>> When curator pick the host that is UP first, it goes through very >>>> quickly. When it picks the host that is down first (QA-E8WIN12), it never >>>> goes through. >>>> >>>> Am I missing something here? >>>> >>>> Below is the log: >>>> >>>> 2016-07-11 15:46:45,733 INFO >>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting >>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on >>>> 06/17/2016 22:24 GMT >>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:host.name=w-rli09-ben >>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:java.version=1.8.0_11 >>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:java.vendor=Oracle Corporation >>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre >>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:java.class.path=(..) >>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:java.io.tmpdir=/tmp >>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:java.compiler=<NA> >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:os.name=Linux >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:os.arch=amd64 >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:os.version=3.8.0-44-generic >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:user.name=benji >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:user.home=(..) >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:user.dir=(..) >>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:os.memory.free=385MB >>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:os.memory.max=5358MB >>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 - >>>> Client environment:os.memory.total=397MB >>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 - >>>> Initiating client connection, connectString=QA-E8WIN11:2181,QA-E8WIN12:2181 >>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf >>>> 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> Successfully established the connection with ZooKeeper >>>> 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 - >>>> Client session timed out, have not heard from server in 22534ms for >>>> sessionid 0x0, closing socket connection and attempting reconnect >>>> 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200 >>>> 2016-07-11 15:47:08,420 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>> Unable to read additional data from server sessionid 0x100024f0936005b, >>>> likely server has closed socket, closing socket connection and attempting >>>> reconnect >>>> 2016-07-11 15:49:15,815 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400 >>>> 2016-07-11 15:49:17,316 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 - >>>> Unable to read additional data from server sessionid 0x100024f0936005f, >>>> likely server has closed socket, closing socket connection and attempting >>>> reconnect >>>> 2016-07-11 15:51:24,713 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: SUSPENDED >>>> 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 - >>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will >>>> not attempt to authenticate using SASL (unknown error) >>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 - >>>> Socket connection established, initiating session, client: / >>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181 >>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 - >>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181, >>>> sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode) >>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 - >>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800 >>>> 2016-07-11 15:51:26,470 INFO >>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State >>>> change: READ_ONLY >>>> (...) >>>> >>>> Thanks >>>> Ben >>>> >>>> >>> >> >
