Well I was trying to let it run and it eventually fails, with a stack:
Successfully established the connection with ZooKeeper
2016-07-11 16:28:21,943 INFO org.apache.zookeeper.ClientCnxn:1251 - Client
session timed out, have not heard from server in 22519ms for sessionid 0x0,
closing socket connection and attempting reconnect
2016-07-11 16:28:22,046 INFO org.apache.zookeeper.ClientCnxn:1138 -
Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
attempt to authenticate using SASL (unknown error)
2016-07-11 16:28:22,050 INFO org.apache.zookeeper.ClientCnxn:980 - Socket
connection established, initiating session, client: /10.11.12.4:54702,
server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:28:22,058 INFO org.apache.zookeeper.ClientCnxn:1400 -
Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
sessionid = 0x100024f09360080, negotiated timeout = 45000 (READ-ONLY mode)
2016-07-11 16:28:22,061 INFO org.apache.zookeeper.ClientCnxn:1297 -
Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
2016-07-11 16:28:22,066 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: READ_ONLY
2016-07-11 16:30:29,312 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable
to read additional data from server sessionid 0x100024f09360080, likely
server has closed socket, closing socket connection and attempting reconnect
2016-07-11 16:30:29,416 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: SUSPENDED
2016-07-11 16:30:31,031 INFO org.apache.zookeeper.ClientCnxn:1138 -
Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
attempt to authenticate using SASL (unknown error)
2016-07-11 16:30:31,032 INFO org.apache.zookeeper.ClientCnxn:980 - Socket
connection established, initiating session, client: /10.11.12.4:55156,
server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1400 -
Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
sessionid = 0x100024f09360083, negotiated timeout = 45000 (READ-ONLY mode)
2016-07-11 16:30:31,034 INFO org.apache.zookeeper.ClientCnxn:1297 -
Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
2016-07-11 16:30:31,035 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: READ_ONLY
2016-07-11 16:32:38,335 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable
to read additional data from server sessionid 0x100024f09360083, likely
server has closed socket, closing socket connection and attempting reconnect
2016-07-11 16:32:38,441 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: SUSPENDED
2016-07-11 16:32:40,425 INFO org.apache.zookeeper.ClientCnxn:1138 -
Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
attempt to authenticate using SASL (unknown error)
2016-07-11 16:32:40,426 INFO org.apache.zookeeper.ClientCnxn:980 - Socket
connection established, initiating session, client: /10.11.12.4:55637,
server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1400 -
Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
sessionid = 0x100024f09360087, negotiated timeout = 45000 (READ-ONLY mode)
2016-07-11 16:32:40,428 INFO org.apache.zookeeper.ClientCnxn:1297 -
Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800
2016-07-11 16:32:40,428 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: READ_ONLY
2016-07-11 16:34:47,615 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable
to read additional data from server sessionid 0x100024f09360087, likely
server has closed socket, closing socket connection and attempting reconnect
2016-07-11 16:34:47,716 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: SUSPENDED
2016-07-11 16:34:49,557 INFO org.apache.zookeeper.ClientCnxn:1138 -
Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will not
attempt to authenticate using SASL (unknown error)
2016-07-11 16:34:49,558 INFO org.apache.zookeeper.ClientCnxn:980 - Socket
connection established, initiating session, client: /10.11.12.4:56094,
server: QA-E8WIN11/10.11.8.236:2181
2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1400 -
Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
sessionid = 0x100024f09360089, negotiated timeout = 45000 (READ-ONLY mode)
2016-07-11 16:34:49,560 INFO org.apache.zookeeper.ClientCnxn:1297 -
Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 1600
2016-07-11 16:34:49,560 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: READ_ONLY
2016-07-11 16:36:56,895 INFO org.apache.zookeeper.ClientCnxn:1253 - Unable
to read additional data from server sessionid 0x100024f09360089, likely
server has closed socket, closing socket connection and attempting reconnect
Exception in thread "main" 2016-07-11 16:36:56,996 INFO
org.apache.curator.framework.state.ConnectionStateManager:228 - State
change: SUSPENDED
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1753)
at
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:310)
at
org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:299)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
at
org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:295)
at
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:287)
at
org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:34)
at TestCuratorReadOnly.main(TestCuratorReadOnly.java:23)
On Mon, Jul 11, 2016 at 4:39 PM, Cameron McKenzie <[email protected]>
wrote:
> So, it appears that the initial connection occurs ok, but then something
> goes wrong
>
> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 -
> Unable to read additional data from server sessionid 0x100024f0936005b,
> likely server has closed socket, closing socket connection and attempting
> reconnect
>
> Then a reconnection occurs ok, and then the cycle starts again.
>
> Can you see where in the stack the getData() call is hanging?
>
> On Tue, Jul 12, 2016 at 9:35 AM, Benjamin Jaton <[email protected]>
> wrote:
>
>> The blockUntilConnectedOrTimedOut() does return, we see the line:
>> "Successfully established the connection with ZooKeeper"
>> which happens right after.
>>
>> The getData() is where it hangs.
>>
>> We see this first:
>> 2016-07-11 15:47:08,420 INFO
>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>> change: READ_ONLY
>>
>> and then we see it again 2 minutes later:
>> 2016-07-11 15:49:17,316 INFO
>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>> change: READ_ONLY
>>
>> and then again 2 minutes after that:
>> 2016-07-11 15:51:26,470 INFO
>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>> change: READ_ONLY
>>
>> Then I stopped it.
>>
>> On Mon, Jul 11, 2016 at 4:13 PM, Cameron McKenzie <[email protected]
>> > wrote:
>>
>>> hey Ben,
>>> From the logs it looks like the connection is established at the end?
>>>
>>> Are you saying that the blockUntilConnectedOrTimedOut() does not return
>>> when this occurs?
>>> cheers
>>>
>>>
>>> On Tue, Jul 12, 2016 at 9:10 AM, Benjamin Jaton <[email protected]
>>> > wrote:
>>>
>>>> ZK is 3.5.1-alpha
>>>> Curator is 2.11.0
>>>>
>>>> On Mon, Jul 11, 2016 at 4:01 PM, Benjamin Jaton <
>>>> [email protected]> wrote:
>>>>
>>>>> I have a 3 nodes ZK cluster , readonly mode is enabled.
>>>>> 2 nodes are down, so the third one is in read-only (verified by using
>>>>> the ZK API manually).
>>>>>
>>>>> I'm using this piece of code:
>>>>>
>>>>> Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
>>>>> .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
>>>>> .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
>>>>> .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
>>>>>
>>>>> CuratorFramework client = curatorClientBuilder.build();
>>>>> client.start();
>>>>> client.getZookeeperClient().blockUntilConnectedOrTimedOut();
>>>>> System.out.println("Successfully established the connection with
>>>>> ZooKeeper");
>>>>> client.getData().forPath("/");
>>>>> System.out.println("Done.");
>>>>>
>>>>> When curator pick the host that is UP first, it goes through very
>>>>> quickly. When it picks the host that is down first (QA-E8WIN12), it never
>>>>> goes through.
>>>>>
>>>>> Am I missing something here?
>>>>>
>>>>> Below is the log:
>>>>>
>>>>> 2016-07-11 15:46:45,733 INFO
>>>>> org.apache.curator.framework.imps.CuratorFrameworkImpl:235 - Starting
>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:zookeeper.version=3.5.1-alpha-1748895, built on
>>>>> 06/17/2016 22:24 GMT
>>>>> 2016-07-11 15:46:45,743 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:host.name=w-rli09-ben
>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:java.version=1.8.0_11
>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:java.vendor=Oracle Corporation
>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:java.home=/usr/local/apps/jdk1.8.0_11/jre
>>>>> 2016-07-11 15:46:45,744 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:java.class.path=(..)
>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:java.io.tmpdir=/tmp
>>>>> 2016-07-11 15:46:45,746 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:java.compiler=<NA>
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:os.name=Linux
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:os.arch=amd64
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:os.version=3.8.0-44-generic
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:user.name=benji
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:user.home=(..)
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:user.dir=(..)
>>>>> 2016-07-11 15:46:45,747 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:os.memory.free=385MB
>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:os.memory.max=5358MB
>>>>> 2016-07-11 15:46:45,748 INFO org.apache.zookeeper.ZooKeeper:109 -
>>>>> Client environment:os.memory.total=397MB
>>>>> 2016-07-11 15:46:45,750 INFO org.apache.zookeeper.ZooKeeper:716 -
>>>>> Initiating client connection,
>>>>> connectString=QA-E8WIN11:2181,QA-E8WIN12:2181
>>>>> sessionTimeout=45000 watcher=org.apache.curator.ConnectionState@3e96bacf
>>>>> 2016-07-11 15:46:45,769 INFO org.apache.zookeeper.ClientCnxn:1138 -
>>>>> Opening socket connection to server QA-E8WIN12/10.11.8.232:2181. Will
>>>>> not attempt to authenticate using SASL (unknown error)
>>>>> Successfully established the connection with ZooKeeper
>>>>> 2016-07-11 15:47:08,300 INFO org.apache.zookeeper.ClientCnxn:1251 -
>>>>> Client session timed out, have not heard from server in 22534ms for
>>>>> sessionid 0x0, closing socket connection and attempting reconnect
>>>>> 2016-07-11 15:47:08,403 INFO org.apache.zookeeper.ClientCnxn:1138 -
>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>>> not attempt to authenticate using SASL (unknown error)
>>>>> 2016-07-11 15:47:08,406 INFO org.apache.zookeeper.ClientCnxn:980 -
>>>>> Socket connection established, initiating session, client: /
>>>>> 10.11.12.4:47004, server: QA-E8WIN11/10.11.8.236:2181
>>>>> 2016-07-11 15:47:08,413 INFO org.apache.zookeeper.ClientCnxn:1400 -
>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>> sessionid = 0x100024f0936005b, negotiated timeout = 45000 (READ-ONLY mode)
>>>>> 2016-07-11 15:47:08,415 INFO org.apache.zookeeper.ClientCnxn:1297 -
>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 200
>>>>> 2016-07-11 15:47:08,420 INFO
>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>> change: READ_ONLY
>>>>> 2016-07-11 15:49:15,712 INFO org.apache.zookeeper.ClientCnxn:1253 -
>>>>> Unable to read additional data from server sessionid 0x100024f0936005b,
>>>>> likely server has closed socket, closing socket connection and attempting
>>>>> reconnect
>>>>> 2016-07-11 15:49:15,815 INFO
>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>> change: SUSPENDED
>>>>> 2016-07-11 15:49:17,313 INFO org.apache.zookeeper.ClientCnxn:1138 -
>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>>> not attempt to authenticate using SASL (unknown error)
>>>>> 2016-07-11 15:49:17,314 INFO org.apache.zookeeper.ClientCnxn:980 -
>>>>> Socket connection established, initiating session, client: /
>>>>> 10.11.12.4:47295, server: QA-E8WIN11/10.11.8.236:2181
>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1400 -
>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>> sessionid = 0x100024f0936005f, negotiated timeout = 45000 (READ-ONLY mode)
>>>>> 2016-07-11 15:49:17,316 INFO org.apache.zookeeper.ClientCnxn:1297 -
>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 400
>>>>> 2016-07-11 15:49:17,316 INFO
>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>> change: READ_ONLY
>>>>> 2016-07-11 15:51:24,607 INFO org.apache.zookeeper.ClientCnxn:1253 -
>>>>> Unable to read additional data from server sessionid 0x100024f0936005f,
>>>>> likely server has closed socket, closing socket connection and attempting
>>>>> reconnect
>>>>> 2016-07-11 15:51:24,713 INFO
>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>> change: SUSPENDED
>>>>> 2016-07-11 15:51:26,467 INFO org.apache.zookeeper.ClientCnxn:1138 -
>>>>> Opening socket connection to server QA-E8WIN11/10.11.8.236:2181. Will
>>>>> not attempt to authenticate using SASL (unknown error)
>>>>> 2016-07-11 15:51:26,468 INFO org.apache.zookeeper.ClientCnxn:980 -
>>>>> Socket connection established, initiating session, client: /
>>>>> 10.11.12.4:47576, server: QA-E8WIN11/10.11.8.236:2181
>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1400 -
>>>>> Session establishment complete on server QA-E8WIN11/10.11.8.236:2181,
>>>>> sessionid = 0x100024f09360061, negotiated timeout = 45000 (READ-ONLY mode)
>>>>> 2016-07-11 15:51:26,470 INFO org.apache.zookeeper.ClientCnxn:1297 -
>>>>> Checking server QA-E8WIN12/10.11.8.232:2181 for being r/w. Timeout 800
>>>>> 2016-07-11 15:51:26,470 INFO
>>>>> org.apache.curator.framework.state.ConnectionStateManager:228 - State
>>>>> change: READ_ONLY
>>>>> (...)
>>>>>
>>>>> Thanks
>>>>> Ben
>>>>>
>>>>>
>>>>
>>>
>>
>