Folks,

We are building an application that talks to ZooKeeper using the C bindings to
enable service discovery.  Our integration is fairly simple and is generally 
working
well from a functional standpoint, but we are encountering a problem with this 
new
application that we have not experienced in the past when we used the Java 
bindings.

What’s happening is that our servers are occasionally (every hour or two) being
disconnected from ZooKeeper and subsequently need to re-connect.  There doesn’t
seem to be any pattern in terms of the timing or frequency of these occurrences.
When the problem occurs, the following is logged:

Aug 27 22:47:55 bixby-i-65e767a3-us-west-2 [2015-Aug-27 22:47:55.326399] [info] 
[zkLogBridge] [0x00007f7992bff700] 2015-08-27 
22:47:55,326:7312(0x7f7990fff700):ZOO_DEBUG@zookeeper_process@2264: Got ping 
response in 74 ms
Aug 27 22:48:01 bixby-i-65e767a3-us-west-2 [2015-Aug-27 22:48:01.993356] 
[error] [zkLogBridge] [0x00007f7992bff700] 2015-08-27 
22:48:01,993:7312(0x7f7990fff700):ZOO_ERROR@handle_socket_error_msg@1643: 
Socket [10.0.19.22:2181] zk retcode=-7, errno=110(Connection timed out): 
connection to 10.0.19.22:2181 timed out (exceeded timeout by 0ms)
Aug 27 22:48:01 bixby-i-65e767a3-us-west-2 [2015-Aug-27 22:48:01.993409] [info] 
[zkClient]{0x7f799e446a40} [0x00007f7992bff700] ZK session watcher with 
type=Event:Session state=State:Connecting

The last of these is our own log indicating that we received the notification 
that
the connection had been dropped.

On the ZooKeeper side of things, the logs look like this:

Aug 27 18:36:54 bixby-i-65e767a3-us-west-2 [2015-Aug-27 18:36:54.880795] [info] 
[zkLogBridge] [0x00007f7992bff700] 2015-08-27 
18:36:54,880:7312(0x7f7990fff700):ZOO_INFO@check_events@1752: session 
establishment complete on server [52.5.153.158:2181], 
sessionId=0x34f6b4380d90006, negotiated timeout=10000
...
2015-08-27 22:47:55,283 [myid:2] - DEBUG 
[CommitProcessor:2:FinalRequestProcessor@160] - sessionid:0x34f6b4380d90006 
type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown 
reqpath:n/a
2015-08-27 22:48:03,095 [myid:2] - DEBUG [ProcessThread(sid:2 
cport:-1)::CommitProcessor@171] - Processing request:: 
sessionid:0x34f6b4380d90006 type:ping cxid:0xfffffffffffffffe 
zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2015-08-27 22:48:03,095 [myid:2] - DEBUG 
[CommitProcessor:2:FinalRequestProcessor@88] - Processing request:: 
sessionid:0x34f6b4380d90006 type:ping cxid:0xfffffffffffffffe 
zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2015-08-27 22:48:03,095 [myid:2] - DEBUG 
[CommitProcessor:2:FinalRequestProcessor@160] - sessionid:0x34f6b4380d90006 
type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown 
reqpath:n/a2015-08-27 22:48:03,095 [myid:2] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x34f6b4380d90006, likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745
2015-08-27 22:48:03,095 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /54.200.140.196:39721 which had sessionid 
0x34f6b4380d90006

So, it looks like the client got annoyed that it didn’t receive a response in 
time from
the server, but it also looks like the server was generally responding pretty 
quickly.

It’s likely that we’re doing something wrong here, but I’m at a loss as to what 
that
might be.  Does anyone have any insights that might explain this behavior?


Reply via email to