Hi Achim, I wonder if this is due to garbage collection stalls. Also, I'd recommend you move to the 3.4 branch.
-Flavio > On 06 Nov 2015, at 09:45, Achim Finke <[email protected]> wrote: > > Hi, > > we use a single standalone Zookeeper in Version 3.3.6 with multiple (Java) > Clients. Clients and Server are running in one zone of a Solaris SPARC. > > The Clients are successfully connecting to the server, sending frequently > their heartbeats and are able to execute operations. It seems you could run > this forever and no timeout ever happens. But when they have finished their > operations and go back to idle, just sending their heartbeats but doesn't > execute any operations, after a while (seems like it is always about an > hour) some but NOT all clients are starting to timeout. The client is > reconnecting in time so the session is not lost but for me this is kind of > strange, so my questions are: > > - Do you have an explanation what can lead to these timeouts which > apparently happen on an hourly base? The network is up and running the > whole time and as I said it just happens in the 'idle' state, seems like > 'load' state would run forever. > - Why is the IP switching between ipv4 and ipv6 with each reconnect? > > I would be glad if someone can put some light in the dark. See the attached > log file for more details. > > Regards, > Achim > > Log: > > ### Server ### > 2015-11-04 11:16:58,367 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for > client /fd00:0:0:c20:0:0:1:2:37188 which had sessionid 0x150cddeb4cd0008 > 2015-11-04 11:16:58,955 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket > connection from /10.64.122.26:60066 > 2015-11-04 11:16:58,956 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew > session 0x150cddeb4cd0008 at /10.64.122.26:60066 > 2015-11-04 11:16:58,957 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session > 0x150cddeb4cd0008 with negotiated timeout 15000 for client / > 10.64.122.26:60066 > > ### Client ### > 0x150cddeb4cd0008, closing socket connection and attempting reconnect > 2015-11-04 11:16:58,364 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1181 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Client session timed out, have not > heard from server in 11216ms for sessionid 0x150cddeb4cd0008, closing > socket connection and attempting reconnect > 2015-11-04 11:16:58,953 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1058 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Opening socket connection to server > xxx.xxx.xxx/10.64.122.26:2181 > 2015-11-04 11:16:58,955 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 947 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Socket connection established to > xxx.xxx.xxx/10.64.122.26:2181, initiating session > 2015-11-04 11:16:58,957 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 736 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Session establishment complete on > server xxx.xxx.xxx/10.64.122.26:2181, sessionid = 0x150cddeb4cd0008, > negotiated timeout = 15000 > > > ### Server ### > 2015-11-04 12:17:05,817 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for > client /10.64.122.26:60066 which had sessionid 0x150cddeb4cd0008 > 2015-11-04 12:17:06,088 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket > connection from /fd00:0:0:c20:0:0:1:2:40286 > 2015-11-04 12:17:06,089 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew > session 0x150cddeb4cd0008 at /fd00:0:0:c20:0:0:1:2:40286 > 2015-11-04 12:17:06,090 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session > 0x150cddeb4cd0008 with negotiated timeout 15000 for client > /fd00:0:0:c20:0:0:1:2:40286 > > ### Client ### > 2015-11-04 12:17:05,815 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1181 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Client session timed out, have not > heard from server in 11449ms for sessionid 0x150cddeb4cd0008, closing > socket connection and attempting reconnect > 2015-11-04 12:17:06,087 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1058 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Opening socket connection to server > xxx.xxx.xxx/fd00:0:0:c20:0:0:1:2:2181 > 2015-11-04 12:17:06,088 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 947 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Socket connection established to > xxx.xxx.xxx/fd00:0:0:c20:0:0:1:2:2181, initiating session > 2015-11-04 12:17:06,090 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 736 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Session establishment complete on > server xxx.xxx.xxx/fd00:0:0:c20:0:0:1:2:2181, sessionid = > 0x150cddeb4cd0008, negotiated timeout = 15000 > > > ### Server ### > 2015-11-04 13:17:13,277 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for > client /fd00:0:0:c20:0:0:1:2:40286 which had sessionid 0x150cddeb4cd0008 > 2015-11-04 13:17:13,736 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket > connection from /10.64.122.26:47735 > 2015-11-04 13:17:13,737 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew > session 0x150cddeb4cd0008 at /10.64.122.26:47735 > 2015-11-04 13:17:13,738 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session > 0x150cddeb4cd0008 with negotiated timeout 15000 for client / > 10.64.122.26:47735 > > ### Client ### > 2015-11-04 13:17:13,275 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1181 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Client session timed out, have not > heard from server in 11787ms for sessionid 0x150cddeb4cd0008, closing > socket connection and attempting reconnect > 2015-11-04 13:17:13,734 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1058 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Opening socket connection to server > xxx.xxx.xxx/10.64.122.26:2181 > 2015-11-04 13:17:13,736 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 947 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Socket connection established to > xxx.xxx.xxx/10.64.122.26:2181, initiating session > 2015-11-04 13:17:13,738 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 736 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Session establishment complete on > server xxx.xxx.xxx/10.64.122.26:2181, sessionid = 0x150cddeb4cd0008, > negotiated timeout = 15000 > > > ### Server ### > 2015-11-04 14:17:20,734 - WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to > read additional data from client sessionid 0x150cddeb4cd0008, likely client > has closed socket > 2015-11-04 14:17:20,735 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for > client /10.64.122.26:47735 which had sessionid 0x150cddeb4cd0008 > 2015-11-04 14:17:21,600 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket > connection from /fd00:0:0:c20:0:0:1:2:65493 > 2015-11-04 14:17:21,600 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew > session 0x150cddeb4cd0008 at /fd00:0:0:c20:0:0:1:2:65493 > 2015-11-04 14:17:21,601 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session > 0x150cddeb4cd0008 with negotiated timeout 15000 for client > /fd00:0:0:c20:0:0:1:2:65493 > > ### Client ### > 2015-11-04 14:17:20,732 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1181 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Client session timed out, have not > heard from server in 11591ms for sessionid 0x150cddeb4cd0008, closing > socket connection and attempting reconnect > 2015-11-04 14:17:21,598 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 1058 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Opening socket connection to server > xxx.xxx.xxx/fd00:0:0:c20:0:0:1:2:2181 > 2015-11-04 14:17:21,600 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 947 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Socket connection established to > xxx.xxx.xxx/fd00:0:0:c20:0:0:1:2:2181, initiating session > 2015-11-04 14:17:21,601 | INFO | xxx.xxx.xxx:2181) | > org.apache.zookeeper.ClientCnxn$SendThread 736 | 197 - > org.apache.hadoop.zookeeper - 3.3.6 | Session establishment complete on > server xxx.xxx.xxx/fd00:0:0:c20:0:0:1:2:2181, sessionid = > 0x150cddeb4cd0008, negotiated timeout = 15000
