Hello all, In our development Zookeeper cluster, about once a week we've started seeing two of the followers have an Unexpected Exception with Zookeeper 3.4.5. Looking at the metrics from the mntr four letter word, there is no particular amount of load or number of sessions that leads to this behavior, it just seems totally out of the blue. Here is a snippet of INFO level logs from that period of time (IP addresses have been scrubbed):
2013-07-25 04:19:20,066 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,066 - WARN [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@793] - Connection request from old client /xxx.xxx.xxx.xxx:xxxxx; will be dropped if server is in r-o mode 2013-07-25 04:19:20,067 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@839] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,068 - INFO [CommitProcessor:6:ZooKeeperServer@595] - Established session 0x63ffea596ee912b with negotiated timeout 30000 for client /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,068 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@863] - got auth packet /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,068 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@897] - auth success /xxx.xxx.xxx.xxx:xxxxx (--------lots of normal activity----------) 2013-07-25 04:19:20,317 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@839] - Client attempting to establish new session at /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,318 - INFO [CommitProcessor:6:ZooKeeperServer@595] - Established session 0x63ffea596ee912c with negotiated timeout 10000 for client /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,318 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@863] - got auth packet /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,318 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@897] - auth success /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,319 - WARN [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader 2013-07-25 04:19:20,319 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called 2013-07-25 04:19:20,319 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1001] - Closed socket connection for client /xxx.xxx.xxx.xxx:xxxxx which had sessionid 0x63ffea596ee8bab (-----------One of these for each active session it seems-----------------)) 2013-07-25 04:19:20,375 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1001] - Closed socket connection for client /xxx.xxx.xxx.xxx:xxxxx which had sessionid 0x83fdf0b85832d61 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] - Shutting down 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] - Shutting down 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting down 2013-07-25 04:19:20,376 - INFO [FollowerRequestProcessor:6:FollowerRequestProcessor@95] - FollowerRequestProcessor exited loop! 2013-07-25 04:19:20,376 - INFO [CommitProcessor:6:CommitProcessor@150] - CommitProcessor exited loop! 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] - shutdown of request processor complete 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@175] - Shutting down 2013-07-25 04:19:20,376 - INFO [SyncThread:6:SyncRequestProcessor@155] - SyncRequestProcessor exited! 2013-07-25 04:19:20,376 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING 2013-07-25 04:19:20,377 - INFO [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.4270965c70 2013-07-25 04:19:20,707 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,707 - WARN [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2013-07-25 04:19:20,707 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@1001] - Closed socket connection for client /xxx.xxx.xxx.xxx:xxxxx (no session established for client) 2013-07-25 04:19:20,708 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,708 - WARN [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2013-07-25 04:19:20,708 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@1001] - Closed socket connection for client /xxx.xxx.xxx.xxx:xxxxx (no session established for client) 2013-07-25 04:19:20,709 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx.xxx.xxx.xxx:xxxxx 2013-07-25 04:19:20,710 - WARN [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2013-07-25 04:19:20,710 - INFO [NIOServerCxn.Factory: 0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@1001] - Closed socket connection for client /xxx.xxx.xxx.xxx:xxxxx (no session established for client) Whenever this event happens on one of the followers, none of the other Zookeeper servers (there are 5 in the cluster) seem to notice. Does anyone have any idea what might be the issue here? Not sure where to start with this one. Thanks, Steve
