Hi Jim- I don't think moving to 3.4.11 should make too much of a difference, it just helps to know the version so I can map the log messages to code.
There are a couple things you could look at in the meantime. I would start with heap usage to make sure this is not the result of GC pauses. You may also want to reevaluate the initLimit if you are storing a large amount of data in ZK. Hopefully this helps. Abe On Thu, Nov 30, 2017, at 10:15, Jim Wong wrote: > Abe, > > Thanks for responding. We are currently on ZooKeeper 3.4.10; do you think > going to 3.4.11 would make a difference? > > We can certainly turn on debug logs, but because the problem has only > occurred twice in the last several (probably 8-9) months and we’re unable > to reproduce it intentionally, it might take a while. > > Is there anything else you suggest we do to troubleshoot in the meantime? > > > > On Nov 29, 2017, at 4:52 PM, Abraham Fine <[email protected]> wrote: > > > > Hi Jim- > > > > What version of ZooKeeper are you using? > > > > It would be great to get a better understanding of what is causing: > > Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: > > Not sufficient followers synced, only synced with sids: [ 3,4 ] > > Nov 29 19:32:12 localhost at > > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) > > Nov 29 19:32:12 localhost at > > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) > > > > This failure appears to align with: > > Nov 29 19:32:12 localhost zookeeper: WARN > > [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when > > following the leader > > > > Is there any possibility debug logs could be enabled? > > > > Thanks, > > Abe > > > > Would you be able to turn on debug logging, I think that might give us > > much more visibility? > > On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote: > >> Hey, folks. We’re running a 5-node ZooKeeper cluster to manage server > >> configuration and discovery. This morning, we experienced an outage in > >> which the entire cluster went offline for approximately 7 minutes. > >> Restarting the ZooKeeper daemons on each node in the cluster got things > >> working again, so there doesn’t appear to have been an underlying network > >> or hardware fault that would explain the cluster’s inability to recover. > >> > >> This is actually the second time this has happened in the last several > >> months, and all of our attempts to reproduce the behavior by > >> intentionally doing bad things to the cluster or the leader have failed. > >> > >> With all that in mind, we were wondering if anyone had any thoughts as to > >> what might be going on, as we’re out of ideas. If anyone can provide any > >> hints or guidance, we’d be very grateful. > >> > >> I’ve included brief excerpts from logs from some of the nodes; the > >> problem started around 17:32:12 and wasn’t resolved until 17:39, after > >> we’d restarted ZooKeeper. This was was originally the leader: > >> > >> Nov 29 19:31:44 localhost zookeeper: INFO [ProcessThread(sid:4 > >> cport:-1)::PrepRequestProcessor@486] - Processed session termination for > >> sessionid: 0x15fdd1d1e0c0313 > >> Nov 29 19:32:11 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:11 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:11 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:12 localhost zookeeper: ERROR > >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected > >> exception causing shutdown while sock still open > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE > >> /10.1.29.76:49198 ******** > >> Nov 29 19:32:12 localhost zookeeper: ERROR > >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected > >> exception causing shutdown while sock still open > >> Nov 29 19:32:12 localhost zookeeper: ERROR > >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected > >> exception causing shutdown while sock still open > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE > >> /10.1.0.252:50932 ******** > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE > >> /10.1.48.67:44882 ******** > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my > >> state) > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my > >> state) > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called > >> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: > >> Not sufficient followers synced, only synced with sids: [ 3,4 ] > >> Nov 29 19:32:12 localhost at > >> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) > >> Nov 29 19:32:12 localhost at > >> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331] > >> - exception while shutting down acceptor: java.net.SocketException: > >> Socket closed > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed > >> socket connection for client /10.1.10.8:42938 which had sessionid > >> 0x45f3a424dd20208 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting > >> down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO [ProcessThread(sid:4 > >> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting > >> down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676] > >> - Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - > >> shutdown of request processor complete > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited! > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop! > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE > >> /10.1.37.117:52956 ******** > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring > >> unexpected exception > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New > >> election. My id = 4, proposed zxid=0x2b0260b5a8 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to > >> 10.1.64.1 port 67 (xid=0x79592ad4) > >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1 > >> (xid=0x79592ad4) > >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 -- > >> renewal in 1608 seconds. > >> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get > >> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s > >> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases > >> of eth0 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 400 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:13 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:13 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 800 > >> Nov 29 19:32:13 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:13 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:13 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:14 localhost zookeeper: INFO > >> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! > >> Nov 29 19:32:14 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 1600 > >> Nov 29 19:32:14 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:15 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 3200 > >> Nov 29 19:32:15 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:18 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 6400 > >> Nov 29 19:32:18 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0, > >> interval 125280ms. > >> Nov 29 19:32:25 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 12800 > >> Nov 29 19:32:25 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:38 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 25600 > >> Nov 29 19:32:38 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:33:03 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 51200 > >> Nov 29 19:33:03 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:33:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:33:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:34:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:34:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:35:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:35:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:36:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:36:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:37:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:37:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:38:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:38:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 localhost zookeeper: WARN > >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for > >> id 5, my id = 4, error = > >> Nov 29 19:39:45 localhost zookeeper: WARN > >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker > >> Nov 29 19:39:45 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times > >> Nov 29 19:39:45 localhost zookeeper: WARN > >> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using > >> channel: for id 5 my id = 4 error = java.net.SocketException: Socket > >> closed > >> Nov 29 19:39:45 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times > >> Nov 29 19:39:45 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 localhost zookeeper: WARN > >> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving > >> thread > >> Nov 29 19:39:47 localhost zookeeper: INFO > >> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746] > >> - Received connection request /10.1.48.67:58398 > >> Nov 29 19:39:47 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:47 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:47 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:47 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:47 localhost zookeeper: INFO > >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service > >> 'zookeeper-server' on user request > >> > >> > >> And here are logs from one of the followers: > >> > >> Nov 29 19:32:11 localhost zookeeper: INFO > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating > >> client: 0x15fc2e9562e0187 > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when > >> following the leader > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed > >> socket connection for client /10.1.29.33:57934 which had sessionid > >> 0x35f3a424dc802a2 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting > >> down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting > >> down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] - > >> FollowerRequestProcessor exited loop! > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - > >> shutdown of request processor complete > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop! > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - > >> Shutting down > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited! > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New > >> election. My id = 5, proposed zxid=0x2b0260b5a8 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:12 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 400 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 800 > >> Nov 29 19:32:12 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:13 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 1600 > >> Nov 29 19:32:13 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:15 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 3200 > >> Nov 29 19:32:15 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:18 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 6400 > >> Nov 29 19:32:18 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:24 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 12800 > >> Nov 29 19:32:24 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:32:37 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 25600 > >> Nov 29 19:32:37 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:33:03 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 51200 > >> Nov 29 19:33:03 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:33:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:33:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:34:42 localhost zookeeper: WARN > >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > >> end of stream exception > >> Nov 29 19:34:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:34:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:35:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:35:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:36:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:36:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:37:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:37:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:38:54 localhost zookeeper: INFO > >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > >> Notification time out: 60000 > >> Nov 29 19:38:54 localhost zookeeper: INFO > >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 > >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > >> state) > >> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service > >> 'zookeeper-server' on user request > >> > >> >
