The other thing I notice is that we're first off in the n.round, and then we're off in the peer epoch, the nodes that thought the round was 6 (4 and 5), upon increasing the round to 7 increase their peer epoch from 41d to 41e. Only when all the nodes finally say both 7 and 41e do you see election claim to finish.
C On Fri, May 1, 2015 at 12:22 PM, Camille Fournier <[email protected]> wrote: > Unfortunately that looks like just zab. It would be super awesome if you > could send me one more log snippet of one of the other machines in this bad > scenario. I think that will help me figure out what's happening. Perhaps > machine 4 or 5? > > Thanks, > C > > On Fri, May 1, 2015 at 12:16 PM, Dudzinski, Karol <[email protected]> > wrote: > >> That is awesome. I was looking for a page like this that describes the >> election process. I'll have a read through it. >> >> The Goldman Sachs Group, Inc. All rights reserved. >> See http://www.gs.com/disclaimer/global_email for important risk >> disclosures, conflicts of interest and other terms and conditions relating >> to this e-mail and your reliance on information contained in it. This >> message may contain confidential or privileged information. If you are not >> the intended recipient, please advise us immediately and delete this >> message. See http://www.gs.com/disclaimer/email for further information >> on confidentiality and the risks of non-secure electronic communication. >> If you cannot access these links, please notify us by reply message and we >> will send the contents to you. >> >> >> -----Original Message----- >> From: Camille Fournier [mailto:[email protected]] >> Sent: 01 May 2015 17:08 >> To: [email protected] >> Subject: Re: Leader election duration >> >> Election epoch.... something in here I hope? >> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0 >> >> On Fri, May 1, 2015 at 12:01 PM, Dudzinski, Karol <[email protected] >> > >> wrote: >> >> > Yes, that is odd. What does the round mean? >> > >> > The Goldman Sachs Group, Inc. All rights reserved. >> > See http://www.gs.com/disclaimer/global_email for important risk >> > disclosures, conflicts of interest and other terms and conditions >> > relating to this e-mail and your reliance on information contained in >> > it. This message may contain confidential or privileged information. >> > If you are not the intended recipient, please advise us immediately >> > and delete this message. See http://www.gs.com/disclaimer/email for >> > further information on confidentiality and the risks of non-secure >> electronic communication. >> > If you cannot access these links, please notify us by reply message >> > and we will send the contents to you. >> > >> > >> > -----Original Message----- >> > From: Camille Fournier [mailto:[email protected]] >> > Sent: 01 May 2015 16:53 >> > To: [email protected] >> > Subject: Re: Leader election duration >> > >> > One thing that jumps out at me here is that a lot of these messages >> > are from different rounds. Some say they are in round 6, and some say >> > they are in round 7. >> > >> > 2015-04-25 12:02:20,585 - INFO [WorkerReceiver[myid=2]: >> > FastLeaderElection@542] - Notification: 5 (n.leader), 0x41d000c7d27 >> > (n.zxid), 0xe6 (n.round), LEADING (n.state), 5 (n.sid), 0x41d >> > (n.peerEPoch), LOOKING (my state) >> > 2015-04-25 12:02:23,723 - INFO [WorkerReceiver[myid=2]: >> > FastLeaderElection@542] - Notification: 1 (n.leader), 0x41d000c7d27 >> > (n.zxid), 0xe7 (n.round), LOOKING (n.state), 1 (n.sid), 0x41d >> > (n.peerEPoch), LOOKING (my state) >> > >> > So that's a bit odd. >> > >> > On Fri, May 1, 2015 at 10:26 AM, Dudzinski, Karol >> > <[email protected]> >> > wrote: >> > >> > > Unfortunately, I can't reproduce this other than in one production >> > > cluster so testing a patch is difficult. >> > > >> > > Below are the relevant logs that Camille requested. >> > > >> > > For the first issue, the leader selection taking a long time, this >> > > is a typical example: >> > > 2015-04-25 12:01:59,734 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :QuorumPeer@670] - LOOKING >> > > 2015-04-25 12:01:59,737 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FileSnap@83] - Reading snapshot <snapshot> >> > > 2015-04-25 12:02:05,203 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:05,327 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FastLeaderElection@740] - New election. My id = 2, proposed >> > > zxid=0x41d000c7d27 >> > > 2015-04-25 12:02:05,327 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:05,328 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:05,530 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FastLeaderElection@774] - Notification time out: 400 >> > > 2015-04-25 12:02:05,940 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FastLeaderElection@774] - Notification time out: 800 >> > > 2015-04-25 12:02:06,750 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FastLeaderElection@774] - Notification time out: 1600 >> > > 2015-04-25 12:02:08,359 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FastLeaderElection@774] - Notification time out: 3200 >> > > 2015-04-25 12:02:08,709 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:08,756 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:10,320 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:10,325 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:13,539 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FastLeaderElection@774] - Notification time out: 6400 >> > > 2015-04-25 12:02:13,707 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:13,755 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING >> > > (n.state), >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:15,329 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:15,334 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:15,582 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING >> > > (n.state), >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:18,709 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 1 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:20,339 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:20,350 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:20,585 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING >> > > (n.state), >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:23,723 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 1 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:25,349 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:25,357 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), FOLLOWING >> > > (n.state), 4 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:25,598 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe6 (n.round), LEADING >> > > (n.state), >> > > 5 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:28,729 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:30,358 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:31,500 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:33,726 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:34,476 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:35,357 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:36,506 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:38,726 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:39,473 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:40,357 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 2 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:41,507 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:43,726 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:44,472 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:46,504 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:48,725 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 2 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41d (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:49,480 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:51,504 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:53,724 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 1 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:54,488 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 5 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:56,514 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 5 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 4 (n.sid), 0x41e (n.peerEPoch), LOOKING (my state) >> > > 2015-04-25 12:02:56,734 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :QuorumPeer@738] - FOLLOWING >> > > 2015-04-25 12:02:56,734 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :ZooKeeperServer@162] - Created server with tickTime 2000 >> > > minSessionTimeout 4000 maxSessionTimeout 100000 datadir <data dir> >> > > snapdir <snap dir> >> > > 2015-04-25 12:02:56,734 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 56999 >> > > >> > > These are logs from server with id 2, they are following the reboot >> > > of the leader which was sid 3. >> > > >> > > >> > > For the second problem, where the sync'ing doesn't happen, here are >> > > the follower logs (immediately after the election shown above): >> > > 2015-04-25 12:02:56,818 - WARN [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Learner@232] - Unexpected exception, tries=0, connecting to <host >> > > with id 5>/<ip>:2182 >> > > java.net.ConnectException: Connection refused >> > > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > > at >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) >> > > at >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) >> > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) >> > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >> > > at java.net.Socket.connect(Socket.java:529) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.jav >> > a:224) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java >> > :71) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > 2015-04-25 12:02:57,906 - WARN [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Learner@232] - Unexpected exception, tries=1, connecting to <host >> > > with id 5>/<ip>:2182 >> > > java.net.ConnectException: Connection refused >> > > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > > at >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) >> > > at >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) >> > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) >> > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >> > > at java.net.Socket.connect(Socket.java:529) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.jav >> > a:224) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java >> > :71) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) >> > > at java.lang.Thread.run(Thread.java:662) >> > > 2015-04-25 12:02:58,725 - INFO >> > > [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 4 >> > > (n.leader), 0x41d000c7d27 (n.zxid), 0xe7 (n.round), LOOKING >> > > (n.state), >> > > 1 (n.sid), 0x41e (n.peerEPoch), FOLLOWING (my state) >> > > 2015-04-25 12:02:58,996 - WARN [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Learner@232] - Unexpected exception, tries=2, connecting to <host >> > > with id 5>/<ip>:2182 >> > > java.net.ConnectException: Connection refused >> > > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > > at >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) >> > > at >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) >> > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) >> > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >> > > at java.net.Socket.connect(Socket.java:529) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.jav >> > a:224) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java >> > :71) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > 2015-04-25 12:03:00,093 - WARN [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Learner@232] - Unexpected exception, tries=3, connecting to <host >> > > with id 5>/<ip>:2182 >> > > java.net.ConnectException: Connection refused >> > > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > > at >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) >> > > at >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) >> > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) >> > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >> > > at java.net.Socket.connect(Socket.java:529) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.jav >> > a:224) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java >> > :71) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > at java.lang.Thread.run(Thread.java:662) >> > > 2015-04-25 12:03:01,185 - ERROR [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Learner@229] - Unexpected exception >> > > java.net.ConnectException: Connection refused >> > > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > > at >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) >> > > at >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) >> > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) >> > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >> > > at java.net.Socket.connect(Socket.java:529) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.jav >> > a:224) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java >> > :71) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:74 >> > > 0) >> > > 2015-04-25 12:03:01,186 - WARN [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Follower@89] - Exception when following the leader >> > > java.net.ConnectException: Connection refused >> > > at java.net.PlainSocketImpl.socketConnect(Native Method) >> > > at >> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) >> > > at >> > > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) >> > > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) >> > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >> > > at java.net.Socket.connect(Socket.java:529) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.jav >> > a:224) >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java >> > :71) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:74 >> > > 0) >> > > 2015-04-25 12:03:01,186 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :Follower@166] - shutdown called >> > > java.lang.Exception: shutdown Follower >> > > at >> > > >> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:74 >> > > 4) >> > > 2015-04-25 12:03:01,187 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :FollowerZooKeeperServer@139] - Shutting down >> > > 2015-04-25 12:03:01,187 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :ZooKeeperServer@419] - shutting down >> > > 2015-04-25 12:03:01,187 - INFO [QuorumPeer[myid=2]/0.0.0.0:2181 >> > > :QuorumPeer@670] - LOOKING >> > > >> > > I think they get rejected because as far as sid 5 is concerned, it's >> > > not the leader yet as shown by these two lines: >> > > 2015-04-25 12:02:27,247 - INFO [QuorumPeer[myid=5]/0.0.0.0:2181 >> > > :QuorumPeer@670] - LOOKING >> > > 2015-04-25 12:03:01,838 - INFO [QuorumPeer[myid=5]/0.0.0.0:2181 >> > > :Leader@345] - LEADING - LEADER ELECTION TOOK - 34591 >> > > >> > > A short while later the leader times out: >> > > 2015-04-25 12:03:21,850 - WARN [QuorumPeer[myid=5]/0.0.0.0:2181 >> > > :QuorumPeer@756] - Unexpected exception >> > > java.lang.InterruptedException: Timeout while waiting for epoch from >> > quorum >> > > at >> > > >> > org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.jav >> > a:872) >> > > at >> > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:364) >> > > at >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:75 >> > > 3) >> > > >> > > During another election, the leader sees this: >> > > 2015-04-25 17:03:55,967 - INFO [QuorumPeer[myid=5]/0.0.0.0:2181 >> > > :Leader@345] - LEADING - LEADER ELECTION TOOK - 34114 >> > > 2015-04-25 17:04:06,034 - INFO >> > > [LearnerHandler-/<ip>:34474:LearnerHandler@263] - Follower sid: 4 : >> > > info >> > > : >> > > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5c4b8582 >> > > >> > > Sometimes this includes all the expected followers, sometimes just >> > > one or two. The problem doesn't end there, even for the follower >> > > for which it prints the line above, it often never gets to the line >> > > below which I see in successful elections (this is an example from a >> > > clean cluster I just spun up for testing): >> > > 2015-04-29 11:05:55,941 - INFO >> > > [LearnerHandler-/<ip>:46737:LearnerHandler@385] - Synchronizing with >> > > Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 >> > > peerLastZxid=0x0 >> > > >> > > Karol >> > > >> > > The Goldman Sachs Group, Inc. All rights reserved. >> > > See http://www.gs.com/disclaimer/global_email for important risk >> > > disclosures, conflicts of interest and other terms and conditions >> > > relating to this e-mail and your reliance on information contained >> > > in it. This message may contain confidential or privileged >> information. >> > > If you are not the intended recipient, please advise us immediately >> > > and delete this message. See http://www.gs.com/disclaimer/email for >> > > further information on confidentiality and the risks of non-secure >> > electronic communication. >> > > If you cannot access these links, please notify us by reply message >> > > and we will send the contents to you. >> > > >> > > >> > > -----Original Message----- >> > > From: Flavio Junqueira [mailto:[email protected]] >> > > Sent: 29 April 2015 22:54 >> > > To: [email protected] >> > > Subject: Re: Leader election duration >> > > >> > > 170MB isn't that large. I'm thinking that the election might be >> > > taking some time because of the issue described in ZK-2164. In that >> > > jira, the election was failing, but I can see the blocking call >> > > causing the election to take longer as well. You may want to test >> > > that fix and see if it makes any difference to you. >> > > >> > > -Flavio >> > > >> > > > On 29 Apr 2015, at 19:05, Karol Dudzinski >> > > > <[email protected]> >> > > wrote: >> > > > >> > > > The snapshots are a bit larger than I'd like (~170MB) but the bit >> > > > that >> > > takes time is actually before that. >> > > > >> > > > On a successful election, on the new leader I see the following: >> > > > - LEADER ELECTION TOOK: 12345 >> > > > - Synchronizing with Follower... >> > > > - Sending DIFF or SNAP >> > > > >> > > > That last bit will take a long time if the snapshot is large, >> agreed. >> > > However, for me it's arriving at the LEADER ELECTION TOOK bit that >> > > takes a long time. And the bit between that and the Synchronizing >> > > often times out with the messages I described in my previous email. >> > > > >> > > > Karol >> > > > >> > > >> On 29 Apr 2015, at 18:46, Flavio Junqueira >> > > <[email protected]> wrote: >> > > >> >> > > >> Check the size of your snapshot files. If during election you >> > > >> need to >> > > transfer snapshots and they are large, then recovery is going to >> > > take longer. We try to send a diff of the txns when possible, and >> > > sending a diff is much faster compared to send a full snapshot. >> > > Sending a diff isn't always possible, depending on how far behind >> > > followers compared to the leader, so the difference might be due to >> snapshots and diffs. >> > > >> -Flavio >> > > >> >> > > >> >> > > >> On Wednesday, April 29, 2015 6:32 PM, Camille Fournier < >> > > [email protected]> wrote: >> > > >> >> > > >> >> > > >> >> > > >> Don't suppose you could share some snippets of the log messages >> > > >> you're seeing a lot? >> > > >> >> > > >> On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski >> > > >> <[email protected]> >> > > >> wrote: >> > > >> >> > > >>> Did a bit more investigation today though I didn't try the prod >> > > >>> snapshot in a test cluster. I did however find a couple of >> > > >>> other strange things which I'd initially missed. >> > > >>> >> > > >>> When the leader of this particular cluster goes for a reboot, I >> > > >>> see many notification printouts from FastLeaderElection. On >> > > >>> some occasions I've seen nearly 50. In other clusters I usually >> > > >>> see 5 or thereabouts. The number of these notifications seems >> > > >>> to correlate with how long election took. What could cause it >> > > >>> to require >> > > exchanging so many messages? >> > > >>> >> > > >>> This cluster also seems to often timeout while syncing after >> > > >>> selecting the leader. On the new leader I see the expected >> > > >>> number of >> > > "Follower sid: ..." >> > > >>> messages but it doesn't always get to the "Synchronizing with >> > Follower" >> > > >>> part. On the leader I then get a "Unexpected exception causing >> > > >>> shutdown while sock still open" message followed by a GOODBYE >> > > >>> for one of the followers. At the same time on that follower I >> > > >>> see a "Exception when following the leader" caused by a read >> > > >>> timeout in Learner.registerWithLeader. Does anything happen >> > > >>> prior to syncing that could take a long time? >> > > >>> >> > > >>> Thanks, >> > > >>> Karol >> > > >>> >> > > >>>>> On 28 Apr 2015, at 20:22, Karol Dudzinski >> > > >>>>> <[email protected]> >> > > >>>> wrote: >> > > >>>> >> > > >>>> Well these are prod clusters so my ability to experiment is >> > > >>>> rather >> > > >>> limited. I can take a copy of the snapshot and try both 3 node >> > > >>> and >> > > >>> 5 in a test cluster. >> > > >>>> >> > > >>>> One thing I forgot to mention is that in most clusters the >> > > >>>> number of >> > > >>> election notification log lines I see is typically, give or >> > > >>> take, the same as the number of participants. In this cluster >> > > >>> however, it's typically 2 or 3 times as many notifications as >> > > >>> the number of >> > > participants. >> > > >>>> >> > > >>>> My gut feeling is it's more likely to be due to load as the 5 >> > > >>>> node >> > > >>> cluster is much busier and the election time has been increasing >> > > >>> over time (as has load). I have no idea exactly what load >> > > >>> though, whether it's number of clients, frequency of >> > > >>> transactions, total data size, etc. I don't understand why >> > > >>> though but that may just be my limited knowledge of the election >> protocol. >> > > >>>> >> > > >>>> Karol >> > > >>>> >> > > >>>>> On 28 Apr 2015, at 19:54, Camille Fournier >> > > >>>>> <[email protected]> >> > > wrote: >> > > >>>>> >> > > >>>>> Just out of curiosity, if you start the 5 node cluster up with >> > > >>>>> only 3 of the nodes to begin with (like, config 5, but only >> > > >>>>> bring up 3 processes), does it speed up the leader election or >> > > >>>>> is it still >> > > slow? >> > > >>>>> >> > > >>>>> C >> > > >>>>> >> > > >>>>> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski < >> > > >>> [email protected]> >> > > >>>>> wrote: >> > > >>>>> >> > > >>>>>> Hi, >> > > >>>>>> >> > > >>>>>> We're seeing some rather strange leader election in one of >> > > >>>>>> our >> > > >>> clusters. >> > > >>>>>> The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" >> > > >>>>>> log >> > > >>> line >> > > >>>>>> (and equivalent for the leader) seems to vary hugely. During >> > > >>>>>> one >> > > >>> rolling >> > > >>>>>> reboot, I saw the number reported as small as 39ms and as >> > > >>>>>> large as 57 seconds (difference in units is not a typo). The >> > > >>>>>> average is just >> > > >>> about 10 >> > > >>>>>> seconds and std dev also about 10 seconds. So the time taken >> > > >>>>>> is not >> > > >>> only >> > > >>>>>> quite large, it's also very variable. >> > > >>>>>> >> > > >>>>>> We have other clusters but the average election time in those >> > > >>>>>> is in the hundreds of millis with std dev in a similar >> > > >>>>>> ballpark. I guess one difference is the "slow" cluster is 5 >> > > >>>>>> participants while the others >> > > >>> are 3, >> > > >>>>>> which may be a factor but I wouldn't expect it to make two >> > > >>>>>> orders of magnitude difference! >> > > >>>>>> >> > > >>>>>> So my question is, what factors contribute to the election >> > > >>>>>> time >> > > >>> reported >> > > >>>>>> by these log lines? And what can we do to speed this up? >> > > >>>>>> >> > > >>>>>> As far as I understand from logs and a quick browse through >> > > >>>>>> the code >> > > >>> that >> > > >>>>>> time is the time to select a leader. Syncing up to the >> > > >>>>>> leader happens after that. The syncing part I can understand >> > > >>>>>> will vary depending on >> > > >>> load >> > > >>>>>> but I don't see why selecting the leader would. >> > > >>>>>> >> > > >>>>>> Thanks, >> > > >>>>>> Karol >> > > >> >> > > >> >> > > >> >> > > >> > > >> > >> > >
