Interestingly, the delay went away at the same time as node2 was brought back 
up.

2018-09-14 09:33:30,944 - INFO  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New 
election. My id =  3, proposed zxid=0x600001f52
2018-09-14 09:33:30,945 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] 
- Notification: 1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 
0x4e8 (n.round), LOOKING (n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my 
state)
2018-09-14 09:33:30,946 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] 
- Notification: 1 (message format version), 1 (n.leader), 0x600001ee4 (n.zxid), 
0x4e8 (n.round), LEADING (n.state), 1 (n.sid), 0x6 (n.peerEpoch) LOOKING (my 
state)
2018-09-14 09:33:30,947 - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@595] 
- Notification: 1 (message format version), 3 (n.leader), 0x600001f52 (n.zxid), 
0x4e8 (n.round), LOOKING (n.state), 2 (n.sid), 0x6 (n.peerEpoch) LOOKING (my 
state)

-----Original Message-----
From: Andrew January [mailto:ajanu...@evertz.com]
Sent: 14 September 2018 16:58
To: user@zookeeper.apache.org
Subject: Issue electing leader, suspect delay in processing notifications

I have a 3 node cluster on 3.4.12. One of the nodes (node 2) was taken down for 
maintenance.
The remaining two nodes seem to have issues gaining quorum.

I think the issues is that node 1 is taking 5 seconds to process notifications.

>From the logs, node 1 and node 3 both agree that node 3 should be leader.
Node 1 sends out it's notification that it's following node 3, then immediately 
tries to connect to it to follow.
Because node 3 takes 5 seconds to pick up the notification that node 1 accepts 
it as leader, it isn't accepting connections to be followed.
This means node 1 times out trying to connect to node 3 before node 3 becomes 
leader.
Node 1 starts a new election. Node 3 times out waiting for acknowledgement that 
it's the leader. The process repeats.

My evidence that node 1 is taking 5 seconds to process notifications is:
1) Node 3 starts a new election, which broadcasts a notification with a new 
round to itself and node 1
2) Node 1 gets the message almost immediately
3) Node 3 doesn't get its own message until 5 seconds later


node3: 2018-09-14 09:28:50,534 - INFO  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New 
election. My id =  3, proposed zxid=0x600001f52
node1: 2018-09-14 09:28:50,535 - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification: 1 (message 
format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e0 (n.round), LOOKING 
(n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:28:55,539 - INFO  
[WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message 
format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e0 (n.round), LOOKING 
(n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)

node3: 2018-09-14 09:29:25,544 - INFO  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New 
election. My id =  3, proposed zxid=0x600001f52
node1: 2018-09-14 09:29:25,546 - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification: 1 (message 
format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e1 (n.round), LOOKING 
(n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:29:30,547 - INFO  
[WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message 
format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e1 (n.round), LOOKING 
(n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)

node3: 2018-09-14 09:30:00,772 - INFO  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New 
election. My id =  3, proposed zxid=0x600001f52
node1: 2018-09-14 09:30:00,774 - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification: 1 (message 
format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e2 (n.round), LOOKING 
(n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:30:05,777 - INFO  
[WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message 
format version), 3 (n.leader), 0x600001f52 (n.zxid), 0x4e2 (n.round), LOOKING 
(n.state), 3 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)

Similarly, when node 1 tries to start an election:
1) Node 1 starts a new election, which has a new round
2) Node 1 gets the message almost immediately
3) Node 3 doesn't get the message until 5 seconds later

node1: 2018-09-14 09:31:24,853 - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New 
election. My id =  1, proposed zxid=0x600001ee4
node1: 2018-09-14 09:31:25,653 - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification: 1 (message 
format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e5 (n.round), LOOKING 
(n.state), 1 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:31:30,654 - INFO  
[WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message 
format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e5 (n.round), LOOKING 
(n.state), 1 (n.sid), 0x6 (n.peerEpoch) LEADING (my state)

node1: 2018-09-14 09:32:00,072 - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New 
election. My id =  1, proposed zxid=0x600001ee4
node1: 2018-09-14 09:32:00,680 - INFO  
[WorkerReceiver[myid=1]:FastLeaderElection@595] - Notification: 1 (message 
format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e6 (n.round), LOOKING 
(n.state), 1 (n.sid), 0x6 (n.peerEpoch) LOOKING (my state)
node3: 2018-09-14 09:32:05,685 - INFO  
[WorkerReceiver[myid=3]:FastLeaderElection@595] - Notification: 1 (message 
format version), 1 (n.leader), 0x600001ee4 (n.zxid), 0x4e6 (n.round), LOOKING 
(n.state), 1 (n.sid), 0x6 (n.peerEpoch) LEADING (my state)

I've looked through the code but I can't see any reason it would have a 
consistent 5 second delay when processing notifications. Any ideas?
This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this e-mail in error please notify the sender (as shown 
above). Kindly do not reproduce, print or forward any material received in 
error, please delete it immediately. Evertz UK Limited (Company No. 3458137) is 
incorporated in England and Wales and has its registered office at 100 
Berkshire Place, Wharfedale Road, Winnersh, Wokingham, Berkshire, United 
Kingdom, RG41 5RD. Evertz Singapore Pte Limited (Company No. 200817005N) is 
incorporated in Singapore and has its registered office at One Marina 
Boulevard, #28-00. Singapore 018989.
This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this e-mail in error please notify the sender (as shown 
above). Kindly do not reproduce, print or forward any material received in 
error, please delete it immediately. Evertz UK Limited (Company No. 3458137) is 
incorporated in England and Wales and has its registered office at 100 
Berkshire Place, Wharfedale Road, Winnersh, Wokingham, Berkshire, United 
Kingdom, RG41 5RD. Evertz Singapore Pte Limited (Company No. 200817005N) is 
incorporated in Singapore and has its registered office at One Marina 
Boulevard, #28-00. Singapore 018989.

Reply via email to