Hi all,

I ran into a weird case where Zookeeper seems to have elected a new leader (a node that just restarted and missed several operations), and a client is able to connect to it and create a new sequential node that has a number earlier than the last node it created. I don't have full logs, or a live system in this state, or any data directories, just some partial server logs and the evidence as seen by the client. Haven't tried reproducing it yet, just wanted to see if anyone here had any ideas. Here's the scenario (probably more info than necessary, but trying to be complete)

Version: Zookeeper 3.3.3

1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes #1, #2, and #3 below): 2) Nodes periodically (and throughout this whole timeline) create sequential, non-ephemeral nodes under the /zkrsm parent node. 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116
4) 5:47:06: Node #1 restarts and rejoins
5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708
6) 5:49:29: Node #2 restarts and rejoins
7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291
8) 5:52:02: Node #3 restarts and begins the rejoining process
9) 5:52:08: Node #1 successfully creates /zkrsm/0000000000000000_record0000003348 10) 5:52:08: Node #2 dies after getting notified of /zkrsm/0000000000000000_record0000003348 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have wallclock timestamps, so not exactly sure on the ordering of this step) 12) 5:52:15: Node #1 successfully creates /zkrsm/0000000000000000_record0000003292

Note that the node created in step #12 is lower than the one created in step #9, and is exactly one greater than the last node seen by node #3 before it restarted.

The leader election bit from node #3's log after restarting might be of interest:


2644 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.QuorumPeer - LOOKING 2644 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.QuorumPeer - Initializing leader election protocol... 2647 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - New election. My id = 37, Proposed zxid = 17179869831 2650 [WorkerSender Thread] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Opening channel to server 126 2653 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2653 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 37 (n.leader), 17179869831 (n.zxid), 1 (n.round), LOOKING (n.state), 37 (n.sid), LOOKING (my state) 2654 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 37, proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831 2655 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: From = 37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed epoch = 1 2657 [WorkerSender Thread] INFO org.apache.zookeeper.server.quorum.QuorumCnxManager - Have smaller server identifier, so dropping the connection: (126, 37) 2657 [WorkerSender Thread] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Opening channel to server 215 2661 [Thread-3] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection request /13.0.0.12:60623 2661 [Thread-3] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection request: 37 2661 [WorkerSender Thread] INFO org.apache.zookeeper.server.quorum.QuorumCnxManager - Have smaller server identifier, so dropping the connection: (215, 37) 2662 [Thread-3] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Address of remote peer: 126 2668 [Thread-3] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection request /13.0.0.11:60190 2668 [Thread-3] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Connection request: 37 2668 [Thread-3] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - Address of remote peer: 215 2669 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2670 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), 126 (n.sid), LOOKING (my state) 2670 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 215, proposed id: 37, zxid: 12884902548, proposed zxid: 17179869831 2670 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2670 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: From = 126, Proposed leader = 215, Porposed zxid = 12884902548, Proposed epoch = 3 2670 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING (n.state), 126 (n.sid), LOOKING (my state) 2671 [WorkerSender Thread] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 126 2671 [WorkerSender Thread] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 215 2671 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2671 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 37 (n.leader), 17179869831 (n.zxid), 3 (n.round), LOOKING (n.state), 37 (n.sid), LOOKING (my state) 2671 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2671 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), 215 (n.sid), LOOKING (my state) 2671 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 37, proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831 2672 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: From = 37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed epoch = 3 2672 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LEADING (n.state), 215 (n.sid), LOOKING (my state) 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 215, proposed id: 37, zxid: 12884902548, proposed zxid: 17179869831 2672 [QuorumPeer:/0.0.0.0:2888] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: From = 215, Proposed leader = 215, Porposed zxid = 12884902548, Proposed epoch = 3 2672 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37 2672 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING (n.state), 126 (n.sid), LEADING (my state) 2673 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.QuorumPeer - LEADING 2675 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Leader - TCP NoDelay set to: true 2693 [WorkerReceiver Thread] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 37

I don't know much about the internal workings of Zookeeper, but what I find weird here is that node #3 proposes zxid 17179869831 (0x400000287), which seems to indicate that it wants to start well into epoch 4 before it even opens channels to other nodes, finding out that their zxids are only 12884902548 (0x300000294). Is it possible that node #3 thought its zxid was not less than the other nodes' zxids, and so did not bother transferring the operations it missed?

Any other ideas on what could have gone wrong? I dont see any other obvious exceptions in node #3's log, except:

2487 [pool-1-thread-2] DEBUG org.apache.zookeeper.server.persistence.FileTxnLog - EOF excepton java.io.EOFException: Failed to read

but that seems to happen all the time, and it's at DEBUG level, so I'm guessing it's expected.

Thanks,

Jeremy


Reply via email to