[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. b. On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. c. However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. b. On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. c. However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. b. On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. c. However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] -
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. * The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. * On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. * However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] -
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. ** The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the snapshot of the in-memory data tree on the node 2. ** On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x20001. ** However, this does not remove the setData at zxid 0x30001 from the transaction log on the node 1. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] -
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. {panel:title=Log from the node 2:} ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1:} {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} {panel} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] -
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. a. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. a. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {noformat} ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... {noformat} *no* further _formatting_ is done here {noformat} At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. {panel:title=Log from the node 1: } ... 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... Some text with a title {panel} At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] -
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {panel:title=My title} Some text with a title {panel} 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: # At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: {panel:title=My title} Some text with a title {panel} 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 a. Shutdown the node 0 and 1 b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ c. Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ c. Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 a. Shutdown the node 0 b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ c. Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 a. Shutdown the node 2 b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ c. Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 a. Shutdown the node 1 b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ c. Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 * Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 {noformat} To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) {noformat} ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 {noformat} ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001 zxid
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 *no* further _formatting_ is done here {noformat} Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001 zxid of leader is
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: # Diverge the node 2 ## Shutdown the node 0 and 1 ## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ ## Shutdown the node 2 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ ## Shutdown the node 1 # Restart the node 0 and 1 (let them finish with resync) # Diverge the node 1 ## Shutdown the node 0 ## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ ## Shutdown the node 1 # Restart the node 0 and 2 (let them finish with resync) # Diverge the node 0 ## Shutdown the node 2 ## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ ## Shutdown the node 0 # Restart the node 1 and 2 (let them finish with resync) # Diverge the node 2 ## Shutdown the node 1 ## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ ## Shutdown the node 2 # Restart the node 1 and 2 (let them finish with resync) # Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: {noformat} *no* further _formatting_ is done here {noformat} /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) # Create 5 znodes with initial values as follow (key = value) ## /testDivergenceResync0 = 0 ## /testDivergenceResync1 = 1 ## /testDivergenceResync2 = 2 ## /testDivergenceResync3 = 3 ## /testDivergenceResync4 = 4 To Reproduce: Diverge the node 2 Shutdown the node 0 and 1 Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ Shutdown the node 2 Restart the node 0 and 1 (let them finish with resync) Diverge the node 1 Shutdown the node 0 Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ Shutdown the node 1 Restart the node 0 and 1 (let them finish with resync) Diverge the node 1 Shutdown the node 0 Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ Shutdown the node 1 Restart the node 0 and 2 (let them finish with resync) Diverge the node 0 Shutdown the node 2 Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ Shutdown the node 0 Restart the node 1 and 2 (let them finish with resync) Diverge the node 2 Shutdown the node 1 Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ Shutdown the node 2 Restart the node 1 and 2 (let them finish with resync) Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001 zxid of leader is 0x5sent zxid of db as 0x20001 2017-11-16 03:02:37,701 [myid:2] -
[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beom Heyn Kim updated ZOOKEEPER-2945: - Description: Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 and this one is for 3.4.11 and later) Initially: Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader) Create 5 znodes with initial values as follow (key = value) /testDivergenceResync0 = 0 /testDivergenceResync1 = 1 /testDivergenceResync2 = 2 /testDivergenceResync3 = 3 /testDivergenceResync4 = 4 To Reproduce: Diverge the node 2 Shutdown the node 0 and 1 Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’ Shutdown the node 2 Restart the node 0 and 1 (let them finish with resync) Diverge the node 1 Shutdown the node 0 Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’ Shutdown the node 1 Restart the node 0 and 1 (let them finish with resync) Diverge the node 1 Shutdown the node 0 Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’ Shutdown the node 1 Restart the node 0 and 2 (let them finish with resync) Diverge the node 0 Shutdown the node 2 Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’ Shutdown the node 0 Restart the node 1 and 2 (let them finish with resync) Diverge the node 2 Shutdown the node 1 Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’ Shutdown the node 2 Restart the node 1 and 2 (let them finish with resync) Restart the node 0 (let it finish with resync) Reading each key from each node directly will give us the output: /testDivergenceResync0 on the node 0 = 0 /testDivergenceResync0 on the node 1 = 0 /testDivergenceResync0 on the node 2 = 0 /testDivergenceResync1 on the node 0 = 1001 /testDivergenceResync1 on the node 1 = 1001 /testDivergenceResync1 on the node 2 = 1001 /testDivergenceResync2 on the node 0 = 2 /testDivergenceResync2 on the node 1 = 1002 /testDivergenceResync2 on the node 2 = 2 /testDivergenceResync3 on the node 0 = 3 /testDivergenceResync3 on the node 1 = 3 /testDivergenceResync3 on the node 2 = 3 /testDivergenceResync4 on the node 0 = 1004 /testDivergenceResync4 on the node 1 = 1004 /testDivergenceResync4 on the node 2 = 1004 Thus, the value of key /testDivergenceResync2 is inconsistent across nodes. What seems to happen: At the step 5, setData (at zxid 0x30001) writing the value 1002 is committed on the node 1. Log from the node 1: 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174] - Processing request:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a 2017-11-16 03:02:19,965 [myid:1] - INFO [SyncThread:1:FileTxnLog@209] - Creating new log file: log.30001 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x30001 is 1 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down 2017-11-16 03:02:21,173 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 1 ] ... At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node 2 which is the leader. Log from the node 2: ... 2017-11-16 03:02:37,470 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER ELECTION TOOK - 232 2017-11-16 03:02:37,479 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP 2017-11-16 03:02:37,626 [myid:2] - INFO [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last zxid of peer is 0x30001 zxid of leader is 0x5sent zxid of db as 0x20001 2017-11-16 03:02:37,701 [myid:2] - INFO