[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.
a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory 
data tree from the snapshot of the in-memory data tree on the node 2. 
b. On the other hand, the node 1 will clear its in-memory data tree and 
restore it with the snapshot from the node 2. Then, it takes its own snapshot 
at zxid 0x20001. 
c. However, this does not remove the setData at zxid 0x30001 from the 
transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.
a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory 
data tree from the snapshot of the in-memory data tree on the node 2. 
b. On the other hand, the node 1 will clear its in-memory data tree and 
restore it with the snapshot from the node 2. Then, it takes its own snapshot 
at zxid 0x20001. 
c. However, this does not remove the setData at zxid 0x30001 from the 
transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.
a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory 
data tree from the snapshot of the in-memory data tree on the node 2. 
b. On the other hand, the node 1 will clear its in-memory data tree and 
restore it with the snapshot from the node 2. Then, it takes its own snapshot 
at zxid 0x20001. 
c. However, this does not remove the setData at zxid 0x30001 from the 
transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.
 The node 2 sends SNAP msg so that the node 1 can restore its in-memory 
data tree from the snapshot of the in-memory data tree on the node 2. On the 
other hand, the node 1 will clear its in-memory data tree and restore it with 
the snapshot from the node 2. Then, it takes its own snapshot at zxid 
0x20001. However, this does not remove the setData at zxid 0x30001 from 
the transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.
 The node 2 sends SNAP msg so that the node 1 can restore its in-memory 
data tree from the snapshot of the in-memory data tree on the node 2. 
 On the other hand, the node 1 will clear its in-memory data tree and 
restore it with the snapshot from the node 2. Then, it takes its own snapshot 
at zxid 0x20001. 
 However, this does not remove the setData at zxid 0x30001 from the 
transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.  
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.  
*  The node 2 sends SNAP msg so that the node 1 can restore its in-memory data 
tree from the snapshot of the in-memory data tree on the node 2. 
* On the other hand, the node 1 will clear its in-memory data tree and restore 
it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 
0x20001. 
* However, this does not remove the setData at zxid 0x30001 from the 
transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader.  
**  The node 2 sends SNAP msg so that the node 1 can restore its in-memory data 
tree from the snapshot of the in-memory data tree on the node 2. 
** On the other hand, the node 1 will clear its in-memory data tree and restore 
it with the snapshot from the node 2. Then, it takes its own snapshot at zxid 
0x20001. 
** However, this does not remove the setData at zxid 0x30001 from the 
transaction log on the node 1.
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
{panel:title=Log from the node 2:}
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}
# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{panel}


# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1:}
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}
{panel}


# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.



Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
a. Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
a. Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.

Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

# At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1:
{noformat}
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
{noformat}

At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1:
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...

{noformat}
*no* further _formatting_ is done here
{noformat}

At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
{panel:title=Log from the node 1: }
...
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
Some text with a title
{panel}

At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 
{panel:title=My title}
Some text with a title
{panel}

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
# At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

{panel:title=My title}
Some text with a title
{panel}

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
a. Shutdown the node 0 and 1
b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
c. Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
c. Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
a. Shutdown the node 0
b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
c. Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
a. Shutdown the node 2
b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
c. Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
a. Shutdown the node 1
b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
c. Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
*  Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4
{noformat}

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
{noformat}
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4
{noformat}

## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  zxid 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004

*no* further _formatting_ is done here
{noformat}
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  zxid of leader is 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
# Diverge the node 2
## Shutdown the node 0 and 1
## Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
## Shutdown the node 2
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
## Shutdown the node 1
# Restart the node 0 and 1 (let them finish with resync)
# Diverge the node 1
## Shutdown the node 0
## Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
## Shutdown the node 1
# Restart the node 0 and 2 (let them finish with resync)
# Diverge the node 0
## Shutdown the node 2
## Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
## Shutdown the node 0
# Restart the node 1 and 2 (let them finish with resync)
# Diverge the node 2
## Shutdown the node 1
## Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
## Shutdown the node 2
# Restart the node 1 and 2 (let them finish with resync)
# Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
{noformat}
*no* further _formatting_ is done here
{noformat}

/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
leader)
# Create 5 znodes with initial values as follow (key = value)
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
Diverge the node 2
Shutdown the node 0 and 1
Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
Shutdown the node 2
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
Shutdown the node 1
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
Shutdown the node 1
Restart the node 0 and 2 (let them finish with resync)
Diverge the node 0
Shutdown the node 2
Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
Shutdown the node 0
Restart the node 1 and 2 (let them finish with resync)
Diverge the node 2
Shutdown the node 1
Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
Shutdown the node 2
Restart the node 1 and 2 (let them finish with resync)
Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  zxid of leader is 0x5sent zxid of db as 
0x20001
2017-11-16 03:02:37,701 [myid:2] - 

[jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader

2017-11-20 Thread Beom Heyn Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2945:
-
Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not 
properly truncating the uncommitted write when the follower receives SNAP msg 
from the leader. So, inconsistency can occur in the in-memory data tree across 
nodes. Here is one procedure to reproduce the inconsistency. (Actually, this 
seems similar to my previous report on ZOOKEEPER-2832, but it was for 3.4.10 
and this one is for 3.4.11 and later)

Initially:
Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
Create 5 znodes with initial values as follow (key = value)
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4

To Reproduce:
Diverge the node 2
Shutdown the node 0 and 1
Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
Shutdown the node 2
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
Shutdown the node 1
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
Shutdown the node 1
Restart the node 0 and 2 (let them finish with resync)
Diverge the node 0
Shutdown the node 2
Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
Shutdown the node 0
Restart the node 1 and 2 (let them finish with resync)
Diverge the node 2
Shutdown the node 1
Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
Shutdown the node 2
Restart the node 1 and 2 (let them finish with resync)
Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x30001) writing the value 1002 is 
committed on the node 1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::CommitProcessor@174] - Processing request:: 
sessionid:0x10001110808 type:setData cxid:0x4 zxid:0x30001 txntype:5 
reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 
cport:-1)::Leader@787] - Proposing:: sessionid:0x10001110808 type:setData 
cxid:0x4 zxid:0x30001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - 
Creating new log file: log.30001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for 
zxid: 0x30001 is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512] - Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, 
only synced with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed 
with the node 2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372] - LEADING - LEADER 
ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@346] - Follower sid: 1 : info : 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@401] - Synchronizing with 
Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x30001
2017-11-16 03:02:37,626 [myid:2] - DEBUG 
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@472] - proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@475] - Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  
[LearnerHandler-/127.0.0.1:46899:LearnerHandler@499] - Sending snapshot last 
zxid of peer is 0x30001  zxid of leader is 0x5sent zxid of db as 
0x20001
2017-11-16 03:02:37,701 [myid:2] - INFO