[GitHub] zookeeper pull request #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue ...
Github user lvfangmin closed the pull request at: https://github.com/apache/zookeeper/pull/647 ---
[GitHub] zookeeper pull request #647: ZOOKEEPER-3125: Fixing pzxid consistent issue w...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/647#discussion_r221391402 --- Diff: src/java/main/org/apache/zookeeper/server/DataTree.java --- @@ -527,6 +527,24 @@ public void deleteNode(String path, long zxid) int lastSlash = path.lastIndexOf('/'); String parentName = path.substring(0, lastSlash); String childName = path.substring(lastSlash + 1); + +// The child might already be deleted during taking fuzzy snapshot, +// but we still need to update the pzxid here before throw exception +// for no such child +DataNode parent = nodes.get(parentName); +if (parent == null) { +throw new KeeperException.NoNodeException(); +} +synchronized (parent) { +parent.removeChild(childName); +// Only update pzxid when the zxid is larger than the current pzxid, +// otherwise we might override higher pzxid set by a following create +// Txn, which could cause the cversion and pzxid inconsistent +if (zxid > parent.stat.getPzxid()) { --- End diff -- This is a new bug I found recently, the previous change in ZOOKEEPER-3125 has a bug which could revert pzxid. Given it's a known issue in that Jira, I think it's reasonable to fix it in the same patch here before porting back to 3.5. ---
[GitHub] zookeeper pull request #647: ZOOKEEPER-3125: Fixing pzxid consistent issue w...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/647#discussion_r221390760 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/FuzzySnapshotRelatedTest.java --- @@ -162,6 +167,98 @@ public void process(String path) { new String(zk[followerA].getData(node2, null, null))); } +/** + * It's possibel during SNAP sync, the parent is serialized before the + * child get deleted during sending the snapshot over. + * + * In which case, we need to make sure the pzxid get correctly updated + * when applying the txns received. + */ +@Test +public void testPZxidUpdatedDuringSnapSyncing() throws Exception { +LOG.info("Enable force snapshot sync"); +System.setProperty(LearnerHandler.FORCE_SNAP_SYNC, "true"); + +final String parent = "/testPZxidUpdatedWhenDeletingNonExistNode"; +final String child = parent + "/child"; +createEmptyNode(zk[leaderId], parent); +createEmptyNode(zk[leaderId], child); + +LOG.info("shutdown follower {}", followerA); +mt[followerA].shutdown(); +QuorumPeerMainTest.waitForOne(zk[followerA], States.CONNECTING); + +LOG.info("Set up ZKDatabase to catch the node serializing in DataTree"); +addSerializeListener(leaderId, parent, child); + +LOG.info("Restart follower A to trigger a SNAP sync with leader"); +mt[followerA].start(); +QuorumPeerMainTest.waitForOne(zk[followerA], States.CONNECTED); + +LOG.info("Check and make sure the pzxid of the parent is the same " + +"on leader and follower A"); +compareStat(parent, leaderId, followerA); +} + +/** + * It's possible during taking fuzzy snapshot, the parent is serialized + * before the child get deleted in the fuzzy range. + * + * In which case, we need to make sure the pzxid get correctly updated + * when replaying the txns. + */ +@Test +public void testPZxidUpdatedWhenLoadingSnapshot() throws Exception { + +final String parent = "/testPZxidUpdatedDuringTakingSnapshot"; +final String child = parent + "/child"; +createEmptyNode(zk[followerA], parent); +createEmptyNode(zk[followerA], child); + +LOG.info("Set up ZKDatabase to catch the node serializing in DataTree"); +addSerializeListener(followerA, parent, child); + +LOG.info("Take snapshot on follower A"); +ZooKeeperServer zkServer = mt[followerA].main.quorumPeer.getActiveServer(); +zkServer.takeSnapshot(); + +LOG.info("Restarting follower A to load snapshot"); +mt[followerA].shutdown(); +mt[followerA].start(); +QuorumPeerMainTest.waitForOne(zk[followerA], States.CONNECTED); + +LOG.info("Check and make sure the pzxid of the parent is the same " + +"on leader and follower A"); +compareStat(parent, leaderId, followerA); +} + +private void addSerializeListener(int sid, String parent, String child) { +final ZooKeeper zkClient = zk[followerA]; +CustomDataTree dt = +(CustomDataTree) mt[sid].main.quorumPeer.getZkDb().getDataTree(); +dt.addListener(parent, new NodeSerializeListener() { +@Override +public void nodeSerialized(String path) { +try { +zkClient.delete(child, -1); +LOG.info("Deleted the child node after the parent is serialized"); +} catch (Exception e) { +LOG.error("Error when deleting node {}", e); +} +} +}); +} + +private void compareStat(String path, int sid, int compareWithSid) throws Exception { +Stat stat1 = new Stat(); +zk[sid].getData(path, null, stat1); + +Stat stat2 = new Stat(); +zk[compareWithSid].getData(path, null, stat2); --- End diff -- I made a comment on that before sending this diff out, I was trying to hear the opinion before making the test change here. Seems we agreed on not using RetryRule, I'll improve this test class. ---
[GitHub] zookeeper pull request #647: ZOOKEEPER-3125: Fixing pzxid consistent issue w...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/647#discussion_r221201278 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/FuzzySnapshotRelatedTest.java --- @@ -162,6 +167,98 @@ public void process(String path) { new String(zk[followerA].getData(node2, null, null))); } +/** + * It's possibel during SNAP sync, the parent is serialized before the + * child get deleted during sending the snapshot over. + * + * In which case, we need to make sure the pzxid get correctly updated + * when applying the txns received. + */ +@Test +public void testPZxidUpdatedDuringSnapSyncing() throws Exception { +LOG.info("Enable force snapshot sync"); +System.setProperty(LearnerHandler.FORCE_SNAP_SYNC, "true"); + +final String parent = "/testPZxidUpdatedWhenDeletingNonExistNode"; +final String child = parent + "/child"; +createEmptyNode(zk[leaderId], parent); +createEmptyNode(zk[leaderId], child); + +LOG.info("shutdown follower {}", followerA); +mt[followerA].shutdown(); +QuorumPeerMainTest.waitForOne(zk[followerA], States.CONNECTING); + +LOG.info("Set up ZKDatabase to catch the node serializing in DataTree"); +addSerializeListener(leaderId, parent, child); + +LOG.info("Restart follower A to trigger a SNAP sync with leader"); +mt[followerA].start(); +QuorumPeerMainTest.waitForOne(zk[followerA], States.CONNECTED); + +LOG.info("Check and make sure the pzxid of the parent is the same " + +"on leader and follower A"); +compareStat(parent, leaderId, followerA); +} + +/** + * It's possible during taking fuzzy snapshot, the parent is serialized + * before the child get deleted in the fuzzy range. + * + * In which case, we need to make sure the pzxid get correctly updated + * when replaying the txns. + */ +@Test +public void testPZxidUpdatedWhenLoadingSnapshot() throws Exception { + +final String parent = "/testPZxidUpdatedDuringTakingSnapshot"; +final String child = parent + "/child"; +createEmptyNode(zk[followerA], parent); +createEmptyNode(zk[followerA], child); + +LOG.info("Set up ZKDatabase to catch the node serializing in DataTree"); +addSerializeListener(followerA, parent, child); + +LOG.info("Take snapshot on follower A"); +ZooKeeperServer zkServer = mt[followerA].main.quorumPeer.getActiveServer(); +zkServer.takeSnapshot(); + +LOG.info("Restarting follower A to load snapshot"); +mt[followerA].shutdown(); +mt[followerA].start(); +QuorumPeerMainTest.waitForOne(zk[followerA], States.CONNECTED); + +LOG.info("Check and make sure the pzxid of the parent is the same " + +"on leader and follower A"); +compareStat(parent, leaderId, followerA); +} + +private void addSerializeListener(int sid, String parent, String child) { +final ZooKeeper zkClient = zk[followerA]; +CustomDataTree dt = +(CustomDataTree) mt[sid].main.quorumPeer.getZkDb().getDataTree(); +dt.addListener(parent, new NodeSerializeListener() { +@Override +public void nodeSerialized(String path) { +try { +zkClient.delete(child, -1); +LOG.info("Deleted the child node after the parent is serialized"); +} catch (Exception e) { +LOG.error("Error when deleting node {}", e); +} +} +}); +} + +private void compareStat(String path, int sid, int compareWithSid) throws Exception { +Stat stat1 = new Stat(); +zk[sid].getData(path, null, stat1); + +Stat stat2 = new Stat(); +zk[compareWithSid].getData(path, null, stat2); --- End diff -- @hanm mentioned a flaky test issue related to this part: https://github.com/apache/zookeeper/pull/605/files#r219628119 I don't see answer in the original PR, how are you dealing with the problem? ---
[GitHub] zookeeper pull request #647: ZOOKEEPER-3125: Fixing pzxid consistent issue w...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/647#discussion_r221200308 --- Diff: src/java/main/org/apache/zookeeper/server/DataTree.java --- @@ -527,6 +527,24 @@ public void deleteNode(String path, long zxid) int lastSlash = path.lastIndexOf('/'); String parentName = path.substring(0, lastSlash); String childName = path.substring(lastSlash + 1); + +// The child might already be deleted during taking fuzzy snapshot, +// but we still need to update the pzxid here before throw exception +// for no such child +DataNode parent = nodes.get(parentName); +if (parent == null) { +throw new KeeperException.NoNodeException(); +} +synchronized (parent) { +parent.removeChild(childName); +// Only update pzxid when the zxid is larger than the current pzxid, +// otherwise we might override higher pzxid set by a following create +// Txn, which could cause the cversion and pzxid inconsistent +if (zxid > parent.stat.getPzxid()) { --- End diff -- Looks like this is the fix you're talking about and which will be fixed separately in 3.6. Is that correct? ---
[GitHub] zookeeper pull request #647: ZOOKEEPER-3125: Fixing pzxid consistent issue w...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/647#discussion_r221200641 --- Diff: src/java/test/org/apache/zookeeper/server/DataTreeTest.java --- @@ -153,6 +153,31 @@ public void testIncrementCversion() throws Exception { (newCversion == prevCversion + 1 && newPzxid == prevPzxid + 1)); } +@Test +public void testPzxidUpdatedWhenDeletingNonExistNode() throws Exception { --- End diff -- This new test is also part of the fix. ---
[GitHub] zookeeper pull request #647: ZOOKEEPER-3125: Fixing pzxid consistent issue w...
GitHub user lvfangmin opened a pull request: https://github.com/apache/zookeeper/pull/647 ZOOKEEPER-3125: Fixing pzxid consistent issue when replaying a txn for a deleted node Port this to 3.5, this also fixed the issue where the pzxid might be overwritten by a smaller one when replaying deleteNode txn, which will be fixed in a separate PR for 3.6. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-3125-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/647.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #647 commit 90ef67cf79c0eaec4b8b3389a8711cce8ba25255 Author: Fangmin Lyu Date: 2018-09-12T11:29:13Z ZOOKEEPER-3125: Fixing pzxid consistent issue when replaying a txn for a deleted node Author: Fangmin Lyu Reviewers: br...@apache.org, an...@apache.org Closes #605 from lvfangmin/ZOOKEEPER-3125 ---