[jira] [Created] (ZOOKEEPER-4773) Ephemeral node is not deleted when all followers are blocked with leader
May created ZOOKEEPER-4773: -- Summary: Ephemeral node is not deleted when all followers are blocked with leader Key: ZOOKEEPER-4773 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4773 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.9.1, 3.8.3 Reporter: May The test case EphemeralNodeDeletionTest describes that a follower loses its connection with leader when the client writes an ephemeral node, and it should delete the node after the client closed. However, the case fails when I make all followers lose connections. To reproduce the bug, I simply modified testEphemeralNodeDeletion() as following: {code:java} // 2: inject network problem in two followers ArrayList followers = getFollowers(); for (CustomQuorumPeer follower : followers) { follower.setInjectError(true); } //CustomQuorumPeer follower = (CustomQuorumPeer) getByServerState(mt, ServerState.FOLLOWING); //follower.setInjectError(true); // 3: close the session so that ephemeral node is deleted zk.close(); // remove the error //follower.setInjectError(false); for (CustomQuorumPeer follower : followers) { follower.setInjectError(false); assertTrue(ClientBase.waitForServerUp("127.0.0.1:" + follower.getClientPort(), CONNECTION_TIMEOUT), "Faulted Follower should have joined quorum by now"); } {code} And here is added method getFollowers(): {code:java} private ArrayList getFollowers() { ArrayList followers = new ArrayList<>(); for (int i = 0; i <= mt.length - 1; i++) { QuorumPeer quorumPeer = mt[i].getQuorumPeer(); if (null != quorumPeer && ServerState.FOLLOWING == quorumPeer.getPeerState()) { followers.add((CustomQuorumPeer)quorumPeer); } } return followers; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4772) Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4772: --- Component/s: quorum > Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower > -- > > Key: ZOOKEEPER-4772 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4772 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Affects Versions: 3.7.2, 3.8.3, 3.9.1 >Reporter: May >Priority: Major > > Current LearnerHandler's syncFollower does not consider the situation that > the proposal (0,0) is committed and snaped. It will not use snap to sync when > minCommittedLog is 0. > The bug can be reproduced by modifying testNewEpochZxid in LearnerHandlerTest: > {code:java} > public void testNewEpochZxid() throws Exception { > long peerZxid; > db.txnLog.add(createProposal(getZxid(0, 0))); // Added > db.txnLog.add(createProposal(getZxid(0, 1))); > db.txnLog.add(createProposal(getZxid(1, 1))); > db.txnLog.add(createProposal(getZxid(1, 2))); > // After leader election, lastProcessedZxid will point to new epoch > db.lastProcessedZxid = getZxid(2, 0); > db.committedLog.add(createProposal(getZxid(0, 0))); // Added > db.committedLog.add(createProposal(getZxid(1, 1))); > db.committedLog.add(createProposal(getZxid(1, 2))); > // Peer has zxid of epoch 0 > peerZxid = getZxid(0, 0); > // We should get snap, we can do better here, but the main logic is > // that we should never send diff if we have never seen any txn older > // than peer zxid > assertTrue(learnerHandler.syncFollower(peerZxid, leader)); // Fail > here > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4772) Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower
May created ZOOKEEPER-4772: -- Summary: Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower Key: ZOOKEEPER-4772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4772 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.9.1, 3.8.3, 3.7.2 Reporter: May Current LearnerHandler's syncFollower does not consider the situation that the proposal (0,0) is committed and snaped. It will not use snap to sync when minCommittedLog is 0. The bug can be reproduced by modifying testNewEpochZxid in LearnerHandlerTest: {code:java} public void testNewEpochZxid() throws Exception { long peerZxid; db.txnLog.add(createProposal(getZxid(0, 0))); // Added db.txnLog.add(createProposal(getZxid(0, 1))); db.txnLog.add(createProposal(getZxid(1, 1))); db.txnLog.add(createProposal(getZxid(1, 2))); // After leader election, lastProcessedZxid will point to new epoch db.lastProcessedZxid = getZxid(2, 0); db.committedLog.add(createProposal(getZxid(0, 0))); // Added db.committedLog.add(createProposal(getZxid(1, 1))); db.committedLog.add(createProposal(getZxid(1, 2))); // Peer has zxid of epoch 0 peerZxid = getZxid(0, 0); // We should get snap, we can do better here, but the main logic is // that we should never send diff if we have never seen any txn older // than peer zxid assertTrue(learnerHandler.syncFollower(peerZxid, leader)); // Fail here {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Description: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 actuallly builds connection with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" # 2022-03-24 22:51:45,033 crash C1ZK1 before: {code:java} org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68), org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46), org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:49,451 restart C1ZK1 before C1ZK2 write to file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001": {code:java} java.io.FileOutputStream.writeBytes(FileOutputStream.java), java.io.FileOutputStream.write(FileOutputStream.java:326), java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:56,744 [Client2] - INFO - build connection with zookeeper # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode "/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /eph" # When we connect with every alive node in the cluster and read data respectively, we get {code:java} 2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper] and [/zookeeper/quota, /eph, /zookeeper] {code} # Then I killed all the nodes in the cluster The file "log.10001" in C1ZK1 is : {code:java} ZooKeeper Transactional Log File with dbid 0 txnlog format version 2 22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x0 zxid 0x10001 createSession 15000 2,1371985504 22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x1 zxid 0x10002 create '/bug,#68656c6c6f,v{s{31,s{'world,'anyone}}},F,1 2,6620487461 22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x2 zxid 0x10003 setData '/bug,#6e696365,1 2,5588659454 22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x4 zxid
[jira] [Commented] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516203#comment-17516203 ] May commented on ZOOKEEPER-4503: Hello [~symat], I updated the bug report and uploaded the logs. > A restarted node can be accessed before it finishing synchronization with > leader > > > Key: ZOOKEEPER-4503 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > Attachments: zookeeper--server-C1ZK1.out, > zookeeper--server-C1ZK2.out, zookeeper--server-C1ZK3.out, > zookeeper--server-C1ZK4.out, zookeeper--server-C1ZK5.out > > > Here is the bug triggering process: > > # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. > # client create a znode "/bug" with value "bad" > # client update znode "/bug" to value "good" > # zk1 crashes before receiving proposal for leader for the request in step 3. > # "/bug" is modified to "good" > # zk1 was restarted > # another client connects to zk1, reads "/bug" and gets "bad" > # zk1 finish synchronization with current leader, and then modify "/bug" to > "good". > The problem is that zk1 should be accessed by a client when it finish > synchronization with current leader in case of a client reads bad data. > > > The actual testing scenario is as following: > I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), > C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) > > # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with > zookeeper (client1 actuallly builds connection with C1ZK1) > # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file > "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think > this crash does not matter): > {code:java} > java.io.FileOutputStream.(FileOutputStream.java:213), > java.io.FileOutputStream.(FileOutputStream.java:162), > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), > org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} > # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" > # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" > # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" > # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" > # Client1 requests to create ephemeral znode "/eph" > # 2022-03-24 22:51:45,033 crash C1ZK1 before: > {code:java} > org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68), > > org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), > org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), > org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), > org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46), > > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) > {code} > # 2022-03-24 22:51:49,451 restart C1ZK1 before C1ZK2 write to file > "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001": > {code:java} > java.io.FileOutputStream.writeBytes(FileOutputStream.java), > java.io.FileOutputStream.write(FileOutputStream.java:326), > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), > org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) > {code} > # 2022-03-24 22:51:56,744 [Client2] - INFO - build connection with zookeeper > # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode > "/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = > ConnectionLoss for /eph" > # When we connect with every alive node in the cluster
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Description: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 actuallly builds connection with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" # 2022-03-24 22:51:45,033 crash C1ZK1 before: {code:java} org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68), org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46), org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:49,451 restart C1ZK1 before C1ZK2 write to file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001": {code:java} java.io.FileOutputStream.writeBytes(FileOutputStream.java), java.io.FileOutputStream.write(FileOutputStream.java:326), java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:56,744 [Client2] - INFO - build connection with zookeeper # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode "/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /eph" # When we connect with every alive node in the cluster and read data respectively, we get {code:java} 2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper] and [/zookeeper/quota, /eph, /zookeeper] {code} # Then I killed all the nodes in the cluster The file "log.10001" in C1ZK1 is : {code:java} ZooKeeper Transactional Log File with dbid 0 txnlog format version 2 22-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x0 zxid 0x10001 createSession 15000 2,137198550422-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x1 zxid 0x10002 create '/bug,#68656c6c6f,v{s{31,s{'world,'anyone}}},F,1 2,662048746122-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x2 zxid 0x10003 setData '/bug,#6e696365,1 2,558865945422-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x4 zxid
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Attachment: zookeeper--server-C1ZK1.out zookeeper--server-C1ZK2.out zookeeper--server-C1ZK3.out zookeeper--server-C1ZK4.out zookeeper--server-C1ZK5.out > A restarted node can be accessed before it finishing synchronization with > leader > > > Key: ZOOKEEPER-4503 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > Attachments: zookeeper--server-C1ZK1.out, > zookeeper--server-C1ZK2.out, zookeeper--server-C1ZK3.out, > zookeeper--server-C1ZK4.out, zookeeper--server-C1ZK5.out > > > Here is the bug triggering process: > > # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. > # client create a znode "/bug" with value "bad" > # client update znode "/bug" to value "good" > # zk1 crashes before receiving proposal for leader for the request in step 3. > # "/bug" is modified to "good" > # zk1 was restarted > # another client connects to zk1, reads "/bug" and gets "bad" > # zk1 finish synchronization with current leader, and then modify "/bug" to > "good". > The problem is that zk1 should be accessed by a client when it finish > synchronization with current leader in case of a client reads bad data. > > > The actual testing scenario is as following: > I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), > C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) > > # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with > zookeeper (client1 actuallly builds connection with C1ZK1) > # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file > "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think > this crash does not matter): > {code:java} > java.io.FileOutputStream.(FileOutputStream.java:213), > java.io.FileOutputStream.(FileOutputStream.java:162), > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), > org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} > # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" > # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" > # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" > # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" > # Client1 requests to create ephemeral znode "/eph" > # 2022-03-24 22:51:45,033 crash C1ZK1 before: > {code:java} > org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68), > > org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), > org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), > org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), > org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46), > > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) > {code} > # 2022-03-24 22:51:49,451 restart C1ZK1 before C1ZK2 write to file > "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001": > {code:java} > java.io.FileOutputStream.writeBytes(FileOutputStream.java), > java.io.FileOutputStream.write(FileOutputStream.java:326), > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), > org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), > > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) > {code} > # 2022-03-24 22:51:56,744 [Client2] - INFO - build connection with zookeeper > # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode > "/eph", got
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Description: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 actuallly builds connection with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" # 2022-03-24 22:51:45,033 crash C1ZK1 before: {code:java} org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68), org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46), org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:49,451 restart C1ZK1 before C1ZK2 write to file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001": {code:java} java.io.FileOutputStream.writeBytes(FileOutputStream.java), java.io.FileOutputStream.write(FileOutputStream.java:326), java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:56,744 [Client2] - INFO - build connection with zookeeper # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode "/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /eph" # When we check the cluster, we got {code:java} 2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper] [/zookeeper/quota, /eph, /zookeeper] {code} # Then I killed all the nodes in the cluster was: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Description: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 actuallly builds connection with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" # 2022-03-24 22:51:45,033 crash C1ZK1 before: {code:java} org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68), org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46), org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:49,451 restart C1ZK1 before C1ZK2 write to file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001": {code:java} java.io.FileOutputStream.writeBytes(FileOutputStream.java), java.io.FileOutputStream.write(FileOutputStream.java:326), java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0) {code} # 2022-03-24 22:51:56,744 [Client2] - INFO - build connection with zookeeper # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode "/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /eph" # When we check the cluster, we got {code:java} 2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper] [/zookeeper/quota, /eph, /zookeeper] {code} was: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data.
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Description: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 actuallly builds connection with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" was: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 connects with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" > A restarted node can be accessed before it finishing synchronization with > leader > > > Key: ZOOKEEPER-4503 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > Here is the bug triggering process: > > # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. > # client create a znode "/bug" with value "bad" > # client update znode "/bug" to value "good" > # zk1 crashes before
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Description: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. The actual testing scenario is as following: I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with zookeeper (client1 connects with C1ZK1) # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this crash does not matter): {code:java} java.io.FileOutputStream.(FileOutputStream.java:213), java.io.FileOutputStream.(FileOutputStream.java:162), org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181), org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code} # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello" # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice" # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice" # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug" # Client1 requests to create ephemeral znode "/eph" was: Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. > A restarted node can be accessed before it finishing synchronization with > leader > > > Key: ZOOKEEPER-4503 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > Here is the bug triggering process: > > # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. > # client create a znode "/bug" with value "bad" > # client update znode "/bug" to value "good" > # zk1 crashes before receiving proposal for leader for the request in step 3. > # "/bug" is modified to "good" > # zk1 was restarted > # another client connects to zk1, reads "/bug" and gets "bad" > # zk1 finish synchronization with current leader, and then modify "/bug" to > "good". > The problem is that zk1 should be accessed by a client when it finish > synchronization with current leader in case of a client reads bad data. > > > The actual testing scenario is as following: > I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), > C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6) > > # 2022-03-24 22:51:40,246 [Client1] - INFO - build connection with > zookeeper (client1 connects with C1ZK1) > # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file > "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think > this crash does not matter): > {code:java} > java.io.FileOutputStream.(FileOutputStream.java:213), > java.io.FileOutputStream.(FileOutputStream.java:162), > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), > > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582), > org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), >
[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4503: --- Summary: A restarted node can be accessed before it finishing synchronization with leader (was: A restarted node can be accessed before it finish synchronization with leader) > A restarted node can be accessed before it finishing synchronization with > leader > > > Key: ZOOKEEPER-4503 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > Here is the bug triggering process: > > # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. > # client create a znode "/bug" with value "bad" > # client update znode "/bug" to value "good" > # zk1 crashes before receiving proposal for leader for the request in step 3. > # "/bug" is modified to "good" > # zk1 was restarted > # another client connects to zk1, reads "/bug" and gets "bad" > # zk1 finish synchronization with current leader, and then modify "/bug" to > "good". > The problem is that zk1 should be accessed by a client when it finish > synchronization with current leader in case of a client reads bad data. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ZOOKEEPER-4503) A restarted node can be accessed before it finish synchronization with leader
May created ZOOKEEPER-4503: -- Summary: A restarted node can be accessed before it finish synchronization with leader Key: ZOOKEEPER-4503 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.6.3 Reporter: May Here is the bug triggering process: # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader. # client create a znode "/bug" with value "bad" # client update znode "/bug" to value "good" # zk1 crashes before receiving proposal for leader for the request in step 3. # "/bug" is modified to "good" # zk1 was restarted # another client connects to zk1, reads "/bug" and gets "bad" # zk1 finish synchronization with current leader, and then modify "/bug" to "good". The problem is that zk1 should be accessed by a client when it finish synchronization with current leader in case of a client reads bad data. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513349#comment-17513349 ] May commented on ZOOKEEPER-4497: Sorry, it's not a bug. > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client creates an ephemeral znode "/eph" > # client closes the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster > Since ZK1 is down, the cluster should clean up sessions that connect to ZK1. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Description: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster Since ZK1 is down, the cluster should clean up sessions that connect to ZK1. was: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster Since ZK1 is down, the cluster should clear sessions that connect to ZK1. > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client creates an ephemeral znode "/eph" > # client closes the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster > Since ZK1 is down, the cluster should clean up sessions that connect to ZK1. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Description: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster Since ZK1 is down, the cluster should clear sessions that connect to ZK1. was: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster Since ZK1 is down, the cluster should clean sessions that connect to ZK1. > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client creates an ephemeral znode "/eph" > # client closes the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster > Since ZK1 is down, the cluster should clear sessions that connect to ZK1. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Affects Version/s: 3.6.3 > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client creates an ephemeral znode "/eph" > # client closes the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster > Since ZK1 is down, the cluster should clean sessions that connect to ZK1. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Description: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster Since ZK1 is down, the cluster should clean sessions that connect to ZK1. was: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client creates an ephemeral znode "/eph" > # client closes the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster > Since ZK1 is down, the cluster should clean sessions that connect to ZK1. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Description: # client connects to node ZK1; # client creates an ephemeral znode "/eph" # client closes the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster was: # client connects to node ZK1; # client create an ephemeral znode "/eph" # client close the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client creates an ephemeral znode "/eph" > # client closes the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Description: # client connects to node ZK1; # client create an ephemeral znode "/eph" # client close the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster was: # client connect to node ZK1; # client create an ephemeral znode "/eph" # client close the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Reporter: May >Priority: Major > > # client connects to node ZK1; > # client create an ephemeral znode "/eph" > # client close the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4497: --- Description: # client connect to node ZK1; # client create an ephemeral znode "/eph" # client close the session; # ZK1 crashes before sending closing session request to leader # the ephemeral znode "/eph" leaves in the cluster > Crash before closing session makes an ephemeral znode leave in ZooKeeper > > > Key: ZOOKEEPER-4497 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 > Project: ZooKeeper > Issue Type: Bug >Reporter: May >Priority: Major > > # client connect to node ZK1; > # client create an ephemeral znode "/eph" > # client close the session; > # ZK1 crashes before sending closing session request to leader > # the ephemeral znode "/eph" leaves in the cluster -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper
May created ZOOKEEPER-4497: -- Summary: Crash before closing session makes an ephemeral znode leave in ZooKeeper Key: ZOOKEEPER-4497 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497 Project: ZooKeeper Issue Type: Bug Reporter: May -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4416: --- Description: There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. 1. zk1 was stopped for a while; 2. restart zk1, and it starts to follow the current leader; 3. zk1 receives snapshot from leader; 4. zk1 receives UPTODATE message from leader; 5. zk1 takes the snapshot of the current data state; 6. zk1 creates the {{currentEpoch.tmp}} file; 7. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file; 8. restart zk1, and it fails due to "Unable to load database on disk" error: {code:java} java.io.IOException: Found null in /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp at java.lang.Throwable.fillInStackTrace(Throwable.java) at java.lang.Throwable.fillInStackTrace(Throwable.java:784) at java.lang.Throwable.(Throwable.java:266) at java.lang.Exception.(Exception.java:66) at java.io.IOException.(IOException.java:58) at org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) {code} was: There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. 1. zk1 was stopped for a while; 2. restart zk1, and it starts to follow the current leader; 3. zk1 creates the {{currentEpoch.tmp}} file; 4. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file; 5. restart zk1, and it fails due to "Unable to load database on disk" error: {code:java} java.io.IOException: Found null in /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp at java.lang.Throwable.fillInStackTrace(Throwable.java) at java.lang.Throwable.fillInStackTrace(Throwable.java:784) at java.lang.Throwable.(Throwable.java:266) at java.lang.Exception.(Exception.java:66) at java.io.IOException.(IOException.java:58) at org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) {code} > Null currentEpoch.tmp fails the server > -- > > Key: ZOOKEEPER-4416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. > 1. zk1 was stopped for a while; > 2. restart zk1, and it starts to follow the current leader; > 3. zk1 receives snapshot from leader; > 4. zk1 receives UPTODATE message from leader; > 5. zk1 takes the snapshot of the current data state; > 6. zk1 creates the {{currentEpoch.tmp}} file; > 7. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file; > 8. restart zk1, and it fails due to "Unable to load database on disk" error: > {code:java} > java.io.IOException: Found null in > /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp > at java.lang.Throwable.fillInStackTrace(Throwable.java) > at java.lang.Throwable.fillInStackTrace(Throwable.java:784) > at java.lang.Throwable.(Throwable.java:266) > at java.lang.Exception.(Exception.java:66) > at java.io.IOException.(IOException.java:58) > at > org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116) > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118) > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4416: --- Description: There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. 1. zk1 was stopped for a while; 2. restart zk1, and it starts to follow the current leader; 3. zk1 creates the {{currentEpoch.tmp}} file; 4. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file; 5. restart zk1, and it fails due to "Unable to load database on disk" error: {code:java} java.io.IOException: Found null in /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp at java.lang.Throwable.fillInStackTrace(Throwable.java) at java.lang.Throwable.fillInStackTrace(Throwable.java:784) at java.lang.Throwable.(Throwable.java:266) at java.lang.Exception.(Exception.java:66) at java.io.IOException.(IOException.java:58) at org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) {code} was: There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. 1. zk1 was stopped for a while; 2. restart zk1, and it starts to follow the current leader; 3. zk1 creates the > Null currentEpoch.tmp fails the server > -- > > Key: ZOOKEEPER-4416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. > 1. zk1 was stopped for a while; > 2. restart zk1, and it starts to follow the current leader; > 3. zk1 creates the {{currentEpoch.tmp}} file; > 4. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file; > 5. restart zk1, and it fails due to "Unable to load database on disk" error: > {code:java} > java.io.IOException: Found null in > /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp > at java.lang.Throwable.fillInStackTrace(Throwable.java) > at java.lang.Throwable.fillInStackTrace(Throwable.java:784) > at java.lang.Throwable.(Throwable.java:266) > at java.lang.Exception.(Exception.java:66) > at java.io.IOException.(IOException.java:58) > at > org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116) > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118) > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] May updated ZOOKEEPER-4416: --- Description: There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. 1. zk1 was stopped for a while; 2. restart zk1, and it starts to follow the current leader; 3. zk1 creates the > Null currentEpoch.tmp fails the server > -- > > Key: ZOOKEEPER-4416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.3 >Reporter: May >Priority: Major > > There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3. > 1. zk1 was stopped for a while; > 2. restart zk1, and it starts to follow the current leader; > 3. zk1 creates the -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server
May created ZOOKEEPER-4416: -- Summary: Null currentEpoch.tmp fails the server Key: ZOOKEEPER-4416 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.6.3 Reporter: May -- This message was sent by Atlassian Jira (v8.20.1#820001)