[jira] [Created] (ZOOKEEPER-4773) Ephemeral node is not deleted when all followers are blocked with leader

2023-11-27 Thread May (Jira)
May created ZOOKEEPER-4773:
--

 Summary: Ephemeral node is not deleted when all followers are 
blocked with leader
 Key: ZOOKEEPER-4773
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4773
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.9.1, 3.8.3
Reporter: May


The test case EphemeralNodeDeletionTest describes that a follower loses its 
connection with leader when the client writes an ephemeral node, and it should 
delete the node after the client closed. However, the case fails when I make 
all followers lose connections.


To reproduce the bug, I simply modified testEphemeralNodeDeletion() as 
following:
{code:java}
// 2: inject network problem in two followers
ArrayList followers = getFollowers();
for (CustomQuorumPeer follower : followers) {
follower.setInjectError(true);
}
//CustomQuorumPeer follower = (CustomQuorumPeer) getByServerState(mt, 
ServerState.FOLLOWING);
//follower.setInjectError(true);

// 3: close the session so that ephemeral node is deleted
zk.close();

// remove the error
//follower.setInjectError(false);
for (CustomQuorumPeer follower : followers) {
follower.setInjectError(false);
assertTrue(ClientBase.waitForServerUp("127.0.0.1:" + 
follower.getClientPort(), CONNECTION_TIMEOUT),
"Faulted Follower should have joined quorum by now");
}
{code}
And here is added method getFollowers():
{code:java}
private ArrayList getFollowers() {
ArrayList followers = new ArrayList<>();
for (int i = 0; i <= mt.length - 1; i++) {
QuorumPeer quorumPeer = mt[i].getQuorumPeer();
if (null != quorumPeer && ServerState.FOLLOWING == 
quorumPeer.getPeerState()) {
followers.add((CustomQuorumPeer)quorumPeer);
}
}
return followers;
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4772) Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower

2023-11-27 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4772:
---
Component/s: quorum

> Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower
> --
>
> Key: ZOOKEEPER-4772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4772
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.7.2, 3.8.3, 3.9.1
>Reporter: May
>Priority: Major
>
> Current LearnerHandler's syncFollower does not consider the situation that 
> the proposal (0,0) is committed and snaped. It will not use snap to sync when 
> minCommittedLog is 0.
> The bug can be reproduced by modifying testNewEpochZxid in LearnerHandlerTest:
> {code:java}
> public void testNewEpochZxid() throws Exception {
> long peerZxid;
> db.txnLog.add(createProposal(getZxid(0, 0))); // Added
> db.txnLog.add(createProposal(getZxid(0, 1)));
> db.txnLog.add(createProposal(getZxid(1, 1)));
> db.txnLog.add(createProposal(getZxid(1, 2)));
> // After leader election, lastProcessedZxid will point to new epoch
> db.lastProcessedZxid = getZxid(2, 0);
> db.committedLog.add(createProposal(getZxid(0, 0))); // Added
> db.committedLog.add(createProposal(getZxid(1, 1)));
> db.committedLog.add(createProposal(getZxid(1, 2)));
> // Peer has zxid of epoch 0
> peerZxid = getZxid(0, 0);
> // We should get snap, we can do better here, but the main logic is
> // that we should never send diff if we have never seen any txn older
> // than peer zxid
> assertTrue(learnerHandler.syncFollower(peerZxid, leader)); // Fail 
> here
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4772) Wrong sync logic in LearnerHandler when sync (0,0) to a new epoch follower

2023-11-27 Thread May (Jira)
May created ZOOKEEPER-4772:
--

 Summary: Wrong sync logic in LearnerHandler when sync (0,0) to a 
new epoch follower
 Key: ZOOKEEPER-4772
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4772
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.9.1, 3.8.3, 3.7.2
Reporter: May


Current LearnerHandler's syncFollower does not consider the situation that the 
proposal (0,0) is committed and snaped. It will not use snap to sync when 
minCommittedLog is 0.

The bug can be reproduced by modifying testNewEpochZxid in LearnerHandlerTest:

{code:java}
public void testNewEpochZxid() throws Exception {
long peerZxid;
db.txnLog.add(createProposal(getZxid(0, 0))); // Added
db.txnLog.add(createProposal(getZxid(0, 1)));
db.txnLog.add(createProposal(getZxid(1, 1)));
db.txnLog.add(createProposal(getZxid(1, 2)));

// After leader election, lastProcessedZxid will point to new epoch
db.lastProcessedZxid = getZxid(2, 0);
db.committedLog.add(createProposal(getZxid(0, 0))); // Added
db.committedLog.add(createProposal(getZxid(1, 1)));
db.committedLog.add(createProposal(getZxid(1, 2)));

// Peer has zxid of epoch 0
peerZxid = getZxid(0, 0);
// We should get snap, we can do better here, but the main logic is
// that we should never send diff if we have never seen any txn older
// than peer zxid
assertTrue(learnerHandler.syncFollower(peerZxid, leader)); // Fail here
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-06 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Description: 
Here is the bug triggering process:
 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 actuallly builds connection with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}

 # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"
 # 2022-03-24 22:51:45,033 crash C1ZK1 before:
{code:java}
org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68),
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), 
org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), 
org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), 
org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46),
 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}

 # 2022-03-24 22:51:49,451  restart C1ZK1 before C1ZK2 write to file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001":
{code:java}
java.io.FileOutputStream.writeBytes(FileOutputStream.java), 
java.io.FileOutputStream.write(FileOutputStream.java:326), 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), 
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}

 # 2022-03-24 22:51:56,744 [Client2] - INFO -  build connection with zookeeper
 # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode 
"/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /eph"
 # When we connect with every alive node in the cluster and read data 
respectively, we get
{code:java}
2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server 
C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper] and 
[/zookeeper/quota, /eph, /zookeeper] {code}

 # Then I killed all the nodes in the cluster

 

The file "log.10001" in C1ZK1 is :
{code:java}
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x0 zxid 0x10001 
createSession 15000
 2,1371985504
22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x1 zxid 0x10002 create 
'/bug,#68656c6c6f,v{s{31,s{'world,'anyone}}},F,1
 2,6620487461
22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x2 zxid 0x10003 setData 
'/bug,#6e696365,1
 2,5588659454
22-3-24 22:51:40 session 0x3035dd9dccf cxid 0x4 zxid 

[jira] [Commented] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516203#comment-17516203
 ] 

May commented on ZOOKEEPER-4503:


Hello [~symat], I updated the bug report and uploaded the logs.

> A restarted node can be accessed before it finishing synchronization with 
> leader
> 
>
> Key: ZOOKEEPER-4503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
> Attachments: zookeeper--server-C1ZK1.out, 
> zookeeper--server-C1ZK2.out, zookeeper--server-C1ZK3.out, 
> zookeeper--server-C1ZK4.out, zookeeper--server-C1ZK5.out
>
>
> Here is the bug triggering process:
>  
>  # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
>  # client create a znode "/bug" with value "bad"
>  # client update znode "/bug" to value "good"
>  # zk1 crashes before receiving proposal for leader for the request in step 3.
>  # "/bug" is modified to "good"
>  # zk1 was restarted
>  # another client connects to zk1, reads "/bug"  and gets "bad"
>  # zk1 finish synchronization with current leader, and then modify "/bug" to 
> "good".
> The problem is that zk1 should be accessed by a client when it finish 
> synchronization with current leader in case of a client reads bad data.
>  
> 
> The actual testing scenario is as following:
> I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
> C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)
>  
>  # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with 
> zookeeper (client1 actuallly builds connection with  C1ZK1)
>  # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
> "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think 
> this crash does not matter):
> {code:java}
> java.io.FileOutputStream.(FileOutputStream.java:213), 
> java.io.FileOutputStream.(FileOutputStream.java:162), 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287),
>  
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
>  org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}
>  # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
>  # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
>  # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
>  # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
>  # Client1 requests to create ephemeral znode "/eph"
>  # 2022-03-24 22:51:45,033 crash C1ZK1 before:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68),
>  
> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126),
>  org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), 
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), 
> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
>  {code}
>  # 2022-03-24 22:51:49,451  restart C1ZK1 before C1ZK2 write to file 
> "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001":
> {code:java}
> java.io.FileOutputStream.writeBytes(FileOutputStream.java), 
> java.io.FileOutputStream.write(FileOutputStream.java:326), 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), 
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293),
>  
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
>  org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
>  {code}
>  # 2022-03-24 22:51:56,744 [Client2] - INFO -  build connection with zookeeper
>  # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode 
> "/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = 
> ConnectionLoss for /eph"
>  # When we connect with every alive node in the cluster 

[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Description: 
Here is the bug triggering process:
 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 actuallly builds connection with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}

 # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"
 # 2022-03-24 22:51:45,033 crash C1ZK1 before:
{code:java}
org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68),
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), 
org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), 
org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), 
org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46),
 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}

 # 2022-03-24 22:51:49,451  restart C1ZK1 before C1ZK2 write to file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001":
{code:java}
java.io.FileOutputStream.writeBytes(FileOutputStream.java), 
java.io.FileOutputStream.write(FileOutputStream.java:326), 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), 
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}

 # 2022-03-24 22:51:56,744 [Client2] - INFO -  build connection with zookeeper
 # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode 
"/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /eph"
 # When we connect with every alive node in the cluster and read data 
respectively, we get
{code:java}
2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server 
C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper] and 
[/zookeeper/quota, /eph, /zookeeper] {code}

 # Then I killed all the nodes in the cluster

 

The file "log.10001" in C1ZK1 is :
{code:java}
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
22-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x0 zxid 0x10001 
createSession 15000
 2,137198550422-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x1 zxid 
0x10002 create '/bug,#68656c6c6f,v{s{31,s{'world,'anyone}}},F,1
 2,662048746122-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x2 zxid 
0x10003 setData '/bug,#6e696365,1
 2,558865945422-3-24 下午10时51分40秒 session 0x3035dd9dccf cxid 0x4 zxid 

[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Attachment: zookeeper--server-C1ZK1.out
zookeeper--server-C1ZK2.out
zookeeper--server-C1ZK3.out
zookeeper--server-C1ZK4.out
zookeeper--server-C1ZK5.out

> A restarted node can be accessed before it finishing synchronization with 
> leader
> 
>
> Key: ZOOKEEPER-4503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
> Attachments: zookeeper--server-C1ZK1.out, 
> zookeeper--server-C1ZK2.out, zookeeper--server-C1ZK3.out, 
> zookeeper--server-C1ZK4.out, zookeeper--server-C1ZK5.out
>
>
> Here is the bug triggering process:
>  
>  # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
>  # client create a znode "/bug" with value "bad"
>  # client update znode "/bug" to value "good"
>  # zk1 crashes before receiving proposal for leader for the request in step 3.
>  # "/bug" is modified to "good"
>  # zk1 was restarted
>  # another client connects to zk1, reads "/bug"  and gets "bad"
>  # zk1 finish synchronization with current leader, and then modify "/bug" to 
> "good".
> The problem is that zk1 should be accessed by a client when it finish 
> synchronization with current leader in case of a client reads bad data.
>  
> 
> The actual testing scenario is as following:
> I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
> C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)
>  
>  # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with 
> zookeeper (client1 actuallly builds connection with  C1ZK1)
>  # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
> "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think 
> this crash does not matter):
> {code:java}
> java.io.FileOutputStream.(FileOutputStream.java:213), 
> java.io.FileOutputStream.(FileOutputStream.java:162), 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287),
>  
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
>  org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}
>  # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
>  # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
>  # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
>  # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
>  # Client1 requests to create ephemeral znode "/eph"
>  # 2022-03-24 22:51:45,033 crash C1ZK1 before:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68),
>  
> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126),
>  org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), 
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), 
> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
>  {code}
>  # 2022-03-24 22:51:49,451  restart C1ZK1 before C1ZK2 write to file 
> "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001":
> {code:java}
> java.io.FileOutputStream.writeBytes(FileOutputStream.java), 
> java.io.FileOutputStream.write(FileOutputStream.java:326), 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), 
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293),
>  
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
>  org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
>  
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
>  {code}
>  # 2022-03-24 22:51:56,744 [Client2] - INFO -  build connection with zookeeper
>  # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode 
> "/eph", got 

[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Description: 
Here is the bug triggering process:
 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 actuallly builds connection with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}
# 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"
 # 2022-03-24 22:51:45,033 crash C1ZK1 before:
{code:java}
org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68),
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), 
org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), 
org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), 
org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46),
 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}
 # 2022-03-24 22:51:49,451  restart C1ZK1 before C1ZK2 write to file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001":
{code:java}
java.io.FileOutputStream.writeBytes(FileOutputStream.java), 
java.io.FileOutputStream.write(FileOutputStream.java:326), 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), 
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}
# 2022-03-24 22:51:56,744 [Client2] - INFO -  build connection with zookeeper
 # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode 
"/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /eph"
 # When we check the cluster, we got 
{code:java}
2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server 
C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper]  
[/zookeeper/quota, /eph, /zookeeper] {code}
# Then I killed all the nodes in the cluster



  was:
Here is the bug triggering process:
 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case 

[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Description: 
Here is the bug triggering process:
 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 actuallly builds connection with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}

 # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"
 # 2022-03-24 22:51:45,033 crash C1ZK1 before:
{code:java}
org.apache.zookeeper.server.quorum.QuorumPacket.serialize(QuorumPacket.java:68),
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126), 
org.apache.zookeeper.server.quorum.Learner.writePacketNow(Learner.java:194), 
org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:186), 
org.apache.zookeeper.server.quorum.SendAckRequestProcessor.processRequest(SendAckRequestProcessor.java:46),
 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}

 # 2022-03-24 22:51:49,451  restart C1ZK1 before C1ZK2 write to file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001":
{code:java}
java.io.FileOutputStream.writeBytes(FileOutputStream.java), 
java.io.FileOutputStream.write(FileOutputStream.java:326), 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82), 
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:293), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0)
 {code}

 # 2022-03-24 22:51:56,744 [Client2] - INFO -  build connection with zookeeper
 # 2022-03-24 22:51:56,876 [Client2] - INFO - cannot read ephemeral znode 
"/eph", got "KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /eph"
 # When we check the cluster, we got 
{code:java}
2022-03-24 22:52:14,663 [ZKChecker] - INFO - server C1ZK3:11181 and server 
C1ZK1:11181 have different number of znodes:[/zookeeper/quota, /zookeeper]  
[/zookeeper/quota, /eph, /zookeeper] {code}

  was:
Here is the bug triggering process:

 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 


[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Description: 
Here is the bug triggering process:

 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 actuallly builds connection with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}

 # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"

  was:
Here is the bug triggering process:

 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 connects with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}

 # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"


> A restarted node can be accessed before it finishing synchronization with 
> leader
> 
>
> Key: ZOOKEEPER-4503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> Here is the bug triggering process:
>  
>  # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
>  # client create a znode "/bug" with value "bad"
>  # client update znode "/bug" to value "good"
>  # zk1 crashes before 

[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-04-01 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Description: 
Here is the bug triggering process:

 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.

 



The actual testing scenario is as following:

I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)

 
 # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with zookeeper 
(client1 connects with  C1ZK1)
 # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
"/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think this 
crash does not matter):
{code:java}
java.io.FileOutputStream.(FileOutputStream.java:213), 
java.io.FileOutputStream.(FileOutputStream.java:162), 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287), 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
 org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181),
 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:0{code}

 # 2022-03-24 22:51:40,761 [Client1] - INFO - created znode "/bug" "hello"
 # 2022-03-24 22:51:40,869 [Client1] - INFO - set znode "/bug" "nice"
 # 2022-03-24 22:51:40,915 [Client1] - INFO - read znode "/bug" is "nice"
 # 2022-03-24 22:51:40,996 [Client1] - INFO - deleted znode "/bug"
 # Client1 requests to create ephemeral znode "/eph"

  was:
Here is the bug triggering process:

 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.


> A restarted node can be accessed before it finishing synchronization with 
> leader
> 
>
> Key: ZOOKEEPER-4503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> Here is the bug triggering process:
>  
>  # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
>  # client create a znode "/bug" with value "bad"
>  # client update znode "/bug" to value "good"
>  # zk1 crashes before receiving proposal for leader for the request in step 3.
>  # "/bug" is modified to "good"
>  # zk1 was restarted
>  # another client connects to zk1, reads "/bug"  and gets "bad"
>  # zk1 finish synchronization with current leader, and then modify "/bug" to 
> "good".
> The problem is that zk1 should be accessed by a client when it finish 
> synchronization with current leader in case of a client reads bad data.
>  
> 
> The actual testing scenario is as following:
> I have a cluster of 5 nodes: C1ZK1(172.30.0.2), C1ZK2(172.30.0.3), 
> C1ZK3(172.30.0.4), C1ZK4(172.30.0.5), C1ZK5(172.30.0.6)
>  
>  # 2022-03-24 22:51:40,246 [Client1] - INFO -  build connection with 
> zookeeper (client1 connects with  C1ZK1)
>  # 2022-03-24 22:51:40,479 crash C1ZK4 before creating file 
> "/home/zkuser/evaluation/zk-3.6.3/zkData/version-2/log.10001" (I think 
> this crash does not matter):
> {code:java}
> java.io.FileOutputStream.(FileOutputStream.java:213), 
> java.io.FileOutputStream.(FileOutputStream.java:162), 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:287),
>  
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:582),
>  org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:641), 
> 

[jira] [Updated] (ZOOKEEPER-4503) A restarted node can be accessed before it finishing synchronization with leader

2022-03-28 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4503:
---
Summary: A restarted node can be accessed before it finishing 
synchronization with leader  (was: A restarted node can be accessed before it 
finish synchronization with leader)

> A restarted node can be accessed before it finishing synchronization with 
> leader
> 
>
> Key: ZOOKEEPER-4503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> Here is the bug triggering process:
>  
>  # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
>  # client create a znode "/bug" with value "bad"
>  # client update znode "/bug" to value "good"
>  # zk1 crashes before receiving proposal for leader for the request in step 3.
>  # "/bug" is modified to "good"
>  # zk1 was restarted
>  # another client connects to zk1, reads "/bug"  and gets "bad"
>  # zk1 finish synchronization with current leader, and then modify "/bug" to 
> "good".
> The problem is that zk1 should be accessed by a client when it finish 
> synchronization with current leader in case of a client reads bad data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4503) A restarted node can be accessed before it finish synchronization with leader

2022-03-28 Thread May (Jira)
May created ZOOKEEPER-4503:
--

 Summary: A restarted node can be accessed before it finish 
synchronization with leader
 Key: ZOOKEEPER-4503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4503
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.6.3
Reporter: May


Here is the bug triggering process:

 
 # A cluster with three nodes: zk1, zk2 and zk3. zk3 is the leader.
 # client create a znode "/bug" with value "bad"
 # client update znode "/bug" to value "good"
 # zk1 crashes before receiving proposal for leader for the request in step 3.
 # "/bug" is modified to "good"
 # zk1 was restarted
 # another client connects to zk1, reads "/bug"  and gets "bad"
 # zk1 finish synchronization with current leader, and then modify "/bug" to 
"good".

The problem is that zk1 should be accessed by a client when it finish 
synchronization with current leader in case of a client reads bad data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-28 Thread May (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513349#comment-17513349
 ] 

May commented on ZOOKEEPER-4497:


Sorry, it's not a bug.

> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client creates an ephemeral znode "/eph"
> # client closes the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster
> Since ZK1 is down, the cluster should clean up sessions that connect to ZK1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Description: 
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

Since ZK1 is down, the cluster should clean up sessions that connect to ZK1.

  was:
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

Since ZK1 is down, the cluster should clear sessions that connect to ZK1.


> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client creates an ephemeral znode "/eph"
> # client closes the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster
> Since ZK1 is down, the cluster should clean up sessions that connect to ZK1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Description: 
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

Since ZK1 is down, the cluster should clear sessions that connect to ZK1.

  was:
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

Since ZK1 is down, the cluster should clean sessions that connect to ZK1.


> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client creates an ephemeral znode "/eph"
> # client closes the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster
> Since ZK1 is down, the cluster should clear sessions that connect to ZK1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Affects Version/s: 3.6.3

> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client creates an ephemeral znode "/eph"
> # client closes the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster
> Since ZK1 is down, the cluster should clean sessions that connect to ZK1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Description: 
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

Since ZK1 is down, the cluster should clean sessions that connect to ZK1.

  was:
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster


> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client creates an ephemeral znode "/eph"
> # client closes the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster
> Since ZK1 is down, the cluster should clean sessions that connect to ZK1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Description: 
# client connects to node ZK1;
# client creates an ephemeral znode "/eph"
# client closes the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

  was:
# client connects to node ZK1;
# client create an ephemeral znode "/eph"
# client close the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster


> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client creates an ephemeral znode "/eph"
> # client closes the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Description: 
# client connects to node ZK1;
# client create an ephemeral znode "/eph"
# client close the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

  was:
# client connect to node ZK1;
# client create an ephemeral znode "/eph"
# client close the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster


> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: May
>Priority: Major
>
> # client connects to node ZK1;
> # client create an ephemeral znode "/eph"
> # client close the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4497:
---
Description: 
# client connect to node ZK1;
# client create an ephemeral znode "/eph"
# client close the session;
# ZK1 crashes before sending closing session request  to leader
# the ephemeral znode "/eph" leaves in the cluster

> Crash before closing session makes an ephemeral znode leave in ZooKeeper
> 
>
> Key: ZOOKEEPER-4497
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: May
>Priority: Major
>
> # client connect to node ZK1;
> # client create an ephemeral znode "/eph"
> # client close the session;
> # ZK1 crashes before sending closing session request  to leader
> # the ephemeral znode "/eph" leaves in the cluster



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4497) Crash before closing session makes an ephemeral znode leave in ZooKeeper

2022-03-17 Thread May (Jira)
May created ZOOKEEPER-4497:
--

 Summary: Crash before closing session makes an ephemeral znode 
leave in ZooKeeper
 Key: ZOOKEEPER-4497
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4497
 Project: ZooKeeper
  Issue Type: Bug
Reporter: May






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server

2021-11-18 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4416:
---
Description: 
There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.

1. zk1 was stopped for a while;
2. restart zk1, and it starts to follow the current leader;
3. zk1 receives snapshot from leader;
4. zk1 receives UPTODATE message from leader;
5. zk1 takes the snapshot of the current data state;
6. zk1 creates the {{currentEpoch.tmp}} file;
7. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file;
8. restart zk1, and it fails due to "Unable to load database on disk" error:

{code:java}
java.io.IOException: Found null in 
/home/zk-3.6.3/zkData/version-2/currentEpoch.tmp
at java.lang.Throwable.fillInStackTrace(Throwable.java)
at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
at java.lang.Throwable.(Throwable.java:266)
at java.lang.Exception.(Exception.java:66)
at java.io.IOException.(IOException.java:58)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
{code}


  was:
There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.

1. zk1 was stopped for a while;
2. restart zk1, and it starts to follow the current leader;
3. zk1 creates the {{currentEpoch.tmp}} file;
4. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file;
5. restart zk1, and it fails due to "Unable to load database on disk" error:

{code:java}
java.io.IOException: Found null in 
/home/zk-3.6.3/zkData/version-2/currentEpoch.tmp
at java.lang.Throwable.fillInStackTrace(Throwable.java)
at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
at java.lang.Throwable.(Throwable.java:266)
at java.lang.Exception.(Exception.java:66)
at java.io.IOException.(IOException.java:58)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
{code}



> Null currentEpoch.tmp fails the server
> --
>
> Key: ZOOKEEPER-4416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.
> 1. zk1 was stopped for a while;
> 2. restart zk1, and it starts to follow the current leader;
> 3. zk1 receives snapshot from leader;
> 4. zk1 receives UPTODATE message from leader;
> 5. zk1 takes the snapshot of the current data state;
> 6. zk1 creates the {{currentEpoch.tmp}} file;
> 7. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file;
> 8. restart zk1, and it fails due to "Unable to load database on disk" error:
> {code:java}
> java.io.IOException: Found null in 
> /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp
> at java.lang.Throwable.fillInStackTrace(Throwable.java)
> at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
> at java.lang.Throwable.(Throwable.java:266)
> at java.lang.Exception.(Exception.java:66)
> at java.io.IOException.(IOException.java:58)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server

2021-11-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4416:
---
Description: 
There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.

1. zk1 was stopped for a while;
2. restart zk1, and it starts to follow the current leader;
3. zk1 creates the {{currentEpoch.tmp}} file;
4. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file;
5. restart zk1, and it fails due to "Unable to load database on disk" error:

{code:java}
java.io.IOException: Found null in 
/home/zk-3.6.3/zkData/version-2/currentEpoch.tmp
at java.lang.Throwable.fillInStackTrace(Throwable.java)
at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
at java.lang.Throwable.(Throwable.java:266)
at java.lang.Exception.(Exception.java:66)
at java.io.IOException.(IOException.java:58)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
{code}


  was:
There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.

1. zk1 was stopped for a while;
2. restart zk1, and it starts to follow the current leader;
3. zk1 creates the 



> Null currentEpoch.tmp fails the server
> --
>
> Key: ZOOKEEPER-4416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.
> 1. zk1 was stopped for a while;
> 2. restart zk1, and it starts to follow the current leader;
> 3. zk1 creates the {{currentEpoch.tmp}} file;
> 4. zk1 crashes before writing current epoch to {{currentEpoch.tmp}} file;
> 5. restart zk1, and it fails due to "Unable to load database on disk" error:
> {code:java}
> java.io.IOException: Found null in 
> /home/zk-3.6.3/zkData/version-2/currentEpoch.tmp
> at java.lang.Throwable.fillInStackTrace(Throwable.java)
> at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
> at java.lang.Throwable.(Throwable.java:266)
> at java.lang.Exception.(Exception.java:66)
> at java.io.IOException.(IOException.java:58)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(QuorumPeer.java:2116)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1118)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1079)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server

2021-11-17 Thread May (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

May updated ZOOKEEPER-4416:
---
Description: 
There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.

1. zk1 was stopped for a while;
2. restart zk1, and it starts to follow the current leader;
3. zk1 creates the 


> Null currentEpoch.tmp fails the server
> --
>
> Key: ZOOKEEPER-4416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.3
>Reporter: May
>Priority: Major
>
> There is a ZooKeeper cluster with three nodes: zk1, zk2 and zk3.
> 1. zk1 was stopped for a while;
> 2. restart zk1, and it starts to follow the current leader;
> 3. zk1 creates the 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4416) Null currentEpoch.tmp fails the server

2021-11-17 Thread May (Jira)
May created ZOOKEEPER-4416:
--

 Summary: Null currentEpoch.tmp fails the server
 Key: ZOOKEEPER-4416
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4416
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.6.3
Reporter: May






--
This message was sent by Atlassian Jira
(v8.20.1#820001)