[jira] [Updated] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4715:
--
Labels: pull-request-available  (was: )

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-06-29 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4715:
---

 Summary: Verify file size and position in testGetCurrentLogSize.
 Key: ZOOKEEPER-4715
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
 Project: ZooKeeper
  Issue Type: Wish
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao
 Fix For: 3.9.0, 3.8.2


This is pre-PR for ZOOKEEPER-4714.

In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we want 
our values to match the original values. Therefore, we added checks for 
fileSize and filePosition in our tests. After adding the checks, we used a new 
method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4714:
--
Labels: pull-request-available  (was: )

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-06-29 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4714:
---

 Summary: Improve syncRequestProcessor performance
 Key: ZOOKEEPER-4714
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
 Project: ZooKeeper
  Issue Type: Wish
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao
 Fix For: 3.9.0, 3.8.2
 Attachments: 761688051587_.pic.jpg

In the SyncRequestProcessor, a write operation is performed for each write 
request. Two methods are relatively time-consuming.

1. Within SyncRequestProcessor#shouldSnapshot, the current size of the current 
file is retrieved, which involves a system call.

Call stack:
java.io.File.length(File.java)
org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)

2. Within ZKDatabase#append, the current position of the current file is 
retrieved, which also involves a system call.

Call stack:
sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)


Therefore, it is best to maintain the current size and position of the current 
file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor. The QuorumPeer 
thread should wait for the exit of syncThread before back in LOOKING state:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Summary: Follower.shutdown() and Observer.shutdown() do not correctly 
shutdown the syncProcessor, which may lead to data inconsistency  (was: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor, which may lead to potential data inconsistency)

> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor, which may lead to data inconsistency
> -
>
> Key: ZOOKEEPER-4712
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>Reporter: Sirius
>Priority: Critical
>
> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor. It may lead to potential data inconsistency (see Potential 
> Risk).
>  
> A follower / observer will invoke syncProcessor.shutdown() in 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
> respectively.
> However, after the 
> [FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
>  of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() 
> anymore.
>  
> h2. Call stack
> h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
>  * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
> ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)
>  
> h5. For comparison, in version 3.4.X,
>  * Observer.shutdown() -> Learner.shutdown() -> 
> {*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * Follower.shutdown() -> Learner.shutdown() -> 
> {*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  
> h2. Code Details
> Take version 3.8.0 as an example.
> In Follower.shutdown() :
> {code:java}
>     public void shutdown() {
>         LOG.info("shutdown Follower");
> +       // invoke Learner.shutdown()
>         super.shutdown();   
>     } {code}
>  
> In Learner.java:
> {code:java}
>     public void shutdown() {
>         ...
>         // shutdown previous zookeeper
>         if (zk != null) {
>             // If we haven't finished SNAP sync, force fully shutdown
>             // to avoid potential inconsistency
> +           // This will invoke ZooKeeperServer.shutdown(boolean), 
> +           // which will not shutdown syncProcessor
> +           // Before the fix of ZOOLEEPER-3642, 
> +           // FollowerZooKeeperServer.shutdown() will be invoked here
>             zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP)); 
>          }
>     } {code}
>  
> In ZooKeeperServer.java:
> {code:java}
>     public synchronized void shutdown(boolean fullyShutDown) {
>         ...
>         if (firstProcessor != null) {
> +           // For a follower, this will not shutdown its syncProcessor.
>             firstProcessor.shutdown(); 
>         }
>         ...
>     } {code}
>  
> In expectation, Follower.shutdown() should invoke 
> LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
> {code:java}
>     public synchronized void shutdown() {
>         ...
>         try {
> +           // shutdown the syncProcessor here
>             if (syncProcessor != null) {
>                 syncProcessor.shutdown();     
>             }
>         } ...
>     } {code}
> Observer.shutdown() has the similar problem.
>  
> h2. Potential Risk
> When Follower.shutdown() is called, the follower's QuorumPeer thread may 
> invoke fastForwardDataBase() and 
> update the lastProcessedZxid for the election and recovery phase before its 
> syncThread drains the pending requests and flushes them to disk.
> In consequence, this lastProcessedZxid is not the latest zxid in its log, 
> leading to log inconsistency after the SYNC phase. (Similar to the symptoms 
> of ZOOKEEPER-2845.)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an 

[jira] [Updated] (ZOOKEEPER-4713) ObserverZooKeeperServer.shutdown() is redundant

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4713:
--
Description: 
After the 
[FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
 of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible for 
the shutdown logic of both the follower and observer. 
ObserverZooKeeperServer.shutdown() seems redundant, because it is not in the 
call stack of Observer.shutdown(). (Note that FollowerZooKeeperServer does not 
have the shutdown() method.)

Related analysis can be found in ZOOKEEPER-4712

  was:
After the 
[FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
 of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible for 
the shutdown logic of both the follower and observer. 
ObserverZooKeeperServer.shutdown() seems redundant.

Related analysis can be found in 
[ZOOKEEPER-4712|https://issues.apache.org/jira/browse/ZOOKEEPER-4712)ZOOKEEPER-4712]


> ObserverZooKeeperServer.shutdown() is redundant
> ---
>
> Key: ZOOKEEPER-4713
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4713
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>Reporter: Sirius
>Priority: Minor
>
> After the 
> [FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
>  of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible 
> for the shutdown logic of both the follower and observer. 
> ObserverZooKeeperServer.shutdown() seems redundant, because it is not in the 
> call stack of Observer.shutdown(). (Note that FollowerZooKeeperServer does 
> not have the shutdown() method.)
> Related analysis can be found in ZOOKEEPER-4712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4713) ObserverZooKeeperServer.shutdown() is redundant

2023-06-29 Thread Sirius (Jira)
Sirius created ZOOKEEPER-4713:
-

 Summary: ObserverZooKeeperServer.shutdown() is redundant
 Key: ZOOKEEPER-4713
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4713
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4
Reporter: Sirius


After the 
[FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
 of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible for 
the shutdown logic of both the follower and observer. 
ObserverZooKeeperServer.shutdown() seems redundant.

Related analysis can be found in 
[ZOOKEEPER-4712|https://issues.apache.org/jira/browse/ZOOKEEPER-4712)ZOOKEEPER-4712]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 
h3. Example trace

(TODO)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of [ZOOKEEPER-3642|https://issues.apache.org/jira/browse/ZOOKEEPER-3642], 
Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of [ZOOKEEPER-3642|https://issues.apache.org/jira/browse/ZOOKEEPER-3642], 
Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}

 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}

Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()

{        LOG.info("shutdown Follower"); + // invoke Learner.shutdown()        
super.shutdown();   }

In Learner.java:
public void shutdown() {
      ...
       // shutdown previous zookeeper
       if (zk != null)

{            // If we haven't finished SNAP sync, force fully shutdown          
  // to avoid potential inconsistency +         // This will invoke 
ZooKeeperServer.shutdown(boolean), + // which will not shutdown syncProcessor + 
// Before the fix of ZOOLEEPER-3642, + // FollowerZooKeeperServer.shutdown() 
will be invoked here            
zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));         }

  }
In ZooKeeperServer.java:
   public synchronized void shutdown(boolean fullyShutDown) {
      ...
       if (firstProcessor != null)

{ + // For a follower, this will not shutdown its syncProcessor.            
firstProcessor.shutdown();       }

    ...
  }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
   public synchronized void shutdown() {
       ...
       try {
+         // shutdown the syncProcessor here
           if (syncProcessor != null)

{                syncProcessor.shutdown();               }

      } ...
  }
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -- 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) -> LeaderZooKeeper.shutdown() 
->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * 
Observer.shutdown(){-}>Learner.shutdown(){-}>{*}ObserverZooKeeperServer.shutdown(){*}{-}>ZooKeeperServer.shutdown(){-}>ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown(){-}>Learner.shutdown(){-}>{*}FollowerZooKeeperServer.shutdown(){*}>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()

{        LOG.info("shutdown Follower"); + // invoke Learner.shutdown()        
super.shutdown();   }

In Learner.java:
public void shutdown() {
      ...
       // shutdown 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -- 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) -> LeaderZooKeeper.shutdown() 
->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * 
Observer.shutdown(){-}>Learner.shutdown(){-}>{*}ObserverZooKeeperServer.shutdown(){*}{-}>ZooKeeperServer.shutdown(){-}>ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown(){-}>Learner.shutdown(){-}>{*}FollowerZooKeeperServer.shutdown(){*}>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()

{        LOG.info("shutdown Follower"); + // invoke Learner.shutdown()        
super.shutdown();   }

In Learner.java:
public void shutdown() {
      ...
       // shutdown previous zookeeper
       if (zk != null)

{            // If we haven't finished SNAP sync, force fully shutdown          
  // to avoid potential inconsistency +         // This will invoke 
ZooKeeperServer.shutdown(boolean), + // which will not shutdown syncProcessor + 
// Before the fix of ZOOLEEPER-3642, + // FollowerZooKeeperServer.shutdown() 
will be invoked here            
zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));         }

  }
In ZooKeeperServer.java:
   public synchronized void shutdown(boolean fullyShutDown) {
      ...
       if (firstProcessor != null)

{ + // For a follower, this will not shutdown its syncProcessor.            
firstProcessor.shutdown();       }

    ...
  }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
   public synchronized void shutdown() {
       ...
       try {
+         // shutdown the syncProcessor here
           if (syncProcessor != null)

{                syncProcessor.shutdown();               }

      } ...
  }
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* 
Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* 
Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * (For comparison) 
Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X, 
 * 
Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 

h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
    public void shutdown() {
        LOG.info("shutdown Follower");
+   // invoke Learner.shutdown()
        super.shutdown();   
    }
In Learner.java:
public void shutdown() {
        ...
        // shutdown previous zookeeper
        

[jira] [Created] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)
Sirius created ZOOKEEPER-4712:
-

 Summary: Follower.shutdown() and Observer.shutdown() do not 
correctly shutdown the syncProcessor, which may lead to potential data 
inconsistency
 Key: ZOOKEEPER-4712
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4
Reporter: Sirius


Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* 
Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* 
Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * (For comparison) 
Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X, 
 * 
Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 

h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
    public void shutdown() {
        LOG.info("shutdown Follower");
+   // invoke Learner.shutdown()
        super.shutdown();   
    }
In Learner.java:
public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+   // which will not shutdown syncProcessor
+   // Before the fix of ZOOLEEPER-3642, 
+   // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));  

        }
    }
In ZooKeeperServer.java:
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+   // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
    ...
    }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    }
Observer.shutdown() has the similar problem. 

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk. 

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4711) a data race in org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers

2023-06-29 Thread lujie (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated ZOOKEEPER-4711:
-
Description: 
When we run :

mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false

The following method in class : org.apache.zookeeper.server.watch.WatcherCleaner
{code:java}
public void addDeadWatcher(int watcherBit) {
        // Wait if there are too many watchers waiting to be closed,
        // this is will slow down the socket packet processing and
        // the adding watches in the ZK pipeline.
        while (maxInProcessingDeadWatchers > 0 && !stopped && 
totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
            try {
                RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning");
                long startTime = Time.currentElapsedTime();
                synchronized (processingCompletedEvent) {
                    processingCompletedEvent.wait(100);
                }
                long latency = Time.currentElapsedTime() - startTime;
                
ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
            } catch (InterruptedException e) {
                LOG.info("Got interrupted while waiting for dead watches queue 
size");
                break;
            }
        }
        synchronized (this) {
            
            if (deadWatchers.add(watcherBit)) {
                totalDeadWatchers.incrementAndGet();
                ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
                if (deadWatchers.size() >= watcherCleanThreshold) {
                    synchronized (cleanEvent) {
                        cleanEvent.notifyAll();
                    }
                }
            }

        }
    }{code}
{code:java}
@Override
    public void run() {
        while (!stopped) {
            synchronized (cleanEvent) {
                try {
                    // add some jitter to avoid cleaning dead watchers at the
                    // same time in the quorum
                    if (!stopped && deadWatchers.size() < 
watcherCleanThreshold) {
                        
                        int maxWaitMs = (watcherCleanIntervalInSeconds
                                         + 
ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
1000;
                        cleanEvent.wait(maxWaitMs);
                    }
                } catch (InterruptedException e) {
                    LOG.info("Received InterruptedException while waiting for 
cleanEvent");
                    break;
                }
            }            if (deadWatchers.isEmpty()) {
                continue;
            }            synchronized (this) {
                // Clean the dead watchers need to go through all the current
                // watches, which is pretty heavy and may take a second if
                // there are millions of watches, that's why we're doing lazily
                // batch clean up in a separate thread with a snapshot of the
                // current dead watchers.
                final Set snapshot = new HashSet<>(deadWatchers);
                deadWatchers.clear();
                int total = snapshot.size();
                LOG.info("Processing {} dead watchers", total);
                cleaners.schedule(new WorkRequest() {
                    @Override
                    public void doWork() throws Exception {
                        long startTime = Time.currentElapsedTime();
                        listener.processDeadWatchers(snapshot);
                        long latency = Time.currentElapsedTime() - startTime;
                        LOG.info("Takes {} to process {} watches", latency, 
total);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEANER_LATENCY.add(latency);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEARED.add(total);
                        totalDeadWatchers.addAndGet(-total);
                        synchronized (processingCompletedEvent) {
                            processingCompletedEvent.notifyAll();
                        }
                    }
                });
            }
        }
        LOG.info("WatcherCleaner thread exited");
    }{code}
As we can see, the two methods visist deadWatchers Object by different thread. 
*Thread in run()* is *read* operation on deadWachers and Thread in 
addDeadWatcher is *write* operation on deadWachers. This causes a data race 
without any lock.

  was:
When we run :

mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false

The method of addDeadWatcher

(
            System.out.println("2s::" +Thread.currentThread().getName()+ "  

[jira] [Updated] (ZOOKEEPER-4711) a data race in org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers

2023-06-29 Thread lujie (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated ZOOKEEPER-4711:
-
Summary: a data race in 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers   (was: 
There is a data race bettween run() and addDeadWatcher in 
org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test.)

> a data race in 
> org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> ---
>
> Key: ZOOKEEPER-4711
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4711
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
> Environment: download zookeeper 3.9.0-SNAPSHOT from github repository 
> ([https://github.com/apache/zookeeper)]
> Then run : mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
>Reporter: lujie
>Priority: Critical
>
> When we run :
> mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
> The method of addDeadWatcher
> (
>             System.out.println("2s::" +Thread.currentThread().getName()+ "  
> "+System.identityHashCode(deadWatchers)+"  " + System.currentTimeMillis());
> this is my debug info.
> )
> {code:java}
> public void addDeadWatcher(int watcherBit) {
>         // Wait if there are too many watchers waiting to be closed,
>         // this is will slow down the socket packet processing and
>         // the adding watches in the ZK pipeline.
>         while (maxInProcessingDeadWatchers > 0 && !stopped && 
> totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
>             try {
>                 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
> cleaning");
>                 long startTime = Time.currentElapsedTime();
>                 synchronized (processingCompletedEvent) {
>                     processingCompletedEvent.wait(100);
>                 }
>                 long latency = Time.currentElapsedTime() - startTime;
>                 
> ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
>             } catch (InterruptedException e) {
>                 LOG.info("Got interrupted while waiting for dead watches 
> queue size");
>                 break;
>             }
>         }
>         synchronized (this) {
>             
>             if (deadWatchers.add(watcherBit)) {
>                 totalDeadWatchers.incrementAndGet();
>                 ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
>                 if (deadWatchers.size() >= watcherCleanThreshold) {
>                     synchronized (cleanEvent) {
>                         cleanEvent.notifyAll();
>                     }
>                 }
>             }
>         }
>     }{code}
>  
> {code:java}
> @Override
>     public void run() {
>         while (!stopped) {
>             synchronized (cleanEvent) {
>                 try {
>                     // add some jitter to avoid cleaning dead watchers at the
>                     // same time in the quorum
>                     if (!stopped && deadWatchers.size() < 
> watcherCleanThreshold) {
>                         
>                         int maxWaitMs = (watcherCleanIntervalInSeconds
>                                          + 
> ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
> 1000;
>                         cleanEvent.wait(maxWaitMs);
>                     }
>                 } catch (InterruptedException e) {
>                     LOG.info("Received InterruptedException while waiting for 
> cleanEvent");
>                     break;
>                 }
>             }            if (deadWatchers.isEmpty()) {
>                 continue;
>             }            synchronized (this) {
>                 // Clean the dead watchers need to go through all the current
>                 // watches, which is pretty heavy and may take a second if
>                 // there are millions of watches, that's why we're doing 
> lazily
>                 // batch clean up in a separate thread with a snapshot of the
>                 // current dead watchers.
>                 final Set snapshot = new HashSet<>(deadWatchers);
>                 deadWatchers.clear();
>                 int total = snapshot.size();
>                 LOG.info("Processing {} dead watchers", total);
>                 cleaners.schedule(new WorkRequest() {
>                     @Override
>                     public void doWork() throws Exception {
>                 

[jira] [Updated] (ZOOKEEPER-4711) There is a data race bettween run() and addDeadWatcher in org.apache.zookeeper.server.watch.WatcherCleaner class when run org.apache.zookeeper.server.watch.WatchManag

2023-06-29 Thread lujie (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated ZOOKEEPER-4711:
-
Summary: There is a data race bettween run() and addDeadWatcher in 
org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test. 
 (was: There is a data race bettween run() and "public void addDeadWatcher(int 
watcherBit)" in org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test.)

> There is a data race bettween run() and addDeadWatcher in 
> org.apache.zookeeper.server.watch.WatcherCleaner class when run 
> org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit 
> test.
> -
>
> Key: ZOOKEEPER-4711
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4711
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
> Environment: download zookeeper 3.9.0-SNAPSHOT from github repository 
> ([https://github.com/apache/zookeeper)]
> Then run : mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
>Reporter: lujie
>Priority: Critical
>
> When we run :
> mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
> The method of addDeadWatcher
> (
>             System.out.println("2s::" +Thread.currentThread().getName()+ "  
> "+System.identityHashCode(deadWatchers)+"  " + System.currentTimeMillis());
> this is my debug info.
> )
> {code:java}
> public void addDeadWatcher(int watcherBit) {
>         // Wait if there are too many watchers waiting to be closed,
>         // this is will slow down the socket packet processing and
>         // the adding watches in the ZK pipeline.
>         while (maxInProcessingDeadWatchers > 0 && !stopped && 
> totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
>             try {
>                 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
> cleaning");
>                 long startTime = Time.currentElapsedTime();
>                 synchronized (processingCompletedEvent) {
>                     processingCompletedEvent.wait(100);
>                 }
>                 long latency = Time.currentElapsedTime() - startTime;
>                 
> ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
>             } catch (InterruptedException e) {
>                 LOG.info("Got interrupted while waiting for dead watches 
> queue size");
>                 break;
>             }
>         }
>         synchronized (this) {
>             
>             if (deadWatchers.add(watcherBit)) {
>                 totalDeadWatchers.incrementAndGet();
>                 ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
>                 if (deadWatchers.size() >= watcherCleanThreshold) {
>                     synchronized (cleanEvent) {
>                         cleanEvent.notifyAll();
>                     }
>                 }
>             }
>         }
>     }{code}
>  
> {code:java}
> @Override
>     public void run() {
>         while (!stopped) {
>             synchronized (cleanEvent) {
>                 try {
>                     // add some jitter to avoid cleaning dead watchers at the
>                     // same time in the quorum
>                     if (!stopped && deadWatchers.size() < 
> watcherCleanThreshold) {
>                         
>                         int maxWaitMs = (watcherCleanIntervalInSeconds
>                                          + 
> ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
> 1000;
>                         cleanEvent.wait(maxWaitMs);
>                     }
>                 } catch (InterruptedException e) {
>                     LOG.info("Received InterruptedException while waiting for 
> cleanEvent");
>                     break;
>                 }
>             }            if (deadWatchers.isEmpty()) {
>                 continue;
>             }            synchronized (this) {
>                 // Clean the dead watchers need to go through all the current
>                 // watches, which is pretty heavy and may take a second if
>                 // there are millions of watches, that's why we're doing 
> lazily
>                 // batch clean up in a separate thread with a snapshot of the
>                 // current dead 

[jira] [Created] (ZOOKEEPER-4711) There is a data race bettween run() and "public void addDeadWatcher(int watcherBit)" in org.apache.zookeeper.server.watch.WatcherCleaner class when run org.apache.zoo

2023-06-29 Thread lujie (Jira)
lujie created ZOOKEEPER-4711:


 Summary: There is a data race bettween run() and "public void 
addDeadWatcher(int watcherBit)" in 
org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test.
 Key: ZOOKEEPER-4711
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4711
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.9.0
 Environment: download zookeeper 3.9.0-SNAPSHOT from github repository 
([https://github.com/apache/zookeeper)]

Then run : mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false
Reporter: lujie


When we run :

mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false

The method of addDeadWatcher

(
            System.out.println("2s::" +Thread.currentThread().getName()+ "  
"+System.identityHashCode(deadWatchers)+"  " + System.currentTimeMillis());
this is my debug info.
)
{code:java}
public void addDeadWatcher(int watcherBit) {
        // Wait if there are too many watchers waiting to be closed,
        // this is will slow down the socket packet processing and
        // the adding watches in the ZK pipeline.
        while (maxInProcessingDeadWatchers > 0 && !stopped && 
totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
            try {
                RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning");
                long startTime = Time.currentElapsedTime();
                synchronized (processingCompletedEvent) {
                    processingCompletedEvent.wait(100);
                }
                long latency = Time.currentElapsedTime() - startTime;
                
ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
            } catch (InterruptedException e) {
                LOG.info("Got interrupted while waiting for dead watches queue 
size");
                break;
            }
        }
        synchronized (this) {
            
            if (deadWatchers.add(watcherBit)) {
                totalDeadWatchers.incrementAndGet();
                ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
                if (deadWatchers.size() >= watcherCleanThreshold) {
                    synchronized (cleanEvent) {
                        cleanEvent.notifyAll();
                    }
                }
            }

        }
    }{code}
 
{code:java}
@Override
    public void run() {
        while (!stopped) {
            synchronized (cleanEvent) {
                try {
                    // add some jitter to avoid cleaning dead watchers at the
                    // same time in the quorum
                    if (!stopped && deadWatchers.size() < 
watcherCleanThreshold) {
                        
                        int maxWaitMs = (watcherCleanIntervalInSeconds
                                         + 
ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
1000;
                        cleanEvent.wait(maxWaitMs);
                    }
                } catch (InterruptedException e) {
                    LOG.info("Received InterruptedException while waiting for 
cleanEvent");
                    break;
                }
            }            if (deadWatchers.isEmpty()) {
                continue;
            }            synchronized (this) {
                // Clean the dead watchers need to go through all the current
                // watches, which is pretty heavy and may take a second if
                // there are millions of watches, that's why we're doing lazily
                // batch clean up in a separate thread with a snapshot of the
                // current dead watchers.
                final Set snapshot = new HashSet<>(deadWatchers);
                deadWatchers.clear();
                int total = snapshot.size();
                LOG.info("Processing {} dead watchers", total);
                cleaners.schedule(new WorkRequest() {
                    @Override
                    public void doWork() throws Exception {
                        long startTime = Time.currentElapsedTime();
                        listener.processDeadWatchers(snapshot);
                        long latency = Time.currentElapsedTime() - startTime;
                        LOG.info("Takes {} to process {} watches", latency, 
total);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEANER_LATENCY.add(latency);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEARED.add(total);
                        totalDeadWatchers.addAndGet(-total);
                        synchronized