[jira] [Updated] (ZOOKEEPER-4721) Upgrade OWASP Dependency Check to 8.3.1

2023-07-18 Thread Andor Molnar (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-4721:

Affects Version/s: 3.8.1
   3.7.1
   3.9.0

> Upgrade OWASP Dependency Check to 8.3.1
> ---
>
> Key: ZOOKEEPER-4721
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4721
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.5.4, 3.6.0, 3.4.12, 3.7.1, 3.9.0, 3.8.1
>Reporter: Abraham Fine
>Assignee: Patrick D. Hunt
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4552) Bump bouncycastle from 1.60 to 1.70

2023-07-17 Thread ZhangJian He (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangJian He resolved ZOOKEEPER-4552.
-
Resolution: Abandoned

> Bump bouncycastle from 1.60 to 1.70
> ---
>
> Key: ZOOKEEPER-4552
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4552
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: ZhangJian He
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4719) Use bouncycastle jdk18on instead of jdk15on

2023-07-17 Thread Zili Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zili Chen resolved ZOOKEEPER-4719.
--
Fix Version/s: 3.9.0
 Assignee: Zili Chen
   Resolution: Fixed

master via 4882f7b63490971e44a669e98428615ef7bf472f

> Use bouncycastle jdk18on instead of jdk15on
> ---
>
> Key: ZOOKEEPER-4719
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4719
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: ZhangJian He
>Assignee: Zili Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> bouncycastle jdk15 on is deprecated in 
> [https://github.com/bcgit/bc-java/issues/1139]
> we can switch to bouncycastle jdk18on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-17 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743717#comment-17743717
 ] 

Zili Chen commented on ZOOKEEPER-4714:
--

[~andor] in case of you don't cut 3.9.0, this patch will be included now.

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-17 Thread Zili Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zili Chen resolved ZOOKEEPER-4714.
--
Fix Version/s: (was: 3.8.3)
 Assignee: Zili Chen
   Resolution: Fixed

master via e2e8ec661f8d50e5341bdefa0ccd8c5116f5ce4b

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-14 Thread Andor Molnar (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-4714:

Fix Version/s: 3.9.0

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.3
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-14 Thread Andor Molnar (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-4714:

Fix Version/s: (was: 3.9.0)

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.3
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4720) Add jakarta.servlet support

2023-07-14 Thread Rajendra Rathore (Jira)
Rajendra Rathore created ZOOKEEPER-4720:
---

 Summary: Add jakarta.servlet support
 Key: ZOOKEEPER-4720
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4720
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Rajendra Rathore


In order to upgrade to Tomcat 10+ / Servlet 5+ it's required to switch to the 
Jakarta EE Namespace.
h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4720) Add jakarta.servlet support

2023-07-14 Thread Rajendra Rathore (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajendra Rathore updated ZOOKEEPER-4720:

Issue Type: Wish  (was: New Feature)

> Add jakarta.servlet support
> ---
>
> Key: ZOOKEEPER-4720
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4720
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: Rajendra Rathore
>Priority: Major
>
> In order to upgrade to Tomcat 10+ / Servlet 5+ it's required to switch to the 
> Jakarta EE Namespace.
> h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4717) Cache serialize data in the request to avoid repeat serialize.

2023-07-13 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli resolved ZOOKEEPER-4717.

Fix Version/s: (was: 3.8.2)
   Resolution: Fixed

> Cache serialize data in the request to avoid repeat serialize.
> --
>
> Key: ZOOKEEPER-4717
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4717
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For each request, it will be serialized three times.
> 1. Leader proposal. It will serialize the request, wrap the serialized data 
> in a proposal, then send the proposal to the quorum members.
> 2. SyncRequestProcessor append txn log. It will serialize the request, then 
> write the serialized data to the txn log.
> 3. ZkDataBase addCommittedProposal. It will serialize the request, wrap the 
> serialized data in a proposal, then add the proposal to committedLog.
> Serialization operations are CPU-sensitive, and when the CPU experiences 
> jitter, the time required for serialization operations will also skyrocket. 
> Therefore, we should avoid serializing the same request multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ZOOKEEPER-4717) Cache serialize data in the request to avoid repeat serialize.

2023-07-13 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli reassigned ZOOKEEPER-4717:
--

Assignee: Enrico Olivelli

> Cache serialize data in the request to avoid repeat serialize.
> --
>
> Key: ZOOKEEPER-4717
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4717
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For each request, it will be serialized three times.
> 1. Leader proposal. It will serialize the request, wrap the serialized data 
> in a proposal, then send the proposal to the quorum members.
> 2. SyncRequestProcessor append txn log. It will serialize the request, then 
> write the serialized data to the txn log.
> 3. ZkDataBase addCommittedProposal. It will serialize the request, wrap the 
> serialized data in a proposal, then add the proposal to committedLog.
> Serialization operations are CPU-sensitive, and when the CPU experiences 
> jitter, the time required for serialization operations will also skyrocket. 
> Therefore, we should avoid serializing the same request multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4717) Cache serialize data in the request to avoid repeat serialize.

2023-07-13 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli updated ZOOKEEPER-4717:
---
Fix Version/s: 3.9.0

> Cache serialize data in the request to avoid repeat serialize.
> --
>
> Key: ZOOKEEPER-4717
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4717
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For each request, it will be serialized three times.
> 1. Leader proposal. It will serialize the request, wrap the serialized data 
> in a proposal, then send the proposal to the quorum members.
> 2. SyncRequestProcessor append txn log. It will serialize the request, then 
> write the serialized data to the txn log.
> 3. ZkDataBase addCommittedProposal. It will serialize the request, wrap the 
> serialized data in a proposal, then add the proposal to committedLog.
> Serialization operations are CPU-sensitive, and when the CPU experiences 
> jitter, the time required for serialization operations will also skyrocket. 
> Therefore, we should avoid serializing the same request multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4719) Use bouncycastle jdk18on instead of jdk15on

2023-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4719:
--
Labels: pull-request-available  (was: )

> Use bouncycastle jdk18on instead of jdk15on
> ---
>
> Key: ZOOKEEPER-4719
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4719
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: ZhangJian He
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> bouncycastle jdk15 on is deprecated in 
> [https://github.com/bcgit/bc-java/issues/1139]
> we can switch to bouncycastle jdk18on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4719) Use bouncycastle jdk18on instead of jdk15on

2023-07-11 Thread ZhangJian He (Jira)
ZhangJian He created ZOOKEEPER-4719:
---

 Summary: Use bouncycastle jdk18on instead of jdk15on
 Key: ZOOKEEPER-4719
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4719
 Project: ZooKeeper
  Issue Type: Bug
Reporter: ZhangJian He


bouncycastle jdk15 on is deprecated in 
[https://github.com/bcgit/bc-java/issues/1139]

we can switch to bouncycastle jdk18on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4718) Removing unnecessary heap memory allocation in serialization can help reduce GC pressure.

2023-07-06 Thread Zili Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zili Chen resolved ZOOKEEPER-4718.
--
Fix Version/s: 3.9.0
   (was: 3.8.2)
 Assignee: Zili Chen
   Resolution: Fixed

master via e08cc2a782982964a57651f179a468b19e2e6010

> Removing unnecessary heap memory allocation in serialization can help reduce 
> GC pressure.
> -
>
> Key: ZOOKEEPER-4718
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4718
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For each request, we will serialize it to a byte array.
> In SerializeUtils#serializeRequest, before serializing the request, it always 
> allocates 32 byte array. It's unnecessary; we can allocate the byte array in 
> the catch code block.
> {code:java}
> public static byte[] serializeRequest(Request request) {
> if (request == null || request.getHdr() == null) {
> return null;
> }
> byte[] data = new byte[32];
> try {
> data = Util.marshallTxnEntry(request.getHdr(), request.getTxn(), 
> request.getTxnDigest());
> } catch (IOException e) {
> LOG.error("This really should be impossible", e);
> }
> return data;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4712:
--
Labels: pull-request-available  (was: )

> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor, which may lead to data inconsistency
> -
>
> Key: ZOOKEEPER-4712
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>Reporter: Sirius
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
> Risk{*}).
>  
> A follower / observer will invoke syncProcessor.shutdown() in 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
> respectively.
> However, after the 
> [FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
>  of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() 
> anymore.
>  
> h2. Call stack
> h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
>  * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
> ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)
>  
> h5. For comparison, in version 3.4.X,
>  * Observer.shutdown() -> Learner.shutdown() -> 
> {*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * Follower.shutdown() -> Learner.shutdown() -> 
> {*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  
> h2. Code Details
> Take version 3.8.0 as an example.
> In Follower.shutdown() :
> {code:java}
>     public void shutdown() {
>         LOG.info("shutdown Follower");
> +       // invoke Learner.shutdown()
>         super.shutdown();   
>     } {code}
>  
> In Learner.java:
> {code:java}
>     public void shutdown() {
>         ...
>         // shutdown previous zookeeper
>         if (zk != null) {
>             // If we haven't finished SNAP sync, force fully shutdown
>             // to avoid potential inconsistency
> +           // This will invoke ZooKeeperServer.shutdown(boolean), 
> +           // which will not shutdown syncProcessor
> +           // Before the fix of ZOOLEEPER-3642, 
> +           // FollowerZooKeeperServer.shutdown() will be invoked here
>             zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP)); 
>          }
>     } {code}
>  
> In ZooKeeperServer.java:
> {code:java}
>     public synchronized void shutdown(boolean fullyShutDown) {
>         ...
>         if (firstProcessor != null) {
> +           // For a follower, this will not shutdown its syncProcessor.
>             firstProcessor.shutdown(); 
>         }
>         ...
>     } {code}
>  
> In expectation, Follower.shutdown() should invoke 
> LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
> {code:java}
>     public synchronized void shutdown() {
>         ...
>         try {
> +           // shutdown the syncProcessor here
>             if (syncProcessor != null) {
>                 syncProcessor.shutdown();     
>             }
>         } ...
>     } {code}
> Observer.shutdown() has the similar problem.
>  
> h2. Potential Risk
> When Follower.shutdown() is called, the follower's QuorumPeer thread may 
> update the lastProcessedZxid for the election and recovery phase before its 
> syncThread drains the pending requests and flushes them to disk.
> In consequence, this lastProcessedZxid is not the latest zxid in its log, 
> leading to log inconsistency after the SYNC phase. (Similar to the symptoms 
> of ZOOKEEPER-2845.)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (ZOOKEEPER-4669) Upgrade snappy-java to 1.1.9.1 (in order to support M1 macs)

2023-07-06 Thread AvnerW (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740592#comment-17740592
 ] 

AvnerW edited comment on ZOOKEEPER-4669 at 7/6/23 12:59 PM:


[~mpolden] , [~cnauroth]  it seems like only version 1.1.10.1 contains a fix 
for CVE-2023-34453, CVE-2023-34454 and CVE-2023-34455.

Can the next ZK version include version 1.1.10.1 instead of 1.1.9.1?


was (Author: avnerw):
[~mpolden] , it seems like only version 1.1.10.1 contains a fix for 
CVE-2023-34453, CVE-2023-34454 and CVE-2023-34455.

Can the next ZK version include version 1.1.10.1 instead of 1.1.9.1?

> Upgrade snappy-java to 1.1.9.1 (in order to support M1 macs)
> 
>
> Key: ZOOKEEPER-4669
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4669
> Project: ZooKeeper
>  Issue Type: Task
>  Components: java client
>Reporter: Enrico Olivelli
>Assignee: Martin Polden
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4669) Upgrade snappy-java to 1.1.9.1 (in order to support M1 macs)

2023-07-06 Thread AvnerW (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740592#comment-17740592
 ] 

AvnerW commented on ZOOKEEPER-4669:
---

[~mpolden] , it seems like only version 1.1.10.1 contains a fix for 
CVE-2023-34453, CVE-2023-34454 and CVE-2023-34455.

Can the next ZK version include version 1.1.10.1 instead of 1.1.9.1?

> Upgrade snappy-java to 1.1.9.1 (in order to support M1 macs)
> 
>
> Key: ZOOKEEPER-4669
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4669
> Project: ZooKeeper
>  Issue Type: Task
>  Components: java client
>Reporter: Enrico Olivelli
>Assignee: Martin Polden
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4718) Removing unnecessary heap memory allocation in serialization can help reduce GC pressure.

2023-07-06 Thread Yan Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhao updated ZOOKEEPER-4718:

Description: 
For each request, we will serialize it to a byte array.
In SerializeUtils#serializeRequest, before serializing the request, it always 
allocates 32 byte array. It's unnecessary; we can allocate the byte array in 
the catch code block.


{code:java}
public static byte[] serializeRequest(Request request) {
if (request == null || request.getHdr() == null) {
return null;
}
byte[] data = new byte[32];
try {
data = Util.marshallTxnEntry(request.getHdr(), request.getTxn(), 
request.getTxnDigest());
} catch (IOException e) {
LOG.error("This really should be impossible", e);
}
return data;
}
{code}


  was:
In SerializeUtils#serializeRequest, before serializing the request, it always 
allocates 32 byte array. It's unnecessary; we can allocate the byte array in 
the catch code block.


{code:java}
public static byte[] serializeRequest(Request request) {
if (request == null || request.getHdr() == null) {
return null;
}
byte[] data = new byte[32];
try {
data = Util.marshallTxnEntry(request.getHdr(), request.getTxn(), 
request.getTxnDigest());
} catch (IOException e) {
LOG.error("This really should be impossible", e);
}
return data;
}
{code}



> Removing unnecessary heap memory allocation in serialization can help reduce 
> GC pressure.
> -
>
> Key: ZOOKEEPER-4718
>     URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4718
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For each request, we will serialize it to a byte array.
> In SerializeUtils#serializeRequest, before serializing the request, it always 
> allocates 32 byte array. It's unnecessary; we can allocate the byte array in 
> the catch code block.
> {code:java}
> public static byte[] serializeRequest(Request request) {
> if (request == null || request.getHdr() == null) {
> return null;
> }
> byte[] data = new byte[32];
> try {
> data = Util.marshallTxnEntry(request.getHdr(), request.getTxn(), 
> request.getTxnDigest());
> } catch (IOException e) {
> LOG.error("This really should be impossible", e);
> }
> return data;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4718) Removing unnecessary heap memory allocation in serialization can help reduce GC pressure.

2023-07-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4718:
--
Labels: pull-request-available  (was: )

> Removing unnecessary heap memory allocation in serialization can help reduce 
> GC pressure.
> -
>
> Key: ZOOKEEPER-4718
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4718
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In SerializeUtils#serializeRequest, before serializing the request, it always 
> allocates 32 byte array. It's unnecessary; we can allocate the byte array in 
> the catch code block.
> {code:java}
> public static byte[] serializeRequest(Request request) {
> if (request == null || request.getHdr() == null) {
> return null;
> }
> byte[] data = new byte[32];
> try {
> data = Util.marshallTxnEntry(request.getHdr(), request.getTxn(), 
> request.getTxnDigest());
> } catch (IOException e) {
> LOG.error("This really should be impossible", e);
> }
> return data;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4718) Removing unnecessary heap memory allocation in serialization can help reduce GC pressure.

2023-07-06 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4718:
---

 Summary: Removing unnecessary heap memory allocation in 
serialization can help reduce GC pressure.
 Key: ZOOKEEPER-4718
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4718
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao
 Fix For: 3.8.2


In SerializeUtils#serializeRequest, before serializing the request, it always 
allocates 32 byte array. It's unnecessary; we can allocate the byte array in 
the catch code block.


{code:java}
public static byte[] serializeRequest(Request request) {
if (request == null || request.getHdr() == null) {
return null;
}
byte[] data = new byte[32];
try {
data = Util.marshallTxnEntry(request.getHdr(), request.getTxn(), 
request.getTxnDigest());
} catch (IOException e) {
LOG.error("This really should be impossible", e);
}
return data;
}
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4717) Cache serialize data in the request to avoid repeat serialize.

2023-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4717:
--
Labels: pull-request-available  (was: )

> Cache serialize data in the request to avoid repeat serialize.
> --
>
> Key: ZOOKEEPER-4717
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4717
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For each request, it will be serialized three times.
> 1. Leader proposal. It will serialize the request, wrap the serialized data 
> in a proposal, then send the proposal to the quorum members.
> 2. SyncRequestProcessor append txn log. It will serialize the request, then 
> write the serialized data to the txn log.
> 3. ZkDataBase addCommittedProposal. It will serialize the request, wrap the 
> serialized data in a proposal, then add the proposal to committedLog.
> Serialization operations are CPU-sensitive, and when the CPU experiences 
> jitter, the time required for serialization operations will also skyrocket. 
> Therefore, we should avoid serializing the same request multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4717) Cache serialize data in the request to avoid repeat serialize.

2023-07-05 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4717:
---

 Summary: Cache serialize data in the request to avoid repeat 
serialize.
 Key: ZOOKEEPER-4717
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4717
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao
 Fix For: 3.8.2


For each request, it will be serialized three times.
1. Leader proposal. It will serialize the request, wrap the serialized data in 
a proposal, then send the proposal to the quorum members.
2. SyncRequestProcessor append txn log. It will serialize the request, then 
write the serialized data to the txn log.
3. ZkDataBase addCommittedProposal. It will serialize the request, wrap the 
serialized data in a proposal, then add the proposal to committedLog.

Serialization operations are CPU-sensitive, and when the CPU experiences 
jitter, the time required for serialization operations will also skyrocket. 
Therefore, we should avoid serializing the same request multiple times.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4599) Upgrade Jetty to avoid CVE-2022-2048

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4599:

Issue Type: Task  (was: Bug)

> Upgrade Jetty to avoid CVE-2022-2048
> 
>
> Key: ZOOKEEPER-4599
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4599
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.6.3, 3.8.0, 3.7.1
>Reporter: Shivakumar
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: security
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>
> |CVE ID|Type|Severity|Packages|Package Version|CVSS|Fix Status|
> |CVE-2022-2048|java|high|org.eclipse.jetty_jetty-io|9.4.43.v20210629|7.5|fixed
>  in 11.0.9, 10.0.9, 9.4.47|
> Our security scan detected the above vulnerabilities
> upgrade to correct versions for fixing vulnerabilities



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4674) C client tests don't pass on CI

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4674:

Issue Type: Bug  (was: Test)

> C client tests don't pass on CI
> ---
>
> Key: ZOOKEEPER-4674
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4674
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client, tests
>Reporter: Enrico Olivelli
>Assignee: Damien Diederen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.6.5, 3.8.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4647) Tests don't pass on JDK20 because we try to mock InetAddress

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4647:

Issue Type: Bug  (was: Test)

> Tests don't pass on JDK20 because we try to mock InetAddress
> 
>
> Key: ZOOKEEPER-4647
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4647
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This test fails on JDK20-Ea
> org.apache.zookeeper.test.StaticHostProviderTest.testEmptyResolution
> Mockito cannot mock this class: class java.net.InetAddress. Mockito can only 
> mock non-private & non-final classes. If you're not sure why you're getting 
> this error, please report to the mailing list.
> if I try to upgrade  Mockito to 4.9.0  the error is
> org.mockito.exceptions.base.MockitoException: 
> Cannot mock/spy class java.net.InetAddress
> Mockito cannot mock/spy because :
>  - sealed class
>  
>  at 
> org.apache.zookeeper.test.StaticHostProviderTest.testReResolvingSingle(StaticHostProviderTest.jav



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4716) Upgrade jackson to 2.15.2, suppress two false positive CVE errors

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4716:

Issue Type: Task  (was: Improvement)

> Upgrade jackson to 2.15.2, suppress two false positive CVE errors
> -
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Our jackson is quite old, I want to upgrade it before release 3.8.2.
> Also we have a few false positive CVEs reported by OWASP:
>  * CVE-2023-35116: according to jackson community, this is not a security 
> issue, see 
> [https://github.com/FasterXML/jackson-databind/issues/3972#issuecomment-1596193098]
>  * CVE-2022-45688: the following CVE is not even jackson related, but a 
> vulnerability in json-java which we don't use in ZooKeeper
>  
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4716) Upgrade jackson to 2.15.2, suppress two false positive CVE errors

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4716:

Summary: Upgrade jackson to 2.15.2, suppress two false positive CVE errors  
(was: upgrade jackson to 2.15.2, suppress two false positive CVE errors)

> Upgrade jackson to 2.15.2, suppress two false positive CVE errors
> -
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Our jackson is quite old, I want to upgrade it before release 3.8.2.
> Also we have a few false positive CVEs reported by OWASP:
>  * CVE-2023-35116: according to jackson community, this is not a security 
> issue, see 
> [https://github.com/FasterXML/jackson-databind/issues/3972#issuecomment-1596193098]
>  * CVE-2022-45688: the following CVE is not even jackson related, but a 
> vulnerability in json-java which we don't use in ZooKeeper
>  
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4709:

Issue Type: Task  (was: Improvement)

> Upgrade Netty to 4.1.94.Final
> -
>
> Key: ZOOKEEPER-4709
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Fabio Buso
>Priority: Major
>  Labels: dependency-upgrade, pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
> several improvements and bug fixes, including a resolution for 
> [CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
>  related to potential memory allocation vulnerabilities during a TLS 
> handshake with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4714:

Fix Version/s: 3.8.3

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.3
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-05 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4714:

Fix Version/s: (was: 3.8.2)

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-07-05 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740220#comment-17740220
 ] 

Zili Chen commented on ZOOKEEPER-4715:
--

It seems the lastest tag to use is 3.9.0. Then if it's not included, please 
move to the next version.

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-07-05 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740218#comment-17740218
 ] 

Zili Chen commented on ZOOKEEPER-4715:
--

I'll move the fixed version to the following ones. [~andor] if it happens that 
you would include it in 3.9.0, please update the field then.

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-07-05 Thread Zili Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zili Chen resolved ZOOKEEPER-4715.
--
Fix Version/s: (was: 3.8.2)
 Assignee: Zili Chen
   Resolution: Fixed

master via 2edb73a943928e0716b91e8a1d06a9c226fa393c

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-05 Thread Yan Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740117#comment-17740117
 ] 

Yan Zhao commented on ZOOKEEPER-4714:
-

No. It's just an improvement.



> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-07-05 Thread Yan Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740116#comment-17740116
 ] 

Yan Zhao commented on ZOOKEEPER-4715:
-

No. It's just an improvement.

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-07-05 Thread Andor Molnar (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740109#comment-17740109
 ] 

Andor Molnar commented on ZOOKEEPER-4715:
-

Hi [~horizonzy] . Do you think this ticket is blocker for 3.9.0?

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-07-05 Thread Andor Molnar (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740108#comment-17740108
 ] 

Andor Molnar commented on ZOOKEEPER-4714:
-

Thanks [~horizonzy] . Do think this issue is a blocker for 3.9.0?

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4713) ObserverZooKeeperServer.shutdown() is redundant

2023-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4713:
--
Labels: pull-request-available  (was: )

> ObserverZooKeeperServer.shutdown() is redundant
> ---
>
> Key: ZOOKEEPER-4713
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4713
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>Reporter: Sirius
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the 
> [FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
>  of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible 
> for the shutdown logic of both the follower and observer. 
> ObserverZooKeeperServer.shutdown() seems redundant, because it is not in the 
> call stack of Observer.shutdown(). (Note that FollowerZooKeeperServer does 
> not have the shutdown() method.)
> Related analysis can be found in ZOOKEEPER-4712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4707) Update snappy-java to address multiple CVEs

2023-07-04 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko resolved ZOOKEEPER-4707.
-
Fix Version/s: 3.9.0
   3.7.2
   3.8.2
   Resolution: Fixed

Thank you [~lhotari] for raising the issue and doing the fix!

I merged it to all active branches, it will be soon released with 3.9.0 and 
3.8.2.

> Update snappy-java to address multiple CVEs
> ---
>
> Key: ZOOKEEPER-4707
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4707
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Lari Hotari
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Address multiple CVEs:
> CVE-2023-34453
> CVE-2023-34454
> CVE-2023-34455
> See https://github.com/xerial/snappy-java/releases/tag/v1.1.10.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4707) Update snappy-java to address multiple CVEs

2023-07-04 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4707:

Affects Version/s: 3.8.1
   3.7.1

> Update snappy-java to address multiple CVEs
> ---
>
> Key: ZOOKEEPER-4707
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4707
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Lari Hotari
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Address multiple CVEs:
> CVE-2023-34453
> CVE-2023-34454
> CVE-2023-34455
> See https://github.com/xerial/snappy-java/releases/tag/v1.1.10.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4643) Committed txns may be improperly truncated if follower crashes right after updating currentEpoch but before persisting txns to disk

2023-07-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4643:
--
Labels: pull-request-available  (was: )

> Committed txns may be improperly truncated if follower crashes right after 
> updating currentEpoch but before persisting txns to disk
> ---
>
> Key: ZOOKEEPER-4643
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4643
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.8.1
>Reporter: Sirius
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Trace-ZK-4643.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a follower is processing the NEWLEADER message in SYNC phase, it will 
> update its {{_currentEpoch_}} to the file *before* writing the txns (from the 
> PROPOSALs sent by leader in SYNC) to the log file. Such execution order may 
> lead to improper truncation of *committed* txns on other servers in later 
> rounds.
> The critical step to trigger this problem is to make a follower node crash 
> right after it updates its {{_currentEpoch_}} to the file but before writing 
> the txns to the log file. The potential risk is that, this node with 
> incomplete committed txns might be later elected as the leader with its 
> larger {{{}_currentEpoch_{}}}, and then improperly uses TRUNC to ask other 
> nodes to truncate their committed txns!
>  
> h2. Trace
> [^Trace-ZK-4643.pdf]
> Here is an example to trigger the bug. (Focus on {{_currentEpoch_}} and 
> {{{}_lastLoggedZxid_{}}})
> {*}Round 1 (Running nodes with their acceptedEpoch & currentEpoch set to 
> 1{*}{*}):{*}
>  - Start the ensemble with three nodes: S{+}0{+}, +S1+ & {+}S2{+}.
>  - +S2+ is elected leader.
>  - For all of them, _{{currentEpoch}}_ = 1, {{_lastLoggedZxid_}} (the last 
> zxid in the log)= <1, 3>, {{_lastProcessedZxid_}} = <1, 3>.
>  - +S0+ crashes.
>  - A new txn <1, 4> is logged and committed by +S1+ & {+}S2{+}. Then, +S1+ & 
> +S2+ have {{_lastLoggedZxid_}} = <1, 4>, {{_lastProcessedZxid_}} = <1, 4> .
>  - Verify clients can read the datatree with latest zxid <1, 4>. 
> *Round 2* {*}(Running nodes with their acceptedEpoch & currentEpoch set to 
> 2{*}{*}){*}{*}:{*}
>  * +S0+ & +S2+ restart, and +S1+ crashes.
>  * Again, +S2+ is elected leader.
>  * Then, during the SYNC phase, the leader +S2+ ({{{}_maxCommittedLog_{}}} = 
> <1, 4>) uses DIFF to sync with the follower +S0+ ({{{}_lastLoggedZxid_{}}} = 
> <1, 3>), and their {{_currentEpoch_}} will be set to 2 (and written to disk).
>  * ( Note that the follower +S0+ updates its currentEpoch file before writing 
> the txns to the log file when receiving NEWLEADER message. )
>  * *Unfortunately, right after the follower +S0+ finishes updating its 
> currentEpoch file, it crashes.*
> *Round 3* {*}(Running nodes with their acceptedEpoch & currentEpoch set to 
> 3{*}{*}){*}{*}:{*}
>  * +S0+ & +S1+ restart, and +S2+ crashes.
>  * Since +S0+ has {{_currentEpoch_}} = 2, +S1+ has {{_currentEpoch_}} = 1, 
> +S0+ will be elected leader.
>  * During the SYNC phase, the leader +S0+ ({{{}_maxCommittedLog_{}}} = <1, 
> 3>) will use TRUNC to sync with +S1+ ({{{}_lastLoggedZxid_{}}} = <1, 4>). 
> Then, +S1+ removes txn <1, 4>.  
>  * ( However, <1, 4> was committed and visible by clients before, and is not 
> supposed to be truncated! )
>  * Verify clients of +S0+ & +S1+ do NOT have the view of txn <1, 4>, a 
> violation of ZAB.
>  
> Extra note: The trace can be constructed with quorum nodes alive at any 
> moment with careful time tuning of node crash & restart, e.g., let +S1+ 
> restart before +S0+ crashes at the end of Round 2.
>  
> h2. Analysis
> *Root Cause:*
> When a follower updates its current epoch, it should guarantee that it has 
> already synced the uncommitted txns to the disk (or, taken snapshot). 
> Otherwise, after the current epoch is updated to the file but the history 
> (transaction log) of the follower is not updated yet, environment failures 
> might prevent the latter from going on smoothly. It is dangerous for a node 
> with updated current epoch but stale history to be elected leader. It might 
> truncate committed txns on other nodes.
>  
> *Property Violation:*
>  * From the server side, the ensemble deletes a committed

[jira] [Commented] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-07-02 Thread Mate Szalay-Beko (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739380#comment-17739380
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-4709:
-

I also pushed it to branch-3.7

> Upgrade Netty to 4.1.94.Final
> -
>
> Key: ZOOKEEPER-4709
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Fabio Buso
>Priority: Major
>  Labels: dependency-upgrade, pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
> several improvements and bug fixes, including a resolution for 
> [CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
>  related to potential memory allocation vulnerabilities during a TLS 
> handshake with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-07-02 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4709:

Fix Version/s: 3.7.2

> Upgrade Netty to 4.1.94.Final
> -
>
> Key: ZOOKEEPER-4709
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Fabio Buso
>Priority: Major
>  Labels: dependency-upgrade, pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
> several improvements and bug fixes, including a resolution for 
> [CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
>  related to potential memory allocation vulnerabilities during a TLS 
> handshake with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4716) upgrade jackson to 2.15.2, suppress two false positive CVE errors

2023-07-02 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko resolved ZOOKEEPER-4716.
-
Fix Version/s: 3.9.0
   3.7.2
   3.8.2
   Resolution: Done

> upgrade jackson to 2.15.2, suppress two false positive CVE errors
> -
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.2, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Our jackson is quite old, I want to upgrade it before release 3.8.2.
> Also we have a few false positive CVEs reported by OWASP:
>  * CVE-2023-35116: according to jackson community, this is not a security 
> issue, see 
> [https://github.com/FasterXML/jackson-databind/issues/3972#issuecomment-1596193098]
>  * CVE-2022-45688: the following CVE is not even jackson related, but a 
> vulnerability in json-java which we don't use in ZooKeeper
>  
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-07-02 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko resolved ZOOKEEPER-4709.
-
Resolution: Done

[~siroibaf] , thank you for the contribution! The fix get merged to branch-3.8 
and master.

> Upgrade Netty to 4.1.94.Final
> -
>
> Key: ZOOKEEPER-4709
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Fabio Buso
>Priority: Major
>  Labels: dependency-upgrade, pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
> several improvements and bug fixes, including a resolution for 
> [CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
>  related to potential memory allocation vulnerabilities during a TLS 
> handshake with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-07-02 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4709:

Fix Version/s: 3.9.0
   3.8.2

> Upgrade Netty to 4.1.94.Final
> -
>
> Key: ZOOKEEPER-4709
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Fabio Buso
>Priority: Major
>  Labels: dependency-upgrade, pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
> several improvements and bug fixes, including a resolution for 
> [CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
>  related to potential memory allocation vulnerabilities during a TLS 
> handshake with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-07-01 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
the lastProcessedZxid for the election and recovery phase before its syncThread 
drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutd

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-07-01 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutd

[jira] [Updated] (ZOOKEEPER-4716) upgrade jackson to 2.15.2, suppress two false positive CVE errors

2023-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4716:
--
Labels: pull-request-available  (was: )

> upgrade jackson to 2.15.2, suppress two false positive CVE errors
> -
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Our jackson is quite old, I want to upgrade it before release 3.8.2.
> Also we have a few false positive CVEs reported by OWASP:
>  * CVE-2023-35116: according to jackson community, this is not a security 
> issue, see 
> [https://github.com/FasterXML/jackson-databind/issues/3972#issuecomment-1596193098]
>  * CVE-2022-45688: the following CVE is not even jackson related, but a 
> vulnerability in json-java which we don't use in ZooKeeper
>  
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4716) upgrade jackson to 2.15.2, suppress two false positive CVE errors

2023-06-30 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4716:

Description: 
Our jackson is quite old, I want to upgrade it before release 3.8.2.

Also we have a few false positive CVEs reported by OWASP:
 * CVE-2023-35116: according to jackson community, this is not a security 
issue, see 
[https://github.com/FasterXML/jackson-databind/issues/3972#issuecomment-1596193098]
 * CVE-2022-45688: the following CVE is not even jackson related, but a 
vulnerability in json-java which we don't use in ZooKeeper

 
{code:java}
[INFO] Finished at: 2023-06-30T13:23:38+02:00 
[INFO]  
[ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
(default-cli) on project zookeeper: 
[ERROR] 
[ERROR] One or more dependencies were identified with vulnerabilities that have 
a CVSS score greater than or equal to '0.0': 
[ERROR] 
[ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
[ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
 {code}
 

  was:
{code:java}
[INFO] Finished at: 2023-06-30T13:23:38+02:00 
[INFO]  
[ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
(default-cli) on project zookeeper: 
[ERROR] 
[ERROR] One or more dependencies were identified with vulnerabilities that have 
a CVSS score greater than or equal to '0.0': 
[ERROR] 
[ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
[ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
 {code}


> upgrade jackson to 2.15.2, suppress two false positive CVE errors
> -
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Our jackson is quite old, I want to upgrade it before release 3.8.2.
> Also we have a few false positive CVEs reported by OWASP:
>  * CVE-2023-35116: according to jackson community, this is not a security 
> issue, see 
> [https://github.com/FasterXML/jackson-databind/issues/3972#issuecomment-1596193098]
>  * CVE-2022-45688: the following CVE is not even jackson related, but a 
> vulnerability in json-java which we don't use in ZooKeeper
>  
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4716) upgrade jackson to 2.15.2, suppress two false positive CVE errors

2023-06-30 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4716:

Summary: upgrade jackson to 2.15.2, suppress two false positive CVE errors  
(was: Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116)

> upgrade jackson to 2.15.2, suppress two false positive CVE errors
> -
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4716) Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116

2023-06-30 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4716:

Description: 
{code:java}
[INFO] Finished at: 2023-06-30T13:23:38+02:00 
[INFO]  
[ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
(default-cli) on project zookeeper: 
[ERROR] 
[ERROR] One or more dependencies were identified with vulnerabilities that have 
a CVSS score greater than or equal to '0.0': 
[ERROR] 
[ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
[ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
 {code}

> Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116
> 
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> {code:java}
> [INFO] Finished at: 2023-06-30T13:23:38+02:00 
> [INFO] 
>  
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:7.1.0:check 
> (default-cli) on project zookeeper: 
> [ERROR] 
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0': 
> [ERROR] 
> [ERROR] jackson-core-2.13.4.jar: CVE-2022-45688(7.5) 
> [ERROR] jackson-databind-2.13.4.2.jar: CVE-2023-35116(7.5)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4716) Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116

2023-06-30 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko updated ZOOKEEPER-4716:

Affects Version/s: 3.8.1

> Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116
> 
>
> Key: ZOOKEEPER-4716
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.8.1
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4716) Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116

2023-06-30 Thread Mate Szalay-Beko (Jira)
Mate Szalay-Beko created ZOOKEEPER-4716:
---

 Summary: Fix jackson related CVEs: CVE-2022-45688, CVE-2023-35116
 Key: ZOOKEEPER-4716
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4716
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Mate Szalay-Beko
Assignee: Mate Szalay-Beko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4628) CVE-2022-42003 CVE-2022-42004 HIGH: upgrade jackson-databind-2.13.3.jar to 2.13.4.1

2023-06-30 Thread Mate Szalay-Beko (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mate Szalay-Beko resolved ZOOKEEPER-4628.
-
Resolution: Duplicate

Thank you [~ivodujmovic] for reporting this issue and submitting a PR!

I see that in the meanwhile this was fixed by ZOOKEEPER-4661. (of course since 
that time we had an other CVE, but I will take care of that in a separate 
ticket)

 

> CVE-2022-42003 CVE-2022-42004 HIGH: upgrade jackson-databind-2.13.3.jar to 
> 2.13.4.1
> ---
>
> Key: ZOOKEEPER-4628
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4628
> Project: ZooKeeper
>  Issue Type: Task
>  Components: security
>Affects Versions: 3.5.10, 3.8.0, 3.7.1
>Reporter: Ivo Dujmovic
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Two High issues 
> [https://nvd.nist.gov/vuln/detail/CVE-2022-42003]
> [https://nvd.nist.gov/vuln/detail/CVE-2022-42004]
> affect jackson version 2.13.3 which zk should update to 2.13.4.1 
> Other projects have done this, but Zookeeper has not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4715:
--
Labels: pull-request-available  (was: )

> Verify file size and position in testGetCurrentLogSize.
> ---
>
> Key: ZOOKEEPER-4715
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is pre-PR for ZOOKEEPER-4714.
> In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we 
> want our values to match the original values. Therefore, we added checks for 
> fileSize and filePosition in our tests. After adding the checks, we used a 
> new method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
> whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4715) Verify file size and position in testGetCurrentLogSize.

2023-06-29 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4715:
---

 Summary: Verify file size and position in testGetCurrentLogSize.
 Key: ZOOKEEPER-4715
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4715
 Project: ZooKeeper
  Issue Type: Wish
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao
 Fix For: 3.9.0, 3.8.2


This is pre-PR for ZOOKEEPER-4714.

In ZOOKEEPER-4714, we maintain fileSize and filePosition ourselves and we want 
our values to match the original values. Therefore, we added checks for 
fileSize and filePosition in our tests. After adding the checks, we used a new 
method to retrieve fileSize and filePosition in ZOOKEEPER-4714 and tested 
whether the tests can still pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4714:
--
Labels: pull-request-available  (was: )

> Improve syncRequestProcessor performance
> 
>
> Key: ZOOKEEPER-4714
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.8.2
>
> Attachments: 761688051587_.pic.jpg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the SyncRequestProcessor, a write operation is performed for each write 
> request. Two methods are relatively time-consuming.
> 1. Within SyncRequestProcessor#shouldSnapshot, the current size of the 
> current file is retrieved, which involves a system call.
> Call stack:
> java.io.File.length(File.java)
> org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
> org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
> org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
> org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)
> 2. Within ZKDatabase#append, the current position of the current file is 
> retrieved, which also involves a system call.
> Call stack:
> sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
> sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
> org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
> org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)
> Therefore, it is best to maintain the current size and position of the 
> current file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4714) Improve syncRequestProcessor performance

2023-06-29 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4714:
---

 Summary: Improve syncRequestProcessor performance
 Key: ZOOKEEPER-4714
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4714
 Project: ZooKeeper
  Issue Type: Wish
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao
 Fix For: 3.9.0, 3.8.2
 Attachments: 761688051587_.pic.jpg

In the SyncRequestProcessor, a write operation is performed for each write 
request. Two methods are relatively time-consuming.

1. Within SyncRequestProcessor#shouldSnapshot, the current size of the current 
file is retrieved, which involves a system call.

Call stack:
java.io.File.length(File.java)
org.apache.zookeeper.server.persistence.FileTxnLog.getCurrentLogSize(FileTxnLog.java:211)
org.apache.zookeeper.server.persistence.FileTxnLog.getTotalLogSize(FileTxnLog.java:221)
org.apache.zookeeper.server.persistence.FileTxnSnapLog.getTotalLogSize(FileTxnSnapLog.java:671)
org.apache.zookeeper.server.ZKDatabase.getTxnSize(ZKDatabase.java:790)
org.apache.zookeeper.server.SyncRequestProcessor.shouldSnapshot(SyncRequestProcessor.java:145)
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:182)

2. Within ZKDatabase#append, the current position of the current file is 
retrieved, which also involves a system call.

Call stack:
sun.nio.ch.FileDispatcherImpl.seek(FileDispatcherImpl.java)
sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:264)
org.apache.zookeeper.server.persistence.FilePadding.padFile(FilePadding.java:76)
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:298)
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592)
org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:678)
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181)


Therefore, it is best to maintain the current size and position of the current 
file ourselves, as this can greatly improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> Z

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor. The QuorumPeer 
thread should wait for the exit of syncThread before back in LOOKING state:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolea

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see {*}Potential 
Risk{*}).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeep

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Summary: Follower.shutdown() and Observer.shutdown() do not correctly 
shutdown the syncProcessor, which may lead to data inconsistency  (was: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor, which may lead to potential data inconsistency)

> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor, which may lead to data inconsistency
> -
>
> Key: ZOOKEEPER-4712
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>Reporter: Sirius
>Priority: Critical
>
> Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
> syncProcessor. It may lead to potential data inconsistency (see Potential 
> Risk).
>  
> A follower / observer will invoke syncProcessor.shutdown() in 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
> respectively.
> However, after the 
> [FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
>  of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
> LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() 
> anymore.
>  
> h2. Call stack
> h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
>  * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
> ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)
>  
> h5. For comparison, in version 3.4.X,
>  * Observer.shutdown() -> Learner.shutdown() -> 
> {*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  * Follower.shutdown() -> Learner.shutdown() -> 
> {*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
> ZooKeeperServer.shutdown(boolean)
>  
> h2. Code Details
> Take version 3.8.0 as an example.
> In Follower.shutdown() :
> {code:java}
>     public void shutdown() {
>         LOG.info("shutdown Follower");
> +       // invoke Learner.shutdown()
>         super.shutdown();   
>     } {code}
>  
> In Learner.java:
> {code:java}
>     public void shutdown() {
>         ...
>         // shutdown previous zookeeper
>         if (zk != null) {
>             // If we haven't finished SNAP sync, force fully shutdown
>             // to avoid potential inconsistency
> +           // This will invoke ZooKeeperServer.shutdown(boolean), 
> +           // which will not shutdown syncProcessor
> +           // Before the fix of ZOOLEEPER-3642, 
> +           // FollowerZooKeeperServer.shutdown() will be invoked here
>             zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP)); 
>          }
>     } {code}
>  
> In ZooKeeperServer.java:
> {code:java}
>     public synchronized void shutdown(boolean fullyShutDown) {
>         ...
>         if (firstProcessor != null) {
> +           // For a follower, this will not shutdown its syncProcessor.
>             firstProcessor.shutdown(); 
>         }
>         ...
>     } {code}
>  
> In expectation, Follower.shutdown() should invoke 
> LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
> {code:java}
>     public synchronized void shutdown() {
>         ...
>         try {
> +           // shutdown the syncProcessor here
>             if (syncProcessor != null) {
>                 syncProcessor.shutdown();     
>             }
>         } ...
>     } {code}
> Observer.shutdown() has the similar problem.
>  
> h2. Potential Risk
> When Follower.shutdown() is called, the follower's QuorumPeer thread may 
> invoke fastForwardDataBase() and 
> update the lastProcessedZxid for the election and recovery phase before its 
> syncThread drains the pending requests and flushes them to disk.
> In consequence, this lastProcessedZxid is not the latest zxid in its log, 
> leading to log inconsistency after the SYNC phase. (Similar to the symptoms 
> of ZOOKEEPER-2845.)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeep

[jira] [Updated] (ZOOKEEPER-4713) ObserverZooKeeperServer.shutdown() is redundant

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4713:
--
Description: 
After the 
[FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
 of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible for 
the shutdown logic of both the follower and observer. 
ObserverZooKeeperServer.shutdown() seems redundant, because it is not in the 
call stack of Observer.shutdown(). (Note that FollowerZooKeeperServer does not 
have the shutdown() method.)

Related analysis can be found in ZOOKEEPER-4712

  was:
After the 
[FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
 of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible for 
the shutdown logic of both the follower and observer. 
ObserverZooKeeperServer.shutdown() seems redundant.

Related analysis can be found in 
[ZOOKEEPER-4712|https://issues.apache.org/jira/browse/ZOOKEEPER-4712)ZOOKEEPER-4712]


> ObserverZooKeeperServer.shutdown() is redundant
> ---
>
> Key: ZOOKEEPER-4713
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4713
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.5.10, 3.6.3, 3.7.0, 3.8.0, 3.7.1, 3.6.4, 3.8.1
>Reporter: Sirius
>Priority: Minor
>
> After the 
> [FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
>  of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible 
> for the shutdown logic of both the follower and observer. 
> ObserverZooKeeperServer.shutdown() seems redundant, because it is not in the 
> call stack of Observer.shutdown(). (Note that FollowerZooKeeperServer does 
> not have the shutdown() method.)
> Related analysis can be found in ZOOKEEPER-4712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4713) ObserverZooKeeperServer.shutdown() is redundant

2023-06-29 Thread Sirius (Jira)
Sirius created ZOOKEEPER-4713:
-

 Summary: ObserverZooKeeperServer.shutdown() is redundant
 Key: ZOOKEEPER-4713
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4713
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4
Reporter: Sirius


After the 
[FIX|https://github.com/apache/zookeeper/commit/66646796c2173423655c7faf2b458b658143e6b5]
 of ZOOKEEPER-1796, LearnerZooKeeperServer.shutdown() should be responsible for 
the shutdown logic of both the follower and observer. 
ObserverZooKeeperServer.shutdown() seems redundant.

Related analysis can be found in 
[ZOOKEEPER-4712|https://issues.apache.org/jira/browse/ZOOKEEPER-4712)ZOOKEEPER-4712]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h2. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h2. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h2. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

 
h3. Example trace

(TODO)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdow

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may invoke 
fastForwardDataBase() and 

update the lastProcessedZxid for the election and recovery phase before its 
syncThread drains the pending requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeep

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of [ZOOKEEPER-3642|https://issues.apache.org/jira/browse/ZOOKEEPER-3642], 
Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeep

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[FIX|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOKEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of [ZOOKEEPER-3642|https://issues.apache.org/jira/browse/ZOOKEEPER-3642], 
Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeep

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example.

In Follower.shutdown() :
{code:java}
    public void shutdown() {
        LOG.info("shutdown Follower");
+       // invoke Learner.shutdown()
        super.shutdown();   
    } {code}
 

In Learner.java:
{code:java}
    public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+           // which will not shutdown syncProcessor
+           // Before the fix of ZOOLEEPER-3642, 
+           // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));   
       }
    } {code}
 

In ZooKeeperServer.java:
{code:java}
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+           // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
        ...
    } {code}

 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
{code:java}
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    } {code}

Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) ->  LeaderZooKeeper.shutdown() -> 
ZooKeeperServer.shutdown() -> ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * Observer.shutdown() -> Learner.shutdown() -> 
{*}ObserverZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 * Follower.shutdown() -> Learner.shutdown() -> 
{*}FollowerZooKeeperServer.shutdown() -{*}> ZooKeeperServer.shutdown() -> 
ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()

{        LOG.info("shutdown Follower"); + // invoke Learner.shutdown()        
super.shutdown();   }

In Learner.java:
public void shutdown() {
      ...
       // shutdown previous zookeeper
       if (zk != null)

{            // If we haven't finished SNAP sync, force fully shutdown          
  // to avoid potential inconsistency +         // This will invoke 
ZooKeeperServer.shutdown(boolean), + // which will not shutdown syncProcessor + 
// Before the fix of ZOOLEEPER-3642, + // FollowerZooKeeperServer.shutdown() 
will be invoked here            
zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));         }

  }
In ZooKeeperServer.java:
   public synchronized void shutdown(boolean fullyShutDown) {
      ...
       if (firstProcessor != null)

{ + // For a follower, this will not shutdown its syncProcessor.            
firstProcessor.shutdown();       }

    ...
  }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
   public synchronized void shutdown() {
       ...
       try {
+         // shutdown the syncProcessor here
           if (syncProcessor != null)

{                syncProcessor.shutdown();               }

      } ...
  }
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -- 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) -> LeaderZooKeeper.shutdown() 
->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * 
Observer.shutdown(){-}>Learner.shutdown(){-}>{*}ObserverZooKeeperServer.shutdown(){*}{-}>ZooKeeperServer.shutdown(){-}>ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown(){-}>Learner.shutdown(){-}>{*}FollowerZooKeeperServer.shutdown(){*}>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()

{        LOG.info("shutdown Follower"); + // invoke Learner.sh

[jira] [Updated] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sirius updated ZOOKEEPER-4712:
--
Description: 
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* Observer.shutdown() -> Learner.shutdown() 
->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* Follower.shutdown() -> Learner.shutdown() -- 
->ZooKeeperServer.shutdown(boolean)

 * (For comparison) Leader.shutdown(String) -> LeaderZooKeeper.shutdown() 
->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X,
 * 
Observer.shutdown(){-}>Learner.shutdown(){-}>{*}ObserverZooKeeperServer.shutdown(){*}{-}>ZooKeeperServer.shutdown(){-}>ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown(){-}>Learner.shutdown(){-}>{*}FollowerZooKeeperServer.shutdown(){*}>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
   public void shutdown()

{        LOG.info("shutdown Follower"); + // invoke Learner.shutdown()        
super.shutdown();   }

In Learner.java:
public void shutdown() {
      ...
       // shutdown previous zookeeper
       if (zk != null)

{            // If we haven't finished SNAP sync, force fully shutdown          
  // to avoid potential inconsistency +         // This will invoke 
ZooKeeperServer.shutdown(boolean), + // which will not shutdown syncProcessor + 
// Before the fix of ZOOLEEPER-3642, + // FollowerZooKeeperServer.shutdown() 
will be invoked here            
zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));         }

  }
In ZooKeeperServer.java:
   public synchronized void shutdown(boolean fullyShutDown) {
      ...
       if (firstProcessor != null)

{ + // For a follower, this will not shutdown its syncProcessor.            
firstProcessor.shutdown();       }

    ...
  }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
   public synchronized void shutdown() {
       ...
       try {
+         // shutdown the syncProcessor here
           if (syncProcessor != null)

{                syncProcessor.shutdown();               }

      } ...
  }
Observer.shutdown() has the similar problem.

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk.

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)

  was:
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* 
Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* 
Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * (For comparison) 
Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X, 
 * 
Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 

h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
    public void shutdown() {
        LOG.info("shutdown Follower");
+   // invoke Learner.shutdown()
        super.shutdow

[jira] [Created] (ZOOKEEPER-4712) Follower.shutdown() and Observer.shutdown() do not correctly shutdown the syncProcessor, which may lead to potential data inconsistency

2023-06-29 Thread Sirius (Jira)
Sirius created ZOOKEEPER-4712:
-

 Summary: Follower.shutdown() and Observer.shutdown() do not 
correctly shutdown the syncProcessor, which may lead to potential data 
inconsistency
 Key: ZOOKEEPER-4712
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4
Reporter: Sirius


Follower.shutdown() and Observer.shutdown() do not correctly shutdown the 
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).

 

A follower / observer will invoke syncProcessor.shutdown() in 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(), 
respectively.

However, after the 
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
 of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke 
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.

 
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
 * *(Buggy)* 
Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * *(Buggy)* 
Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)

 * (For comparison) 
Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 
h5. For comparison, in version 3.4.X, 
 * 
Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 * 
Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)

 

h4. Code Details

Take version 3.8.0 as an example. In Follower.shutdown() :
    public void shutdown() {
        LOG.info("shutdown Follower");
+   // invoke Learner.shutdown()
        super.shutdown();   
    }
In Learner.java:
public void shutdown() {
        ...
        // shutdown previous zookeeper
        if (zk != null) {
            // If we haven't finished SNAP sync, force fully shutdown
            // to avoid potential inconsistency
+           // This will invoke ZooKeeperServer.shutdown(boolean), 
+   // which will not shutdown syncProcessor
+   // Before the fix of ZOOLEEPER-3642, 
+   // FollowerZooKeeperServer.shutdown() will be invoked here
            zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));  

        }
    }
In ZooKeeperServer.java:
    public synchronized void shutdown(boolean fullyShutDown) {
        ...
        if (firstProcessor != null) {
+   // For a follower, this will not shutdown its syncProcessor.
            firstProcessor.shutdown(); 
        }
    ...
    }
 

In expectation, Follower.shutdown() should invoke 
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
    public synchronized void shutdown() {
        ...
        try {
+           // shutdown the syncProcessor here
            if (syncProcessor != null) {
                syncProcessor.shutdown();     
            }
        } ...
    }
Observer.shutdown() has the similar problem. 

 
h4. Potential Risk

When Follower.shutdown() is called, the follower's QuorumPeer thread may update 
its lastProcessedZxid for the election before its syncThread drains the pending 
requests and flushes them to disk. 

In consequence, this lastProcessedZxid is not the latest zxid in its log, 
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of 
ZOOKEEPER-2845.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4711) a data race in org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers

2023-06-29 Thread lujie (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated ZOOKEEPER-4711:
-
Description: 
When we run :

mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false

The following method in class : org.apache.zookeeper.server.watch.WatcherCleaner
{code:java}
public void addDeadWatcher(int watcherBit) {
        // Wait if there are too many watchers waiting to be closed,
        // this is will slow down the socket packet processing and
        // the adding watches in the ZK pipeline.
        while (maxInProcessingDeadWatchers > 0 && !stopped && 
totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
            try {
                RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning");
                long startTime = Time.currentElapsedTime();
                synchronized (processingCompletedEvent) {
                    processingCompletedEvent.wait(100);
                }
                long latency = Time.currentElapsedTime() - startTime;
                
ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
            } catch (InterruptedException e) {
                LOG.info("Got interrupted while waiting for dead watches queue 
size");
                break;
            }
        }
        synchronized (this) {
            
            if (deadWatchers.add(watcherBit)) {
                totalDeadWatchers.incrementAndGet();
                ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
                if (deadWatchers.size() >= watcherCleanThreshold) {
                    synchronized (cleanEvent) {
                        cleanEvent.notifyAll();
                    }
                }
            }

        }
    }{code}
{code:java}
@Override
    public void run() {
        while (!stopped) {
            synchronized (cleanEvent) {
                try {
                    // add some jitter to avoid cleaning dead watchers at the
                    // same time in the quorum
                    if (!stopped && deadWatchers.size() < 
watcherCleanThreshold) {
                        
                        int maxWaitMs = (watcherCleanIntervalInSeconds
                                         + 
ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
1000;
                        cleanEvent.wait(maxWaitMs);
                    }
                } catch (InterruptedException e) {
                    LOG.info("Received InterruptedException while waiting for 
cleanEvent");
                    break;
                }
            }            if (deadWatchers.isEmpty()) {
                continue;
            }            synchronized (this) {
                // Clean the dead watchers need to go through all the current
                // watches, which is pretty heavy and may take a second if
                // there are millions of watches, that's why we're doing lazily
                // batch clean up in a separate thread with a snapshot of the
                // current dead watchers.
                final Set snapshot = new HashSet<>(deadWatchers);
                deadWatchers.clear();
                int total = snapshot.size();
                LOG.info("Processing {} dead watchers", total);
                cleaners.schedule(new WorkRequest() {
                    @Override
                    public void doWork() throws Exception {
                        long startTime = Time.currentElapsedTime();
                        listener.processDeadWatchers(snapshot);
                        long latency = Time.currentElapsedTime() - startTime;
                        LOG.info("Takes {} to process {} watches", latency, 
total);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEANER_LATENCY.add(latency);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEARED.add(total);
                        totalDeadWatchers.addAndGet(-total);
                        synchronized (processingCompletedEvent) {
                            processingCompletedEvent.notifyAll();
                        }
                    }
                });
            }
        }
        LOG.info("WatcherCleaner thread exited");
    }{code}
As we can see, the two methods visist deadWatchers Object by different thread. 
*Thread in run()* is *read* operation on deadWachers and Thread in 
addDeadWatcher is *write* operation on deadWachers. This causes a data race 
without any lock.

  was:
When we run :

mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false

[jira] [Updated] (ZOOKEEPER-4711) a data race in org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers

2023-06-29 Thread lujie (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated ZOOKEEPER-4711:
-
Summary: a data race in 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers   (was: 
There is a data race bettween run() and addDeadWatcher in 
org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test.)

> a data race in 
> org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> ---
>
> Key: ZOOKEEPER-4711
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4711
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
> Environment: download zookeeper 3.9.0-SNAPSHOT from github repository 
> ([https://github.com/apache/zookeeper)]
> Then run : mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
>Reporter: lujie
>Priority: Critical
>
> When we run :
> mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
> The method of addDeadWatcher
> (
>             System.out.println("2s::" +Thread.currentThread().getName()+ "  
> "+System.identityHashCode(deadWatchers)+"  " + System.currentTimeMillis());
> this is my debug info.
> )
> {code:java}
> public void addDeadWatcher(int watcherBit) {
>         // Wait if there are too many watchers waiting to be closed,
>         // this is will slow down the socket packet processing and
>         // the adding watches in the ZK pipeline.
>         while (maxInProcessingDeadWatchers > 0 && !stopped && 
> totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
>             try {
>                 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
> cleaning");
>                 long startTime = Time.currentElapsedTime();
>                 synchronized (processingCompletedEvent) {
>                     processingCompletedEvent.wait(100);
>                 }
>                 long latency = Time.currentElapsedTime() - startTime;
>                 
> ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
>             } catch (InterruptedException e) {
>                 LOG.info("Got interrupted while waiting for dead watches 
> queue size");
>                 break;
>             }
>         }
>         synchronized (this) {
>             
>             if (deadWatchers.add(watcherBit)) {
>                 totalDeadWatchers.incrementAndGet();
>                 ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
>                 if (deadWatchers.size() >= watcherCleanThreshold) {
>                     synchronized (cleanEvent) {
>                         cleanEvent.notifyAll();
>                     }
>                 }
>             }
>         }
>     }{code}
>  
> {code:java}
> @Override
>     public void run() {
>         while (!stopped) {
>             synchronized (cleanEvent) {
>                 try {
>                     // add some jitter to avoid cleaning dead watchers at the
>                     // same time in the quorum
>                     if (!stopped && deadWatchers.size() < 
> watcherCleanThreshold) {
>                         
>                         int maxWaitMs = (watcherCleanIntervalInSeconds
>                                          + 
> ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
> 1000;
>                         cleanEvent.wait(maxWaitMs);
>                     }
>                 } catch (InterruptedException e) {
>                     LOG.info("Received InterruptedException while waiting for 
> cleanEvent");
>                     break;
>                 }
>             }            if (deadWatchers.isEmpty()) {
>                 continue;
>             }            synchronized (this) {
>                 // Clean the dead watchers need to go through all the current
>                 // watches, which is pretty heavy and may take a second if
>                 // there are millions of watches, that's why we're doing 
> lazily
>                 // batch clean up in a separate thread with a snapshot of the
>                 // current

[jira] [Updated] (ZOOKEEPER-4711) There is a data race bettween run() and addDeadWatcher in org.apache.zookeeper.server.watch.WatcherCleaner class when run org.apache.zookeeper.server.watch.WatchManag

2023-06-29 Thread lujie (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated ZOOKEEPER-4711:
-
Summary: There is a data race bettween run() and addDeadWatcher in 
org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test. 
 (was: There is a data race bettween run() and "public void addDeadWatcher(int 
watcherBit)" in org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test.)

> There is a data race bettween run() and addDeadWatcher in 
> org.apache.zookeeper.server.watch.WatcherCleaner class when run 
> org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit 
> test.
> -
>
> Key: ZOOKEEPER-4711
>     URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4711
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
> Environment: download zookeeper 3.9.0-SNAPSHOT from github repository 
> ([https://github.com/apache/zookeeper)]
> Then run : mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
>Reporter: lujie
>Priority: Critical
>
> When we run :
> mvn test -Dmaven.test.failure.ignore=true 
> -Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
> -DfailIfNoTests=false -DredirectTestOutputToFile=false
> The method of addDeadWatcher
> (
>             System.out.println("2s::" +Thread.currentThread().getName()+ "  
> "+System.identityHashCode(deadWatchers)+"  " + System.currentTimeMillis());
> this is my debug info.
> )
> {code:java}
> public void addDeadWatcher(int watcherBit) {
>         // Wait if there are too many watchers waiting to be closed,
>         // this is will slow down the socket packet processing and
>         // the adding watches in the ZK pipeline.
>         while (maxInProcessingDeadWatchers > 0 && !stopped && 
> totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
>             try {
>                 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
> cleaning");
>                 long startTime = Time.currentElapsedTime();
>                 synchronized (processingCompletedEvent) {
>                     processingCompletedEvent.wait(100);
>                 }
>                 long latency = Time.currentElapsedTime() - startTime;
>                 
> ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
>             } catch (InterruptedException e) {
>                 LOG.info("Got interrupted while waiting for dead watches 
> queue size");
>                 break;
>             }
>         }
>         synchronized (this) {
>             
>             if (deadWatchers.add(watcherBit)) {
>                 totalDeadWatchers.incrementAndGet();
>                 ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
>                 if (deadWatchers.size() >= watcherCleanThreshold) {
>                     synchronized (cleanEvent) {
>                         cleanEvent.notifyAll();
>                     }
>                 }
>             }
>         }
>     }{code}
>  
> {code:java}
> @Override
>     public void run() {
>         while (!stopped) {
>             synchronized (cleanEvent) {
>                 try {
>                     // add some jitter to avoid cleaning dead watchers at the
>                     // same time in the quorum
>                     if (!stopped && deadWatchers.size() < 
> watcherCleanThreshold) {
>                         
>                         int maxWaitMs = (watcherCleanIntervalInSeconds
>                                          + 
> ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
> 1000;
>                         cleanEvent.wait(maxWaitMs);
>                     }
>                 } catch (InterruptedException e) {
>                     LOG.info("Received InterruptedException while waiting for 
> cleanEvent");
>                     break;
>                 }
>             }            if (deadWatchers.isEmpty()) {
>                 continue;
>             }       

[jira] [Created] (ZOOKEEPER-4711) There is a data race bettween run() and "public void addDeadWatcher(int watcherBit)" in org.apache.zookeeper.server.watch.WatcherCleaner class when run org.apache.zoo

2023-06-29 Thread lujie (Jira)
lujie created ZOOKEEPER-4711:


 Summary: There is a data race bettween run() and "public void 
addDeadWatcher(int watcherBit)" in 
org.apache.zookeeper.server.watch.WatcherCleaner class when run 
org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers junit test.
 Key: ZOOKEEPER-4711
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4711
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.9.0
 Environment: download zookeeper 3.9.0-SNAPSHOT from github repository 
([https://github.com/apache/zookeeper)]

Then run : mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false
Reporter: lujie


When we run :

mvn test -Dmaven.test.failure.ignore=true 
-Dtest=org.apache.zookeeper.server.watch.WatchManagerTest#testDeadWatchers 
-DfailIfNoTests=false -DredirectTestOutputToFile=false

The method of addDeadWatcher

(
            System.out.println("2s::" +Thread.currentThread().getName()+ "  
"+System.identityHashCode(deadWatchers)+"  " + System.currentTimeMillis());
this is my debug info.
)
{code:java}
public void addDeadWatcher(int watcherBit) {
        // Wait if there are too many watchers waiting to be closed,
        // this is will slow down the socket packet processing and
        // the adding watches in the ZK pipeline.
        while (maxInProcessingDeadWatchers > 0 && !stopped && 
totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
            try {
                RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning");
                long startTime = Time.currentElapsedTime();
                synchronized (processingCompletedEvent) {
                    processingCompletedEvent.wait(100);
                }
                long latency = Time.currentElapsedTime() - startTime;
                
ServerMetrics.getMetrics().ADD_DEAD_WATCHER_STALL_TIME.add(latency);
            } catch (InterruptedException e) {
                LOG.info("Got interrupted while waiting for dead watches queue 
size");
                break;
            }
        }
        synchronized (this) {
            
            if (deadWatchers.add(watcherBit)) {
                totalDeadWatchers.incrementAndGet();
                ServerMetrics.getMetrics().DEAD_WATCHERS_QUEUED.add(1);
                if (deadWatchers.size() >= watcherCleanThreshold) {
                    synchronized (cleanEvent) {
                        cleanEvent.notifyAll();
                    }
                }
            }

        }
    }{code}
 
{code:java}
@Override
    public void run() {
        while (!stopped) {
            synchronized (cleanEvent) {
                try {
                    // add some jitter to avoid cleaning dead watchers at the
                    // same time in the quorum
                    if (!stopped && deadWatchers.size() < 
watcherCleanThreshold) {
                        
                        int maxWaitMs = (watcherCleanIntervalInSeconds
                                         + 
ThreadLocalRandom.current().nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 
1000;
                        cleanEvent.wait(maxWaitMs);
                    }
                } catch (InterruptedException e) {
                    LOG.info("Received InterruptedException while waiting for 
cleanEvent");
                    break;
                }
            }            if (deadWatchers.isEmpty()) {
                continue;
            }            synchronized (this) {
                // Clean the dead watchers need to go through all the current
                // watches, which is pretty heavy and may take a second if
                // there are millions of watches, that's why we're doing lazily
                // batch clean up in a separate thread with a snapshot of the
                // current dead watchers.
                final Set snapshot = new HashSet<>(deadWatchers);
                deadWatchers.clear();
                int total = snapshot.size();
                LOG.info("Processing {} dead watchers", total);
                cleaners.schedule(new WorkRequest() {
                    @Override
                    public void doWork() throws Exception {
                        long startTime = Time.currentElapsedTime();
                        listener.processDeadWatchers(snapshot);
                        long latency = Time.currentElapsedTime() - startTime;
                        LOG.info("Takes {} to process {} watches", latency, 
total);
                        
ServerMetrics.getMetrics().DEAD_WATCHERS_CLEANER_LATENCY.add(latency);
                        
ServerMetrics.getMetrics()

[jira] [Commented] (ZOOKEEPER-4628) CVE-2022-42003 CVE-2022-42004 HIGH: upgrade jackson-databind-2.13.3.jar to 2.13.4.1

2023-06-27 Thread AvnerW (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737581#comment-17737581
 ] 

AvnerW commented on ZOOKEEPER-4628:
---

Are there any plans to upgrade jackson-databind, jackson-core etc. to 2.15.x 
for the next ZK releases 3.8.2/3.9.0?
There are few scanner reports about 2.13.x (e.g.: sonatype-2022-6438).

> CVE-2022-42003 CVE-2022-42004 HIGH: upgrade jackson-databind-2.13.3.jar to 
> 2.13.4.1
> ---
>
> Key: ZOOKEEPER-4628
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4628
> Project: ZooKeeper
>  Issue Type: Task
>  Components: security
>Affects Versions: 3.5.10, 3.8.0, 3.7.1
>Reporter: Ivo Dujmovic
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Two High issues 
> [https://nvd.nist.gov/vuln/detail/CVE-2022-42003]
> [https://nvd.nist.gov/vuln/detail/CVE-2022-42004]
> affect jackson version 2.13.3 which zk should update to 2.13.4.1 
> Other projects have done this, but Zookeeper has not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-27 Thread Paolo Patierno (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737473#comment-17737473
 ] 

Paolo Patierno commented on ZOOKEEPER-4708:
---

This problem we are facing looks pretty similar to this 
[https://github.com/confluentinc/cp-helm-charts/issues/205]

AFAIU, ZooKeeper just gives up when after some time/attempts it's not able to 
form the quorum (maybe because DNS resolving issues). Raising the NPE was 
helpful because it drove ZooKeeper to crash, Kubernetes to restart the 
container. The quorum is formed because pod is still up and DNS already 
resolved. Avoiding the NPE drives ZooKeeper to give up forming the quorum and 
get stuck.

We also see that in this situation if you try to make a connection, it logs "ZK 
Down" ... which is the truth because the ensamble is not actually working.

> ZooKeeper 3.6.4 quorum failing due to  address
> --
>
> Key: ZOOKEEPER-4708
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4708
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.4, 3.8.1
>Reporter: Paolo Patierno
>Priority: Major
>
> We work on the Strimzi project which is about deploying an Apache Kafka 
> cluster on Kubernetes together with a ZooKeeper ensamble.
> Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
> when running on minikube for development purposes.
> With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to 
> have issues during the quorum formation and leader election.
> The first one was about ZooKeeper pods not able to bind the quorum port 3888 
> to the Cluster IP but during the DNS resolution they get the loopback address 
> instead.
> Following a possible log at ZooKeeper startup where you can see the binding 
> at 127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a 
> valid not loopback IP address).
>  
> {code:java}
> INFO 3 is accepting connections now, my election bind port: 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
> (org.apache.zookeeper.server.quorum.QuorumCnxManager) 
> [ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
> This specific issue had two solutions: using quorumListenOnAllIPs=true on 
> ZooKeeper configuration or binding to 0.0.0.0 address. {code}
>  
> Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
> for getting 3.6.4 working. What is changed from this perspective?
> Said that, While binding to 0.0.0.0 seems to work fine, using the 
> quorumListenOnAllIPs=true doesn't.
> Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
> ZooKeeper leader (ID=3) we see the following.
> (Starting with ** you can see some additional logs added to 
> {{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
> get more information.)
> {code:java}
> 2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 
> 3]]; starting up and setting last processed zxid: 0x1 
> (org.apache.zookeeper.server.quorum.Leader) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,990 INFO ** 
> newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr 
> = 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
> (org.apache.zookeeper.server.quorum.Leader) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:2888
>  (org.apache.zookeeper.server.quorum.Leader) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,992 INFO ** qs.addr 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888,
>  qs.electionAddr 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888,
>  qs.clientAddr/127.0.0.1:12181 
> (org.apache.zookeeper.server.quorum.QuorumPeer) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,992 DEBUG zookeeper 
> (org.apache.zookeeper.common.PathTrie) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,993 WARN Restarting Leader Election 
> (org.apache.zookeeper.server.quorum.QuorumPeer) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] {code}
> So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
> ID=1.
> As you can see we are in the {{Leader#startZ

[jira] [Commented] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-26 Thread Paolo Patierno (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737449#comment-17737449
 ] 

Paolo Patierno commented on ZOOKEEPER-4708:
---

I went with git bisect between release-3.6.3 tag (good) and release-3.6.4 tag 
(bad). It ended highlighting the following commit as the reason of a not 
working 3.6.4:
{code:java}
357e88c1438780e28d36bf54784937e18547e136 is the first bad commit
commit 357e88c1438780e28d36bf54784937e18547e136
Author: Enrico Olivelli 
Date:   Tue Jan 25 12:48:34 2022 +    ZOOKEEPER-3988: 
rg.apache.zookeeper.server.NettyServerCnxn.receiveMessage throws 
NullPointerException
    
    Modifications:
    - prevent the NPE, the code that throws NPE is only to record some metrics 
for non TLS requests
    
    Related to:
    - apache/pulsar#11070
    - https://github.com/pravega/zookeeper-operator/issues/393
    
    Author: Enrico Olivelli 
    
    Reviewers: Nicolo² Boschi , Andor Molnar 
, Mate Szalay-Beko 
    
    Closes #1798 from eolivelli/fix/ZOOKEEPER-3988-npe
    
    (cherry picked from commit 957f8fc0afbeca638f13f6fb739e49a921da2b9d)
    Signed-off-by: Mate Szalay-Beko  
.../zookeeper/server/NettyServerCnxnFactory.java   | 18 ++-
 .../zookeeper/server/NettyServerCnxnTest.java      | 26 +++---
 .../apache/zookeeper/server/TxnLogCountTest.java   |  2 +-
 3 files changed, 31 insertions(+), 15 deletions(-) {code}
Taking a look at the NettServerCnxnFactory class, it's just adding a check 
around zkServer to avoid that an NPE is raised on calling 
zkServer.serverStats() when it's null. I think there is nothing wrong with it, 
but when the NPE was raised before the fix, it forced the container restarting 
and the ZooKeeper nodes were able to form the quorum. Avoiding the NPE seems to 
leave ZooKeeper in a situation where it's not able to recover and form the 
quorum, it's stuck.

At this point my question could be, is it normal that zkServer is null? Is it 
showing a more subtle bug? The NPE wasn't happening with 3.6.3.

 

> ZooKeeper 3.6.4 quorum failing due to  address
> --
>
> Key: ZOOKEEPER-4708
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4708
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.4, 3.8.1
>Reporter: Paolo Patierno
>Priority: Major
>
> We work on the Strimzi project which is about deploying an Apache Kafka 
> cluster on Kubernetes together with a ZooKeeper ensamble.
> Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
> when running on minikube for development purposes.
> With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to 
> have issues during the quorum formation and leader election.
> The first one was about ZooKeeper pods not able to bind the quorum port 3888 
> to the Cluster IP but during the DNS resolution they get the loopback address 
> instead.
> Following a possible log at ZooKeeper startup where you can see the binding 
> at 127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a 
> valid not loopback IP address).
>  
> {code:java}
> INFO 3 is accepting connections now, my election bind port: 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
> (org.apache.zookeeper.server.quorum.QuorumCnxManager) 
> [ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
> This specific issue had two solutions: using quorumListenOnAllIPs=true on 
> ZooKeeper configuration or binding to 0.0.0.0 address. {code}
>  
> Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
> for getting 3.6.4 working. What is changed from this perspective?
> Said that, While binding to 0.0.0.0 seems to work fine, using the 
> quorumListenOnAllIPs=true doesn't.
> Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
> ZooKeeper leader (ID=3) we see the following.
> (Starting with ** you can see some additional logs added to 
> {{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
> get more information.)
> {code:java}
> 2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 
> 3]]; starting up and setting last processed zxid: 0x1 
> (org.apache.zookeeper.server.quorum.Leader) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,990 INFO ** 
> newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr 
> = 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
> (org.apache.zookeeper.server.quorum.Leader) 
> [QuorumPeer[myi

[jira] [Updated] (ZOOKEEPER-4710) Fix ZkUtil deleteInBatch() by releasing semaphore after set flag

2023-06-23 Thread Yan Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhao updated ZOOKEEPER-4710:

Summary: Fix ZkUtil deleteInBatch() by releasing semaphore after set flag  
(was: Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail)

> Fix ZkUtil deleteInBatch() by releasing semaphore after set flag
> 
>
> Key: ZOOKEEPER-4710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4710
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/zookeeper/blob/58eed9f5280be1c6a9ccacc47dd6afa65e916ae8/zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java#L111-L116
> We should set the flag before releasing the Semaphore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4710) Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail

2023-06-23 Thread Enrico Olivelli (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736471#comment-17736471
 ] 

Enrico Olivelli commented on ZOOKEEPER-4710:


Committed to master branch

> Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail
> 
>
> Key: ZOOKEEPER-4710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4710
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/zookeeper/blob/58eed9f5280be1c6a9ccacc47dd6afa65e916ae8/zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java#L111-L116
> We should set the flag before releasing the Semaphore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4710) Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail

2023-06-23 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli resolved ZOOKEEPER-4710.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail
> 
>
> Key: ZOOKEEPER-4710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4710
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/zookeeper/blob/58eed9f5280be1c6a9ccacc47dd6afa65e916ae8/zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java#L111-L116
> We should set the flag before releasing the Semaphore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (ZOOKEEPER-4710) Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail

2023-06-23 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli reassigned ZOOKEEPER-4710:
--

Assignee: Enrico Olivelli

> Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail
> 
>
> Key: ZOOKEEPER-4710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4710
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/zookeeper/blob/58eed9f5280be1c6a9ccacc47dd6afa65e916ae8/zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java#L111-L116
> We should set the flag before releasing the Semaphore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4710) Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail

2023-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4710:
--
Labels: pull-request-available  (was: )

> Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail
> 
>
> Key: ZOOKEEPER-4710
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4710
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.8.1
>Reporter: Yan Zhao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/zookeeper/blob/58eed9f5280be1c6a9ccacc47dd6afa65e916ae8/zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java#L111-L116
> We should set the flag before releasing the Semaphore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4710) Flaky test of org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail

2023-06-21 Thread Yan Zhao (Jira)
Yan Zhao created ZOOKEEPER-4710:
---

 Summary: Flaky test of 
org.apache.zookeeper.ZooKeeperTest#testDeleteRecursiveFail
 Key: ZOOKEEPER-4710
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4710
 Project: ZooKeeper
  Issue Type: Wish
  Components: server
Affects Versions: 3.8.1
Reporter: Yan Zhao


https://github.com/apache/zookeeper/blob/58eed9f5280be1c6a9ccacc47dd6afa65e916ae8/zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java#L111-L116

We should set the flag before releasing the Semaphore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-4709:
--
Labels: dependency-upgrade pull-request-available  (was: dependency-upgrade)

> Upgrade Netty to 4.1.94.Final
> -
>
> Key: ZOOKEEPER-4709
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.7.1, 3.8.1
>Reporter: Fabio Buso
>Priority: Major
>  Labels: dependency-upgrade, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
> several improvements and bug fixes, including a resolution for 
> [CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
>  related to potential memory allocation vulnerabilities during a TLS 
> handshake with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4709) Upgrade Netty to 4.1.94.Final

2023-06-21 Thread Fabio Buso (Jira)
Fabio Buso created ZOOKEEPER-4709:
-

 Summary: Upgrade Netty to 4.1.94.Final
 Key: ZOOKEEPER-4709
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4709
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.8.1, 3.7.1
Reporter: Fabio Buso


[Netty 4.1.94|https://netty.io/news/2023/06/19/4-1-94-Final.html] includes 
several improvements and bug fixes, including a resolution for 
[CVE-2023-34462|https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845]
 related to potential memory allocation vulnerabilities during a TLS handshake 
with Server Name Indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4026) CREATE2 requests embeded in a MULTI request only get a regular CREATE response

2023-06-21 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4026.

Fix Version/s: 3.7.2
   3.9.0
   3.8.2
   Resolution: Fixed

Issue resolved by pull request 1978
[https://github.com/apache/zookeeper/pull/1978]

> CREATE2 requests embeded in a MULTI request only get a regular CREATE response
> --
>
> Key: ZOOKEEPER-4026
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4026
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.8, 3.6.2
> Environment: Tested with official docker hub images of the server and 
> a python Zookeeper client (Kazoo, http://github.com/python-zk/kazoo)
>Reporter: Charles-Henri de Boysson
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.2, 3.9.0, 3.8.2
>
> Attachments: MULTI_CREATE2_bug.txt
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When making a MULTI request with a CREATE2 payload, the reply from the server 
> only contains a regular CREATE response (the path but without the stat data).
>  
> See attachment for a capture and decode of the request/reply.
>  
> How to reproduce:
>  * Connect to the ensemble
>  * Make a MULTI (OpCode 14) request with a CREATE2 operation (OpCode 15)
>  * Reply from server is success, znode is create, but the MULTI reply 
> contains a CREATE (OpCode 1)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-21 Thread Paolo Patierno (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735607#comment-17735607
 ] 

Paolo Patierno commented on ZOOKEEPER-4708:
---

It turned out that using the binding to 0.0.0.0 doesn't work properly for a 1 
node ZooKeeper ensamble, we got the following problem:
{code:java}
2023-06-21 07:32:59,906 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.default.svc/10.244.0.54:2888 
(org.apache.zookeeper.server.quorum.Leader) [SyncThread:1]
2023-06-21 07:32:59,906 INFO ** self.getQuorumAddress() = /0.0.0.0:2888 
(org.apache.zookeeper.server.quorum.Leader) [SyncThread:1]
2023-06-21 07:32:59,907 ERROR Severe unrecoverable error, from thread : 
SyncThread:1 (org.apache.zookeeper.server.ZooKeeperCriticalThread) 
[SyncThread:1]
java.util.NoSuchElementException
    at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1599)
    at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1620)
    at 
org.apache.zookeeper.server.quorum.Leader.getDesignatedLeader(Leader.java:864)
    at org.apache.zookeeper.server.quorum.Leader.tryToCommit(Leader.java:939)
    at org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:1029)
    at 
org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:47)
    at 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
    at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)
 {code}
Still in the Leader.getDesignatedLeader and this time the self address is 
different from the one coming from the voting members (which is just one node).

The code move forward to get another candidate with long curCandidate = 
candidates.iterator().next(); which obviously doesn't exist.

I was wondering, why ZooKeeper is not able to recover or refresh the address 
resolution if this is really a slow DNS registration problem.

And just to reinforce again, this problem doesn't exist with ZooKeeper 3.6.3

> ZooKeeper 3.6.4 quorum failing due to  address
> --
>
> Key: ZOOKEEPER-4708
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4708
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.4, 3.8.1
>Reporter: Paolo Patierno
>Priority: Major
>
> We work on the Strimzi project which is about deploying an Apache Kafka 
> cluster on Kubernetes together with a ZooKeeper ensamble.
> Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
> when running on minikube for development purposes.
> With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to 
> have issues during the quorum formation and leader election.
> The first one was about ZooKeeper pods not able to bind the quorum port 3888 
> to the Cluster IP but during the DNS resolution they get the loopback address 
> instead.
> Following a possible log at ZooKeeper startup where you can see the binding 
> at 127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a 
> valid not loopback IP address).
>  
> {code:java}
> INFO 3 is accepting connections now, my election bind port: 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
> (org.apache.zookeeper.server.quorum.QuorumCnxManager) 
> [ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
> This specific issue had two solutions: using quorumListenOnAllIPs=true on 
> ZooKeeper configuration or binding to 0.0.0.0 address. {code}
>  
> Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
> for getting 3.6.4 working. What is changed from this perspective?
> Said that, While binding to 0.0.0.0 seems to work fine, using the 
> quorumListenOnAllIPs=true doesn't.
> Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
> ZooKeeper leader (ID=3) we see the following.
> (Starting with ** you can see some additional logs added to 
> {{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
> get more information.)
> {code:java}
> 2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 
> 3]]; starting up and setting last processed zxid: 0x1 
> (org.apache.zookeeper.server.quorum.Leader) 
> [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
> 2023-06-19 12:32:51,990 INFO ** 
> newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr 
> = 
> my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
> (org.apache.zookeeper.se

[jira] [Updated] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-20 Thread Paolo Patierno (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno updated ZOOKEEPER-4708:
--
Description: 
We work on the Strimzi project which is about deploying an Apache Kafka cluster 
on Kubernetes together with a ZooKeeper ensamble.

Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
when running on minikube for development purposes.

With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have 
issues during the quorum formation and leader election.

The first one was about ZooKeeper pods not able to bind the quorum port 3888 to 
the Cluster IP but during the DNS resolution they get the loopback address 
instead.
Following a possible log at ZooKeeper startup where you can see the binding at 
127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid 
not loopback IP address).

 
{code:java}
INFO 3 is accepting connections now, my election bind port: 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
(org.apache.zookeeper.server.quorum.QuorumCnxManager) 
[ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
This specific issue had two solutions: using quorumListenOnAllIPs=true on 
ZooKeeper configuration or binding to 0.0.0.0 address. {code}
 

Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
for getting 3.6.4 working. What is changed from this perspective?

Said that, While binding to 0.0.0.0 seems to work fine, using the 
quorumListenOnAllIPs=true doesn't.

Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
ZooKeeper leader (ID=3) we see the following.
(Starting with ** you can see some additional logs added to 
{{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
get more information.)
{code:java}
2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; 
starting up and setting last processed zxid: 0x1 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 INFO ** qs.addr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, 
qs.electionAddr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, 
qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,993 WARN Restarting Leader Election 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] {code}
So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
ID=1.
As you can see we are in the {{Leader#startZkServer}} method, and because of 
the reconfiguration enabled, the designatedLeader is processed. The problem is 
that the {{Leader#getDesignatedLeader}} is not returning “self” as leader but 
another one (ID=1), because of the difference in the quorum address.
>From the above log, it’s not an actual difference in terms of addresses but 
>the {{self.getQuorumAddress()}} is returning an  (even if it’s 
>still the same hostname related to ZooKeeper-2 instance). This difference 
>causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported 
>as leader but it’s not able to commit, so prevents any requests and the 
>ZooKeeper ensemble gets stuck.
{code:java}
2023-06-19 12:32:51,996 WARN Suggested leader: 1 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will 
be initialized with allowedToCommit = false 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] {code}
The overall issue could be related to DNS problems, with DNS records not 
registered yet during pod initialization (where ZooKeeper is running on 
Kubernetes). But we don’t understand why it’s not able to recover somehow.

What we don't get a reason is why ZooKeeper 3.6.3 didn't need any binding 
specific configuration an

[jira] [Updated] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-20 Thread Paolo Patierno (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno updated ZOOKEEPER-4708:
--
Description: 
We work on the Strimzi project which is about deploying an Apache Kafka cluster 
on Kubernetes together with a ZooKeeper ensamble.

Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
when running on minikube for development purposes.

With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have 
issues during the quorum formation and leader election.

The first one was about ZooKeeper pods not able to bind the quorum port 3888 to 
the Cluster IP but during the DNS resolution they get the loopback address 
instead.
Following a possible log at ZooKeeper startup where you can see the binding at 
127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid 
not loopback IP address).

INFO 3 is accepting connections now, my election bind port: 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
(org.apache.zookeeper.server.quorum.QuorumCnxManager) 
[ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
This specific issue had two solutions: using quorumListenOnAllIPs=true on 
ZooKeeper configuration or binding to 0.0.0.0 address.

Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
for getting 3.6.4 working. What is changed from this perspective?

Said that, While binding to 0.0.0.0 seems to work fine, using the 
quorumListenOnAllIPs=true doesn't.

Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
ZooKeeper leader (ID=3) we see the following.
(Starting with ** you can see some additional logs added to 
{{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
get more information.)
{code:java}
2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; 
starting up and setting last processed zxid: 0x1 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 INFO ** qs.addr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, 
qs.electionAddr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, 
qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,993 WARN Restarting Leader Election 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] {code}
So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
ID=1.
As you can see we are in the {{Leader#startZkServer}} method, and because of 
the reconfiguration enabled, the designatedLeader is processed. The problem is 
that the {{Leader#getDesignatedLeader}} is not returning “self” as leader but 
another one (ID=1), because of the difference in the quorum address.
>From the above log, it’s not an actual difference in terms of addresses but 
>the {{self.getQuorumAddress()}} is returning an  (even if it’s 
>still the same hostname related to ZooKeeper-2 instance). This difference 
>causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported 
>as leader but it’s not able to commit, so prevents any requests and the 
>ZooKeeper ensemble gets stuck.

2023-06-19 12:32:51,996 WARN Suggested leader: 1 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will 
be initialized with allowedToCommit = false 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]

The overall issue could be related to DNS problems, with DNS records not 
registered yet during pod initialization (where ZooKeeper is running on 
Kubernetes). But we don’t understand why it’s not able to recover somehow.

What we don't get a reason is why ZooKeeper 3.6.3 didn't need any binding 
specific configuration and was working just fine, while t

[jira] [Updated] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-20 Thread Paolo Patierno (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno updated ZOOKEEPER-4708:
--
Description: 
We work on the Strimzi project which is about deploying an Apache Kafka cluster 
on Kubernetes together with a ZooKeeper ensamble.

Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
when running on minikube for development purposes.

With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have 
issues during the quorum formation and leader election.

The first one was about ZooKeeper pods not able to bind the quorum port 3888 to 
the Cluster IP but during the DNS resolution they get the loopback address 
instead.
Following a possible log at ZooKeeper startup where you can see the binding at 
127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid 
not loopback IP address).

 
{code:java}
INFO 3 is accepting connections now, my election bind port: 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
(org.apache.zookeeper.server.quorum.QuorumCnxManager) 
[ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
This specific issue had two solutions: using quorumListenOnAllIPs=true on 
ZooKeeper configuration or binding to 0.0.0.0 address. {code}
 

Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
for getting 3.6.4 working. What is changed from this perspective?

Said that, While binding to 0.0.0.0 seems to work fine, using the 
quorumListenOnAllIPs=true doesn't.

Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
ZooKeeper leader (ID=3) we see the following.
(Starting with ** you can see some additional logs added to 
{{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
get more information.)
{code:java}
2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; 
starting up and setting last processed zxid: 0x1 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 INFO ** qs.addr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, 
qs.electionAddr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, 
qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,993 WARN Restarting Leader Election 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] {code}
So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
ID=1.
As you can see we are in the {{Leader#startZkServer}} method, and because of 
the reconfiguration enabled, the designatedLeader is processed. The problem is 
that the {{Leader#getDesignatedLeader}} is not returning “self” as leader but 
another one (ID=1), because of the difference in the quorum address.
>From the above log, it’s not an actual difference in terms of addresses but 
>the {{self.getQuorumAddress()}} is returning an  (even if it’s 
>still the same hostname related to ZooKeeper-2 instance). This difference 
>causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported 
>as leader but it’s not able to commit, so prevents any requests and the 
>ZooKeeper ensemble gets stuck.
{code:java}
2023-06-19 12:32:51,996 WARN Suggested leader: 1 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will 
be initialized with allowedToCommit = false 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] {code}
The overall issue could be related to DNS problems, with DNS records not 
registered yet during pod initialization (where ZooKeeper is running on 
Kubernetes). But we don’t understand why it’s not able to recover somehow.

What we don't get a reason is why ZooKeeper 3.6.3 didn't need any binding 
specific configuration an

[jira] [Created] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-20 Thread Paolo Patierno (Jira)
Paolo Patierno created ZOOKEEPER-4708:
-

 Summary: ZooKeeper 3.6.4 quorum failing due to  address
 Key: ZOOKEEPER-4708
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4708
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.8.1, 3.6.4
Reporter: Paolo Patierno


We work on the Strimzi project which is about deploying an Apache Kafka cluster 
on Kubernetes together with a ZooKeeper ensamble.

Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
when running on minikube for development purposes.

With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have 
issues during the quorum formation and leader election.

The first one was about ZooKeeper pods not able to bind the quorum port 3888 to 
the Cluster IP but during the DNS resolution they get the loopback address 
instead.
Following a possible log at ZooKeeper startup where you can see the binding at 
127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid 
not loopback IP address).
INFO 3 is accepting connections now, my election bind port: 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
(org.apache.zookeeper.server.quorum.QuorumCnxManager) 
[ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
This specific issue had two solutions: using quorumListenOnAllIPs=true on 
ZooKeeper configuration or binding to 0.0.0.0 address.

Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
for getting 3.6.4 working. What is changed from this perspective?

Said that, While binding to 0.0.0.0 seems to work fine, using the 
quorumListenOnAllIPs=true doesn't.

Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
ZooKeeper leader (ID=3) we see the following.
(Starting with ** you can see some additional logs added to 
{{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
get more information.)
2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; 
starting up and setting last processed zxid: 0x1 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 INFO ** qs.addr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, 
qs.electionAddr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, 
qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,993 WARN Restarting Leader Election 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
ID=1.
As you can see we are in the {{Leader#startZkServer}} method, and because of 
the reconfiguration enabled, the designatedLeader is processed. The problem is 
that the {{Leader#getDesignatedLeader}} is not returning “self” as leader but 
another one (ID=1), because of the difference in the quorum address.
>From the above log, it’s not an actual difference in terms of addresses but 
>the {{self.getQuorumAddress()}} is returning an  (even if it’s 
>still the same hostname related to ZooKeeper-2 instance). This difference 
>causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported 
>as leader but it’s not able to commit, so prevents any requests and the 
>ZooKeeper ensemble gets stuck.
2023-06-19 12:32:51,996 WARN Suggested leader: 1 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will 
be initialized with allowedToCommit = false 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
The overall issue could be related to DNS problems, with DNS records not 
registered yet during pod initialization (where ZooKeeper is running on 
Kubernetes). But we don’t understand why it’s not able to recover somehow.

[jira] [Updated] (ZOOKEEPER-4708) ZooKeeper 3.6.4 quorum failing due to address

2023-06-20 Thread Paolo Patierno (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno updated ZOOKEEPER-4708:
--
Description: 
We work on the Strimzi project which is about deploying an Apache Kafka cluster 
on Kubernetes together with a ZooKeeper ensamble.

Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues 
when running on minikube for development purposes.

With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have 
issues during the quorum formation and leader election.

The first one was about ZooKeeper pods not able to bind the quorum port 3888 to 
the Cluster IP but during the DNS resolution they get the loopback address 
instead.
Following a possible log at ZooKeeper startup where you can see the binding at 
127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid 
not loopback IP address).


INFO 3 is accepting connections now, my election bind port: 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 
(org.apache.zookeeper.server.quorum.QuorumCnxManager) 
[ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]
This specific issue had two solutions: using quorumListenOnAllIPs=true on 
ZooKeeper configuration or binding to 0.0.0.0 address.

Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed 
for getting 3.6.4 working. What is changed from this perspective?

Said that, While binding to 0.0.0.0 seems to work fine, using the 
quorumListenOnAllIPs=true doesn't.

Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current 
ZooKeeper leader (ID=3) we see the following.
(Starting with ** you can see some additional logs added to 
{{org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader}} in order to 
get more information.)


2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; 
starting up and setting last processed zxid: 0x1 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** 
newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:2888 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 INFO ** qs.addr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, 
qs.electionAddr 
my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, 
qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,993 WARN Restarting Leader Election 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]


So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node 
ID=1.
As you can see we are in the {{Leader#startZkServer}} method, and because of 
the reconfiguration enabled, the designatedLeader is processed. The problem is 
that the {{Leader#getDesignatedLeader}} is not returning “self” as leader but 
another one (ID=1), because of the difference in the quorum address.
>From the above log, it’s not an actual difference in terms of addresses but 
>the {{self.getQuorumAddress()}} is returning an  (even if it’s 
>still the same hostname related to ZooKeeper-2 instance). This difference 
>causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported 
>as leader but it’s not able to commit, so prevents any requests and the 
>ZooKeeper ensemble gets stuck.


2023-06-19 12:32:51,996 WARN Suggested leader: 1 
(org.apache.zookeeper.server.quorum.QuorumPeer) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will 
be initialized with allowedToCommit = false 
(org.apache.zookeeper.server.quorum.Leader) 
[QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]

The overall issue could be related to DNS problems, with DNS records not 
registered yet during pod initialization (where ZooKeeper is running on 
Kubernetes). But we don’t understand why it’s not able to recover somehow.

What we don't get a reason is why ZooKeeper 3.6.3 didn't need any binding 
specific configuration and was working just fine, while the new 3.6.4 needs

[jira] [Resolved] (ZOOKEEPER-4271) Flaky test - ReadOnlyModeTest.testConnectionEvents

2023-06-19 Thread Kezhu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kezhu Wang resolved ZOOKEEPER-4271.
---
Resolution: Duplicate

> Flaky test - ReadOnlyModeTest.testConnectionEvents
> --
>
> Key: ZOOKEEPER-4271
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4271
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.6.2
>Reporter: Amichai Rothman
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test fails sometimes. If I run this test class (with 
> -Dtest=ReadOnlyModeTest) in a loop it always hits the failure eventually 
> after a few runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-3996) Flaky test: ReadOnlyModeTest.testConnectionEvents

2023-06-19 Thread Andor Molnar (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar resolved ZOOKEEPER-3996.
-
Fix Version/s: 3.9.0
 Assignee: Kezhu Wang
   Resolution: Fixed

> Flaky test: ReadOnlyModeTest.testConnectionEvents
> -
>
> Key: ZOOKEEPER-3996
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3996
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Reporter: Ling Mao
>Assignee: Kezhu Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We noticed that the unit case: ReadOnlyModeTest.testConnectionEvents has 
> failed frequently when building the CI.
> The link is: 
> https://ci-hadoop.apache.org/blue/organizations/jenkins/zookeeper-precommit-github-pr/detail/PR-1527/1/pipeline/
> {code:java}
> [2020-11-06T13:21:34.527Z] [INFO] Running 
> org.apache.zookeeper.RemoveWatchesTest
> [2020-11-06T13:21:36.136Z] [INFO] Tests run: 352, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 14.475 s - in 
> org.apache.zookeeper.common.X509UtilTest
> [2020-11-06T13:22:06.176Z] [INFO] Tests run: 13, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 414.867 s - in 
> org.apache.zookeeper.server.quorum.QuorumSSLTest
> [2020-11-06T13:22:41.949Z] [INFO] Tests run: 46, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 66.898 s - in org.apache.zookeeper.RemoveWatchesTest
> [2020-11-06T13:22:41.949Z] [INFO] 
> [2020-11-06T13:22:41.949Z] [INFO] Results:
> [2020-11-06T13:22:41.949Z] [INFO] 
> [2020-11-06T13:22:41.949Z] [ERROR] Errors: 
> [2020-11-06T13:22:41.949Z] [ERROR]   
> ReadOnlyModeTest.testConnectionEvents:205 » Timeout Failed to connect in 
> read-...
> [2020-11-06T13:22:41.949Z] [INFO] 
> [2020-11-06T13:22:41.949Z] [ERROR] Tests run: 2863, Failures: 0, Errors: 1, 
> Skipped: 4
> [2020-11-06T13:22:41.949Z] [INFO] 
> [2020-11-06T13:22:43.552Z] [INFO] 
> 
> [2020-11-06T13:22:43.552Z] [INFO] Reactor Summary for Apache ZooKeeper 
> 3.7.0-SNAPSHOT:{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    2   3   4   5   6   7   8   9   10   11   >