[jira] [Created] (ZOOKEEPER-4822) Quorum TLS - Enable member authorization based on certificate CN
Damien Diederen created ZOOKEEPER-4822: -- Summary: Quorum TLS - Enable member authorization based on certificate CN Key: ZOOKEEPER-4822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4822 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Damien Diederen Assignee: Damien Diederen Quorum TLS enables mutual authentication of quorum members. Member authorization, however, cannot be configured on the basis of the presented principal CN; a round of SASL authentication has to be performed on top of the secured connection. This ticket is about enabling authorization based on trusted client certificates. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824461#comment-17824461 ] Damien Diederen commented on ZOOKEEPER-4814: Also related: [ClickHouse ticket, "Incompatibility with Zookeeper 3.9"|https://github.com/ClickHouse/ClickHouse/issues/53749] whose custom (?) client was updated by [patch "Add support for read-only mode in ZooKeeper"|https://github.com/ClickHouse/ClickHouse/pull/57479/files]. > Protocol desynchronization after Connect for (some) old clients > --- > > Key: ZOOKEEPER-4814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.9.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > > Some old clients experience a protocol synchronization after receiving a > {{ConnectResponse}} from the server. > This started happening with ZOOKEEPER-4492, "Merge readOnly field into > ConnectRequest and Response," which writes overlong responses to clients > which do not know about the {{readOnly}} flag. > (One example of such a client is ZooKeeper's own C client library prior to > version 3.5!) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4814: --- Description: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. (One example of such a client is ZooKeeper's own C client library prior to version 3.5!) was: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. > Protocol desynchronization after Connect for (some) old clients > --- > > Key: ZOOKEEPER-4814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.9.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > > Some old clients experience a protocol synchronization after receiving a > {{ConnectResponse}} from the server. > This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest > and Response," which writes overlong responses to clients which do not know > about the {{readOnly}} flag. > (One example of such a client is ZooKeeper's own C client library prior to > version 3.5!) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4814: --- Description: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started happening with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. (One example of such a client is ZooKeeper's own C client library prior to version 3.5!) was: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. (One example of such a client is ZooKeeper's own C client library prior to version 3.5!) > Protocol desynchronization after Connect for (some) old clients > --- > > Key: ZOOKEEPER-4814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.9.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > > Some old clients experience a protocol synchronization after receiving a > {{ConnectResponse}} from the server. > This started happening with ZOOKEEPER-4492, "Merge readOnly field into > ConnectRequest and Response," which writes overlong responses to clients > which do not know about the {{readOnly}} flag. > (One example of such a client is ZooKeeper's own C client library prior to > version 3.5!) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4814: --- Description: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started happening with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. (One example of such a client is ZooKeeper's own C client library prior to version 3.5!) was: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started happening with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. (One example of such a client is ZooKeeper's own C client library prior to version 3.5!) > Protocol desynchronization after Connect for (some) old clients > --- > > Key: ZOOKEEPER-4814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.9.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > > Some old clients experience a protocol synchronization after receiving a > {{ConnectResponse}} from the server. > This started happening with ZOOKEEPER-4492, "Merge readOnly field into > ConnectRequest and Response," which writes overlong responses to clients > which do not know about the {{readOnly}} flag. > (One example of such a client is ZooKeeper's own C client library prior to > version 3.5!) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4814: --- Description: Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and Response," which writes overlong responses to clients which do not know about the {{readOnly}} flag. was:Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. > Protocol desynchronization after Connect for (some) old clients > --- > > Key: ZOOKEEPER-4814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.9.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > > Some old clients experience a protocol synchronization after receiving a > {{ConnectResponse}} from the server. > This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest > and Response," which writes overlong responses to clients which do not know > about the {{readOnly}} flag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients
Damien Diederen created ZOOKEEPER-4814: -- Summary: Protocol desynchronization after Connect for (some) old clients Key: ZOOKEEPER-4814 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.9.0 Reporter: Damien Diederen Assignee: Damien Diederen Some old clients experience a protocol synchronization after receiving a {{ConnectResponse}} from the server. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4799) Refactor ACL check in addWatch command
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4799: --- Fix Version/s: 3.7.3 > Refactor ACL check in addWatch command > -- > > Key: ZOOKEEPER-4799 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Fix For: 3.7.3, 3.8.4, 3.9.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4787) Failed to establish connection between zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4787: --- Fix Version/s: (was: 3.10.0) (was: 3.7.3) (was: 3.8.4) > Failed to establish connection between zookeeper > > > Key: ZOOKEEPER-4787 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4787 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.7.2, 3.8.3, 3.9.1 > Environment: z/OS >Reporter: softrock >Priority: Blocker > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > *Problem:* > When run zookeepers version 3.8.3 on z/OS platform,they cannot establish the > connection > Error: > [2024-01-17 23:06:44,194] INFO Received connection request from > /xx.xx.xx.xx:23840 (org.apache.zookeeper.server.quorum.QuorumCnxManager) > [2024-01-17 23:06:44,197] ERROR Initial message parsing error! > (org.apache.zookeeper.server.quorum.QuorumCnxManager) > > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException: > Badly formed address: K???K???K???z > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage.parse(QuorumCnxManager.java:271) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:607) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:555) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1085) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1039) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522) > at java.util.concurrent.FutureTask.run(FutureTask.java:277) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.lang.Thread.run(Thread.java:825) > *Root cause:* > The receiver cannot resolve the address from the sender requesting a > connection. This is because the sender sends the address in UTF-8 encoding, > but the receiver parses the address in IBM-1047 encoding (the default). > *Resolution:* > Both receiver and sender sides use UTF-8 encoding > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4799) Refactor ACL check in addWatch command
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4799: --- Fix Version/s: 3.8.4 > Refactor ACL check in addWatch command > -- > > Key: ZOOKEEPER-4799 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Fix For: 3.8.4, 3.9.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4799) Refactor ACL check in addWatch command
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4799: --- Fix Version/s: 3.9.2 > Refactor ACL check in addWatch command > -- > > Key: ZOOKEEPER-4799 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Fix For: 3.9.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4785) Txn loss due to race condition in Learner.syncWithLeader() during DIFF sync
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4785: --- Fix Version/s: 3.9.2 3.10 > Txn loss due to race condition in Learner.syncWithLeader() during DIFF sync > --- > > Key: ZOOKEEPER-4785 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4785 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.8.0, 3.7.1, 3.8.1, 3.7.2, 3.8.2, 3.9.1 >Reporter: Li Wang >Assignee: Li Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.9.2, 3.10 > > Time Spent: 50m > Remaining Estimate: 0h > > We had txn loss incident in production recently. After investigation, we > found it was caused by the race condition of follower writing the current > epoch and sending the ACK_LD before successfully persisting all the txns from > DIFF sync in Learner.syncWithLeader() method. > {code:java} > case Leader.NEWLEADER: > ... > self.setCurrentEpoch(newEpoch); > writeToTxnLog = true; > //Anything after this needs to go to the transaction log, not applied > directly in memory > isPreZAB1_0 = false; > // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit > them (ACK NEWLEADER). > sock.setSoTimeout(self.tickTime * self.syncLimit); > self.setSyncMode(QuorumPeer.SyncMode.NONE); > zk.startupWithoutServing(); > if (zk instanceof FollowerZooKeeperServer) { > FollowerZooKeeperServer fzk = (FollowerZooKeeperServer) zk; > for (PacketInFlight p : packetsNotCommitted) { > fzk.logRequest(p.hdr, p.rec, p.digest); > } > packetsNotCommitted.clear(); > } > writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), > true); > break; > } > {code} > In this method, when follower receives the NEWLEADER msg, the current epoch > is updated before writing the uncommitted txns to the disk and writing txns > is done asynchronously by the SyncThreadd. If follower crashes after setting > the current epoch and sending ACK_LD and before all transactions are > successfully written to disk, transactions loss can happen. > This is because leader election is based on epoch first and then transaction > id. When the follower becomes a leader because it has highest epoch, it will > ask the other followers to truncate txns even they have been written to disk, > causing data loss. > The following is the scenario > 1. Leader election happened > 2. A follower synced with Leader via DIFF, received committed proposals from > leader and kept them in memory > 3. The follower received the NEWLEADER message > 4. The follower updated the newEpoch > 5. The follower was bounced before writing all the uncommitted txns to disk > 6. Leader shutdown and a new election triggered > 7. Follower became the new leader because it has largest currentEpoch > 8. New leader asked other followers to truncate their committed txns and > transactions got lost -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4730) Incorrect datadir and logdir size reported from admin and 4lw dirs command
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4730: --- Fix Version/s: 3.9.2 > Incorrect datadir and logdir size reported from admin and 4lw dirs command > --- > > Key: ZOOKEEPER-4730 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4730 > Project: ZooKeeper > Issue Type: Bug >Reporter: Li Wang >Assignee: Li Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.9.2, 3.10 > > Time Spent: 2h > Remaining Estimate: 0h > > Output from the dirs admin command > { > "datadir_size" : 134217760, > "logdir_size" : 933, > "command" : "dirs", > "error" : null > } > Output from dirs 4lw command: > datadir_size: 134217760 > logdir_size: 933 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4799) Refactor ACL check in addWatch command
Damien Diederen created ZOOKEEPER-4799: -- Summary: Refactor ACL check in addWatch command Key: ZOOKEEPER-4799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799 Project: ZooKeeper Issue Type: Improvement Reporter: Damien Diederen Assignee: Damien Diederen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4764) Tune the log of refuse session request.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4764: --- Fix Version/s: 3.10.0 (was: 3.9.2) > Tune the log of refuse session request. > --- > > Key: ZOOKEEPER-4764 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4764 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.7.2, 3.8.3, 3.9.1 >Reporter: Yan Zhao >Priority: Trivial > Labels: pull-request-available > Fix For: 3.10.0, 3.7.3, 3.8.4 > > Time Spent: 10m > Remaining Estimate: 0h > > The log: > Refusing session request for client as it has seen zxid our last zxid is 0x0 > client must try another server (org.apache.zookeeper.server.ZooKeeperServer) > We would better print the sessionId in the content. > After improvement: > Refusing session(0xab) request for client as it has seen zxid our last zxid > is 0x0 client must try another server > (org.apache.zookeeper.server.ZooKeeperServer) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4760) Add support for filename to get and set cli commands
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4760: --- Fix Version/s: (was: 3.9.2) > Add support for filename to get and set cli commands > > > Key: ZOOKEEPER-4760 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4760 > Project: ZooKeeper > Issue Type: Improvement > Components: tools >Reporter: Soumitra Kumar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.10.0 > > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > CLI supports get and set commands to read and write data. Add support for: > # reading input data for set command from a file, and > # writing output data in get command to a file > This will help in dealing with arbitrary byte arrays and also scripting > read/write to large number of znodes using CLI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (ZOOKEEPER-4787) Failed to establish connection between zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4787: --- Fix Version/s: 3.10.0 (was: 3.9.2) > Failed to establish connection between zookeeper > > > Key: ZOOKEEPER-4787 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4787 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.7.2, 3.8.3, 3.9.1 > Environment: z/OS >Reporter: softrock >Priority: Blocker > Labels: pull-request-available > Fix For: 3.10.0, 3.7.3, 3.8.4 > > Time Spent: 1h > Remaining Estimate: 0h > > *Problem:* > When run zookeepers version 3.8.3 on z/OS platform,they cannot establish the > connection > Error: > [2024-01-17 23:06:44,194] INFO Received connection request from > /xx.xx.xx.xx:23840 (org.apache.zookeeper.server.quorum.QuorumCnxManager) > [2024-01-17 23:06:44,197] ERROR Initial message parsing error! > (org.apache.zookeeper.server.quorum.QuorumCnxManager) > > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException: > Badly formed address: K???K???K???z > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage.parse(QuorumCnxManager.java:271) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:607) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:555) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1085) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1039) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522) > at java.util.concurrent.FutureTask.run(FutureTask.java:277) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.lang.Thread.run(Thread.java:825) > *Root cause:* > The receiver cannot resolve the address from the sender requesting a > connection. This is because the sender sends the address in UTF-8 encoding, > but the receiver parses the address in IBM-1047 encoding (the default). > *Resolution:* > Both receiver and sender sides use UTF-8 encoding > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ZOOKEEPER-4753) Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779790#comment-17779790 ] Damien Diederen commented on ZOOKEEPER-4753: Hi [~xiaotong.wang], {quote}we need verify the server host when we use SASL/Kerberos {quote} Yes. (I also have additional improvements queued regarding this topic, but the changes you mention were in fact preliminary to fixing [https://zookeeper.apache.org/security.html#CVE-2023-44981]. The other changes were not included as not strictly part of the security fix.) {quote}it's better to verify if current authentication is Kerberos or not, but now we check it with isDigestAuthn and use entry.getLoginModuleName().equals(DigestLoginModule.class.getName()) {quote} Yes; this is unfortunate. Would you know of a better method to detect the SASL mechanism in use? What we really want here is to conditionalize on {{DIGEST-MD5}} or {{{}GSSAPI{}}}. {quote}we rewrite DigestLoginModule to make sure user paasword are storage with encrypted our new DigestLoginModule required user{~}hd{~}=encode("testpwd") it will incompatible when we upgrade {quote} Indeed. (I was afraid I would hear about something like that… and there we are :) Is your custom digest module a subclass of the ZooKeeper one, or an unrelated object? {quote}Is there a better way to fix this issue {quote} As mentioned above: I would love it if we could just look up whether {{DIGEST-MD5}} or {{GSSAPI}} is in use. Ideas welcome! In any case, I will keep your case into account when submitting the updated patch—worst case, you will have to explicitly disable the principal check. In the meantime, you are not affected by CVE-2023-44981 if using DIGEST-MD5. HTH, -D > Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth > > > Key: ZOOKEEPER-4753 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4753 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.9.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Fix For: 3.7.2, 3.8.3, 3.9.1 > > > The SASL-based quorum authorizer does not explicitly distinguish between the > DIGEST-MD5 and GSSAPI mechanisms: it is simply relying on {{NameCallback}} > and {{PasswordCallback}} for authentication with the former and examining > Kerberos principals in {{AuthorizeCallback}} for the latter. > It turns out that some SASL/DIGEST-MD5 configurations cause authentication > and authorization IDs not to match the expected format, and the > DIGEST-MD5-based portions of the quorum test suite to fail with obscure > errors. (They can be traced to failures to join the quorum, but only by > looking into detailed logs.) > We can use the login module name to determine whether DIGEST-MD5 or GSSAPI is > used, and relax the authentication ID check for the former. As a cleanup, we > can keep the password-based credential map empty when Kerberos principals are > expected. Finally, we can adapt tests to ensure "weirdly-shaped" credentials > only cause authentication failures in the GSSAPI case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ZOOKEEPER-4755) Handle Netty CVE-2023-4586
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4755. Fix Version/s: 3.7.2 3.9.1 3.8.3 Resolution: Fixed Issue resolved by pull request 2075 [https://github.com/apache/zookeeper/pull/2075] > Handle Netty CVE-2023-4586 > -- > > Key: ZOOKEEPER-4755 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4755 > Project: ZooKeeper > Issue Type: Task >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Fix For: 3.7.2, 3.9.1, 3.8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > The {{dependency-check:check}}... check currently fails with the following: > {noformat} > [ERROR] netty-handler-4.1.94.Final.jar: CVE-2023-4586(6.5) > {noformat} > According to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-4586 , > CVE-2023-4586 is reserved. No fix or additional information is available as > of the creation of this ticket. > We have to: > # Temporarily suppress the check; > # Monitor CVE-2023-4586 and apply the remediation as soon as it becomes > available. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ZOOKEEPER-4755) Handle Netty CVE-2023-4586
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771547#comment-17771547 ] Damien Diederen commented on ZOOKEEPER-4755: Relevant discussion and pointers: [https://github.com/jeremylong/DependencyCheck/issues/5912#issuecomment-1699387994] > Handle Netty CVE-2023-4586 > -- > > Key: ZOOKEEPER-4755 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4755 > Project: ZooKeeper > Issue Type: Task >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > > The {{dependency-check:check}}... check currently fails with the following: > {noformat} > [ERROR] netty-handler-4.1.94.Final.jar: CVE-2023-4586(6.5) > {noformat} > According to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-4586 , > CVE-2023-4586 is reserved. No fix or additional information is available as > of the creation of this ticket. > We have to: > # Temporarily suppress the check; > # Monitor CVE-2023-4586 and apply the remediation as soon as it becomes > available. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4755) Handle Netty CVE-2023-4586
Damien Diederen created ZOOKEEPER-4755: -- Summary: Handle Netty CVE-2023-4586 Key: ZOOKEEPER-4755 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4755 Project: ZooKeeper Issue Type: Task Reporter: Damien Diederen Assignee: Damien Diederen The {{dependency-check:check}}... check currently fails with the following: {noformat} [ERROR] netty-handler-4.1.94.Final.jar: CVE-2023-4586(6.5) {noformat} According to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-4586 , CVE-2023-4586 is reserved. No fix or additional information is available as of the creation of this ticket. We have to: # Temporarily suppress the check; # Monitor CVE-2023-4586 and apply the remediation as soon as it becomes available. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ZOOKEEPER-4754) Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4754. Fix Version/s: 3.7.2 3.9.1 3.8.3 Resolution: Fixed Issue resolved by pull request 2074 [https://github.com/apache/zookeeper/pull/2074] > Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900 > > > Key: ZOOKEEPER-4754 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4754 > Project: ZooKeeper > Issue Type: Task >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Fix For: 3.7.2, 3.9.1, 3.8.3 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4754) Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900
Damien Diederen created ZOOKEEPER-4754: -- Summary: Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900 Key: ZOOKEEPER-4754 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4754 Project: ZooKeeper Issue Type: Task Reporter: Damien Diederen Assignee: Damien Diederen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ZOOKEEPER-4751) Update snappy-java to 1.1.10.5 to address CVE-2023-43642
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4751. Fix Version/s: 3.7.2 3.8.3 3.9.1 Assignee: Damien Diederen Resolution: Fixed > Update snappy-java to 1.1.10.5 to address CVE-2023-43642 > > > Key: ZOOKEEPER-4751 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4751 > Project: ZooKeeper > Issue Type: Task >Reporter: Lari Hotari >Assignee: Damien Diederen >Priority: Minor > Labels: pull-request-available > Fix For: 3.7.2, 3.8.3, 3.9.1 > > Time Spent: 20m > Remaining Estimate: 0h > > snappy-java 1.1.10.1 contains CVE-2023-43642 . Upgrade the dependency to > 1.1.10.5 to get rid of the CVE. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4753) Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth
Damien Diederen created ZOOKEEPER-4753: -- Summary: Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth Key: ZOOKEEPER-4753 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4753 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.9.0 Reporter: Damien Diederen Assignee: Damien Diederen The SASL-based quorum authorizer does not explicitly distinguish between the DIGEST-MD5 and GSSAPI mechanisms: it is simply relying on {{NameCallback}} and {{PasswordCallback}} for authentication with the former and examining Kerberos principals in {{AuthorizeCallback}} for the latter. It turns out that some SASL/DIGEST-MD5 configurations cause authentication and authorization IDs not to match the expected format, and the DIGEST-MD5-based portions of the quorum test suite to fail with obscure errors. (They can be traced to failures to join the quorum, but only by looking into detailed logs.) We can use the login module name to determine whether DIGEST-MD5 or GSSAPI is used, and relax the authentication ID check for the former. As a cleanup, we can keep the password-based credential map empty when Kerberos principals are expected. Finally, we can adapt tests to ensure "weirdly-shaped" credentials only cause authentication failures in the GSSAPI case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ZOOKEEPER-4689) Node may not accessible due the the inconsistent ACL reference map after SNAP sync (again)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758105#comment-17758105 ] Damien Diederen commented on ZOOKEEPER-4689: Hi [~adamyi], This is indeed a very critical issue, as the corruption can spread from member to member\! I initially preferred solution 2 from the ticket description—the one which was tentatively implemented in https://github.com/apache/zookeeper/pull/1997—but given the difficulties encountered, and [~kezhuw]’s suggestion of never removing the ACL {{aclIndex}} is pointing to, I am also reconsidering. Are we missing something? We would also like to add some kind of \(optional) "fsck" pass which sanity checks the tree before the service starts—to prevent this and other kinds of corruption from spreading—but that can be implemented in a followup ticket. > Node may not accessible due the the inconsistent ACL reference map after SNAP > sync (again) > -- > > Key: ZOOKEEPER-4689 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4689 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.0, 3.7.0, 3.8.0 >Reporter: Adam Yi >Priority: Critical > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In Zookeeper, we do a "fuzzy snapshot". It means that we don't make a copy of > the DataTree or grab a lock or anything when serializing a DataTree. Instead, > we note down the zxid when we start serializing DataTree. We serialize the > DataTree while it's getting mutated and replay the \{transactions after > starting to take snapshot} after deserializing the DataTree. The idea is that > those transactions should be idempotent. > Zookeeper also implements its own interned ACL. It keeps a [long -> ACL] map > and store the `long` in each node as nodes tend to share the same ACL. > When serializing DataTree, we first serialize the ACL cache and then > serialize the nodes. It's possible that with the following sequence, a node > points to an invalid ACL entry: > 1. Serialize ACL > 2. Create node with new ACL > 3. Serialize node > ZOOKEEPER-3306 fixes this by making sure to insert the ACL to cache upon > calling `DataTree.createNode` when replaying transactions and when the node > already exists. However, we only insert it to the cache, but do not set the > interned ACL in the node to point to the new entry. > It's possible that the longval we get for the ACL is inconsistent, even > though we follow the same zxid ordering of events. Specifically, we keep a > [aclIndex] pointing to the max entry that currently exists and increment that > whenever we need to intern a new ACL we've never seen before. > With ZOOKEEPER-2214, we started to do reference counting in ACL cache and > remove no-longer used entries from the cache. > Say the current aclIndex is 10. If we create a node with ACL unseen before > and delete that node, aclIndex will increment to 11. However, when we > deserialize the tree, we'll set aclIndex to the max existent ACL cache entry, > so it's reverted back to 10. aclIndex inconsistency on its own is fine but it > causes problem to the ZOOKEEPER-3306 patch. > Now if we follow the same scenario mentioned in ZOOKEEPER-3306: > # Leader creates ACL entry 11 and delete it due to node deletion > # Server A starts to have snap sync with leader > # After serializing the ACL map to Server A, there is a txn T1 to create a > node N1 with new ACL_1 which was not exist in ACL map > # On leader, after this txn, the ACL map will be 12 -> (ACL_1, COUNT: 1), > and data tree N1 -> 12 > # On server A, it will be ACL map with max ID 10, and N1 -> 12 in fuzzy > snapshot > # When replaying the txn T1, it will add 11 -> (ACL_1, COUNT: 1) to the ACL > cache but the node N1 still points to 12. > N1 still points to invalid ACL entry. > There are two ways to fix this: > # Make aclIndex consistent upon re-deserialization (by either serializing it > in snapshot or paying special attention to decrement it when removing cache) > # Fix ZOOKEEPER-3306 patch so that we also override the ACL of node to new > key if previous entry does not exist in the ACL table. > > I think solution 2 is nicer as aclIndex inconsistency itself is not a > problem. With solution 1, we're still implicitly depending on aclIndex > consistency and ordering of events. It's harder to reason about and it seems > more fragile than solution 1. > I'm going to send a patch for solution 2 but please let me know if you > disagree and I'm happy to go with solution 1 instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4725) TTL node creations do not appear in audit log
Damien Diederen created ZOOKEEPER-4725: -- Summary: TTL node creations do not appear in audit log Key: ZOOKEEPER-4725 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4725 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.8.2 Reporter: Damien Diederen Assignee: Damien Diederen {{AuditHelper.addAuditLog}} ignores the {{createTTL}} opcode outside of `multi` transactions, resulting in missing audit log entries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (ZOOKEEPER-4026) CREATE2 requests embeded in a MULTI request only get a regular CREATE response
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4026. Fix Version/s: 3.7.2 3.9.0 3.8.2 Resolution: Fixed Issue resolved by pull request 1978 [https://github.com/apache/zookeeper/pull/1978] > CREATE2 requests embeded in a MULTI request only get a regular CREATE response > -- > > Key: ZOOKEEPER-4026 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4026 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.8, 3.6.2 > Environment: Tested with official docker hub images of the server and > a python Zookeeper client (Kazoo, http://github.com/python-zk/kazoo) >Reporter: Charles-Henri de Boysson >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Fix For: 3.7.2, 3.9.0, 3.8.2 > > Attachments: MULTI_CREATE2_bug.txt > > Time Spent: 6h > Remaining Estimate: 0h > > When making a MULTI request with a CREATE2 payload, the reply from the server > only contains a regular CREATE response (the path but without the stat data). > > See attachment for a capture and decode of the request/reply. > > How to reproduce: > * Connect to the ensemble > * Make a MULTI (OpCode 14) request with a CREATE2 operation (OpCode 15) > * Reply from server is success, znode is create, but the MULTI reply > contains a CREATE (OpCode 1) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ZOOKEEPER-4614) Network event can cause C Client to "forget" to SASL-authenticate
Damien Diederen created ZOOKEEPER-4614: -- Summary: Network event can cause C Client to "forget" to SASL-authenticate Key: ZOOKEEPER-4614 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4614 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.7.1, 3.8.0, 3.7.0 Reporter: Damien Diederen Assignee: Damien Diederen A network hiccup occurring during the very last step of a SASL authentication sequence can cause the C client to "forget" to authenticate the next connection because an internal flag is not properly reset. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498641#comment-17498641 ] Damien Diederen commented on ZOOKEEPER-4337: Hi [~danielma-2020], No idea—but 3.8.0 might get there first. See this recent discussion on the {{-dev}} mailing list: [https://lists.apache.org/thread/80kjmk6kvp51k99nwvswdzcg5w1wr1jk] (I am afraid I cannot volunteer at this point, as I am already overloaded by other obligations.) HTH, -D > CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0 > --- > > Key: ZOOKEEPER-4337 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337 > Project: ZooKeeper > Issue Type: Bug > Components: security >Affects Versions: 3.7.0, 3.8.0 >Reporter: Dominique Mongelli >Assignee: Damien Diederen >Priority: Major > Labels: cve, pull-request-available, security > Fix For: 3.5.10, 3.8.0, 3.7.1, 3.6.4 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Hi, our security tool detects the following CVE on zookeeper 3.7.0 : > [https://nvd.nist.gov/vuln/detail/CVE-2021-34429] > > > {noformat} > For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs > can be crafted using some encoded characters to access the content of the > WEB-INF directory and/or bypass some security constraints. This is a > variation of the vulnerability reported in > CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat} > > It is a vulnerability related to jetty jar in version > {{9.4.38.v20210224.jar}}. > Here is the security advisory from jetty: > https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm > The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should > be done. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (ZOOKEEPER-4479) Tests: C client test TestOperations.cc testTimeoutCausedByWatches1 is very flaky on CI
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen reassigned ZOOKEEPER-4479: -- Assignee: Damien Diederen > Tests: C client test TestOperations.cc testTimeoutCausedByWatches1 is very > flaky on CI > -- > > Key: ZOOKEEPER-4479 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4479 > Project: ZooKeeper > Issue Type: Task > Components: c client, tests >Reporter: Enrico Olivelli >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This test is very annoying on CI. > it is not using the real Java server and it fails very often > [exec] > /home/runner/work/zookeeper/zookeeper/zookeeper-client/zookeeper-client-c/tests/TestOperations.cc:296: > Assertion: equality assertion failed [Expected: 1, Actual : 0] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446060#comment-17446060 ] Damien Diederen commented on ZOOKEEPER-4306: Hi [~zyu], It's a nasty issue, and we've been using the {{closeSessionTxn.enabled = false}} workaround in the meantime. You have probably seen [my pull request|https://github.com/apache/zookeeper/pull/1716] regarding a possible "solution." I should really open a thread on the {{dev}} mailing list and discuss it there one of these days, because somebody might have a better idea. (Feel free to beat me to it :) It is still on my TODO list, but unfortunately not too close to the top. Cheers, -D > CloseSessionTxn contains too many ephemal nodes cause cluster crash > --- > > Key: ZOOKEEPER-4306 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Lin Changrui >Priority: Critical > Labels: pull-request-available > Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg > > Time Spent: 0.5h > Remaining Estimate: 0h > > We took a test about how many ephemal nodes can client create under one > parent node with defalut configuration. The test caused cluster crash at > last, exception stack trace like this. > follower: > !f.jpg! > leader: > !l1.png! > !l2.jpg! > It seems that leader sent a too large txn packet to followers. When follower > try to deserialize the txn, it found the txn length out of its buffer > size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes > followers crashed, and then, leader found there was no sufficient followers > synced, so leader shutdown later. When leader shutdown, it called > zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of > its buffer size, so it crashed too. > After the servers crashed, they try to restart the quorum. But they would not > success because the last txn is too large. We lose the log at that moment, > but the stack trace is same as this one. > !r.jpg|width=1468,height=598! > > *Root Cause* > We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) > visualize this log and found this. !cs.jpg|width=1400,height=581! So > closeSessionTxn contains all ephemal nodes with absolute path. We know we > will get a large getChildren respose if we create too many children nodes > under one parent node, that is limited by jute.maxbuffer of client. If we > create plenty of ephemal nodes under different parent nodes with one session, > it may not cause out of buffer of client, but when the session close without > delete these node first, it probably cause cluster crash. > Is it a bug or just a unspecified feature?If it just so, how should we judge > the upper limit of creating nodes? > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ZOOKEEPER-4397) Zookeeper crashes: Unable to load database on disk java.io.IOException: Unreasonable length
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432217#comment-17432217 ] Damien Diederen commented on ZOOKEEPER-4397: Hi [~ryan.ren], Do you have clients which create large numbers (i.e., are leaking) ephemeral nodes? If so, this may be a duplicate of ZOOKEEPER-4306. HTH, -D > Zookeeper crashes: Unable to load database on disk java.io.IOException: > Unreasonable length > --- > > Key: ZOOKEEPER-4397 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4397 > Project: ZooKeeper > Issue Type: Bug > Components: jute >Affects Versions: 3.6.3 > Environment: Linux > OpenJDK 1.8 >Reporter: Ryan >Priority: Major > Attachments: Error_snapshot.jpg > > > After running for a while, the entire cluster (3 zookeeper) crash suddenly > ERROR-[main:QuorumPeer@1148]- Unable to load database on disk > java.io.IOException: Unreasonable length = 3015236 > at > org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166) > > at.org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4382) Update Maven Bundle Plugin in order to allow builds on JDK18
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4382. Fix Version/s: 3.6.4 3.7.1 Resolution: Fixed Issue resolved by pull request 1760 [https://github.com/apache/zookeeper/pull/1760] > Update Maven Bundle Plugin in order to allow builds on JDK18 > > > Key: ZOOKEEPER-4382 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4382 > Project: ZooKeeper > Issue Type: Improvement > Components: build >Affects Versions: 3.8.0 >Reporter: Enrico Olivelli >Assignee: Enrico Olivelli >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1, 3.6.4 > > Time Spent: 20m > Remaining Estimate: 0h > > On JDK8 zookeeper build fails with a ConcurrentModificationException. > The fix is to update the plugin to the latest version > [ERROR] Failed to execute goal > org.apache.felix:maven-bundle-plugin:4.1.0:bundle (build bundle) on project > zookeeper-jute: Execution build bundle of goal > org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed.: > ConcurrentModificationException -> [Help > 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.felix:maven-bundle-plugin:4.1.0:bundle (build bundle) on > project zookeeper-jute: Execution build bundle of goal > org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed. > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:215) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:156) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:148) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:117) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:81) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:56) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:128) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:193) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at jdk.internal.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:77) > at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:568) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) > Caused by: org.apache.maven.plugin.PluginExecutionException: Execution build > bundle of goal org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed. > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:148) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:210) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:156) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4380) Avoid NPE in RateLogger#rateLimitLog
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4380. Fix Version/s: 3.7.1 3.8.0 Resolution: Fixed Issue resolved by pull request 1758 [https://github.com/apache/zookeeper/pull/1758] > Avoid NPE in RateLogger#rateLimitLog > > > Key: ZOOKEEPER-4380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4380 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Wenjun Ruan >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The newMsg may cause NPE > {code:java} > /** > * In addition to the message, it also takes a value. > */ > public void rateLimitLog(String newMsg, String value) { > long now = Time.currentElapsedTime(); > if (newMsg.equals(msg)) { > ++count; > this.value = value; > if (now - timestamp >= LOG_INTERVAL) { > flush(); > msg = newMsg; > timestamp = now; > this.value = value; > } > } else { > flush(); > msg = newMsg; > this.value = value; > timestamp = now; > LOG.warn("Message:{} Value:{}", msg, value); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4360) Avoid NPE during metrics execution if the leader is not set on a FOLLOWER node
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4360. Fix Version/s: 3.8.0 Resolution: Fixed Issue resolved by pull request 1743 [https://github.com/apache/zookeeper/pull/1743] > Avoid NPE during metrics execution if the leader is not set on a FOLLOWER > node > --- > > Key: ZOOKEEPER-4360 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4360 > Project: ZooKeeper > Issue Type: Bug > Components: metric system >Affects Versions: 3.6.2 >Reporter: Nicoló Boschi >Priority: Major > Labels: metrics, pull-request-available > Fix For: 3.8.0, 3.7.1, 3.6.4 > > Time Spent: 20m > Remaining Estimate: 0h > > On a follower node, we had this error > {code} > ago 20, 2021 1:46:28 PM org.apache.catalina.core.StandardWrapperValve invoke > GRAVE: Servlet.service() for servlet [metrics] in context with path > [/metrics] threw exception > java.lang.NullPointerException: Cannot invoke > "org.apache.zookeeper.server.quorum.Leader.getProposalStats()" because the > return value of > "org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.getLeader()" is null > at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.lambda$registerMetrics$5(LeaderZooKeeperServer.java:122) > at > magnews.zookeeper.ZooKeeperMetricsProviderAdapter$MetricsContextImpl.lambda$registerGauge$0(ZooKeeperMetricsProviderAdapter.java:91) > {code} > Unfortunately, I'm not able to reproduce this error deterministically > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4343) OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4343. Fix Version/s: 3.8.0 Resolution: Fixed Issue resolved by pull request 1735 [https://github.com/apache/zookeeper/pull/1735] > OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6 > > > Key: ZOOKEEPER-4343 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4343 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.8.0 >Reporter: Damien Diederen >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > {noformat} > [ERROR] One or more dependencies were identified with vulnerabilities that > have a CVSS score greater than or equal to '0,0': > [ERROR] > [ERROR] commons-io-2.6.jar: CVE-2021-29425 > [ERROR] > [ERROR] See the dependency-check report for more details. > {noformat} > The issue is fixed in release 2.7: > > - https://nvd.nist.gov/vuln/detail/CVE-2021-29425 > - https://issues.apache.org/jira/browse/IO-556 > - https://issues.apache.org/jira/browse/IO-559 > - https://commons.apache.org/proper/commons-io/changes-report.html#a2.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4337. Fix Version/s: 3.7.1 3.8.0 Resolution: Fixed Issue resolved by pull request 1734 [https://github.com/apache/zookeeper/pull/1734] > CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0 > --- > > Key: ZOOKEEPER-4337 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.7.0, 3.8.0 >Reporter: Dominique Mongelli >Assignee: Damien Diederen >Priority: Major > Labels: cve, pull-request-available, security > Fix For: 3.8.0, 3.7.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Hi, our security tool detects the following CVE on zookeeper 3.7.0 : > [https://nvd.nist.gov/vuln/detail/CVE-2021-34429] > > > {noformat} > For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs > can be crafted using some encoded characters to access the content of the > WEB-INF directory and/or bypass some security constraints. This is a > variation of the vulnerability reported in > CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat} > > It is a vulnerability related to jetty jar in version > {{9.4.38.v20210224.jar}}. > Here is the security advisory from jetty: > https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm > The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should > be done. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-3807) fix the bad format when website pages build due to bash marker
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-3807. Fix Version/s: 3.7.1 3.8.0 Resolution: Duplicate Closing this (valiant) attempt in favor of ZOOKEEPER-4356 and [#1741|https://github.com/apache/zookeeper/pull/1741#pullrequestreview-742595992], as acknowledged by [~maoling]. > fix the bad format when website pages build due to bash marker > -- > > Key: ZOOKEEPER-3807 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3807 > Project: ZooKeeper > Issue Type: Improvement > Components: documentation >Affects Versions: 3.7.0, 3.6.1 >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4274) Flaky test - RequestThrottlerTest.testLargeRequestThrottling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4274. Assignee: Damien Diederen Resolution: Duplicate > Flaky test - RequestThrottlerTest.testLargeRequestThrottling > > > Key: ZOOKEEPER-4274 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4274 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.2 >Reporter: Amichai Rothman >Assignee: Damien Diederen >Priority: Minor > > This test occasionally fails. e.g. in > [https://github.com/apache/zookeeper/pull/1672/checks?check_run_id=2265118964]. > A bit hard to recreate, but it pops up again eventually. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4226) Flaky test: RequestThrottlerTest.testLargeRequestThrottling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4226. Assignee: Damien Diederen Resolution: Duplicate > Flaky test: RequestThrottlerTest.testLargeRequestThrottling > --- > > Key: ZOOKEEPER-4226 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4226 > Project: ZooKeeper > Issue Type: Bug > Components: tests >Reporter: Ling Mao >Assignee: Damien Diederen >Priority: Minor > > {code:java} > ERROR] Failures: > 943[ERROR] RequestThrottlerTest.testLargeRequestThrottling:297 expected: > <2> but was: <0> > 944[INFO] > 945[ERROR] Tests run: 2901, Failures: 1, Errors: 0, Skipped: 4 > {code} > URL: > https://github.com/apache/zookeeper/pull/1608/checks?check_run_id=1953408348 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ZOOKEEPER-4327) Flaky test: RequestThrottlerTest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen reassigned ZOOKEEPER-4327: -- Assignee: Damien Diederen > Flaky test: RequestThrottlerTest > > > Key: ZOOKEEPER-4327 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4327 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Ling Mao >Assignee: Damien Diederen >Priority: Major > > URL: > https://github.com/apache/zookeeper/pull/1702/checks?check_run_id=2848599299 > {code:java} > [ERROR] Failures: > 947[ERROR] > RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340 > expected: <3> but was: <4> > 948[INFO] > 949[ERROR] Tests run: 2913, Failures: 1, Errors: 0, Skipped: 4 > {code} > === > > URL: > [https://github.com/apache/zookeeper/pull/1709/checks?check_run_id=2884777341] > {code:java} > [INFO] > 948[INFO] Results: > 949[INFO] > 950[ERROR] Failures: > 951[ERROR] > RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340 > expected: <3> but was: <7> > 952[ERROR] RequestThrottlerTest.testLargeRequestThrottling:299 expected: > <5> but was: <4> > 953[INFO] > 954[ERROR] Tests run: 2913, Failures: 2, Errors: 0, Skipped: 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4327) Flaky test: RequestThrottlerTest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4327: --- Parent: ZOOKEEPER-3170 Issue Type: Sub-task (was: Bug) > Flaky test: RequestThrottlerTest > > > Key: ZOOKEEPER-4327 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4327 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Ling Mao >Priority: Major > > URL: > https://github.com/apache/zookeeper/pull/1702/checks?check_run_id=2848599299 > {code:java} > [ERROR] Failures: > 947[ERROR] > RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340 > expected: <3> but was: <4> > 948[INFO] > 949[ERROR] Tests run: 2913, Failures: 1, Errors: 0, Skipped: 4 > {code} > === > > URL: > [https://github.com/apache/zookeeper/pull/1709/checks?check_run_id=2884777341] > {code:java} > [INFO] > 948[INFO] Results: > 949[INFO] > 950[ERROR] Failures: > 951[ERROR] > RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340 > expected: <3> but was: <7> > 952[ERROR] RequestThrottlerTest.testLargeRequestThrottling:299 expected: > <5> but was: <4> > 953[INFO] > 954[ERROR] Tests run: 2913, Failures: 2, Errors: 0, Skipped: 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-4343) OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6
Damien Diederen created ZOOKEEPER-4343: -- Summary: OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6 Key: ZOOKEEPER-4343 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4343 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.8.0 Reporter: Damien Diederen Assignee: Damien Diederen {noformat} [ERROR] One or more dependencies were identified with vulnerabilities that have a CVSS score greater than or equal to '0,0': [ERROR] [ERROR] commons-io-2.6.jar: CVE-2021-29425 [ERROR] [ERROR] See the dependency-check report for more details. {noformat} The issue is fixed in release 2.7: - https://nvd.nist.gov/vuln/detail/CVE-2021-29425 - https://issues.apache.org/jira/browse/IO-556 - https://issues.apache.org/jira/browse/IO-559 - https://commons.apache.org/proper/commons-io/changes-report.html#a2.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4337: --- Affects Version/s: 3.8.0 > CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0 > --- > > Key: ZOOKEEPER-4337 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.7.0, 3.8.0 >Reporter: Dominique Mongelli >Assignee: Damien Diederen >Priority: Major > Labels: cve, pull-request-available, security > Time Spent: 20m > Remaining Estimate: 0h > > Hi, our security tool detects the following CVE on zookeeper 3.7.0 : > [https://nvd.nist.gov/vuln/detail/CVE-2021-34429] > > > {noformat} > For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs > can be crafted using some encoded characters to access the content of the > WEB-INF directory and/or bypass some security constraints. This is a > variation of the vulnerability reported in > CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat} > > It is a vulnerability related to jetty jar in version > {{9.4.38.v20210224.jar}}. > Here is the security advisory from jetty: > https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm > The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should > be done. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen reassigned ZOOKEEPER-4337: -- Assignee: Damien Diederen > CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0 > --- > > Key: ZOOKEEPER-4337 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Dominique Mongelli >Assignee: Damien Diederen >Priority: Major > Labels: cve, security > > Hi, our security tool detects the following CVE on zookeeper 3.7.0 : > [https://nvd.nist.gov/vuln/detail/CVE-2021-34429] > > > {noformat} > For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs > can be crafted using some encoded characters to access the content of the > WEB-INF directory and/or bypass some security constraints. This is a > variation of the vulnerability reported in > CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat} > > It is a vulnerability related to jetty jar in version > {{9.4.38.v20210224.jar}}. > Here is the security advisory from jetty: > https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm > The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should > be done. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4341) Gia Lai - An Overview
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4341. Resolution: Invalid > Gia Lai - An Overview > - > > Key: ZOOKEEPER-4341 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4341 > Project: ZooKeeper > Issue Type: Test > Components: jute >Affects Versions: 3.5.3 > Environment: [Gia Lai|https://digialai.com] has a tropical monsoon > climate with two seasons: rainy season from May to November and dry season > from December to April of the following year. The average yearly temperature > ranges from 21oC to 25oC. The average annual rainfall in the western Truong > Son region is 2,200-2,500mm, whereas in the eastern Truong Son region it is > 1,200-1,750mm. > *Tourism Development Possibility* > Gia Lai is the starting point for many of the coastal domain's and Cambodia's > river systems, such as the River Ba, the River Se San, and other streams. Gia > Lai area contains many lakes, rapids, passes, and primeval woods that create > beautiful and lyrical natural vistas, as well as the strong primal wildness > of the Central Highlands' mountains and forests. It is the Kon Ka Kinh and > Kon Cha Rang rainforest, which is home to many unique species; the Chu Prong > district's Wild Chong Khoeng waterfall. > In Chu Se district, there lies the lyrical Phu Cuong waterfall. There are > many beautiful rivers such as Da Trang stream, Mo stream, and other > picturesque sites such as "Mong" wharf on the Pa river, Bien Ho (To Nung > lake) on the broad and calm sea - Ham Rong mountain is 1,092m high and the > summit is an extinct volcano. >Reporter: Harmony Kae >Priority: Major > Fix For: 3.2.3 > > Attachments: tin-gia-lai-tin-nhanh-gia-lai.jpg > > Original Estimate: 504h > Remaining Estimate: 504h > > [Gia Lai|https://digialai.com] is a hilly border province in the northwestern > Central Highlands, at an elevation of 600-800 meters above sea level. [Gia > Lai|https://digialai.com] is bordered to the north by Kon Tum province, to > the south by Dak Lak province, to the west by Cambodia with a national > boundary of 90 kilometers, and to the east by Quang Ngai, Binh Dinh, and Phu > Yen provinces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-4342) Robustify C client against errors during SASL negotiation
Damien Diederen created ZOOKEEPER-4342: -- Summary: Robustify C client against errors during SASL negotiation Key: ZOOKEEPER-4342 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4342 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.7.0, 3.8.0 Reporter: Damien Diederen Assignee: Damien Diederen 1. The current client is ignoring the error field of the response header, and only considering SASL-level errors when processing a SASL response. 2. Such errors cause a double-free of the input buffer, which crashes the application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4334) SASL authentication fails when using host aliases
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389792#comment-17389792 ] Damien Diederen commented on ZOOKEEPER-4334: [~ekleszcz] wrote: bq. that won't solve the problem as the change considers only the SASL auth between the quorum members and my case regards the Java client to server auth. Ah, right; I just spotted "the quorum member's saslToken is null," saw that you were using keytabs, assumed this was about quorum auth, and thought I'd mention ZOOKEEPER-4030. bq. I have just discovered the extra flag: {{zookeeper.sasl.client.canonicalize.hostname}}. This means that by default we have to strictly use the canonical names for the principals. What I would like to achieve instead is to define the aliases in the principals. \[…\] Tested and it keeps failing \[…\] Right. As [~eolivelli] mentions, Kerberos implementations tend to be bound to "real" names, as returned by reverse DNS resolution. ZooKeeper \(client-to-server, and now server-to-server) supports referencing members using aliases, but the correct tickets still have to be provided. My understanding is that this is a Kerberos limitation, not a ZooKeeper issue. You are of course welcome to suggest a workaround if you find one, but I would otherwise suggest amending or closing this ticket. > SASL authentication fails when using host aliases > - > > Key: ZOOKEEPER-4334 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.1 >Reporter: Emil Kleszcz >Priority: Critical > > I faced an issue while trying to use alternative aliases with Zookeeper > quorum when SASL is enabled. The errors I get in zookeeper log are the > following: > ``` > 2021-07-12 21:04:46,437 [myid:3] - WARN > [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /:37368 failed to > SASL authenticate: {} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum > failed)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49) > at > org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650) > at > org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599) > at > org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379) > at > org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182) > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339) > at > org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) > at > org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Checksum failed) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) > ... 11 more > Caused by: KrbException: Checksum failed > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94) > at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) > at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281) > at sun.security.krb5.KrbApReq.(KrbApReq.java:149) > at > sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829) > ... 14 more > Caused by: java.security.GeneralSecurityException: Checksum failed > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451) > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272) > at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100) > ... 20 more > ``` > What did I do? > 1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3 > 2) Changed in zoo.cfg: > changed from >
[jira] [Resolved] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4211. Fix Version/s: 3.8.0 Resolution: Fixed Issue resolved by pull request 1644 [https://github.com/apache/zookeeper/pull/1644] > Expose Quota Metrics to Prometheus > -- > > Key: ZOOKEEPER-4211 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211 > Project: ZooKeeper > Issue Type: New Feature > Components: metric system >Affects Versions: 3.7.0, 3.7 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0 > > Time Spent: 11h > Remaining Estimate: 0h > > In 3.7, Quota limit can be enforced and the quota related stats are captured > in the StatsTrack. From the "listquota" CLI command, we can the quota limit > and usage info. > As an addition to that, we would like to collect the quota metrics and expose > them to the Prometheus for the following: > 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard > 2. Creating alert based on the quota levels (e.g. 90% used) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4204) Flaky test - RequestPathMetricsCollectorTest.testMultiThreadPerf
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4204. Fix Version/s: 3.7.1 3.8.0 Resolution: Fixed Issue resolved by pull request 1598 [https://github.com/apache/zookeeper/pull/1598] > Flaky test - RequestPathMetricsCollectorTest.testMultiThreadPerf > > > Key: ZOOKEEPER-4204 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4204 > Project: ZooKeeper > Issue Type: Bug > Components: tests >Affects Versions: 3.8.0 >Reporter: Amichai Rothman >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 50m > Remaining Estimate: 0h > > This test sometimes fails on a laptop. Timed performance tests in unit tests > can be problematic in general due to the variety of hardware it might run on, > but I have a little fix that reduces the test overhead and tightens the > timing, so it's a good first step (and works for me). > > org.opentest4j.AssertionFailedError: expected: but was: > at > org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testMultiThreadPerf(RequestPathMetricsCollectorTest.java:448) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4333) QuorumSSLTest - testOCSP fails on JDK17
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4333. Resolution: Fixed Issue resolved by pull request 1724 [https://github.com/apache/zookeeper/pull/1724] > QuorumSSLTest - testOCSP fails on JDK17 > --- > > Key: ZOOKEEPER-4333 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4333 > Project: ZooKeeper > Issue Type: Test > Components: security, tests >Affects Versions: 3.6.2 >Reporter: Enrico Olivelli >Assignee: Enrico Olivelli >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > On JDK17 (early access) QuorumSSLTest#tetOCSP fails because with JDK17 the > TLS client is sending a OCSP request as GET on the URI. > > Previously the OCSP request was send inside the BODY of the HTTP request. > > In order to fix the test we have to fix our mock HTTP OCSP server (that is > part of the test suite, it is not zookeeper server code) in order to handle > this case. > For reference: > https://it.wikipedia.org/wiki/Online_Certificate_Status_Protocol > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4334) SASL authentication fails when using host aliases
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388732#comment-17388732 ] Damien Diederen commented on ZOOKEEPER-4334: I added (some) support for canonicalization in ZOOKEEPER-4030, but the corresponding commit is only present in 3.7.0+. Furthermore, it has to be activated via {{zookeeper.kerberos.canonicalizeHostNames}}. With that support, it is possible to reference servers via {{CNAME}} aliases, as long as the keytab contains the real names. Would that actually solve your use-case? > SASL authentication fails when using host aliases > - > > Key: ZOOKEEPER-4334 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.1 >Reporter: Emil Kleszcz >Priority: Critical > > I faced an issue while trying to use alternative aliases with Zookeeper > quorum when SASL is enabled. The errors I get in zookeeper log are the > following: > ``` > 2021-07-12 21:04:46,437 [myid:3] - WARN > [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /:37368 failed to > SASL authenticate: {} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum > failed)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49) > at > org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650) > at > org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599) > at > org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379) > at > org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182) > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339) > at > org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) > at > org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Checksum failed) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167) > ... 11 more > Caused by: KrbException: Checksum failed > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94) > at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) > at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281) > at sun.security.krb5.KrbApReq.(KrbApReq.java:149) > at > sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829) > ... 14 more > Caused by: java.security.GeneralSecurityException: Checksum failed > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451) > at > sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272) > at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76) > at > sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100) > ... 20 more > ``` > What did I do? > 1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3 > 2) Changed in zoo.cfg: > changed from > server.1=a > server.2=b > server.3=c > to: > server.1=zk1 > server.2=zk2 > server.3=zk3 > (at this stage after restarting the ensemble all works as expected. > 3) Generate new keytab with alias-based principals and host-based principals > in zookeeper.keytab > 4) Change jaas.conf (server) definition from: > Server > { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true > keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true > useTicketCache=false principal="zookeeper/a.com@COM"; } > ; > to > Server > { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true > keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true > useTicketCache=false principal="zookeeper/zk1.com@COM"; } > ; > From that moment, after restarting quorum
[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374136#comment-17374136 ] Damien Diederen commented on ZOOKEEPER-4332: Hi [~ekleszcz], Great! Cheers, -D > Cannot access children of znode that owns too many znodes > - > > Key: ZOOKEEPER-4332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.1 >Reporter: Emil Kleszcz >Priority: Critical > Labels: zookeeper > Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot > 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png > > > We experience problems with performing any operation (deleteall, get etc.) on > a znode that has too many child nodes. In our case, it's above 200k. At the > same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't > help. This should be either solved by limiting the number of direct znodes > allowed by a parameter or by adding a hard limit by default. > I am attaching some screenshots of the commands and their results. What's > interesting the numbers from getAllChildrenNumber and stat (numChildren) > commands don't match. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372764#comment-17372764 ] Damien Diederen commented on ZOOKEEPER-4332: Hi Emil, bq. However, my current problem is to clean up this state. I cannot do anything with this path as I explained. Right; understood. bq. Thus, "/rmstore/ZKRMStateRoot/RMAppRoot/application_1592399466874_18251/appattempt_159239946687417986_02" should be the longest. The length in bytes should be 10601166. Okay. bq. The extra dot is just my editorial mistake in the post, in zoo.cfg it is specified as expected: "jute.maxbuffer=4194304". Okay—just making sure. But that still tells us something: the error you are experiencing is happening on the client side \(you are using {{zkCli.sh}}, aren't you?), but {{zoo.cfg}} is the server configuration\! The easiest way to change *its* buffer size is, as far as I know, the {{CLIENT_JVMFLAGS}} environment variable. Could you try with something like this: {code:bash} CLIENT_JVMFLAGS='-Djute.maxbuffer=0x100' export CLIENT_JVMFLAGS zkCli.sh -server ${MY_CONN_STRING?} {code} HTH, \-D > Cannot access children of znode that owns too many znodes > - > > Key: ZOOKEEPER-4332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.1 >Reporter: Emil Kleszcz >Priority: Critical > Labels: zookeeper > Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot > 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png > > > We experience problems with performing any operation (deleteall, get etc.) on > a znode that has too many child nodes. In our case, it's above 200k. At the > same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't > help. This should be either solved by limiting the number of direct znodes > allowed by a parameter or by adding a hard limit by default. > I am attaching some screenshots of the commands and their results. What's > interesting the numbers from getAllChildrenNumber and stat (numChildren) > commands don't match. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4314) Can not get real exception when getChildren more than 4M
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372525#comment-17372525 ] Damien Diederen commented on ZOOKEEPER-4314: Hi [~qzballack], I understand that this ticket is about the lack of diagnosis, but have linked the two other issues because they capture the root cause and potential workarounds. > Can not get real exception when getChildren more than 4M > > > Key: ZOOKEEPER-4314 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4314 > Project: ZooKeeper > Issue Type: Improvement > Components: java client >Affects Versions: 3.6.0 >Reporter: Li Jian >Priority: Major > Attachments: zookeeper problem.docx > > > When use zkClient or curator listener one zookeeper node more than 4M will > cause endless loop,because they catch error code for > KeeperException.Code.ConnectionLoss(-4) then will retry getChildren method > after a monent。 The root reason for zookeeper invoke readLength method occor > IOException for “Package len is out of range” but when deal with the > following process zookeeper changed the real exception to ConnectionLoss。Can > See the attached file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ZOOKEEPER-4323) support compile and build C client in the Mac OS
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372512#comment-17372512 ] Damien Diederen edited comment on ZOOKEEPER-4323 at 7/1/21, 8:16 AM: - Hi [~maoling], Which version(s) does this ticket apply to? I fixed some Catalina compilation issues while preparing the 3.7.0 release, and [~eolivelli] [confirmed|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202103.mbox/%3cCACcefgd1sNLSs2__E=ZgpYBrCuRm02PeUz=pxrvzg4baz2c...@mail.gmail.com%3e] that he was successful in building the result on BigSur: {quote} * This time \(for the first time!) I was able to build the C client on MacOs \(BigSur) \! {quote} Did it get broken in the meantime? \(I don't have a mac at hand.) P.-S. — Note that the tests are still badly broken, and that making them portable is a nontrivial undertaking. was (Author: ztzg): Hi [~maoling], Which version(s) does this ticket apply to? I fixed some Catalina compilation issues while preparing the 3.7.0 release, and [~eolivelli] [confirmed|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202103.mbox/%3cCACcefgd1sNLSs2__E=ZgpYBrCuRm02PeUz=pxrvzg4baz2c...@mail.gmail.com%3e] that he was successful in building the result on BigSur: {quote} * This time \(for the first time!) I was able to build the C client on MacOs \(BigSur) \! {quote} Did it get broken in the meantime? \(I don't have a mac at hand.) > support compile and build C client in the Mac OS > > > Key: ZOOKEEPER-4323 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4323 > Project: ZooKeeper > Issue Type: Improvement > Components: c client, documentation >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4323) support compile and build C client in the Mac OS
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372512#comment-17372512 ] Damien Diederen commented on ZOOKEEPER-4323: Hi [~maoling], Which version(s) does this ticket apply to? I fixed some Catalina compilation issues while preparing the 3.7.0 release, and [~eolivelli] [confirmed|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202103.mbox/%3cCACcefgd1sNLSs2__E=ZgpYBrCuRm02PeUz=pxrvzg4baz2c...@mail.gmail.com%3e] that he was successful in building the result on BigSur: {quote} * This time \(for the first time!) I was able to build the C client on MacOs \(BigSur) \! {quote} Did it get broken in the meantime? \(I don't have a mac at hand.) > support compile and build C client in the Mac OS > > > Key: ZOOKEEPER-4323 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4323 > Project: ZooKeeper > Issue Type: Improvement > Components: c client, documentation >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372502#comment-17372502 ] Damien Diederen commented on ZOOKEEPER-4332: Hi Emil, You also wrote: bq. At the same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't help. This is strange. How long are the names of the children of {{.../RMAppRoot/}}? Assuming 32 ASCII characters on average, \~7MiB of {{jute.maxbuffer}} should be enough for {{GetChildren}} to succeed: {code:bash} p=/rmstore/ZKRMStateRoot/RMAppRoot/0123456789abcdef0123456789abcdef p_length=${#p} echo $(((p_length + 4) * 100011)) # 6900759 {code} \(Note that there is an extra dot in the name of the property you reported above. It must be {{\-Djute.maxbuffer}}…=. Could that explain the issue?) > Cannot access children of znode that owns too many znodes > - > > Key: ZOOKEEPER-4332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.1 >Reporter: Emil Kleszcz >Priority: Critical > Labels: zookeeper > Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot > 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png > > > We experience problems with performing any operation (deleteall, get etc.) on > a znode that has too many child nodes. In our case, it's above 200k. At the > same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't > help. This should be either solved by limiting the number of direct znodes > allowed by a parameter or by adding a hard limit by default. > I am attaching some screenshots of the commands and their results. What's > interesting the numbers from getAllChildrenNumber and stat (numChildren) > commands don't match. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372485#comment-17372485 ] Damien Diederen commented on ZOOKEEPER-4332: Hi Emil, This is a long-standing issue, indeed, and one which is unfortunately still relevant. Suggestions which have been suggested include: * [Rejecting node creations|https://issues.apache.org/jira/browse/ZOOKEEPER-1162?focusedCommentId=13091100=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13091100] which would cause the {{GetChildren}} payload to overflow {{jute.maxbuffer}} \(which is somewhat problematic, as the database does not specify the minimum {{jute.maxbuffer}} needed at runtime); * [Introducing a paginated version|https://issues.apache.org/jira/browse/ZOOKEEPER-2260] of {{GetChildren}}. I had missed that existing patch before; it would be interesting to forward-port it to 3.7\+\! You also noted: bq. I am attaching some screenshots of the commands and their results. What's interesting the numbers from getAllChildrenNumber and stat \(numChildren) commands don't match. Note that {{getAllChildrenNumber}} is a *recursive* computation, whereas {{Stat.numChildren}} tracks the number of *direct* children—so I would expect the former to be larger than the latter if some of the nodes have children. > Cannot access children of znode that owns too many znodes > - > > Key: ZOOKEEPER-4332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.1 >Reporter: Emil Kleszcz >Priority: Critical > Labels: zookeeper > Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot > 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png > > > We experience problems with performing any operation (deleteall, get etc.) on > a znode that has too many child nodes. In our case, it's above 200k. At the > same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't > help. This should be either solved by limiting the number of direct znodes > allowed by a parameter or by adding a hard limit by default. > I am attaching some screenshots of the commands and their results. What's > interesting the numbers from getAllChildrenNumber and stat (numChildren) > commands don't match. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4284) Add metrics for observer sync time
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4284: --- Fix Version/s: 3.7.1 > Add metrics for observer sync time > -- > > Key: ZOOKEEPER-4284 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4284 > Project: ZooKeeper > Issue Type: Improvement > Components: metric system >Affects Versions: 3.7.0, 3.7.1 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 2h > Remaining Estimate: 0h > > With enabling the feature of followers hosting observers, it would be nice if > we have a metric to measure the observer sync time just like what we have for > the follower sync time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4318) Only report the follower sync time metrics if sync is completed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4318: --- Fix Version/s: 3.7.1 > Only report the follower sync time metrics if sync is completed > --- > > Key: ZOOKEEPER-4318 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4318 > Project: ZooKeeper > Issue Type: Improvement > Components: metric system >Affects Versions: 3.8 >Reporter: Li Wang >Priority: Minor > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 40m > Remaining Estimate: 0h > > We should calculate the sync time only if completedSync is true. Otherwise, > we will get noisy data such as 0 sync time in cases where sync immediately > failed (due to network partition for example). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4318) Only report the follower sync time metrics if sync is completed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4318. Fix Version/s: (was: 3.8) (was: 3.7.1) 3.8.0 Resolution: Fixed Issue resolved by pull request 1712 [https://github.com/apache/zookeeper/pull/1712] > Only report the follower sync time metrics if sync is completed > --- > > Key: ZOOKEEPER-4318 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4318 > Project: ZooKeeper > Issue Type: Improvement > Components: metric system >Affects Versions: 3.8 >Reporter: Li Wang >Priority: Minor > Labels: pull-request-available > Fix For: 3.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We should calculate the sync time only if completedSync is true. Otherwise, > we will get noisy data such as 0 sync time in cases where sync immediately > failed (due to network partition for example). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4284) Add metrics for observer sync time
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4284. Fix Version/s: 3.8.0 Resolution: Fixed Issue resolved by pull request 1691 [https://github.com/apache/zookeeper/pull/1691] > Add metrics for observer sync time > -- > > Key: ZOOKEEPER-4284 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4284 > Project: ZooKeeper > Issue Type: Improvement > Components: metric system >Affects Versions: 3.7.0, 3.7.1 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > With enabling the feature of followers hosting observers, it would be nice if > we have a metric to measure the observer sync time just like what we have for > the follower sync time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-2695) Handle unknown error for rolling upgrade old client new server scenario
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368712#comment-17368712 ] Damien Diederen commented on ZOOKEEPER-2695: Hi [~arshad.mohammad], I was looking into this, following [this question|https://github.com/apache/zookeeper/pull/1716#pullrequestreview-687896698] of [~eolivelli]'s: bq. You are also introducing a new error code, how do old clients will handle that error code? This is a recurrent question—one to which I feel we don't have a very good answer. The patch attached to this ticket still applies, provided one updates the paths and the target JUnit API imports. \(I am attaching a refreshed version as I have it at hand; feel free to turn it into a GitHub pull request.) The proposed solution definitely improves on the status quo, but I have a few questions: # {{Code.SYSTEMERROR}} is documented as follows: bq. This is never thrown by the server, it shouldn't be used other than to indicate a range. Specifically error codes greater than this value, but lesser than {{#APIERROR}}, are system errors. So I suppose I would suggest throwing {{SystemErrorException}} for codes in {{\[Code.SYSTEMERROR, Code.APIERROR)}} and {{APIErrorException}} codes outside that range. What do you think? # Unfortunately, "known" system errors such as {{RuntimeInconsistencyException}} do not inherit from {{SystemErrorException}}. Similarly, an "API error" such as {{NoNodeException}} does not inherit from {{APIErrorException}}. The list of {{KeeperException}} subclasses being flat means that clients have intimate knowledge of error codes to distinguish between exceptions which warrant an immediate retry, exceptions which should trigger exponential back-off, or fatal ones such as {{Code.AUTHFAILED}}. While reparenting the existing classes would technically be an API/ABI break, we could potentially introduce superclasses such as e.g. {{BaseSystemErrorException}} between {{KeeperException}} and its children \(AFAICT, Sun & Oracle frequently do so), and use ranges to map unknown codes. Is that something we should consider? # {{KeeperException}} currently uses a {{private Code code}} member variable, which makes it impossible to propagate an unknown error codes. Should we change that \(back?) to an {{int}}—without changing the public interface? Of course, methods such as {{public Code code()}} would still have to return {{null}}, but others such as {{public int getCode()}} \(currently deprecated) could return the actual value, and an informative message could still be generated in {{public String getMessage()}}. Thoughts? > Handle unknown error for rolling upgrade old client new server scenario > --- > > Key: ZOOKEEPER-2695 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2695 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Attachments: > 0001-ZOOKEEPER-2695-Handle-unknown-error-for-rolling-upgr.patch, > ZOOKEEPER-2695-01.patch > > > In Zookeeper rolling upgrade scenario where server is new but client is old, > when sever sends error code which is not understood by the client, client > throws NullPointerException. > KeeperException.SystemErrorException should be thrown for all unknown error > code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-2695) Handle unknown error for rolling upgrade old client new server scenario
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-2695: --- Attachment: 0001-ZOOKEEPER-2695-Handle-unknown-error-for-rolling-upgr.patch > Handle unknown error for rolling upgrade old client new server scenario > --- > > Key: ZOOKEEPER-2695 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2695 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Attachments: > 0001-ZOOKEEPER-2695-Handle-unknown-error-for-rolling-upgr.patch, > ZOOKEEPER-2695-01.patch > > > In Zookeeper rolling upgrade scenario where server is new but client is old, > when sever sends error code which is not understood by the client, client > throws NullPointerException. > KeeperException.SystemErrorException should be thrown for all unknown error > code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-2154) NPE in KeeperException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-2154. Fix Version/s: (was: 3.8.0) Resolution: Duplicate Closing in favor of ZOOKEEPER-2695. (Same issue, but more comprehensive solution.) > NPE in KeeperException > -- > > Key: ZOOKEEPER-2154 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2154 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.6 >Reporter: Surendra Singh Lilhore >Priority: Major > Attachments: ZOOKEEPER-2154.patch > > > KeeperException should handle exception is code is null... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367917#comment-17367917 ] Damien Diederen commented on ZOOKEEPER-4306: {quote}I think it‘s more easier for others to notice this limitation if add some JavaDoc of ZooKeeper.create. Would you agree? {quote} Yes, I agree :) Explicitly added to my TODO list. > CloseSessionTxn contains too many ephemal nodes cause cluster crash > --- > > Key: ZOOKEEPER-4306 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Lin Changrui >Priority: Critical > Labels: pull-request-available > Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg > > Time Spent: 20m > Remaining Estimate: 0h > > We took a test about how many ephemal nodes can client create under one > parent node with defalut configuration. The test caused cluster crash at > last, exception stack trace like this. > follower: > !f.jpg! > leader: > !l1.png! > !l2.jpg! > It seems that leader sent a too large txn packet to followers. When follower > try to deserialize the txn, it found the txn length out of its buffer > size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes > followers crashed, and then, leader found there was no sufficient followers > synced, so leader shutdown later. When leader shutdown, it called > zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of > its buffer size, so it crashed too. > After the servers crashed, they try to restart the quorum. But they would not > success because the last txn is too large. We lose the log at that moment, > but the stack trace is same as this one. > !r.jpg|width=1468,height=598! > > *Root Cause* > We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) > visualize this log and found this. !cs.jpg|width=1400,height=581! So > closeSessionTxn contains all ephemal nodes with absolute path. We know we > will get a large getChildren respose if we create too many children nodes > under one parent node, that is limited by jute.maxbuffer of client. If we > create plenty of ephemal nodes under different parent nodes with one session, > it may not cause out of buffer of client, but when the session close without > delete these node first, it probably cause cluster crash. > Is it a bug or just a unspecified feature?If it just so, how should we judge > the upper limit of creating nodes? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365079#comment-17365079 ] Damien Diederen commented on ZOOKEEPER-4306: Hi [~Changrui Lin], A first PR is available: https://github.com/apache/zookeeper/pull/1716. Your review/comments would be welcome! Cheers, -D > CloseSessionTxn contains too many ephemal nodes cause cluster crash > --- > > Key: ZOOKEEPER-4306 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Lin Changrui >Priority: Critical > Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg > > > We took a test about how many ephemal nodes can client create under one > parent node with defalut configuration. The test caused cluster crash at > last, exception stack trace like this. > follower: > !f.jpg! > leader: > !l1.png! > !l2.jpg! > It seems that leader sent a too large txn packet to followers. When follower > try to deserialize the txn, it found the txn length out of its buffer > size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes > followers crashed, and then, leader found there was no sufficient followers > synced, so leader shutdown later. When leader shutdown, it called > zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of > its buffer size, so it crashed too. > After the servers crashed, they try to restart the quorum. But they would not > success because the last txn is too large. We lose the log at that moment, > but the stack trace is same as this one. > !r.jpg|width=1468,height=598! > > *Root Cause* > We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) > visualize this log and found this. !cs.jpg|width=1400,height=581! So > closeSessionTxn contains all ephemal nodes with absolute path. We know we > will get a large getChildren respose if we create too many children nodes > under one parent node, that is limited by jute.maxbuffer of client. If we > create plenty of ephemal nodes under different parent nodes with one session, > it may not cause out of buffer of client, but when the session close without > delete these node first, it probably cause cluster crash. > Is it a bug or just a unspecified feature?If it just so, how should we judge > the upper limit of creating nodes? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4312) ZooKeeperServerEmbedded: enhance server start/stop for testability
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4312. Fix Version/s: 3.7.1 3.8.0 Resolution: Fixed Issue resolved by pull request 1710 [https://github.com/apache/zookeeper/pull/1710] > ZooKeeperServerEmbedded: enhance server start/stop for testability > -- > > Key: ZOOKEEPER-4312 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4312 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.7.0 >Reporter: Enrico Olivelli >Assignee: Enrico Olivelli >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 50m > Remaining Estimate: 0h > > ZooKeeperServerEmbedded works well for running ZooKeeper but it lacks support > for a few little features in order to use it for tests. > I saw these problems while working on the port of Curator Testing Server to > ZooKeeperServerEmbedded. > * There is no wait to wait for the server to be up-and-running > * When you "close()" the server, it does not wait for the ports to be closed > * There is no wait to have the ConnectString for the server -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360734#comment-17360734 ] Damien Diederen commented on ZOOKEEPER-4306: Hi [~Changrui Lin], I am currently looking into fixing this. bq. Is it a bug or just a unspecified feature? If it just so, how should we judge the upper limit of creating nodes? I would definitely put this in the "bug" category \:) It seems that we would want ephemeral node creation to start failing when the session gets "too big" to fit in a transaction. [~eolivelli], [~lvfangmin]: would you agree? Here are a few related tickets which include ideas for minimizing {{jute.maxbuffer}} annoyances: # ZOOKEEPER-1162: Suggests \(among others) controlling node size during child creation (similar to what I am proposing above); # ZOOKEEPER-1644: Suggests compressing some of the data, which would allow for a larger {{CloseSessionTxn}} (related to your comment about "absolute paths"). Note that it is currently possible to work around this issue by setting this undocumented flag: {noformat} closeSessionTxn.enabled = false {noformat} (The flag was introduced as part of ZOOKEEPER-3145. Of course, disabling it "unfixes" the "potential watch missing issue." Still, probably better than suffering crashing ensembles.) > CloseSessionTxn contains too many ephemal nodes cause cluster crash > --- > > Key: ZOOKEEPER-4306 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.6.2 >Reporter: Lin Changrui >Priority: Critical > Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg > > > We took a test about how many ephemal nodes can client create under one > parent node with defalut configuration. The test caused cluster crash at > last, exception stack trace like this. > follower: > !f.jpg! > leader: > !l1.png! > !l2.jpg! > It seems that leader sent a too large txn packet to followers. When follower > try to deserialize the txn, it found the txn length out of its buffer > size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes > followers crashed, and then, leader found there was no sufficient followers > synced, so leader shutdown later. When leader shutdown, it called > zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of > its buffer size, so it crashed too. > After the servers crashed, they try to restart the quorum. But they would not > success because the last txn is too large. We lose the log at that moment, > but the stack trace is same as this one. > !r.jpg|width=1468,height=598! > > *Root Cause* > We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) > visualize this log and found this. !cs.jpg|width=1400,height=581! So > closeSessionTxn contains all ephemal nodes with absolute path. We know we > will get a large getChildren respose if we create too many children nodes > under one parent node, that is limited by jute.maxbuffer of client. If we > create plenty of ephemal nodes under different parent nodes with one session, > it may not cause out of buffer of client, but when the session close without > delete these node first, it probably cause cluster crash. > Is it a bug or just a unspecified feature?If it just so, how should we judge > the upper limit of creating nodes? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358800#comment-17358800 ] Damien Diederen commented on ZOOKEEPER-4211: Hi [~liwang], I am planning to dedicate some time to this on Wednesday. HTH, -D > Expose Quota Metrics to Prometheus > -- > > Key: ZOOKEEPER-4211 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211 > Project: ZooKeeper > Issue Type: New Feature > Components: metric system >Affects Versions: 3.7.0, 3.7 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > > In 3.7, Quota limit can be enforced and the quota related stats are captured > in the StatsTrack. From the "listquota" CLI command, we can the quota limit > and usage info. > As an addition to that, we would like to collect the quota metrics and expose > them to the Prometheus for the following: > 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard > 2. Creating alert based on the quota levels (e.g. 90% used) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-3970) Enable ZooKeeperServerController to expire session
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-3970. Fix Version/s: 3.7.1 3.8.0 Resolution: Fixed Issue resolved by pull request 1505 [https://github.com/apache/zookeeper/pull/1505] > Enable ZooKeeperServerController to expire session > -- > > Key: ZOOKEEPER-3970 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3970 > Project: ZooKeeper > Issue Type: Task > Components: server, tests >Reporter: Michael Han >Assignee: Michael Han >Priority: Minor > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > This is a follow up of ZOOKEEPER-3948. Here we enable > ZooKeeperServerController to be able to expire a global or local session. > This is very useful in our experience in integration testing when we want a > controlled session expiration mechanism. This is done by having session > tracker exposing both global and local session stats, so a zookeeper server > can expire the sessions in the controller. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343403#comment-17343403 ] Damien Diederen commented on ZOOKEEPER-4211: Hi [~liwang], Yup, I have seen your update. Planning to have a look tomorrow. Cheers, -D > Expose Quota Metrics to Prometheus > -- > > Key: ZOOKEEPER-4211 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211 > Project: ZooKeeper > Issue Type: New Feature > Components: metric system >Affects Versions: 3.7.0, 3.7 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > In 3.7, Quota limit can be enforced and the quota related stats are captured > in the StatsTrack. From the "listquota" CLI command, we can the quota limit > and usage info. > As an addition to that, we would like to collect the quota metrics and expose > them to the Prometheus for the following: > 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard > 2. Creating alert based on the quota levels (e.g. 90% used) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340353#comment-17340353 ] Damien Diederen commented on ZOOKEEPER-4211: Hi [~liwang], Ha, funny; I just caught up with your discussion on the {{dev}} mailing list! (https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202105.mbox/browser). Excellent work, btw. I haven't forgotten about your points, but have just been really busy on other topics. (Of course, don't hesitate to ping other reviewers, particularly when I am lagging!) That being said: bq. Based on investigation result of the performance impact of Prometheus Summary quantile computation, I am working on adding the support for CounterSet for the use case that need to group counter metrics by keys (i.e. top namespace) but no need for quantiles and sum. I figure we'll have to use some sort of queue to gather all these metrics without blocking the worker threads, but +1 on not making things worse in the meantime :) > Expose Quota Metrics to Prometheus > -- > > Key: ZOOKEEPER-4211 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211 > Project: ZooKeeper > Issue Type: New Feature > Components: metric system >Affects Versions: 3.7.0, 3.7 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > > In 3.7, Quota limit can be enforced and the quota related stats are captured > in the StatsTrack. From the "listquota" CLI command, we can the quota limit > and usage info. > As an addition to that, we would like to collect the quota metrics and expose > them to the Prometheus for the following: > 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard > 2. Creating alert based on the quota levels (e.g. 90% used) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4285) High CVE-2019-25013 reported by Clair scanner for Zookeeper 3.6.1
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4285. Assignee: Damien Diederen Resolution: Invalid Hi [~priyavj], ZooKeeper releases do not bundle the GNU C library, nor native binaries, so I don't see how this report could be lifted on our side. If you have installed some kind of ZooKeeper package provided by a distributor, I would suggest raising the issue with them. (Of course, feel free to reopen if I missed something.) Best, -D > High CVE-2019-25013 reported by Clair scanner for Zookeeper 3.6.1 > - > > Key: ZOOKEEPER-4285 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4285 > Project: ZooKeeper > Issue Type: Bug >Reporter: priya Vijay >Assignee: Damien Diederen >Priority: Major > > On running clair scanner for Zookeeper 3.6.1, the following high priority > vulnerability is reported: > CVE-2019-25013 [https://nvd.nist.gov/vuln/detail/CVE-2019-25013] > details: The iconv feature in the GNU C Library (aka glibc or libc6) through > 2.32, when processing invalid multi-byte input sequences in the EUC-KR > encoding, may have a buffer over-read -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4262) Backport ZOOKEEPER-3911 to branch-3.5
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4262. Fix Version/s: 3.5.10 Resolution: Fixed Issue resolved by pull request 1657 [https://github.com/apache/zookeeper/pull/1657] > Backport ZOOKEEPER-3911 to branch-3.5 > - > > Key: ZOOKEEPER-4262 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4262 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: fanyang >Priority: Critical > Labels: pull-request-available > Fix For: 3.5.10 > > Time Spent: 40m > Remaining Estimate: 0h > > Backporting ZOOKEEPER-3911 to branch-3.5 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4282) Redesign quota feature
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326440#comment-17326440 ] Damien Diederen commented on ZOOKEEPER-4282: Hi [~arshad.mohammad], I was just looking into this. Generally agree. Even when "hard" limits are set, the current quota implementation ([ZOOKEEPER-3301|https://issues.apache.org/jira/browse/ZOOKEEPER-3301]) is trivial to work around. First, a few notes about your points, then a scary story below: # Not necessarily against adding {{setQuota}} and friends, but wouldn't creating all nodes in the {{/zookeeper/quota}} subtree with an ACL akin to {{world:anyone:r}} (by default, value configurable) be technically sufficient, and not require a change in protocol? # ACL? # I agree that the notion of an "application root node" seems to be quite common in deployments, and that the native root should be protected in such setups. Perhaps a simple configuration setting? Doing {{setAcl / world:anyone:r}} as {{super}} is not that difficult, though---and the window is probably negligible in practice; # We currently have a tristate: {{enforceQuota:false}} means that quotas are not being _processed_ at all; "soft" quotas cause overflows to be logged; "hard" quotas cause requests to fail. (Not saying we need to preserve these features; it was just to complete your description). Now for the scary story: The old quota implementation was supposed to be "advisory," but I looked a bit deeper---and just noticed that besides its obvious limitations, the lack of controls combined with the central location of the quota checks creates a serious DoS vector! (I was aware of a similar problem with [ZOOKEEPER-451|https://issues.apache.org/jira/browse/ZOOKEEPER-451], but it turns out that the issue is present in mainline 3.6 and earlier.) On a "properly administered" ensemble, a {{super}} user sets up a "root node" for user {{eve}}: {noformat} setAcl / world:anyone:r create /eve setAcl /eve sasl:eve:cdrwa setquota /eve -B 32 {noformat} Once logged in, {{eve}} can simply do: {noformat} set /zookeeper/quota/eve/zookeeper_limits boom create /eve/was.here {noformat} which immediately causes the server to fail and exit with this nasty exception: {noformat} 2021-04-21 12:20:25,861 [myid:] - ERROR [SyncThread:0:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from thread : SyncThread:0 java.lang.IllegalArgumentException: invalid string yolo at org.apache.zookeeper.StatsTrack.(StatsTrack.java:50) at org.apache.zookeeper.server.DataTree.updateCountBytes(DataTree.java:409) at org.apache.zookeeper.server.DataTree.createNode(DataTree.java:550) {noformat} Worse, the server won't restart before the corrupted data is excised from the snapshot or transaction log. This seems to be a minimal reproducer: {noformat} create /eve create /zookeeper/quota/eve create /zookeeper/quota/eve/zookeeper_stats boom create /zookeeper/quota/eve/zookeeper_limits boom create /eve/was.here {noformat} I would suggest opening another ticket, and creating PRs preventing the server crash for 3.5 and 3.6. WDYT? Should I take care of it? Best, -D (Cc: [~eolivelli], [~maoling], [~hanm].) > Redesign quota feature > -- > > Key: ZOOKEEPER-4282 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4282 > Project: ZooKeeper > Issue Type: New Feature > Components: quota >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Fix For: 3.8.0 > > > *Quota Use Case:* > Generally in a big data solution deployment multiple services (hdfs, yarn, > hbase etc.) use single Zookeeper cluster. So it is very important to ensure > fare usage by all services. Sometime services unintentionally, mainly because > of faulty behavior, create many znodes and impact the overall reliability of > the ZooKeeper service. To ensure the faire usage quota feature is required. > But this is the only use case there are many other use cases for quota > feature. > *Current Problems:* > # Currently, user can set quota by updating znode > “/zookeeper/quota/nodepath”, or using setquota/delquota in CLI command. > This makes the quota setting infective > Currently any user can set/delete quota, which is not proper, it should be > admin operation > # User is allowed to modify zookeeper system paths like /zookeeper/quota. > These are internal to zookeeper should not be allowed to modify. > # Generally services create single top level znode in Zookeeper like /hbase > and create all required znode under it. > It is better if it is configurable who can create top level znodes to > controll ZooKeeper usage. > # After ZOOKEEPER-231, there two kinds quota enforcement limits 1. Hard limit > 2. Soft limit. > I think there should be only limit.
[jira] [Resolved] (ZOOKEEPER-4265) Download page broken links
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4265. Fix Version/s: 3.6.3 3.7.1 3.8.0 Resolution: Fixed Issue resolved by pull request 1677 [https://github.com/apache/zookeeper/pull/1677] > Download page broken links > -- > > Key: ZOOKEEPER-4265 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265 > Project: ZooKeeper > Issue Type: Bug >Reporter: Sebb >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1, 3.6.3 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The download page [1] has broken links for the following release versions: > 3.6.1 > 3.5.9 > Please remove them from the page. > If necessary, they can be linked from the archive server, in which case the > page should make it clear that they historic releases. > [1] https://zookeeper.apache.org/releases.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4265) Download page broken links
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316943#comment-17316943 ] Damien Diederen commented on ZOOKEEPER-4265: [~sebb]: Thank you for the report. You might want to take a look at https://github.com/apache/zookeeper/pull/1677. > Download page broken links > -- > > Key: ZOOKEEPER-4265 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265 > Project: ZooKeeper > Issue Type: Bug >Reporter: Sebb >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The download page [1] has broken links for the following release versions: > 3.6.1 > 3.5.9 > Please remove them from the page. > If necessary, they can be linked from the archive server, in which case the > page should make it clear that they historic releases. > [1] https://zookeeper.apache.org/releases.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ZOOKEEPER-4265) Download page broken links
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen reassigned ZOOKEEPER-4265: -- Assignee: Damien Diederen > Download page broken links > -- > > Key: ZOOKEEPER-4265 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265 > Project: ZooKeeper > Issue Type: Bug >Reporter: Sebb >Assignee: Damien Diederen >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The download page [1] has broken links for the following release versions: > 3.6.1 > 3.5.9 > Please remove them from the page. > If necessary, they can be linked from the archive server, in which case the > page should make it clear that they historic releases. > [1] https://zookeeper.apache.org/releases.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ZOOKEEPER-4266) Correct ZooKeeper version in documentation header
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314246#comment-17314246 ] Damien Diederen edited comment on ZOOKEEPER-4266 at 4/3/21, 11:45 AM: -- Issue resolved by pull requests [https://github.com/apache/zookeeper/pull/1659] and [https://github.com/apache/zookeeper/pull/1660] was (Author: ztzg): Issue resolved by pull request 1660 [https://github.com/apache/zookeeper/pull/1660] > Correct ZooKeeper version in documentation header > - > > Key: ZOOKEEPER-4266 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4266 > Project: ZooKeeper > Issue Type: Bug > Components: documentation >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Minor > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Attachments: image-2021-03-28-22-25-39-949.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Both Master and branch-3.7 documentation header have ZooKeeper version as 3.6. > These should be changed to 3.8 and 3.7 for master and branch-3.7 respectively > Master documentation currently: > !image-2021-03-28-22-25-39-949.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4266) Correct ZooKeeper version in documentation header
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4266. Resolution: Fixed Issue resolved by pull request 1660 [https://github.com/apache/zookeeper/pull/1660] > Correct ZooKeeper version in documentation header > - > > Key: ZOOKEEPER-4266 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4266 > Project: ZooKeeper > Issue Type: Bug > Components: documentation >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Minor > Labels: pull-request-available > Fix For: 3.8.0, 3.7.1 > > Attachments: image-2021-03-28-22-25-39-949.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Both Master and branch-3.7 documentation header have ZooKeeper version as 3.6. > These should be changed to 3.8 and 3.7 for master and branch-3.7 respectively > Master documentation currently: > !image-2021-03-28-22-25-39-949.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313402#comment-17313402 ] Damien Diederen commented on ZOOKEEPER-4211: Hi [~liwang], Not sure you have seen my two comments from earlier today: * https://github.com/apache/zookeeper/pull/1644#discussion_r605547937 * https://github.com/apache/zookeeper/pull/1644#discussion_r605548947 (I know the other points are still pending.) > Expose Quota Metrics to Prometheus > -- > > Key: ZOOKEEPER-4211 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211 > Project: ZooKeeper > Issue Type: New Feature > Components: metric system >Affects Versions: 3.7.0, 3.7 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > In 3.7, Quota limit can be enforced and the quota related stats are captured > in the StatsTrack. From the "listquota" CLI command, we can the quota limit > and usage info. > As an addition to that, we would like to collect the quota metrics and expose > them to the Prometheus for the following: > 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard > 2. Creating alert based on the quota levels (e.g. 90% used) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4245) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308866#comment-17308866 ] Damien Diederen commented on ZOOKEEPER-4245: No problem; website submit buttons are sometimes unresponsive/unclear. I just wanted to give you a chance to chime in after closing the extra requests—in case I missed something. > Resource leaks in > org.apache.zookeeper.server.persistence.SnapStream#getInputStream and > #getOutputStream > > > Key: ZOOKEEPER-4245 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4245 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Martin Kellogg >Priority: Major > > There are three (related) possible resource leaks in the `getInputStream` > and `getOutputStream` methods in `SnapStream.java`. I noticed the first > because of the use of the error-prone `GZIPOutputStream`, and the other two > after looking at the surrounding code. > Here is the offending code (copied from > [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102]): > {noformat} > /** > * Return the CheckedInputStream based on the extension of the fileName. > * > * @param file the file the InputStream read from > * @return the specific InputStream > * @throws IOException > */ > public static CheckedInputStream getInputStream(File file) throws > IOException { > FileInputStream fis = new FileInputStream(file); > InputStream is; > switch (getStreamMode(file.getName())) { > case GZIP: > is = new GZIPInputStream(fis); > break; > case SNAPPY: > is = new SnappyInputStream(fis); > break; > case CHECKED: > default: > is = new BufferedInputStream(fis); > } > return new CheckedInputStream(is, new Adler32()); > } > /** > * Return the OutputStream based on predefined stream mode. > * > * @param file the file the OutputStream writes to > * @param fsync sync the file immediately after write > * @return the specific OutputStream > * @throws IOException > */ > public static CheckedOutputStream getOutputStream(File file, boolean > fsync) throws IOException { > OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new > FileOutputStream(file); > OutputStream os; > switch (streamMode) { > case GZIP: > os = new GZIPOutputStream(fos); > break; > case SNAPPY: > os = new SnappyOutputStream(fos); > break; > case CHECKED: > default: > os = new BufferedOutputStream(fos); > } > return new CheckedOutputStream(os, new Adler32()); > } > {noformat} > All three possible resource leaks are caused by the constructors of the > intermediate streams (i.e. `is` and `os`), some of which might throw > `IOException`s: > * in `getOutputStream`, the call to `new GZIPOutputStream` can throw an > exception, because `GZIPOutputStream` writes out the header in the > constructor. If it does throw, then `fos` is never closed. That it does so > makes it hard to use correctly; someone raised this as an issue with the JDK > folks [here|https://bugs.openjdk.java.net/browse/JDK-8180899], but they > closed it as "won't fix" because the constructor is documented to throw > (hence the need to catch the exception here). > * in `getInputStream`, the call to `new GZIPInputStream` can throw an > `IOException` for a similar reason, causing the file handle held by `fis` to > leak. > * similarly, the call to `new SnappyInputStream` can throw an `IOException`, > because it tries to read the file header during construction, which also > causes `fis` to leak. `SnappyOutputStream` cannot throw; I checked > [here|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java]. > I'll submit a PR with a (simple) fix shortly after this bug report goes up > and gets assigned an issue number, and add a link to this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4245) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4245. Resolution: Duplicate (Hi [~kelloggm]. Thank you for the report. I believe this one and the others were accidental duplicates of ZOOKEEPER-4246, weren't they?) > Resource leaks in > org.apache.zookeeper.server.persistence.SnapStream#getInputStream and > #getOutputStream > > > Key: ZOOKEEPER-4245 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4245 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Martin Kellogg >Priority: Major > > There are three (related) possible resource leaks in the `getInputStream` > and `getOutputStream` methods in `SnapStream.java`. I noticed the first > because of the use of the error-prone `GZIPOutputStream`, and the other two > after looking at the surrounding code. > Here is the offending code (copied from > [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102]): > {noformat} > /** > * Return the CheckedInputStream based on the extension of the fileName. > * > * @param file the file the InputStream read from > * @return the specific InputStream > * @throws IOException > */ > public static CheckedInputStream getInputStream(File file) throws > IOException { > FileInputStream fis = new FileInputStream(file); > InputStream is; > switch (getStreamMode(file.getName())) { > case GZIP: > is = new GZIPInputStream(fis); > break; > case SNAPPY: > is = new SnappyInputStream(fis); > break; > case CHECKED: > default: > is = new BufferedInputStream(fis); > } > return new CheckedInputStream(is, new Adler32()); > } > /** > * Return the OutputStream based on predefined stream mode. > * > * @param file the file the OutputStream writes to > * @param fsync sync the file immediately after write > * @return the specific OutputStream > * @throws IOException > */ > public static CheckedOutputStream getOutputStream(File file, boolean > fsync) throws IOException { > OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new > FileOutputStream(file); > OutputStream os; > switch (streamMode) { > case GZIP: > os = new GZIPOutputStream(fos); > break; > case SNAPPY: > os = new SnappyOutputStream(fos); > break; > case CHECKED: > default: > os = new BufferedOutputStream(fos); > } > return new CheckedOutputStream(os, new Adler32()); > } > {noformat} > All three possible resource leaks are caused by the constructors of the > intermediate streams (i.e. `is` and `os`), some of which might throw > `IOException`s: > * in `getOutputStream`, the call to `new GZIPOutputStream` can throw an > exception, because `GZIPOutputStream` writes out the header in the > constructor. If it does throw, then `fos` is never closed. That it does so > makes it hard to use correctly; someone raised this as an issue with the JDK > folks [here|https://bugs.openjdk.java.net/browse/JDK-8180899], but they > closed it as "won't fix" because the constructor is documented to throw > (hence the need to catch the exception here). > * in `getInputStream`, the call to `new GZIPInputStream` can throw an > `IOException` for a similar reason, causing the file handle held by `fis` to > leak. > * similarly, the call to `new SnappyInputStream` can throw an `IOException`, > because it tries to read the file header during construction, which also > causes `fis` to leak. `SnappyOutputStream` cannot throw; I checked > [here|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java]. > I'll submit a PR with a (simple) fix shortly after this bug report goes up > and gets assigned an issue number, and add a link to this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4244) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4244. Resolution: Duplicate > Resource leaks in > org.apache.zookeeper.server.persistence.SnapStream#getInputStream and > #getOutputStream > > > Key: ZOOKEEPER-4244 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4244 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Martin Kellogg >Priority: Major > > There are three (related) possible resource leaks in the `getInputStream` > and `getOutputStream` methods in `SnapStream.java`. I noticed the first > because of the use of the error-prone `GZIPOutputStream`, and the other two > after looking at the surrounding code. > Here is the offending code (copied from > [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102]): > {noformat} > /** > * Return the CheckedInputStream based on the extension of the fileName. > * > * @param file the file the InputStream read from > * @return the specific InputStream > * @throws IOException > */ > public static CheckedInputStream getInputStream(File file) throws > IOException { > FileInputStream fis = new FileInputStream(file); > InputStream is; > switch (getStreamMode(file.getName())) { > case GZIP: > is = new GZIPInputStream(fis); > break; > case SNAPPY: > is = new SnappyInputStream(fis); > break; > case CHECKED: > default: > is = new BufferedInputStream(fis); > } > return new CheckedInputStream(is, new Adler32()); > } > /** > * Return the OutputStream based on predefined stream mode. > * > * @param file the file the OutputStream writes to > * @param fsync sync the file immediately after write > * @return the specific OutputStream > * @throws IOException > */ > public static CheckedOutputStream getOutputStream(File file, boolean > fsync) throws IOException { > OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new > FileOutputStream(file); > OutputStream os; > switch (streamMode) { > case GZIP: > os = new GZIPOutputStream(fos); > break; > case SNAPPY: > os = new SnappyOutputStream(fos); > break; > case CHECKED: > default: > os = new BufferedOutputStream(fos); > } > return new CheckedOutputStream(os, new Adler32()); > } > {noformat} > All three possible resource leaks are caused by the constructors of the > intermediate streams (i.e. `is` and `os`), some of which might throw > `IOException`s: > * in `getOutputStream`, the call to `new GZIPOutputStream` can throw an > exception, because `GZIPOutputStream` writes out the header in the > constructor. If it does throw, then `fos` is never closed. That it does so > makes it hard to use correctly; someone raised this as an issue with the JDK > folks [here|https://bugs.openjdk.java.net/browse/JDK-8180899], but they > closed it as "won't fix" because the constructor is documented to throw > (hence the need to catch the exception here). > * in `getInputStream`, the call to `new GZIPInputStream` can throw an > `IOException` for a similar reason, causing the file handle held by `fis` to > leak. > * similarly, the call to `new SnappyInputStream` can throw an `IOException`, > because it tries to read the file header during construction, which also > causes `fis` to leak. `SnappyOutputStream` cannot throw; I checked > [here|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java]. > I'll submit a PR with a (simple) fix shortly after this bug report goes up > and gets assigned an issue number, and add a link to this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4243) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4243. Resolution: Duplicate > Resource leaks in > org.apache.zookeeper.server.persistence.SnapStream#getInputStream and > #getOutputStream > > > Key: ZOOKEEPER-4243 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4243 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Martin Kellogg >Priority: Major > > There are three (related) possible resource leaks in the getInputStream and > getOutputStream methods in SnapStream.java. I noticed the first because of > the use of the error-prone GZIPOutputStream, and the other two after looking > at the surrounding code. > Here is the offending code (copied from > [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102):] > > {code:java} > /** > * Return the CheckedInputStream based on the extension of the fileName. > * > * @param file the file the InputStream read from > * @return the specific InputStream > * @throws IOException > */ > public static CheckedInputStream getInputStream(File file) throws IOException > { > FileInputStream fis = new FileInputStream(file); > InputStream is; > switch (getStreamMode(file.getName())) { > case GZIP: > is = new GZIPInputStream(fis); > break; > case SNAPPY: > is = new SnappyInputStream(fis); > break; > case CHECKED: > default: > is = new BufferedInputStream(fis); > } > return new CheckedInputStream(is, new Adler32()); > } > > /** > * Return the OutputStream based on predefined stream mode. > * > * @param file the file the OutputStream writes to > * @param fsync sync the file immediately after write > * @return the specific OutputStream > * @throws IOException > */ > public static CheckedOutputStream getOutputStream(File file, boolean fsync) > throws IOException { > OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new > FileOutputStream(file); > OutputStream os; > switch (streamMode) { > case GZIP: > os = new GZIPOutputStream(fos); > break; > case SNAPPY: > os = new SnappyOutputStream(fos); > break; > case CHECKED: > default: > os = new BufferedOutputStream(fos); > } > return new CheckedOutputStream(os, new Adler32()); > }{code} > All three possible resource leaks are caused by the constructors of the > intermediate streams (i.e. is and os), some of which might throw IOExceptions: > * in getOutputStream, the call to "new GZIPOutputStream" can throw an > exception, because GZIPOutputStream writes out the header in the constructor. > If it does throw, then fos is never closed. That it does so makes it hard to > use correctly; someone raised this as an issue with the JDK folks > [here|[https://bugs.openjdk.java.net/browse/JDK-8180899]], but they closed it > as "won't fix" because the constructor is documented to throw (hence why we > need to catch the exception here). > * in getInputStream, the call to "new GZIPInputStream" can throw an > IOException for a similar reason, causing the file handle held by fis to leak. > * similarly, the call to "new SnappyInputStream" can throw an IOException, > because it tries to read the file header during construction, which also > causes fis to leak. SnappyOutputStream cannot throw; I checked > [here|[https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java]]. > I will submit a PR with a fix on Github shortly and update this description > with a link. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4251) Flaky test: org.apache.zookeeper.test.WatcherTest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4251: --- Fix Version/s: 3.8.0 > Flaky test: org.apache.zookeeper.test.WatcherTest > - > > Key: ZOOKEEPER-4251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4251 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Mohammad Arshad >Priority: Major > Labels: pull-request-available > Fix For: 3.6.3, 3.8.0 > > Attachments: image-2021-03-16-12-24-27-480.png > > Time Spent: 1h > Remaining Estimate: 0h > > Flakyness=73.3% (11 / 15) > !image-2021-03-16-12-24-27-480.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4251) Flaky test: org.apache.zookeeper.test.WatcherTest
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4251. Fix Version/s: 3.6.3 Resolution: Fixed Issue resolved by pull request 1647 [https://github.com/apache/zookeeper/pull/1647] > Flaky test: org.apache.zookeeper.test.WatcherTest > - > > Key: ZOOKEEPER-4251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4251 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Mohammad Arshad >Priority: Major > Labels: pull-request-available > Fix For: 3.6.3 > > Attachments: image-2021-03-16-12-24-27-480.png > > Time Spent: 50m > Remaining Estimate: 0h > > Flakyness=73.3% (11 / 15) > !image-2021-03-16-12-24-27-480.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4259) Allow AdminServer to force https
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4259. Fix Version/s: 3.8.0 Resolution: Fixed > Allow AdminServer to force https > > > Key: ZOOKEEPER-4259 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4259 > Project: ZooKeeper > Issue Type: Improvement > Components: security >Affects Versions: 3.7.0 >Reporter: Norbert Kalmár >Assignee: Norbert Kalmár >Priority: Minor > Labels: pull-request-available > Fix For: 3.8.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Since portunification (ZOOKEEPER-3371), AdminServer supports https. But there > is no way to disable http and allow https only. It is my understanding, that > to be FLIPS compliant, only https is allowed. This is one reason it is good > to have such a feature. > To enable https currently, we need to set these parameters in zoo.cfg: > {code:java} > ssl.quorum.keyStore.location=/tmp/zookeeper/keystore.jks > ssl.quorum.keyStore.password=password > ssl.quorum.trustStore.location=/tmp/zookeeper/truststore.jks > ssl.quorum.trustStore.password=password > admin.portUnification=true > {code} > I generated keystore and truststore with the following commands: > {code:java} > #create test/dev keystore/truststore (ZK runs only on localhost) > keytool -genkeypair -alias zk.dev -keyalg RSA -keysize 2048 -dname > "cn=zk.dev" -keypass password -keystore /tmp/zookeeper/keystore.jks -ext > san=dns:localhost -storepass password > keytool -exportcert -alias zk.dev -keystore /tmp/zookeeper/keystore.jks -file > /tmp/zookeeper/zk.dev.cer -rfc > keytool -keystore /tmp/zookeeper/truststore.jks -storepass password > -importcert -alias zk.dev -file /tmp/zookeeper/zk.dev.cer > #check > keytool -list -v -keystore /tmp/zookeeper/truststore.jks > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3128) Get CLI Command displays Authentication error for Authorization error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-3128: --- Fix Version/s: (was: 3.7.0) > Get CLI Command displays Authentication error for Authorization error > - > > Key: ZOOKEEPER-3128 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3128 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.3, 3.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > CLI Get Command display > "org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = > NoAuth for /b" when user does not have read access on the znode /b. > Steps to reproduce the bug: > {noformat} > [zk: vm1:2181(CONNECTED) 1] create /b > Created /b > [zk: vm1:2181(CONNECTED) 2] getAcl /b > 'world,'anyone > : cdrwa > [zk: vm1:2181(CONNECTED) 3] setAcl /b world:anyone:wa > [zk: vm1:2181(CONNECTED) 4] getAcl /b > 'world,'anyone > : wa > [zk: vm1:2181(CONNECTED) 5] get /b > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = > NoAuth for /b > [zk: vm1:2181(CONNECTED) 6] > {noformat} > Expected output: > {noformat} > [zk: vm1:2181(CONNECTED) 0] get /b > Insufficient permission : /b > {noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3128) Get CLI Command displays Authentication error for Authorization error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-3128: --- Fix Version/s: (was: 3.7.0.) 3.7.0 > Get CLI Command displays Authentication error for Authorization error > - > > Key: ZOOKEEPER-3128 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3128 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.3, 3.7.0, 3.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > CLI Get Command display > "org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = > NoAuth for /b" when user does not have read access on the znode /b. > Steps to reproduce the bug: > {noformat} > [zk: vm1:2181(CONNECTED) 1] create /b > Created /b > [zk: vm1:2181(CONNECTED) 2] getAcl /b > 'world,'anyone > : cdrwa > [zk: vm1:2181(CONNECTED) 3] setAcl /b world:anyone:wa > [zk: vm1:2181(CONNECTED) 4] getAcl /b > 'world,'anyone > : wa > [zk: vm1:2181(CONNECTED) 5] get /b > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = > NoAuth for /b > [zk: vm1:2181(CONNECTED) 6] > {noformat} > Expected output: > {noformat} > [zk: vm1:2181(CONNECTED) 0] get /b > Insufficient permission : /b > {noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305147#comment-17305147 ] Damien Diederen commented on ZOOKEEPER-4211: Hello [~liwang], Sorry for the lag; things have been a bit frantic on other fronts. I will review your contribution ASAP. It might not get into 3.7.0, as we currently have a release candidate which is being voted on, but I will consider it for 3.7.1 and {{master}}. > Expose Quota Metrics to Prometheus > -- > > Key: ZOOKEEPER-4211 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211 > Project: ZooKeeper > Issue Type: New Feature > Components: metric system >Affects Versions: 3.7.0, 3.7 >Reporter: Li Wang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In 3.7, Quota limit can be enforced and the quota related stats are captured > in the StatsTrack. From the "listquota" CLI command, we can the quota limit > and usage info. > As an addition to that, we would like to collect the quota metrics and expose > them to the Prometheus for the following: > 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard > 2. Creating alert based on the quota levels (e.g. 90% used) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4257) learner.asyncSending and learner.closeSocketAsync should be configurable in zoo.cfg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303219#comment-17303219 ] Damien Diederen commented on ZOOKEEPER-4257: Removed the 3.7.0 tag as no resolution has been merged into {{branch-3.7.0}} as of now, and the ticket shouldn't appear in the release notes before then. > learner.asyncSending and learner.closeSocketAsync should be configurable in > zoo.cfg > --- > > Key: ZOOKEEPER-4257 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4257 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Mohammad Arshad >Priority: Minor > Fix For: 3.8.0 > > > Configurations learner.asyncSending and learner.closeSocketAsync introduced > in ZOOKEEPER-3575 and ZOOKEEPER-3574 are java system property only, which > means can not be configured > through ZooKeeper configuration file zoo.cfg > As these JIRA changes are not released yet it is better to correct it and > make it configurable through zoo.cfg. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4257) learner.asyncSending and learner.closeSocketAsync should be configurable in zoo.cfg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4257: --- Fix Version/s: (was: 3.7.0) > learner.asyncSending and learner.closeSocketAsync should be configurable in > zoo.cfg > --- > > Key: ZOOKEEPER-4257 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4257 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Mohammad Arshad >Priority: Minor > Fix For: 3.8.0 > > > Configurations learner.asyncSending and learner.closeSocketAsync introduced > in ZOOKEEPER-3575 and ZOOKEEPER-3574 are java system property only, which > means can not be configured > through ZooKeeper configuration file zoo.cfg > As these JIRA changes are not released yet it is better to correct it and > make it configurable through zoo.cfg. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-4241) Change log level without restarting zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303217#comment-17303217 ] Damien Diederen commented on ZOOKEEPER-4241: [~arshad.mohammad]: Removed the 3.7.0 tag as it hasn't been merged into {{branch-3.7.0}} as of now, and shouldn't appear in the release notes before it is. > Change log level without restarting zookeeper > - > > Key: ZOOKEEPER-4241 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4241 > Project: ZooKeeper > Issue Type: Wish > Components: server >Affects Versions: 3.6.2 > Environment: Kubernetes >Reporter: Pratik Thacker >Assignee: Mohammad Arshad >Priority: Major > Labels: pull-request-available > Fix For: 3.6.3, 3.8.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In our usecase of zookeeper, we want to change log level of zookeeper without > restarting it. > This will help us to trace issues without restarting zookeeper as some of the > issues may not appear immediately after restart with debug log level enabled, > and it may take longer to reproduce the issue after restart. > Does such feature/API is already available in Apache Zookeeper? > If it is not available then could you please consider this request to > implement it? > Please let us know if you need any further details from us. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-4241) Change log level without restarting zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-4241: --- Fix Version/s: (was: 3.7.0) > Change log level without restarting zookeeper > - > > Key: ZOOKEEPER-4241 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4241 > Project: ZooKeeper > Issue Type: Wish > Components: server >Affects Versions: 3.6.2 > Environment: Kubernetes >Reporter: Pratik Thacker >Assignee: Mohammad Arshad >Priority: Major > Labels: pull-request-available > Fix For: 3.6.3, 3.8.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In our usecase of zookeeper, we want to change log level of zookeeper without > restarting it. > This will help us to trace issues without restarting zookeeper as some of the > issues may not appear immediately after restart with debug log level enabled, > and it may take longer to reproduce the issue after restart. > Does such feature/API is already available in Apache Zookeeper? > If it is not available then could you please consider this request to > implement it? > Please let us know if you need any further details from us. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3706) ZooKeeper.close() would leak SendThread when the network is broken
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen updated ZOOKEEPER-3706: --- Fix Version/s: (was: 3.7.0) > ZooKeeper.close() would leak SendThread when the network is broken > -- > > Key: ZOOKEEPER-3706 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3706 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.6.0, 3.4.14, 3.5.6 >Reporter: Pierre Yin >Assignee: Pierre Yin >Priority: Major > Labels: pull-request-available > Fix For: 3.8.0 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > The close method of ZooKeeper may cause the leak of SendThread when the > network is broken. > When the network is broken, the SendThread of ZooKeeper client falls into the > continuous reconnecting scenario. But there is an unsafe point which is just > at the moment before startConnect() during the continuous reconnecting. If > SendThread.close() in another thread hit the unsafe point, startConnect() > would sleep some time and force to change state to States.CONNECTING although > SendThread.close() already set state to States.CLOSED. In this case, the > SendThread would be never be dead and nobody would change the state again. > In normal case, ZooKeeper.close() would be blocked forever to wait > closeSession packet is finished until the network broken is recovered. But if > user set the request timeout, ZooKeeper.close() would break the block waiting > within timeout and invoke SendThread.close() to change state to CLOSED. > That's why SendThread.close() can hit the unsafe point. > Set request timeout is a very common practice. > I propose a patch and send it out later. > Maybe someone can help to review it. > > Thanks > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-4231) Add document for snapshot compression config
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Diederen resolved ZOOKEEPER-4231. Fix Version/s: 3.7.0 Resolution: Fixed Issue resolved by pull request 1642 [https://github.com/apache/zookeeper/pull/1642] > Add document for snapshot compression config > > > Key: ZOOKEEPER-4231 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4231 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Huizhi Lu >Assignee: Huizhi Lu >Priority: Minor > Labels: pull-request-available > Fix For: 3.7.0 > > Original Estimate: 24h > Time Spent: 40m > Remaining Estimate: 23h 20m > > Snapshot compression method was added in zookeeper, but there is no clear > documentation about the config. This ticket is created to add documentation > for the config: > *zookeeper.snapshot.compression.method* -- This message was sent by Atlassian Jira (v8.3.4#803005)