[jira] [Created] (ZOOKEEPER-4822) Quorum TLS - Enable member authorization based on certificate CN

2024-03-29 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4822:
--

 Summary: Quorum TLS - Enable member authorization based on 
certificate CN
 Key: ZOOKEEPER-4822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4822
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Reporter: Damien Diederen
Assignee: Damien Diederen


Quorum TLS enables mutual authentication of quorum members.

Member authorization, however, cannot be configured on the basis of the 
presented principal CN; a round of SASL authentication has to be performed on 
top of the secured connection.

This ticket is about enabling authorization based on trusted client 
certificates.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients

2024-03-07 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824461#comment-17824461
 ] 

Damien Diederen commented on ZOOKEEPER-4814:


Also related: [ClickHouse ticket, "Incompatibility with Zookeeper 
3.9"|https://github.com/ClickHouse/ClickHouse/issues/53749] whose custom (?) 
client was updated by [patch "Add support for read-only mode in 
ZooKeeper"|https://github.com/ClickHouse/ClickHouse/pull/57479/files].


> Protocol desynchronization after Connect for (some) old clients
> ---
>
> Key: ZOOKEEPER-4814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>
> Some old clients experience a protocol synchronization after receiving a 
> {{ConnectResponse}} from the server.
> This started happening with ZOOKEEPER-4492, "Merge readOnly field into 
> ConnectRequest and Response," which writes overlong responses to clients 
> which do not know about the {{readOnly}} flag.
> (One example of such a client is ZooKeeper's own C client library prior to 
> version 3.5!)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients

2024-03-07 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4814:
---
Description: 
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and 
Response," which writes overlong responses to clients which do not know about 
the {{readOnly}} flag.

 (One example of such a client is ZooKeeper's own C client library prior to 
version 3.5!)

  was:
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and 
Response," which writes overlong responses to clients which do not know about 
the {{readOnly}} flag.


> Protocol desynchronization after Connect for (some) old clients
> ---
>
> Key: ZOOKEEPER-4814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>
> Some old clients experience a protocol synchronization after receiving a 
> {{ConnectResponse}} from the server.
> This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest 
> and Response," which writes overlong responses to clients which do not know 
> about the {{readOnly}} flag.
>  (One example of such a client is ZooKeeper's own C client library prior to 
> version 3.5!)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients

2024-03-07 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4814:
---
Description: 
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started happening with ZOOKEEPER-4492, "Merge readOnly field into 
ConnectRequest and Response," which writes overlong responses to clients which 
do not know about the {{readOnly}} flag.

 (One example of such a client is ZooKeeper's own C client library prior to 
version 3.5!)

  was:
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and 
Response," which writes overlong responses to clients which do not know about 
the {{readOnly}} flag.

 (One example of such a client is ZooKeeper's own C client library prior to 
version 3.5!)


> Protocol desynchronization after Connect for (some) old clients
> ---
>
> Key: ZOOKEEPER-4814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>
> Some old clients experience a protocol synchronization after receiving a 
> {{ConnectResponse}} from the server.
> This started happening with ZOOKEEPER-4492, "Merge readOnly field into 
> ConnectRequest and Response," which writes overlong responses to clients 
> which do not know about the {{readOnly}} flag.
>  (One example of such a client is ZooKeeper's own C client library prior to 
> version 3.5!)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients

2024-03-07 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4814:
---
Description: 
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started happening with ZOOKEEPER-4492, "Merge readOnly field into 
ConnectRequest and Response," which writes overlong responses to clients which 
do not know about the {{readOnly}} flag.

(One example of such a client is ZooKeeper's own C client library prior to 
version 3.5!)

  was:
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started happening with ZOOKEEPER-4492, "Merge readOnly field into 
ConnectRequest and Response," which writes overlong responses to clients which 
do not know about the {{readOnly}} flag.

 (One example of such a client is ZooKeeper's own C client library prior to 
version 3.5!)


> Protocol desynchronization after Connect for (some) old clients
> ---
>
> Key: ZOOKEEPER-4814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>
> Some old clients experience a protocol synchronization after receiving a 
> {{ConnectResponse}} from the server.
> This started happening with ZOOKEEPER-4492, "Merge readOnly field into 
> ConnectRequest and Response," which writes overlong responses to clients 
> which do not know about the {{readOnly}} flag.
> (One example of such a client is ZooKeeper's own C client library prior to 
> version 3.5!)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients

2024-03-07 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4814:
---
Description: 
Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.

This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest and 
Response," which writes overlong responses to clients which do not know about 
the {{readOnly}} flag.

  was:Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.


> Protocol desynchronization after Connect for (some) old clients
> ---
>
> Key: ZOOKEEPER-4814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.9.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>
> Some old clients experience a protocol synchronization after receiving a 
> {{ConnectResponse}} from the server.
> This started with ZOOKEEPER-4492, "Merge readOnly field into ConnectRequest 
> and Response," which writes overlong responses to clients which do not know 
> about the {{readOnly}} flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4814) Protocol desynchronization after Connect for (some) old clients

2024-03-07 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4814:
--

 Summary: Protocol desynchronization after Connect for (some) old 
clients
 Key: ZOOKEEPER-4814
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4814
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.9.0
Reporter: Damien Diederen
Assignee: Damien Diederen


Some old clients experience a protocol synchronization after receiving a 
{{ConnectResponse}} from the server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4799) Refactor ACL check in addWatch command

2024-02-19 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4799:
---
Fix Version/s: 3.7.3

> Refactor ACL check in addWatch command
> --
>
> Key: ZOOKEEPER-4799
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
> Fix For: 3.7.3, 3.8.4, 3.9.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4787) Failed to establish connection between zookeeper

2024-02-19 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4787:
---
Fix Version/s: (was: 3.10.0)
   (was: 3.7.3)
   (was: 3.8.4)

> Failed to establish connection between zookeeper
> 
>
> Key: ZOOKEEPER-4787
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4787
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.7.2, 3.8.3, 3.9.1
> Environment: z/OS
>Reporter: softrock
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Problem:*
> When run zookeepers version 3.8.3 on z/OS platform,they cannot establish the 
> connection
> Error:
> [2024-01-17 23:06:44,194] INFO Received connection request from 
> /xx.xx.xx.xx:23840 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>  [2024-01-17 23:06:44,197] ERROR Initial message parsing error! 
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>  Badly formed address: K???K???K???z
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage.parse(QuorumCnxManager.java:271)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:607)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:555)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1085)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1039)
>      at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
>      at java.util.concurrent.FutureTask.run(FutureTask.java:277)
>      at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
>      at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>      at java.lang.Thread.run(Thread.java:825)
> *Root cause:*
> The receiver cannot resolve the address from the sender requesting a 
> connection. This is because the sender sends the address in UTF-8 encoding, 
> but the receiver parses the address in IBM-1047 encoding (the default).
> *Resolution:*
>  Both receiver and sender sides use UTF-8 encoding
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4799) Refactor ACL check in addWatch command

2024-02-12 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4799:
---
Fix Version/s: 3.8.4

> Refactor ACL check in addWatch command
> --
>
> Key: ZOOKEEPER-4799
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
> Fix For: 3.8.4, 3.9.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4799) Refactor ACL check in addWatch command

2024-02-12 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4799:
---
Fix Version/s: 3.9.2

> Refactor ACL check in addWatch command
> --
>
> Key: ZOOKEEPER-4799
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
> Fix For: 3.9.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4785) Txn loss due to race condition in Learner.syncWithLeader() during DIFF sync

2024-02-12 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4785:
---
Fix Version/s: 3.9.2
   3.10

> Txn loss due to race condition in Learner.syncWithLeader() during DIFF sync
> ---
>
> Key: ZOOKEEPER-4785
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4785
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.8.0, 3.7.1, 3.8.1, 3.7.2, 3.8.2, 3.9.1
>Reporter: Li Wang
>Assignee: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.2, 3.10
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We had txn loss incident in production recently. After investigation, we 
> found it was caused by the race condition of follower writing the current 
> epoch and sending the ACK_LD before successfully persisting all the txns from 
> DIFF sync in Learner.syncWithLeader() method.
> {code:java}
> case Leader.NEWLEADER: 
> ...
> self.setCurrentEpoch(newEpoch);
> writeToTxnLog = true;
> //Anything after this needs to go to the transaction log, not applied 
> directly in memory
> isPreZAB1_0 = false;
> // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit 
> them (ACK NEWLEADER).
> sock.setSoTimeout(self.tickTime * self.syncLimit);
> self.setSyncMode(QuorumPeer.SyncMode.NONE);
> zk.startupWithoutServing();
> if (zk instanceof FollowerZooKeeperServer) {
> FollowerZooKeeperServer fzk = (FollowerZooKeeperServer) zk;
> for (PacketInFlight p : packetsNotCommitted) {
>   fzk.logRequest(p.hdr, p.rec, p.digest);
> }
> packetsNotCommitted.clear();
> }
> writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
> true);
> break;
> }
> {code}
> In this method, when follower receives the NEWLEADER msg, the current epoch 
> is updated before writing the uncommitted txns to the disk and writing txns 
> is done asynchronously by the SyncThreadd.  If follower crashes after setting 
> the current epoch and sending ACK_LD and before all transactions are 
> successfully written to disk, transactions loss can happen.  
> This is because leader election is based on epoch first and then transaction 
> id.  When the follower becomes a leader because it has highest epoch, it will 
> ask the other followers to truncate txns even they have been written to disk, 
> causing data loss.
> The following is the scenario
> 1. Leader election happened
> 2. A follower synced with Leader via DIFF, received committed proposals from 
> leader and kept them in memory
> 3. The follower received the NEWLEADER message
> 4. The follower updated the newEpoch
> 5. The follower was bounced  before writing all the uncommitted txns to disk
> 6. Leader shutdown and a new election triggered
> 7. Follower became the new leader because it has largest currentEpoch
> 8. New leader asked other followers to truncate their committed txns and 
> transactions got lost



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4730) Incorrect datadir and logdir size reported from admin and 4lw dirs command

2024-02-12 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4730:
---
Fix Version/s: 3.9.2

> Incorrect datadir and logdir size reported from admin and 4lw dirs command 
> ---
>
> Key: ZOOKEEPER-4730
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4730
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Li Wang
>Assignee: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.2, 3.10
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Output from the dirs admin command
> {
>   "datadir_size" : 134217760,
>   "logdir_size" : 933,
>   "command" : "dirs",
>   "error" : null
> }
> Output from dirs 4lw command:
> datadir_size: 134217760
> logdir_size: 933



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4799) Refactor ACL check in addWatch command

2024-02-01 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4799:
--

 Summary: Refactor ACL check in addWatch command
 Key: ZOOKEEPER-4799
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4799
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Damien Diederen
Assignee: Damien Diederen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4764) Tune the log of refuse session request.

2024-01-29 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4764:
---
Fix Version/s: 3.10.0
   (was: 3.9.2)

> Tune the log of refuse session request.
> ---
>
> Key: ZOOKEEPER-4764
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4764
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.7.2, 3.8.3, 3.9.1
>Reporter: Yan Zhao
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.10.0, 3.7.3, 3.8.4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The log:
> Refusing session request for client as it has seen zxid our last zxid is 0x0 
> client must try another server (org.apache.zookeeper.server.ZooKeeperServer)
> We would better print the sessionId in the content.
> After improvement:
> Refusing session(0xab) request for client as it has seen zxid our last zxid 
> is 0x0 client must try another server 
> (org.apache.zookeeper.server.ZooKeeperServer)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4760) Add support for filename to get and set cli commands

2024-01-29 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4760:
---
Fix Version/s: (was: 3.9.2)

> Add support for filename to get and set cli commands
> 
>
> Key: ZOOKEEPER-4760
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4760
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tools
>Reporter: Soumitra Kumar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.10.0
>
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> CLI supports get and set commands to read and write data. Add support for:
>  # reading input data for set command from a file, and
>  # writing output data in get command to a file
> This will help in dealing with arbitrary byte arrays and also scripting 
> read/write to large number of znodes using CLI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (ZOOKEEPER-4787) Failed to establish connection between zookeeper

2024-01-29 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4787:
---
Fix Version/s: 3.10.0
   (was: 3.9.2)

> Failed to establish connection between zookeeper
> 
>
> Key: ZOOKEEPER-4787
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4787
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.7.2, 3.8.3, 3.9.1
> Environment: z/OS
>Reporter: softrock
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.10.0, 3.7.3, 3.8.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Problem:*
> When run zookeepers version 3.8.3 on z/OS platform,they cannot establish the 
> connection
> Error:
> [2024-01-17 23:06:44,194] INFO Received connection request from 
> /xx.xx.xx.xx:23840 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>  [2024-01-17 23:06:44,197] ERROR Initial message parsing error! 
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>  Badly formed address: K???K???K???z
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage.parse(QuorumCnxManager.java:271)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:607)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:555)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1085)
>      at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1039)
>      at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
>      at java.util.concurrent.FutureTask.run(FutureTask.java:277)
>      at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
>      at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>      at java.lang.Thread.run(Thread.java:825)
> *Root cause:*
> The receiver cannot resolve the address from the sender requesting a 
> connection. This is because the sender sends the address in UTF-8 encoding, 
> but the receiver parses the address in IBM-1047 encoding (the default).
> *Resolution:*
>  Both receiver and sender sides use UTF-8 encoding
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4753) Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth

2023-10-26 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779790#comment-17779790
 ] 

Damien Diederen commented on ZOOKEEPER-4753:


Hi [~xiaotong.wang],
{quote}we need verify the server host when we use SASL/Kerberos
{quote}
Yes.

(I also have additional improvements queued regarding this topic, but the 
changes you mention were in fact preliminary to fixing 
[https://zookeeper.apache.org/security.html#CVE-2023-44981]. The other changes 
were not included as not strictly part of the security fix.)
{quote}it's better to verify if current authentication is Kerberos or not, but 
now we check it with isDigestAuthn and use 
entry.getLoginModuleName().equals(DigestLoginModule.class.getName())
{quote}
Yes; this is unfortunate. Would you know of a better method to detect the SASL 
mechanism in use? What we really want here is to conditionalize on 
{{DIGEST-MD5}} or {{{}GSSAPI{}}}.
{quote}we rewrite DigestLoginModule to make sure user paasword are storage with 
encrypted our new DigestLoginModule required user{~}hd{~}=encode("testpwd")

it will incompatible when we upgrade
{quote}
Indeed. (I was afraid I would hear about something like that… and there we are 
:) Is your custom digest module a subclass of the ZooKeeper one, or an 
unrelated object?
{quote}Is there a better way to fix this issue
{quote}
As mentioned above: I would love it if we could just look up whether 
{{DIGEST-MD5}} or {{GSSAPI}} is in use. Ideas welcome!

In any case, I will keep your case into account when submitting the updated 
patch—worst case, you will have to explicitly disable the principal check.

In the meantime, you are not affected by CVE-2023-44981 if using DIGEST-MD5.

HTH, -D

> Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth
> 
>
> Key: ZOOKEEPER-4753
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4753
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.9.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
> Fix For: 3.7.2, 3.8.3, 3.9.1
>
>
> The SASL-based quorum authorizer does not explicitly distinguish between the 
> DIGEST-MD5 and GSSAPI mechanisms: it is simply relying on {{NameCallback}} 
> and {{PasswordCallback}} for authentication with the former and examining 
> Kerberos principals in {{AuthorizeCallback}} for the latter.
> It turns out that some SASL/DIGEST-MD5 configurations cause authentication 
> and authorization IDs not to match the expected format, and the 
> DIGEST-MD5-based portions of the quorum test suite to fail with obscure 
> errors. (They can be traced to failures to join the quorum, but only by 
> looking into detailed logs.)
> We can use the login module name to determine whether DIGEST-MD5 or GSSAPI is 
> used, and relax the authentication ID check for the former.  As a cleanup, we 
> can keep the password-based credential map empty when Kerberos principals are 
> expected.  Finally, we can adapt tests to ensure "weirdly-shaped" credentials 
> only cause authentication failures in the GSSAPI case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4755) Handle Netty CVE-2023-4586

2023-10-03 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4755.

Fix Version/s: 3.7.2
   3.9.1
   3.8.3
   Resolution: Fixed

Issue resolved by pull request 2075
[https://github.com/apache/zookeeper/pull/2075]

> Handle Netty CVE-2023-4586
> --
>
> Key: ZOOKEEPER-4755
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4755
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.2, 3.9.1, 3.8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{dependency-check:check}}... check currently fails with the following:
> {noformat}
> [ERROR] netty-handler-4.1.94.Final.jar: CVE-2023-4586(6.5)
> {noformat}
> According to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-4586 , 
> CVE-2023-4586 is reserved.  No fix or additional information is available as 
> of the creation of this ticket.
> We have to:
> # Temporarily suppress the check;
> # Monitor CVE-2023-4586 and apply the remediation as soon as it becomes 
> available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4755) Handle Netty CVE-2023-4586

2023-10-03 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771547#comment-17771547
 ] 

Damien Diederen commented on ZOOKEEPER-4755:


Relevant discussion and pointers:

[https://github.com/jeremylong/DependencyCheck/issues/5912#issuecomment-1699387994]
 

> Handle Netty CVE-2023-4586
> --
>
> Key: ZOOKEEPER-4755
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4755
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>
> The {{dependency-check:check}}... check currently fails with the following:
> {noformat}
> [ERROR] netty-handler-4.1.94.Final.jar: CVE-2023-4586(6.5)
> {noformat}
> According to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-4586 , 
> CVE-2023-4586 is reserved.  No fix or additional information is available as 
> of the creation of this ticket.
> We have to:
> # Temporarily suppress the check;
> # Monitor CVE-2023-4586 and apply the remediation as soon as it becomes 
> available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4755) Handle Netty CVE-2023-4586

2023-10-03 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4755:
--

 Summary: Handle Netty CVE-2023-4586
 Key: ZOOKEEPER-4755
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4755
 Project: ZooKeeper
  Issue Type: Task
Reporter: Damien Diederen
Assignee: Damien Diederen


The {{dependency-check:check}}... check currently fails with the following:

{noformat}
[ERROR] netty-handler-4.1.94.Final.jar: CVE-2023-4586(6.5)
{noformat}

According to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-4586 , 
CVE-2023-4586 is reserved.  No fix or additional information is available as of 
the creation of this ticket.

We have to:

# Temporarily suppress the check;
# Monitor CVE-2023-4586 and apply the remediation as soon as it becomes 
available.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4754) Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900

2023-10-03 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4754.

Fix Version/s: 3.7.2
   3.9.1
   3.8.3
   Resolution: Fixed

Issue resolved by pull request 2074
[https://github.com/apache/zookeeper/pull/2074]

> Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900
> 
>
> Key: ZOOKEEPER-4754
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4754
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.2, 3.9.1, 3.8.3
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4754) Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and CVE-2023-41900

2023-10-03 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4754:
--

 Summary: Update Jetty to avoid CVE-2023-36479, CVE-2023-40167, and 
CVE-2023-41900
 Key: ZOOKEEPER-4754
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4754
 Project: ZooKeeper
  Issue Type: Task
Reporter: Damien Diederen
Assignee: Damien Diederen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4751) Update snappy-java to 1.1.10.5 to address CVE-2023-43642

2023-10-03 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4751.

Fix Version/s: 3.7.2
   3.8.3
   3.9.1
 Assignee: Damien Diederen
   Resolution: Fixed

> Update snappy-java to 1.1.10.5 to address CVE-2023-43642
> 
>
> Key: ZOOKEEPER-4751
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4751
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Lari Hotari
>Assignee: Damien Diederen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.7.2, 3.8.3, 3.9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> snappy-java 1.1.10.1 contains CVE-2023-43642 . Upgrade the dependency to 
> 1.1.10.5 to get rid of the CVE.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4753) Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth

2023-10-03 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4753:
--

 Summary: Explicit handling of DIGEST-MD5 vs GSSAPI in quorum auth
 Key: ZOOKEEPER-4753
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4753
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.9.0
Reporter: Damien Diederen
Assignee: Damien Diederen


The SASL-based quorum authorizer does not explicitly distinguish between the 
DIGEST-MD5 and GSSAPI mechanisms: it is simply relying on {{NameCallback}} and 
{{PasswordCallback}} for authentication with the former and examining Kerberos 
principals in {{AuthorizeCallback}} for the latter.

It turns out that some SASL/DIGEST-MD5 configurations cause authentication and 
authorization IDs not to match the expected format, and the DIGEST-MD5-based 
portions of the quorum test suite to fail with obscure errors. (They can be 
traced to failures to join the quorum, but only by looking into detailed logs.)

We can use the login module name to determine whether DIGEST-MD5 or GSSAPI is 
used, and relax the authentication ID check for the former.  As a cleanup, we 
can keep the password-based credential map empty when Kerberos principals are 
expected.  Finally, we can adapt tests to ensure "weirdly-shaped" credentials 
only cause authentication failures in the GSSAPI case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4689) Node may not accessible due the the inconsistent ACL reference map after SNAP sync (again)

2023-08-23 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758105#comment-17758105
 ] 

Damien Diederen commented on ZOOKEEPER-4689:


Hi [~adamyi],

This is indeed a very critical issue, as the corruption can spread from member 
to member\!

I initially preferred solution 2 from the ticket description—the one which was 
tentatively implemented in https://github.com/apache/zookeeper/pull/1997—but 
given the difficulties encountered, and [~kezhuw]’s suggestion of never 
removing the ACL {{aclIndex}} is pointing to, I am also reconsidering. Are we 
missing something?

We would also like to add some kind of \(optional) "fsck" pass which sanity 
checks the tree before the service starts—to prevent this and other kinds of 
corruption from spreading—but that can be implemented in a followup ticket.


> Node may not accessible due the the inconsistent ACL reference map after SNAP 
> sync (again)
> --
>
> Key: ZOOKEEPER-4689
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4689
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.7.0, 3.8.0
>Reporter: Adam Yi
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In Zookeeper, we do a "fuzzy snapshot". It means that we don't make a copy of 
> the DataTree or grab a lock or anything when serializing a DataTree. Instead, 
> we note down the zxid when we start serializing DataTree. We serialize the 
> DataTree while it's getting mutated and replay the \{transactions after 
> starting to take snapshot} after deserializing the DataTree. The idea is that 
> those transactions should be idempotent.
> Zookeeper also implements its own interned ACL. It keeps a [long -> ACL] map 
> and store the `long` in each node as nodes tend to share the same ACL.
> When serializing DataTree, we first serialize the ACL cache and then 
> serialize the nodes. It's possible that with the following sequence, a node 
> points to an invalid ACL entry:
> 1. Serialize ACL
> 2. Create node with new ACL
> 3. Serialize node
> ZOOKEEPER-3306 fixes this by making sure to insert the ACL to cache upon 
> calling `DataTree.createNode` when replaying transactions and when the node 
> already exists. However, we only insert it to the cache, but do not set the 
> interned ACL in the node to point to the new entry.
> It's possible that the longval we get for the ACL is inconsistent, even 
> though we follow the same zxid ordering of events. Specifically, we keep a 
> [aclIndex] pointing to the max entry that currently exists and increment that 
> whenever we need to intern a new ACL we've never seen before.
> With ZOOKEEPER-2214, we started to do reference counting in ACL cache and 
> remove no-longer used entries from the cache. 
> Say the current aclIndex is 10. If we create a node with ACL unseen before 
> and delete that node, aclIndex will increment to 11. However, when we 
> deserialize the tree, we'll set aclIndex to the max existent ACL cache entry, 
> so it's reverted back to 10. aclIndex inconsistency on its own is fine but it 
> causes problem to the ZOOKEEPER-3306 patch.
> Now if we follow the same scenario mentioned in ZOOKEEPER-3306:
>  # Leader creates ACL entry 11 and delete it due to node deletion
>  # Server A starts to have snap sync with leader
>  # After serializing the ACL map to Server A, there is a txn T1 to create a 
> node N1 with new ACL_1 which was not exist in ACL map
>  # On leader, after this txn, the ACL map will be 12 -> (ACL_1, COUNT: 1), 
> and data tree N1 -> 12
>  # On server A, it will be ACL map with max ID 10, and N1 -> 12 in fuzzy 
> snapshot
>  # When replaying the txn T1, it will add 11 -> (ACL_1, COUNT: 1) to the ACL 
> cache but the node N1 still points to 12.
> N1 still points to invalid ACL entry.
> There are two ways to fix this:
>  # Make aclIndex consistent upon re-deserialization (by either serializing it 
> in snapshot or paying special attention to decrement it when removing cache)
>  # Fix ZOOKEEPER-3306 patch so that we also override the ACL of node to new 
> key if previous entry does not exist in the ACL table.
>  
> I think solution 2 is nicer as aclIndex inconsistency itself is not a 
> problem. With solution 1, we're still implicitly depending on aclIndex 
> consistency and ordering of events. It's harder to reason about and it seems 
> more fragile than solution 1.
> I'm going to send a patch for solution 2 but please let me know if you 
> disagree and I'm happy to go with solution 1 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4725) TTL node creations do not appear in audit log

2023-07-26 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4725:
--

 Summary: TTL node creations do not appear in audit log
 Key: ZOOKEEPER-4725
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4725
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.8.2
Reporter: Damien Diederen
Assignee: Damien Diederen


{{AuditHelper.addAuditLog}} ignores the {{createTTL}} opcode outside of `multi` 
transactions, resulting in missing audit log entries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (ZOOKEEPER-4026) CREATE2 requests embeded in a MULTI request only get a regular CREATE response

2023-06-21 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4026.

Fix Version/s: 3.7.2
   3.9.0
   3.8.2
   Resolution: Fixed

Issue resolved by pull request 1978
[https://github.com/apache/zookeeper/pull/1978]

> CREATE2 requests embeded in a MULTI request only get a regular CREATE response
> --
>
> Key: ZOOKEEPER-4026
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4026
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.8, 3.6.2
> Environment: Tested with official docker hub images of the server and 
> a python Zookeeper client (Kazoo, http://github.com/python-zk/kazoo)
>Reporter: Charles-Henri de Boysson
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.2, 3.9.0, 3.8.2
>
> Attachments: MULTI_CREATE2_bug.txt
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When making a MULTI request with a CREATE2 payload, the reply from the server 
> only contains a regular CREATE response (the path but without the stat data).
>  
> See attachment for a capture and decode of the request/reply.
>  
> How to reproduce:
>  * Connect to the ensemble
>  * Make a MULTI (OpCode 14) request with a CREATE2 operation (OpCode 15)
>  * Reply from server is success, znode is create, but the MULTI reply 
> contains a CREATE (OpCode 1)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4614) Network event can cause C Client to "forget" to SASL-authenticate

2022-09-22 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4614:
--

 Summary: Network event can cause C Client to "forget" to 
SASL-authenticate
 Key: ZOOKEEPER-4614
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4614
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.7.1, 3.8.0, 3.7.0
Reporter: Damien Diederen
Assignee: Damien Diederen


A network hiccup occurring during the very last step of a SASL authentication 
sequence can cause the C client to "forget" to authenticate the next connection 
because an internal flag is not properly reset.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0

2022-02-27 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498641#comment-17498641
 ] 

Damien Diederen commented on ZOOKEEPER-4337:


Hi [~danielma-2020],

No idea—but 3.8.0 might get there first. See this recent discussion on the 
{{-dev}} mailing list:

[https://lists.apache.org/thread/80kjmk6kvp51k99nwvswdzcg5w1wr1jk]

(I am afraid I cannot volunteer at this point, as I am already overloaded by 
other obligations.)

HTH, -D

> CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
> ---
>
> Key: ZOOKEEPER-4337
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.7.0, 3.8.0
>Reporter: Dominique Mongelli
>Assignee: Damien Diederen
>Priority: Major
>  Labels: cve, pull-request-available, security
> Fix For: 3.5.10, 3.8.0, 3.7.1, 3.6.4
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Hi, our security tool detects the following CVE on zookeeper 3.7.0 :
> [https://nvd.nist.gov/vuln/detail/CVE-2021-34429]
>  
>  
> {noformat}
> For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs 
> can be crafted using some encoded characters to access the content of the 
> WEB-INF directory and/or bypass some security constraints. This is a 
> variation of the vulnerability reported in 
> CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat}
>  
> It is a vulnerability related to jetty jar in version 
> {{9.4.38.v20210224.jar}}.
> Here is the security advisory from jetty: 
> https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm
> The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should 
> be done.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ZOOKEEPER-4479) Tests: C client test TestOperations.cc testTimeoutCausedByWatches1 is very flaky on CI

2022-02-24 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen reassigned ZOOKEEPER-4479:
--

Assignee: Damien Diederen

> Tests: C client test TestOperations.cc testTimeoutCausedByWatches1 is very 
> flaky on CI
> --
>
> Key: ZOOKEEPER-4479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4479
> Project: ZooKeeper
>  Issue Type: Task
>  Components: c client, tests
>Reporter: Enrico Olivelli
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This test is very annoying on CI.
> it is not using the real Java server and it fails very often 
> [exec] 
> /home/runner/work/zookeeper/zookeeper/zookeeper-client/zookeeper-client-c/tests/TestOperations.cc:296:
>  Assertion: equality assertion failed [Expected: 1, Actual : 0]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash

2021-11-18 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446060#comment-17446060
 ] 

Damien Diederen commented on ZOOKEEPER-4306:


Hi [~zyu],

It's a nasty issue, and we've been using the {{closeSessionTxn.enabled = 
false}} workaround in the meantime.

You have probably seen [my pull 
request|https://github.com/apache/zookeeper/pull/1716] regarding a possible 
"solution."  I should really open a thread on the {{dev}} mailing list and 
discuss it there one of these days, because somebody might have a better idea.  
(Feel free to beat me to it :)

It is still on my TODO list, but unfortunately not too close to the top.

Cheers, -D

> CloseSessionTxn contains too many ephemal nodes cause cluster crash
> ---
>
> Key: ZOOKEEPER-4306
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Lin Changrui
>Priority: Critical
>  Labels: pull-request-available
> Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We took a test about how many ephemal nodes can client create under one 
> parent node with defalut configuration. The test caused cluster crash at 
> last, exception stack trace like this.
> follower:
> !f.jpg!
> leader:
> !l1.png!
> !l2.jpg!
> It seems that leader sent a too large txn packet to followers. When follower 
> try to deserialize the txn, it found the txn length out of its buffer 
> size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes 
> followers crashed, and then, leader found there was no sufficient followers 
> synced, so leader shutdown later. When leader shutdown, it called 
> zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of 
> its buffer size, so it crashed too.
> After the servers crashed, they try to restart the quorum. But they would not 
> success because the last txn is too large. We lose the log at that moment, 
> but the stack trace is same as this one.
> !r.jpg|width=1468,height=598!
>  
> *Root Cause*
> We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) 
> visualize this log and found this. !cs.jpg|width=1400,height=581! So 
> closeSessionTxn contains all ephemal nodes with absolute path. We know we 
> will get a large getChildren respose if we create too many children nodes 
> under one parent node, that is limited by jute.maxbuffer of client. If we 
> create plenty of ephemal nodes under different parent nodes with one session, 
> it may not cause out of buffer of client, but when the session close without 
> delete these node first, it probably cause cluster crash.
> Is it a bug or just a unspecified feature?If it just so, how should we judge 
> the upper limit of creating nodes? 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4397) Zookeeper crashes: Unable to load database on disk java.io.IOException: Unreasonable length

2021-10-21 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432217#comment-17432217
 ] 

Damien Diederen commented on ZOOKEEPER-4397:


Hi [~ryan.ren],

Do you have clients which create large numbers (i.e., are leaking) ephemeral 
nodes?  If so, this may be a duplicate of ZOOKEEPER-4306.

HTH, -D

> Zookeeper crashes: Unable to load database on disk java.io.IOException: 
> Unreasonable length
> ---
>
> Key: ZOOKEEPER-4397
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4397
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: jute
>Affects Versions: 3.6.3
> Environment: Linux  
> OpenJDK 1.8
>Reporter: Ryan
>Priority: Major
> Attachments: Error_snapshot.jpg
>
>
> After running for a while, the entire cluster (3 zookeeper) crash suddenly 
> ERROR-[main:QuorumPeer@1148]- Unable to load database on disk 
> java.io.IOException: Unreasonable length = 3015236
>   at 
> org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166)
>   
> at.org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4382) Update Maven Bundle Plugin in order to allow builds on JDK18

2021-10-06 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4382.

Fix Version/s: 3.6.4
   3.7.1
   Resolution: Fixed

Issue resolved by pull request 1760
[https://github.com/apache/zookeeper/pull/1760]

> Update Maven Bundle Plugin in order to allow builds on JDK18
> 
>
> Key: ZOOKEEPER-4382
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4382
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.8.0
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1, 3.6.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On JDK8 zookeeper build fails with a ConcurrentModificationException.
> The fix is to update the plugin to the latest version
> [ERROR] Failed to execute goal 
> org.apache.felix:maven-bundle-plugin:4.1.0:bundle (build bundle) on project 
> zookeeper-jute: Execution build bundle of goal 
> org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed.: 
> ConcurrentModificationException -> [Help 
> 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.felix:maven-bundle-plugin:4.1.0:bundle (build bundle) on 
> project zookeeper-jute: Execution build bundle of goal 
> org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed.
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:215)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:156)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:148)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:56)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:77)
> at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:568)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:282)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:225)
> at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:406)
> at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:347)
> Caused by: org.apache.maven.plugin.PluginExecutionException: Execution build 
> bundle of goal org.apache.felix:maven-bundle-plugin:4.1.0:bundle failed.
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:148)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:210)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:156)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4380) Avoid NPE in RateLogger#rateLimitLog

2021-09-22 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4380.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1758
[https://github.com/apache/zookeeper/pull/1758]

> Avoid NPE in RateLogger#rateLimitLog
> 
>
> Key: ZOOKEEPER-4380
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4380
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Wenjun Ruan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The newMsg may cause NPE
> {code:java}
> /**
>  * In addition to the message, it also takes a value.
>  */
> public void rateLimitLog(String newMsg, String value) {
> long now = Time.currentElapsedTime();
> if (newMsg.equals(msg)) {
> ++count;
> this.value = value;
> if (now - timestamp >= LOG_INTERVAL) {
> flush();
> msg = newMsg;
> timestamp = now;
> this.value = value;
> }
> } else {
> flush();
> msg = newMsg;
> this.value = value;
> timestamp = now;
> LOG.warn("Message:{} Value:{}", msg, value);
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4360) Avoid NPE during metrics execution if the leader is not set on a FOLLOWER node

2021-09-02 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4360.

Fix Version/s: 3.8.0
   Resolution: Fixed

Issue resolved by pull request 1743
[https://github.com/apache/zookeeper/pull/1743]

> Avoid NPE during metrics execution if the leader is not set on a FOLLOWER 
> node 
> ---
>
> Key: ZOOKEEPER-4360
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4360
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: metric system
>Affects Versions: 3.6.2
>Reporter: Nicoló Boschi
>Priority: Major
>  Labels: metrics, pull-request-available
> Fix For: 3.8.0, 3.7.1, 3.6.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On a follower node, we had this error
> {code}
> ago 20, 2021 1:46:28 PM org.apache.catalina.core.StandardWrapperValve invoke
> GRAVE: Servlet.service() for servlet [metrics] in context with path 
> [/metrics] threw exception
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.zookeeper.server.quorum.Leader.getProposalStats()" because the 
> return value of 
> "org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.getLeader()" is null
> at 
> org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.lambda$registerMetrics$5(LeaderZooKeeperServer.java:122)
> at 
> magnews.zookeeper.ZooKeeperMetricsProviderAdapter$MetricsContextImpl.lambda$registerGauge$0(ZooKeeperMetricsProviderAdapter.java:91)
> {code}
> Unfortunately, I'm not able to reproduce this error deterministically
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4343) OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6

2021-09-01 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4343.

Fix Version/s: 3.8.0
   Resolution: Fixed

Issue resolved by pull request 1735
[https://github.com/apache/zookeeper/pull/1735]

> OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6
> 
>
> Key: ZOOKEEPER-4343
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4343
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.8.0
>Reporter: Damien Diederen
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0,0': 
> [ERROR] 
> [ERROR] commons-io-2.6.jar: CVE-2021-29425
> [ERROR] 
> [ERROR] See the dependency-check report for more details.
> {noformat}
> The issue is fixed in release 2.7:
> 
> - https://nvd.nist.gov/vuln/detail/CVE-2021-29425
> - https://issues.apache.org/jira/browse/IO-556
> - https://issues.apache.org/jira/browse/IO-559
> - https://commons.apache.org/proper/commons-io/changes-report.html#a2.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0

2021-09-01 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4337.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1734
[https://github.com/apache/zookeeper/pull/1734]

> CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
> ---
>
> Key: ZOOKEEPER-4337
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.8.0
>Reporter: Dominique Mongelli
>Assignee: Damien Diederen
>Priority: Major
>  Labels: cve, pull-request-available, security
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Hi, our security tool detects the following CVE on zookeeper 3.7.0 :
> [https://nvd.nist.gov/vuln/detail/CVE-2021-34429]
>  
>  
> {noformat}
> For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs 
> can be crafted using some encoded characters to access the content of the 
> WEB-INF directory and/or bypass some security constraints. This is a 
> variation of the vulnerability reported in 
> CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat}
>  
> It is a vulnerability related to jetty jar in version 
> {{9.4.38.v20210224.jar}}.
> Here is the security advisory from jetty: 
> https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm
> The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should 
> be done.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-3807) fix the bad format when website pages build due to bash marker

2021-09-01 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-3807.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Duplicate

Closing this (valiant) attempt in favor of ZOOKEEPER-4356 and 
[#1741|https://github.com/apache/zookeeper/pull/1741#pullrequestreview-742595992],
 as acknowledged by [~maoling].



> fix the bad format when website pages build due to bash marker
> --
>
> Key: ZOOKEEPER-3807
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3807
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.7.0, 3.6.1
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4274) Flaky test - RequestThrottlerTest.testLargeRequestThrottling

2021-08-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4274.

  Assignee: Damien Diederen
Resolution: Duplicate

> Flaky test - RequestThrottlerTest.testLargeRequestThrottling
> 
>
> Key: ZOOKEEPER-4274
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4274
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.2
>Reporter: Amichai Rothman
>Assignee: Damien Diederen
>Priority: Minor
>
> This test occasionally fails. e.g. in 
> [https://github.com/apache/zookeeper/pull/1672/checks?check_run_id=2265118964].
>  A bit hard to recreate, but it pops up again eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4226) Flaky test: RequestThrottlerTest.testLargeRequestThrottling

2021-08-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4226.

  Assignee: Damien Diederen
Resolution: Duplicate

> Flaky test: RequestThrottlerTest.testLargeRequestThrottling
> ---
>
> Key: ZOOKEEPER-4226
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4226
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Reporter: Ling Mao
>Assignee: Damien Diederen
>Priority: Minor
>
> {code:java}
> ERROR] Failures: 
> 943[ERROR]   RequestThrottlerTest.testLargeRequestThrottling:297 expected: 
> <2> but was: <0>
> 944[INFO] 
> 945[ERROR] Tests run: 2901, Failures: 1, Errors: 0, Skipped: 4
> {code}
> URL: 
> https://github.com/apache/zookeeper/pull/1608/checks?check_run_id=1953408348



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-4327) Flaky test: RequestThrottlerTest

2021-08-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen reassigned ZOOKEEPER-4327:
--

Assignee: Damien Diederen

> Flaky test: RequestThrottlerTest
> 
>
> Key: ZOOKEEPER-4327
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4327
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ling Mao
>Assignee: Damien Diederen
>Priority: Major
>
> URL: 
> https://github.com/apache/zookeeper/pull/1702/checks?check_run_id=2848599299
> {code:java}
> [ERROR] Failures: 
> 947[ERROR]   
> RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340
>  expected: <3> but was: <4>
> 948[INFO] 
> 949[ERROR] Tests run: 2913, Failures: 1, Errors: 0, Skipped: 4
> {code}
> ===
>  
>  URL: 
> [https://github.com/apache/zookeeper/pull/1709/checks?check_run_id=2884777341]
> {code:java}
> [INFO] 
> 948[INFO] Results:
> 949[INFO] 
> 950[ERROR] Failures: 
> 951[ERROR]   
> RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340
>  expected: <3> but was: <7>
> 952[ERROR]   RequestThrottlerTest.testLargeRequestThrottling:299 expected: 
> <5> but was: <4>
> 953[INFO] 
> 954[ERROR] Tests run: 2913, Failures: 2, Errors: 0, Skipped: 4
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4327) Flaky test: RequestThrottlerTest

2021-08-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4327:
---
Parent: ZOOKEEPER-3170
Issue Type: Sub-task  (was: Bug)

> Flaky test: RequestThrottlerTest
> 
>
> Key: ZOOKEEPER-4327
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4327
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ling Mao
>Priority: Major
>
> URL: 
> https://github.com/apache/zookeeper/pull/1702/checks?check_run_id=2848599299
> {code:java}
> [ERROR] Failures: 
> 947[ERROR]   
> RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340
>  expected: <3> but was: <4>
> 948[INFO] 
> 949[ERROR] Tests run: 2913, Failures: 1, Errors: 0, Skipped: 4
> {code}
> ===
>  
>  URL: 
> [https://github.com/apache/zookeeper/pull/1709/checks?check_run_id=2884777341]
> {code:java}
> [INFO] 
> 948[INFO] Results:
> 949[INFO] 
> 950[ERROR] Failures: 
> 951[ERROR]   
> RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340
>  expected: <3> but was: <7>
> 952[ERROR]   RequestThrottlerTest.testLargeRequestThrottling:299 expected: 
> <5> but was: <4>
> 953[INFO] 
> 954[ERROR] Tests run: 2913, Failures: 2, Errors: 0, Skipped: 4
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-4343) OWASP Dependency-Check fails with CVE-2021-29425, commons-io-2.6

2021-08-05 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4343:
--

 Summary: OWASP Dependency-Check fails with CVE-2021-29425, 
commons-io-2.6
 Key: ZOOKEEPER-4343
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4343
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.8.0
Reporter: Damien Diederen
Assignee: Damien Diederen


{noformat}
[ERROR] One or more dependencies were identified with vulnerabilities that have 
a CVSS score greater than or equal to '0,0': 
[ERROR] 
[ERROR] commons-io-2.6.jar: CVE-2021-29425
[ERROR] 
[ERROR] See the dependency-check report for more details.
{noformat}

The issue is fixed in release 2.7:

- https://nvd.nist.gov/vuln/detail/CVE-2021-29425
- https://issues.apache.org/jira/browse/IO-556
- https://issues.apache.org/jira/browse/IO-559
- https://commons.apache.org/proper/commons-io/changes-report.html#a2.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0

2021-08-05 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4337:
---
Affects Version/s: 3.8.0

> CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
> ---
>
> Key: ZOOKEEPER-4337
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.8.0
>Reporter: Dominique Mongelli
>Assignee: Damien Diederen
>Priority: Major
>  Labels: cve, pull-request-available, security
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi, our security tool detects the following CVE on zookeeper 3.7.0 :
> [https://nvd.nist.gov/vuln/detail/CVE-2021-34429]
>  
>  
> {noformat}
> For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs 
> can be crafted using some encoded characters to access the content of the 
> WEB-INF directory and/or bypass some security constraints. This is a 
> variation of the vulnerability reported in 
> CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat}
>  
> It is a vulnerability related to jetty jar in version 
> {{9.4.38.v20210224.jar}}.
> Here is the security advisory from jetty: 
> https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm
> The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should 
> be done.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-4337) CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0

2021-08-05 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen reassigned ZOOKEEPER-4337:
--

Assignee: Damien Diederen

> CVE-2021-34429 in jetty 9.4.38.v20210224 in zookeeper 3.7.0
> ---
>
> Key: ZOOKEEPER-4337
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4337
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Dominique Mongelli
>Assignee: Damien Diederen
>Priority: Major
>  Labels: cve, security
>
> Hi, our security tool detects the following CVE on zookeeper 3.7.0 :
> [https://nvd.nist.gov/vuln/detail/CVE-2021-34429]
>  
>  
> {noformat}
> For Eclipse Jetty versions 9.4.37-9.4.42, 10.0.1-10.0.5 & 11.0.1-11.0.5, URIs 
> can be crafted using some encoded characters to access the content of the 
> WEB-INF directory and/or bypass some security constraints. This is a 
> variation of the vulnerability reported in 
> CVE-2021-28164/GHSA-v7ff-8wcx-gmc5.{noformat}
>  
> It is a vulnerability related to jetty jar in version 
> {{9.4.38.v20210224.jar}}.
> Here is the security advisory from jetty: 
> https://github.com/eclipse/jetty.project/security/advisories/GHSA-vjv5-gp2w-65vm
> The CVE has been fixed in 9.4.43, 10.0.6, 11.0.6. An upgrade to 9.4.43 should 
> be done.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4341) Gia Lai - An Overview

2021-08-05 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4341.

Resolution: Invalid

> Gia Lai - An Overview
> -
>
> Key: ZOOKEEPER-4341
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4341
> Project: ZooKeeper
>  Issue Type: Test
>  Components: jute
>Affects Versions: 3.5.3
> Environment: [Gia Lai|https://digialai.com] has a tropical monsoon 
> climate with two seasons: rainy season from May to November and dry season 
> from December to April of the following year. The average yearly temperature 
> ranges from 21oC to 25oC. The average annual rainfall in the western Truong 
> Son region is 2,200-2,500mm, whereas in the eastern Truong Son region it is 
> 1,200-1,750mm.
> *Tourism Development Possibility*
> Gia Lai is the starting point for many of the coastal domain's and Cambodia's 
> river systems, such as the River Ba, the River Se San, and other streams. Gia 
> Lai area contains many lakes, rapids, passes, and primeval woods that create 
> beautiful and lyrical natural vistas, as well as the strong primal wildness 
> of the Central Highlands' mountains and forests. It is the Kon Ka Kinh and 
> Kon Cha Rang rainforest, which is home to many unique species; the Chu Prong 
> district's Wild Chong Khoeng waterfall.
> In Chu Se district, there lies the lyrical Phu Cuong waterfall. There are 
> many beautiful rivers such as Da Trang stream, Mo stream, and other 
> picturesque sites such as "Mong" wharf on the Pa river, Bien Ho (To Nung 
> lake) on the broad and calm sea - Ham Rong mountain is 1,092m high and the 
> summit is an extinct volcano.
>Reporter: Harmony Kae
>Priority: Major
> Fix For: 3.2.3
>
> Attachments: tin-gia-lai-tin-nhanh-gia-lai.jpg
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> [Gia Lai|https://digialai.com] is a hilly border province in the northwestern 
> Central Highlands, at an elevation of 600-800 meters above sea level. [Gia 
> Lai|https://digialai.com] is bordered to the north by Kon Tum province, to 
> the south by Dak Lak province, to the west by Cambodia with a national 
> boundary of 90 kilometers, and to the east by Quang Ngai, Binh Dinh, and Phu 
> Yen provinces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-4342) Robustify C client against errors during SASL negotiation

2021-08-04 Thread Damien Diederen (Jira)
Damien Diederen created ZOOKEEPER-4342:
--

 Summary: Robustify C client against errors during SASL negotiation
 Key: ZOOKEEPER-4342
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4342
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.7.0, 3.8.0
Reporter: Damien Diederen
Assignee: Damien Diederen


1. The current client is ignoring the error field of the response header, and 
only considering SASL-level errors when processing a SASL response.

2. Such errors cause a double-free of the input buffer, which crashes the 
application.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4334) SASL authentication fails when using host aliases

2021-07-29 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389792#comment-17389792
 ] 

Damien Diederen commented on ZOOKEEPER-4334:


[~ekleszcz] wrote:

bq. that won't solve the problem as the change considers only the SASL auth 
between the quorum members and my case regards the Java client to server auth.

Ah, right; I just spotted "the quorum member's saslToken is null," saw that you 
were using keytabs, assumed this was about quorum auth, and thought I'd mention 
ZOOKEEPER-4030.

bq. I have just discovered the extra flag: 
{{zookeeper.sasl.client.canonicalize.hostname}}. This means that by default we 
have to strictly use the canonical names for the principals. What I would like 
to achieve instead is to define the aliases in the principals. \[…\] Tested and 
it keeps failing \[…\]

Right. As [~eolivelli] mentions, Kerberos implementations tend to be bound to 
"real" names, as returned by reverse DNS resolution.

ZooKeeper \(client-to-server, and now server-to-server) supports referencing 
members using aliases, but the correct tickets still have to be provided.

My understanding is that this is a Kerberos limitation, not a ZooKeeper issue. 
You are of course welcome to suggest a workaround if you find one, but I would 
otherwise suggest amending or closing this ticket.


> SASL authentication fails when using host aliases
> -
>
> Key: ZOOKEEPER-4334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Emil Kleszcz
>Priority: Critical
>
> I faced an issue while trying to use alternative aliases with Zookeeper 
> quorum when SASL is enabled. The errors I get in zookeeper log are the 
> following:
>  ```
>  2021-07-12 21:04:46,437 [myid:3] - WARN 
> [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /:37368 failed to 
> SASL authenticate: {}
>  javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum 
> failed)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
>  at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49)
>  at 
> org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650)
>  at 
> org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599)
>  at 
> org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379)
>  at 
> org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182)
>  at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
>  at 
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
>  at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Checksum failed)
>  at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856)
>  at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
>  at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167)
>  ... 11 more
>  Caused by: KrbException: Checksum failed
>  at 
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102)
>  at 
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94)
>  at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175)
>  at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281)
>  at sun.security.krb5.KrbApReq.(KrbApReq.java:149)
>  at 
> sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108)
>  at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829)
>  ... 14 more
>  Caused by: java.security.GeneralSecurityException: Checksum failed
>  at 
> sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451)
>  at 
> sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272)
>  at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76)
>  at 
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100)
>  ... 20 more
>  ```
> What did I do?
>  1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3
>  2) Changed in zoo.cfg:
>  changed from
>  

[jira] [Resolved] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus

2021-07-29 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4211.

Fix Version/s: 3.8.0
   Resolution: Fixed

Issue resolved by pull request 1644
[https://github.com/apache/zookeeper/pull/1644]

> Expose Quota Metrics to Prometheus
> --
>
> Key: ZOOKEEPER-4211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: metric system
>Affects Versions: 3.7.0, 3.7
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> In 3.7, Quota limit can be enforced and the quota related stats are captured 
> in the StatsTrack.  From the "listquota" CLI command, we can the quota limit 
> and usage info. 
> As an addition to that, we would like to collect the quota metrics and expose 
> them to the Prometheus for the following:
> 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard
> 2. Creating alert based on the quota levels (e.g. 90% used)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4204) Flaky test - RequestPathMetricsCollectorTest.testMultiThreadPerf

2021-07-28 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4204.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1598
[https://github.com/apache/zookeeper/pull/1598]

> Flaky test - RequestPathMetricsCollectorTest.testMultiThreadPerf
> 
>
> Key: ZOOKEEPER-4204
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4204
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.8.0
>Reporter: Amichai Rothman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This test sometimes fails on a laptop. Timed performance tests in unit tests 
> can be problematic in general due to the variety of hardware it might run on, 
> but I have a little fix that reduces the test overhead and tightens the 
> timing, so it's a good first step (and works for me).
>  
> org.opentest4j.AssertionFailedError: expected:  but was: 
>  at 
> org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testMultiThreadPerf(RequestPathMetricsCollectorTest.java:448)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4333) QuorumSSLTest - testOCSP fails on JDK17

2021-07-28 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4333.

Resolution: Fixed

Issue resolved by pull request 1724
[https://github.com/apache/zookeeper/pull/1724]

> QuorumSSLTest - testOCSP fails on JDK17
> ---
>
> Key: ZOOKEEPER-4333
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4333
> Project: ZooKeeper
>  Issue Type: Test
>  Components: security, tests
>Affects Versions: 3.6.2
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> On JDK17 (early access) QuorumSSLTest#tetOCSP fails because with JDK17 the 
> TLS client is sending a OCSP request as GET on the URI.
>  
> Previously the OCSP request was send inside the BODY of the HTTP request.
>  
> In order to fix the test we have to fix our mock HTTP OCSP server (that is 
> part of the test suite, it is not zookeeper server code) in order to handle 
> this case.
> For reference:
> https://it.wikipedia.org/wiki/Online_Certificate_Status_Protocol
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4334) SASL authentication fails when using host aliases

2021-07-28 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388732#comment-17388732
 ] 

Damien Diederen commented on ZOOKEEPER-4334:


I added (some) support for canonicalization in ZOOKEEPER-4030, but the 
corresponding commit is only present in 3.7.0+.  Furthermore, it has to be 
activated via {{zookeeper.kerberos.canonicalizeHostNames}}.

With that support, it is possible to reference servers via  {{CNAME}} aliases, 
as long as the keytab contains the real names.  Would that actually solve your 
use-case?

> SASL authentication fails when using host aliases
> -
>
> Key: ZOOKEEPER-4334
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4334
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.1
>Reporter: Emil Kleszcz
>Priority: Critical
>
> I faced an issue while trying to use alternative aliases with Zookeeper 
> quorum when SASL is enabled. The errors I get in zookeeper log are the 
> following:
>  ```
>  2021-07-12 21:04:46,437 [myid:3] - WARN 
> [NIOWorkerThread-3:ZooKeeperServer@1661] - Client /:37368 failed to 
> SASL authenticate: {}
>  javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum 
> failed)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
>  at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:49)
>  at 
> org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1650)
>  at 
> org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:1599)
>  at 
> org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:379)
>  at 
> org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:182)
>  at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:339)
>  at 
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
>  at 
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Checksum failed)
>  at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:856)
>  at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
>  at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:167)
>  ... 11 more
>  Caused by: KrbException: Checksum failed
>  at 
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102)
>  at 
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94)
>  at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175)
>  at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:281)
>  at sun.security.krb5.KrbApReq.(KrbApReq.java:149)
>  at 
> sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108)
>  at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:829)
>  ... 14 more
>  Caused by: java.security.GeneralSecurityException: Checksum failed
>  at 
> sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451)
>  at 
> sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272)
>  at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76)
>  at 
> sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100)
>  ... 20 more
>  ```
> What did I do?
>  1) created host aliases for each quorum node (a,b,c): zk1, zk2, zk3
>  2) Changed in zoo.cfg:
>  changed from
>  server.1=a
>  server.2=b
>  server.3=c
> to:
>  server.1=zk1
>  server.2=zk2
>  server.3=zk3
> (at this stage after restarting the ensemble all works as expected.
>  3) Generate new keytab with alias-based principals and host-based principals 
> in zookeeper.keytab
>  4) Change jaas.conf (server) definition from:
>  Server
> { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true 
> keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true 
> useTicketCache=false principal="zookeeper/a.com@COM"; }
> ;
> to
>  Server
> { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true 
> keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true 
> useTicketCache=false principal="zookeeper/zk1.com@COM"; }
> ;
> From that moment, after restarting quorum 

[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes

2021-07-03 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374136#comment-17374136
 ] 

Damien Diederen commented on ZOOKEEPER-4332:


Hi [~ekleszcz],

Great!

Cheers, -D

> Cannot access children of znode that owns too many znodes
> -
>
> Key: ZOOKEEPER-4332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.1
>Reporter: Emil Kleszcz
>Priority: Critical
>  Labels: zookeeper
> Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot 
> 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png
>
>
> We experience problems with performing any operation (deleteall, get etc.) on 
> a znode that has too many child nodes. In our case, it's above 200k. At the 
> same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't 
> help. This should be either solved by limiting the number of direct znodes 
> allowed by a parameter or by adding a hard limit by default.
> I am attaching some screenshots of the commands and their results. What's 
> interesting the numbers from getAllChildrenNumber and stat (numChildren) 
> commands don't match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes

2021-07-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372764#comment-17372764
 ] 

Damien Diederen commented on ZOOKEEPER-4332:


Hi Emil,

bq. However, my current problem is to clean up this state. I cannot do anything 
with this path as I explained.
Right; understood.

bq. Thus, 
"/rmstore/ZKRMStateRoot/RMAppRoot/application_1592399466874_18251/appattempt_159239946687417986_02"
 should be the longest. The length in bytes should be 10601166.

Okay.

bq. The extra dot is just my editorial mistake in the post, in zoo.cfg it is 
specified as expected: "jute.maxbuffer=4194304".

Okay—just making sure. But that still tells us something: the error you are 
experiencing is happening on the client side \(you are using {{zkCli.sh}}, 
aren't you?), but {{zoo.cfg}} is the server configuration\!

The easiest way to change *its* buffer size is, as far as I know, the 
{{CLIENT_JVMFLAGS}} environment variable. Could you try with something like 
this:

{code:bash}
CLIENT_JVMFLAGS='-Djute.maxbuffer=0x100'

export CLIENT_JVMFLAGS

zkCli.sh -server ${MY_CONN_STRING?}
{code}

HTH, \-D


> Cannot access children of znode that owns too many znodes
> -
>
> Key: ZOOKEEPER-4332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.1
>Reporter: Emil Kleszcz
>Priority: Critical
>  Labels: zookeeper
> Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot 
> 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png
>
>
> We experience problems with performing any operation (deleteall, get etc.) on 
> a znode that has too many child nodes. In our case, it's above 200k. At the 
> same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't 
> help. This should be either solved by limiting the number of direct znodes 
> allowed by a parameter or by adding a hard limit by default.
> I am attaching some screenshots of the commands and their results. What's 
> interesting the numbers from getAllChildrenNumber and stat (numChildren) 
> commands don't match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4314) Can not get real exception when getChildren more than 4M

2021-07-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372525#comment-17372525
 ] 

Damien Diederen commented on ZOOKEEPER-4314:


Hi [~qzballack],

I understand that this ticket is about the lack of diagnosis, but have linked 
the two other issues because they capture the root cause and potential 
workarounds.


> Can not get real exception when getChildren more than 4M
> 
>
> Key: ZOOKEEPER-4314
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4314
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Affects Versions: 3.6.0
>Reporter: Li Jian
>Priority: Major
> Attachments: zookeeper  problem.docx
>
>
> When use zkClient or curator listener one zookeeper node more than 4M will 
> cause endless loop,because they catch error code for 
> KeeperException.Code.ConnectionLoss(-4) then will retry getChildren method 
> after a monent。 The root reason for zookeeper invoke readLength method occor 
> IOException for “Package len is out of range” but when deal with the 
> following process zookeeper changed the real exception to ConnectionLoss。Can 
> See the attached file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ZOOKEEPER-4323) support compile and build C client in the Mac OS

2021-07-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372512#comment-17372512
 ] 

Damien Diederen edited comment on ZOOKEEPER-4323 at 7/1/21, 8:16 AM:
-

Hi [~maoling],

Which version(s) does this ticket apply to? I fixed some Catalina compilation 
issues while preparing the 3.7.0 release, and [~eolivelli] 
[confirmed|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202103.mbox/%3cCACcefgd1sNLSs2__E=ZgpYBrCuRm02PeUz=pxrvzg4baz2c...@mail.gmail.com%3e]
 that he was successful in building the result on BigSur:

{quote}
* This time \(for the first time!) I was able to build the C client on MacOs 
\(BigSur) \!
{quote}

Did it get broken in the meantime? \(I don't have a mac at hand.)

P.-S. — Note that the tests are still badly broken, and that making them 
portable is a nontrivial undertaking.


was (Author: ztzg):
Hi [~maoling],

Which version(s) does this ticket apply to? I fixed some Catalina compilation 
issues while preparing the 3.7.0 release, and [~eolivelli] 
[confirmed|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202103.mbox/%3cCACcefgd1sNLSs2__E=ZgpYBrCuRm02PeUz=pxrvzg4baz2c...@mail.gmail.com%3e]
 that he was successful in building the result on BigSur:

{quote}
* This time \(for the first time!) I was able to build the C client on MacOs 
\(BigSur) \!
{quote}

Did it get broken in the meantime? \(I don't have a mac at hand.)

> support compile and build C client in the Mac OS
> 
>
> Key: ZOOKEEPER-4323
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4323
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4323) support compile and build C client in the Mac OS

2021-07-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372512#comment-17372512
 ] 

Damien Diederen commented on ZOOKEEPER-4323:


Hi [~maoling],

Which version(s) does this ticket apply to? I fixed some Catalina compilation 
issues while preparing the 3.7.0 release, and [~eolivelli] 
[confirmed|https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202103.mbox/%3cCACcefgd1sNLSs2__E=ZgpYBrCuRm02PeUz=pxrvzg4baz2c...@mail.gmail.com%3e]
 that he was successful in building the result on BigSur:

{quote}
* This time \(for the first time!) I was able to build the C client on MacOs 
\(BigSur) \!
{quote}

Did it get broken in the meantime? \(I don't have a mac at hand.)

> support compile and build C client in the Mac OS
> 
>
> Key: ZOOKEEPER-4323
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4323
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes

2021-07-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372502#comment-17372502
 ] 

Damien Diederen commented on ZOOKEEPER-4332:


Hi Emil,

You also wrote:

bq. At the same time jute.max.buffer is 4194304. Increasing it by a few factors 
doesn't help.
This is strange. How long are the names of the children of {{.../RMAppRoot/}}? 
Assuming 32 ASCII characters on average, \~7MiB of {{jute.maxbuffer}} should be 
enough for {{GetChildren}} to succeed:

{code:bash}
p=/rmstore/ZKRMStateRoot/RMAppRoot/0123456789abcdef0123456789abcdef
p_length=${#p}
echo $(((p_length + 4) * 100011))
# 6900759
{code}
\(Note that there is an extra dot in the name of the property you reported 
above. It must be {{\-Djute.maxbuffer}}…=. Could that explain the issue?)

> Cannot access children of znode that owns too many znodes
> -
>
> Key: ZOOKEEPER-4332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.1
>Reporter: Emil Kleszcz
>Priority: Critical
>  Labels: zookeeper
> Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot 
> 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png
>
>
> We experience problems with performing any operation (deleteall, get etc.) on 
> a znode that has too many child nodes. In our case, it's above 200k. At the 
> same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't 
> help. This should be either solved by limiting the number of direct znodes 
> allowed by a parameter or by adding a hard limit by default.
> I am attaching some screenshots of the commands and their results. What's 
> interesting the numbers from getAllChildrenNumber and stat (numChildren) 
> commands don't match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4332) Cannot access children of znode that owns too many znodes

2021-07-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372485#comment-17372485
 ] 

Damien Diederen commented on ZOOKEEPER-4332:


Hi Emil,

This is a long-standing issue, indeed, and one which is unfortunately still 
relevant.

Suggestions which have been suggested include:

* [Rejecting node 
creations|https://issues.apache.org/jira/browse/ZOOKEEPER-1162?focusedCommentId=13091100=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13091100]
 which would cause the {{GetChildren}} payload to overflow {{jute.maxbuffer}} 
\(which is somewhat problematic, as the database does not specify the minimum 
{{jute.maxbuffer}} needed at runtime);
* [Introducing a paginated 
version|https://issues.apache.org/jira/browse/ZOOKEEPER-2260] of 
{{GetChildren}}. I had missed that existing patch before; it would be 
interesting to forward-port it to 3.7\+\!

You also noted:

bq. I am attaching some screenshots of the commands and their results. What's 
interesting the numbers from getAllChildrenNumber and stat \(numChildren) 
commands don't match.
Note that {{getAllChildrenNumber}} is a *recursive* computation, whereas 
{{Stat.numChildren}} tracks the number of *direct* children—so I would expect 
the former to be larger than the latter if some of the nodes have children.


> Cannot access children of znode that owns too many znodes
> -
>
> Key: ZOOKEEPER-4332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4332
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.1
>Reporter: Emil Kleszcz
>Priority: Critical
>  Labels: zookeeper
> Attachments: Screen Shot 2021-06-30 at 16.52.17.png, Screen Shot 
> 2021-06-30 at 16.52.42.png, Screen Shot 2021-06-30 at 16.53.04.png
>
>
> We experience problems with performing any operation (deleteall, get etc.) on 
> a znode that has too many child nodes. In our case, it's above 200k. At the 
> same time jute.max.buffer is 4194304. Increasing it by a few factors doesn't 
> help. This should be either solved by limiting the number of direct znodes 
> allowed by a parameter or by adding a hard limit by default.
> I am attaching some screenshots of the commands and their results. What's 
> interesting the numbers from getAllChildrenNumber and stat (numChildren) 
> commands don't match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4284) Add metrics for observer sync time

2021-06-30 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4284:
---
Fix Version/s: 3.7.1

> Add metrics for observer sync time
> --
>
> Key: ZOOKEEPER-4284
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4284
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Affects Versions: 3.7.0, 3.7.1
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> With enabling the feature of followers hosting observers, it would be nice if 
> we have a metric to measure the observer sync time just like what we have for 
> the follower sync time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4318) Only report the follower sync time metrics if sync is completed

2021-06-30 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4318:
---
Fix Version/s: 3.7.1

> Only report the follower sync time metrics if sync is completed
> ---
>
> Key: ZOOKEEPER-4318
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4318
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Affects Versions: 3.8
>Reporter: Li Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should calculate the sync time only if completedSync is true. Otherwise, 
> we will get noisy data such as 0 sync time in cases where sync immediately 
> failed (due to network partition for example).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4318) Only report the follower sync time metrics if sync is completed

2021-06-30 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4318.

Fix Version/s: (was: 3.8)
   (was: 3.7.1)
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1712
[https://github.com/apache/zookeeper/pull/1712]

> Only report the follower sync time metrics if sync is completed
> ---
>
> Key: ZOOKEEPER-4318
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4318
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Affects Versions: 3.8
>Reporter: Li Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should calculate the sync time only if completedSync is true. Otherwise, 
> we will get noisy data such as 0 sync time in cases where sync immediately 
> failed (due to network partition for example).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4284) Add metrics for observer sync time

2021-06-30 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4284.

Fix Version/s: 3.8.0
   Resolution: Fixed

Issue resolved by pull request 1691
[https://github.com/apache/zookeeper/pull/1691]

> Add metrics for observer sync time
> --
>
> Key: ZOOKEEPER-4284
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4284
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Affects Versions: 3.7.0, 3.7.1
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> With enabling the feature of followers hosting observers, it would be nice if 
> we have a metric to measure the observer sync time just like what we have for 
> the follower sync time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-2695) Handle unknown error for rolling upgrade old client new server scenario

2021-06-24 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368712#comment-17368712
 ] 

Damien Diederen commented on ZOOKEEPER-2695:


Hi [~arshad.mohammad],

I was looking into this, following [this 
question|https://github.com/apache/zookeeper/pull/1716#pullrequestreview-687896698]
 of [~eolivelli]'s:

bq. You are also introducing a new error code, how do old clients will handle 
that error code?
This is a recurrent question—one to which I feel we don't have a very good 
answer.

The patch attached to this ticket still applies, provided one updates the paths 
and the target JUnit API imports. \(I am attaching a refreshed version as I 
have it at hand; feel free to turn it into a GitHub pull request.)

The proposed solution definitely improves on the status quo, but I have a few 
questions:

# {{Code.SYSTEMERROR}} is documented as follows:
bq. This is never thrown by the server, it shouldn't be used other than to 
indicate a range. Specifically error codes greater than this value, but lesser 
than {{#APIERROR}}, are system errors.
So I suppose I would suggest throwing {{SystemErrorException}} for codes in 
{{\[Code.SYSTEMERROR, Code.APIERROR)}} and {{APIErrorException}} codes outside 
that range. What do you think?
# Unfortunately, "known" system errors such as 
{{RuntimeInconsistencyException}} do not inherit from {{SystemErrorException}}. 
Similarly, an "API error" such as {{NoNodeException}} does not inherit from 
{{APIErrorException}}.
The list of {{KeeperException}} subclasses being flat means that clients have 
intimate knowledge of error codes to distinguish between exceptions which 
warrant an immediate retry, exceptions which should trigger exponential 
back-off, or fatal ones such as {{Code.AUTHFAILED}}.
While reparenting the existing classes would technically be an API/ABI break, 
we could potentially introduce superclasses such as e.g. 
{{BaseSystemErrorException}} between {{KeeperException}} and its children 
\(AFAICT, Sun & Oracle frequently do so), and use ranges to map unknown codes.
Is that something we should consider?
# {{KeeperException}} currently uses a {{private Code code}} member variable, 
which makes it impossible to propagate an unknown error codes.
Should we change that \(back?) to an {{int}}—without changing the public 
interface?
Of course, methods such as {{public Code code()}} would still have to return 
{{null}}, but others such as {{public int getCode()}} \(currently deprecated) 
could return the actual value, and an informative message could still be 
generated in {{public String getMessage()}}.

Thoughts?

> Handle unknown error for rolling upgrade old client new server scenario
> ---
>
> Key: ZOOKEEPER-2695
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2695
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Attachments: 
> 0001-ZOOKEEPER-2695-Handle-unknown-error-for-rolling-upgr.patch, 
> ZOOKEEPER-2695-01.patch
>
>
> In Zookeeper rolling upgrade scenario where server is new but client is old, 
> when sever sends error code which is not understood by the client, client 
> throws NullPointerException. 
> KeeperException.SystemErrorException should be thrown for all unknown error 
> code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-2695) Handle unknown error for rolling upgrade old client new server scenario

2021-06-24 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-2695:
---
Attachment: 0001-ZOOKEEPER-2695-Handle-unknown-error-for-rolling-upgr.patch

> Handle unknown error for rolling upgrade old client new server scenario
> ---
>
> Key: ZOOKEEPER-2695
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2695
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Attachments: 
> 0001-ZOOKEEPER-2695-Handle-unknown-error-for-rolling-upgr.patch, 
> ZOOKEEPER-2695-01.patch
>
>
> In Zookeeper rolling upgrade scenario where server is new but client is old, 
> when sever sends error code which is not understood by the client, client 
> throws NullPointerException. 
> KeeperException.SystemErrorException should be thrown for all unknown error 
> code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-2154) NPE in KeeperException

2021-06-24 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-2154.

Fix Version/s: (was: 3.8.0)
   Resolution: Duplicate

Closing in favor of ZOOKEEPER-2695.  (Same issue, but more comprehensive 
solution.)

> NPE in KeeperException
> --
>
> Key: ZOOKEEPER-2154
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2154
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6
>Reporter: Surendra Singh Lilhore
>Priority: Major
> Attachments: ZOOKEEPER-2154.patch
>
>
> KeeperException should handle exception is code is null...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash

2021-06-23 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367917#comment-17367917
 ] 

Damien Diederen commented on ZOOKEEPER-4306:


{quote}I think it‘s more easier for others to notice this limitation if add 
some JavaDoc of ZooKeeper.create. Would you agree?
{quote}
Yes, I agree :) Explicitly added to my TODO list.

> CloseSessionTxn contains too many ephemal nodes cause cluster crash
> ---
>
> Key: ZOOKEEPER-4306
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Lin Changrui
>Priority: Critical
>  Labels: pull-request-available
> Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We took a test about how many ephemal nodes can client create under one 
> parent node with defalut configuration. The test caused cluster crash at 
> last, exception stack trace like this.
> follower:
> !f.jpg!
> leader:
> !l1.png!
> !l2.jpg!
> It seems that leader sent a too large txn packet to followers. When follower 
> try to deserialize the txn, it found the txn length out of its buffer 
> size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes 
> followers crashed, and then, leader found there was no sufficient followers 
> synced, so leader shutdown later. When leader shutdown, it called 
> zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of 
> its buffer size, so it crashed too.
> After the servers crashed, they try to restart the quorum. But they would not 
> success because the last txn is too large. We lose the log at that moment, 
> but the stack trace is same as this one.
> !r.jpg|width=1468,height=598!
>  
> *Root Cause*
> We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) 
> visualize this log and found this. !cs.jpg|width=1400,height=581! So 
> closeSessionTxn contains all ephemal nodes with absolute path. We know we 
> will get a large getChildren respose if we create too many children nodes 
> under one parent node, that is limited by jute.maxbuffer of client. If we 
> create plenty of ephemal nodes under different parent nodes with one session, 
> it may not cause out of buffer of client, but when the session close without 
> delete these node first, it probably cause cluster crash.
> Is it a bug or just a unspecified feature?If it just so, how should we judge 
> the upper limit of creating nodes? 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash

2021-06-17 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365079#comment-17365079
 ] 

Damien Diederen commented on ZOOKEEPER-4306:


Hi [~Changrui Lin],

A first PR is available: https://github.com/apache/zookeeper/pull/1716.

Your review/comments would be welcome!

Cheers, -D

> CloseSessionTxn contains too many ephemal nodes cause cluster crash
> ---
>
> Key: ZOOKEEPER-4306
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Lin Changrui
>Priority: Critical
> Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg
>
>
> We took a test about how many ephemal nodes can client create under one 
> parent node with defalut configuration. The test caused cluster crash at 
> last, exception stack trace like this.
> follower:
> !f.jpg!
> leader:
> !l1.png!
> !l2.jpg!
> It seems that leader sent a too large txn packet to followers. When follower 
> try to deserialize the txn, it found the txn length out of its buffer 
> size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes 
> followers crashed, and then, leader found there was no sufficient followers 
> synced, so leader shutdown later. When leader shutdown, it called 
> zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of 
> its buffer size, so it crashed too.
> After the servers crashed, they try to restart the quorum. But they would not 
> success because the last txn is too large. We lose the log at that moment, 
> but the stack trace is same as this one.
> !r.jpg|width=1468,height=598!
>  
> *Root Cause*
> We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) 
> visualize this log and found this. !cs.jpg|width=1400,height=581! So 
> closeSessionTxn contains all ephemal nodes with absolute path. We know we 
> will get a large getChildren respose if we create too many children nodes 
> under one parent node, that is limited by jute.maxbuffer of client. If we 
> create plenty of ephemal nodes under different parent nodes with one session, 
> it may not cause out of buffer of client, but when the session close without 
> delete these node first, it probably cause cluster crash.
> Is it a bug or just a unspecified feature?If it just so, how should we judge 
> the upper limit of creating nodes? 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4312) ZooKeeperServerEmbedded: enhance server start/stop for testability

2021-06-13 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4312.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1710
[https://github.com/apache/zookeeper/pull/1710]

> ZooKeeperServerEmbedded: enhance server start/stop for testability
> --
>
> Key: ZOOKEEPER-4312
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4312
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.7.0
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ZooKeeperServerEmbedded works well for running ZooKeeper but it lacks support 
> for a few little features in order to use it for tests.
> I saw these problems while working on the port of Curator Testing Server to 
> ZooKeeperServerEmbedded.
>  * There is no wait to wait for the server to be up-and-running
>  * When you "close()" the server, it does not wait for the ports to be closed
>  * There is no wait to have the ConnectString for the server



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4306) CloseSessionTxn contains too many ephemal nodes cause cluster crash

2021-06-10 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360734#comment-17360734
 ] 

Damien Diederen commented on ZOOKEEPER-4306:


Hi [~Changrui Lin],

I am currently looking into fixing this.

bq. Is it a bug or just a unspecified feature? If it just so, how should we 
judge the upper limit of creating nodes?

I would definitely put this in the "bug" category \:)

It seems that we would want ephemeral node creation to start failing when the 
session gets "too big" to fit in a transaction.  [~eolivelli], [~lvfangmin]: 
would you agree?

Here are a few related tickets which include ideas for minimizing 
{{jute.maxbuffer}} annoyances:

# ZOOKEEPER-1162: Suggests \(among others) controlling node size during child 
creation (similar to what I am proposing above);
# ZOOKEEPER-1644: Suggests compressing some of the data, which would allow for 
a larger {{CloseSessionTxn}} (related to your comment about "absolute paths").

Note that it is currently possible to work around this issue by setting this 
undocumented flag:

{noformat}
closeSessionTxn.enabled = false
{noformat}

(The flag was introduced as part of ZOOKEEPER-3145. Of course, disabling it 
"unfixes" the "potential watch missing issue." Still, probably better than 
suffering crashing ensembles.)


> CloseSessionTxn contains too many ephemal nodes cause cluster crash
> ---
>
> Key: ZOOKEEPER-4306
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.2
>Reporter: Lin Changrui
>Priority: Critical
> Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg
>
>
> We took a test about how many ephemal nodes can client create under one 
> parent node with defalut configuration. The test caused cluster crash at 
> last, exception stack trace like this.
> follower:
> !f.jpg!
> leader:
> !l1.png!
> !l2.jpg!
> It seems that leader sent a too large txn packet to followers. When follower 
> try to deserialize the txn, it found the txn length out of its buffer 
> size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes 
> followers crashed, and then, leader found there was no sufficient followers 
> synced, so leader shutdown later. When leader shutdown, it called 
> zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of 
> its buffer size, so it crashed too.
> After the servers crashed, they try to restart the quorum. But they would not 
> success because the last txn is too large. We lose the log at that moment, 
> but the stack trace is same as this one.
> !r.jpg|width=1468,height=598!
>  
> *Root Cause*
> We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780) 
> visualize this log and found this. !cs.jpg|width=1400,height=581! So 
> closeSessionTxn contains all ephemal nodes with absolute path. We know we 
> will get a large getChildren respose if we create too many children nodes 
> under one parent node, that is limited by jute.maxbuffer of client. If we 
> create plenty of ephemal nodes under different parent nodes with one session, 
> it may not cause out of buffer of client, but when the session close without 
> delete these node first, it probably cause cluster crash.
> Is it a bug or just a unspecified feature?If it just so, how should we judge 
> the upper limit of creating nodes? 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus

2021-06-07 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358800#comment-17358800
 ] 

Damien Diederen commented on ZOOKEEPER-4211:


Hi [~liwang],

I am planning to dedicate some time to this on Wednesday.

HTH, -D

> Expose Quota Metrics to Prometheus
> --
>
> Key: ZOOKEEPER-4211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: metric system
>Affects Versions: 3.7.0, 3.7
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> In 3.7, Quota limit can be enforced and the quota related stats are captured 
> in the StatsTrack.  From the "listquota" CLI command, we can the quota limit 
> and usage info. 
> As an addition to that, we would like to collect the quota metrics and expose 
> them to the Prometheus for the following:
> 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard
> 2. Creating alert based on the quota levels (e.g. 90% used)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-3970) Enable ZooKeeperServerController to expire session

2021-05-20 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-3970.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1505
[https://github.com/apache/zookeeper/pull/1505]

> Enable ZooKeeperServerController to expire session
> --
>
> Key: ZOOKEEPER-3970
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3970
> Project: ZooKeeper
>  Issue Type: Task
>  Components: server, tests
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is a follow up of ZOOKEEPER-3948. Here we enable 
> ZooKeeperServerController to be able to expire a global or local session. 
> This is very useful in our experience in integration testing when we want a 
> controlled session expiration mechanism. This is done by having session 
> tracker exposing both global and local session stats, so a zookeeper server 
> can expire the sessions in the controller. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus

2021-05-12 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343403#comment-17343403
 ] 

Damien Diederen commented on ZOOKEEPER-4211:


Hi [~liwang],

Yup, I have seen your update.  Planning to have a look tomorrow.

Cheers, -D

> Expose Quota Metrics to Prometheus
> --
>
> Key: ZOOKEEPER-4211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: metric system
>Affects Versions: 3.7.0, 3.7
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In 3.7, Quota limit can be enforced and the quota related stats are captured 
> in the StatsTrack.  From the "listquota" CLI command, we can the quota limit 
> and usage info. 
> As an addition to that, we would like to collect the quota metrics and expose 
> them to the Prometheus for the following:
> 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard
> 2. Creating alert based on the quota levels (e.g. 90% used)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus

2021-05-06 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340353#comment-17340353
 ] 

Damien Diederen commented on ZOOKEEPER-4211:


Hi [~liwang],

Ha, funny; I just caught up with your discussion on the {{dev}} mailing list!  
(https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202105.mbox/browser).  
Excellent work, btw.

I haven't forgotten about your points, but have just been really busy on other 
topics.  (Of course, don't hesitate to ping other reviewers, particularly when 
I am lagging!)

That being said:

bq. Based on investigation result of the performance impact of Prometheus 
Summary quantile computation, I am working on adding the support for CounterSet 
for the use case that need to group counter metrics by keys (i.e. top 
namespace) but no need for quantiles and sum.

I figure we'll have to use some sort of queue to gather all these metrics 
without blocking the worker threads, but +1 on not making things worse in the 
meantime :)

> Expose Quota Metrics to Prometheus
> --
>
> Key: ZOOKEEPER-4211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: metric system
>Affects Versions: 3.7.0, 3.7
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> In 3.7, Quota limit can be enforced and the quota related stats are captured 
> in the StatsTrack.  From the "listquota" CLI command, we can the quota limit 
> and usage info. 
> As an addition to that, we would like to collect the quota metrics and expose 
> them to the Prometheus for the following:
> 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard
> 2. Creating alert based on the quota levels (e.g. 90% used)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4285) High CVE-2019-25013 reported by Clair scanner for Zookeeper 3.6.1

2021-05-06 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4285.

  Assignee: Damien Diederen
Resolution: Invalid

Hi [~priyavj],

ZooKeeper releases do not bundle the GNU C library, nor native binaries, so I 
don't see how this report could be lifted on our side.  If you have installed 
some kind of ZooKeeper package provided by a distributor, I would suggest 
raising the issue with them.

(Of course, feel free to reopen if I missed something.)

Best, -D

> High CVE-2019-25013 reported by Clair scanner for Zookeeper 3.6.1
> -
>
> Key: ZOOKEEPER-4285
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4285
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: priya Vijay
>Assignee: Damien Diederen
>Priority: Major
>
> On running clair scanner for Zookeeper 3.6.1, the following high priority 
> vulnerability is reported: 
> CVE-2019-25013  [https://nvd.nist.gov/vuln/detail/CVE-2019-25013]
>  details: The iconv feature in the GNU C Library (aka glibc or libc6) through 
> 2.32, when processing invalid multi-byte input sequences in the EUC-KR 
> encoding, may have a buffer over-read



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4262) Backport ZOOKEEPER-3911 to branch-3.5

2021-04-21 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4262.

Fix Version/s: 3.5.10
   Resolution: Fixed

Issue resolved by pull request 1657
[https://github.com/apache/zookeeper/pull/1657]

> Backport ZOOKEEPER-3911 to branch-3.5
> -
>
> Key: ZOOKEEPER-4262
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4262
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: fanyang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.5.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Backporting ZOOKEEPER-3911 to branch-3.5
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4282) Redesign quota feature

2021-04-21 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326440#comment-17326440
 ] 

Damien Diederen commented on ZOOKEEPER-4282:


Hi [~arshad.mohammad],

I was just looking into this.

Generally agree.  Even when "hard" limits are set, the current quota 
implementation 
([ZOOKEEPER-3301|https://issues.apache.org/jira/browse/ZOOKEEPER-3301]) is 
trivial to work around.

First, a few notes about your points, then a scary story below:

# Not necessarily against adding {{setQuota}} and friends, but wouldn't 
creating all nodes in the {{/zookeeper/quota}} subtree with an ACL akin to 
{{world:anyone:r}} (by default, value configurable) be technically sufficient, 
and not require a change in protocol?
# ACL?
# I agree that the notion of an "application root node" seems to be quite 
common in deployments, and that the native root should be protected in such 
setups.  Perhaps a simple configuration setting? Doing {{setAcl / 
world:anyone:r}} as {{super}} is not that difficult, though---and the window is 
probably negligible in practice;
# We currently have a tristate: {{enforceQuota:false}} means that quotas are 
not being _processed_ at all; "soft" quotas cause overflows to be logged; 
"hard" quotas cause requests to fail. (Not saying we need to preserve these 
features; it was just to complete your description).

Now for the scary story:

The old quota implementation was supposed to be "advisory," but I looked a bit 
deeper---and just noticed that besides its obvious limitations, the lack of 
controls combined with the central location of the quota checks creates a 
serious DoS vector!

(I was aware of a similar problem with 
[ZOOKEEPER-451|https://issues.apache.org/jira/browse/ZOOKEEPER-451], but it 
turns out that the issue is present in mainline 3.6 and earlier.)

On a "properly administered" ensemble, a {{super}} user sets up a "root node" 
for user {{eve}}:

{noformat}
setAcl / world:anyone:r

create /eve
setAcl /eve sasl:eve:cdrwa
setquota /eve -B 32
{noformat}

Once logged in, {{eve}} can simply do:

{noformat}
set /zookeeper/quota/eve/zookeeper_limits boom
create /eve/was.here
{noformat}

which immediately causes the server to fail and exit with this nasty exception:

{noformat}
2021-04-21 12:20:25,861 [myid:] - ERROR 
[SyncThread:0:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from 
thread : SyncThread:0
java.lang.IllegalArgumentException: invalid string yolo
at org.apache.zookeeper.StatsTrack.(StatsTrack.java:50)
at 
org.apache.zookeeper.server.DataTree.updateCountBytes(DataTree.java:409)
at org.apache.zookeeper.server.DataTree.createNode(DataTree.java:550)
{noformat}

Worse, the server won't restart before the corrupted data is excised from the 
snapshot or transaction log.

This seems to be a minimal reproducer:

{noformat}
create /eve
create /zookeeper/quota/eve
create /zookeeper/quota/eve/zookeeper_stats boom
create /zookeeper/quota/eve/zookeeper_limits boom
create /eve/was.here
{noformat}

I would suggest opening another ticket, and creating PRs preventing the server 
crash for 3.5 and 3.6.  WDYT?  Should I take care of it?

Best, -D

(Cc: [~eolivelli], [~maoling], [~hanm].)

> Redesign quota feature
> --
>
> Key: ZOOKEEPER-4282
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4282
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quota
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.8.0
>
>
> *Quota Use Case:*
> Generally in a big data solution deployment multiple services (hdfs, yarn, 
> hbase etc.) use single Zookeeper cluster. So it is very important to ensure 
> fare usage by all services. Sometime services unintentionally, mainly because 
> of faulty behavior, create many znodes and impact the overall reliability of 
> the ZooKeeper service. To ensure the faire usage quota feature is required. 
> But this is the only use case there are many other use cases for quota 
> feature.
> *Current Problems:*
> # Currently, user can set quota by updating znode 
> “/zookeeper/quota/nodepath”, or using setquota/delquota in CLI command.
> This makes the quota setting infective
> Currently any user can set/delete quota, which is not proper, it should be 
> admin operation
> # User is allowed to modify zookeeper system paths like /zookeeper/quota. 
> These are internal to zookeeper should not be allowed to modify.
> # Generally services create single top level znode in Zookeeper like /hbase 
> and create all required znode under it. 
> It is better if it is configurable who can create top level znodes to 
> controll ZooKeeper usage.
> # After ZOOKEEPER-231, there two kinds quota enforcement limits 1. Hard limit 
> 2. Soft limit. 
> I think there should be only limit. 

[jira] [Resolved] (ZOOKEEPER-4265) Download page broken links

2021-04-10 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4265.

Fix Version/s: 3.6.3
   3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1677
[https://github.com/apache/zookeeper/pull/1677]

> Download page broken links
> --
>
> Key: ZOOKEEPER-4265
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1, 3.6.3
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The download page [1] has broken links for the following release versions:
> 3.6.1
> 3.5.9
> Please remove them from the page.
> If necessary, they can be linked from the archive server, in which case the 
> page should make it clear that they historic releases.
> [1] https://zookeeper.apache.org/releases.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4265) Download page broken links

2021-04-08 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316943#comment-17316943
 ] 

Damien Diederen commented on ZOOKEEPER-4265:


[~sebb]: Thank you for the report.  You might want to take a look at 
https://github.com/apache/zookeeper/pull/1677.

> Download page broken links
> --
>
> Key: ZOOKEEPER-4265
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sebb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The download page [1] has broken links for the following release versions:
> 3.6.1
> 3.5.9
> Please remove them from the page.
> If necessary, they can be linked from the archive server, in which case the 
> page should make it clear that they historic releases.
> [1] https://zookeeper.apache.org/releases.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-4265) Download page broken links

2021-04-08 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen reassigned ZOOKEEPER-4265:
--

Assignee: Damien Diederen

> Download page broken links
> --
>
> Key: ZOOKEEPER-4265
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The download page [1] has broken links for the following release versions:
> 3.6.1
> 3.5.9
> Please remove them from the page.
> If necessary, they can be linked from the archive server, in which case the 
> page should make it clear that they historic releases.
> [1] https://zookeeper.apache.org/releases.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ZOOKEEPER-4266) Correct ZooKeeper version in documentation header

2021-04-03 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314246#comment-17314246
 ] 

Damien Diederen edited comment on ZOOKEEPER-4266 at 4/3/21, 11:45 AM:
--

Issue resolved by pull requests
[https://github.com/apache/zookeeper/pull/1659] and
[https://github.com/apache/zookeeper/pull/1660]


was (Author: ztzg):
Issue resolved by pull request 1660
[https://github.com/apache/zookeeper/pull/1660]

> Correct ZooKeeper version in documentation header
> -
>
> Key: ZOOKEEPER-4266
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4266
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
> Attachments: image-2021-03-28-22-25-39-949.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Both Master and branch-3.7 documentation header have ZooKeeper version as 3.6.
>  These should be changed to 3.8 and 3.7 for master and branch-3.7 respectively
> Master documentation currently:
>  !image-2021-03-28-22-25-39-949.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4266) Correct ZooKeeper version in documentation header

2021-04-03 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4266.

Resolution: Fixed

Issue resolved by pull request 1660
[https://github.com/apache/zookeeper/pull/1660]

> Correct ZooKeeper version in documentation header
> -
>
> Key: ZOOKEEPER-4266
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4266
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
> Attachments: image-2021-03-28-22-25-39-949.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Both Master and branch-3.7 documentation header have ZooKeeper version as 3.6.
>  These should be changed to 3.8 and 3.7 for master and branch-3.7 respectively
> Master documentation currently:
>  !image-2021-03-28-22-25-39-949.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus

2021-04-01 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313402#comment-17313402
 ] 

Damien Diederen commented on ZOOKEEPER-4211:


Hi [~liwang],

Not sure you have seen my two comments from earlier today:

* https://github.com/apache/zookeeper/pull/1644#discussion_r605547937
* https://github.com/apache/zookeeper/pull/1644#discussion_r605548947

(I know the other points are still pending.)


> Expose Quota Metrics to Prometheus
> --
>
> Key: ZOOKEEPER-4211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: metric system
>Affects Versions: 3.7.0, 3.7
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> In 3.7, Quota limit can be enforced and the quota related stats are captured 
> in the StatsTrack.  From the "listquota" CLI command, we can the quota limit 
> and usage info. 
> As an addition to that, we would like to collect the quota metrics and expose 
> them to the Prometheus for the following:
> 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard
> 2. Creating alert based on the quota levels (e.g. 90% used)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4245) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream

2021-03-25 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308866#comment-17308866
 ] 

Damien Diederen commented on ZOOKEEPER-4245:


No problem; website submit buttons are sometimes unresponsive/unclear.  I just 
wanted to give you a chance to chime in after closing the extra requests—in 
case I missed something.

> Resource leaks in 
> org.apache.zookeeper.server.persistence.SnapStream#getInputStream and 
> #getOutputStream
> 
>
> Key: ZOOKEEPER-4245
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4245
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Martin Kellogg
>Priority: Major
>
>  There are three (related) possible resource leaks in the `getInputStream` 
> and `getOutputStream` methods in `SnapStream.java`. I noticed the first 
> because of the use of the error-prone `GZIPOutputStream`, and the other two 
> after looking at the surrounding code.
> Here is the offending code (copied from 
> [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102]):
> {noformat}
> /**
>  * Return the CheckedInputStream based on the extension of the fileName.
>  *
>  * @param file the file the InputStream read from
>  * @return the specific InputStream
>  * @throws IOException
>  */
> public static CheckedInputStream getInputStream(File file) throws 
> IOException {
> FileInputStream fis = new FileInputStream(file);
> InputStream is;
> switch (getStreamMode(file.getName())) {
> case GZIP:
> is = new GZIPInputStream(fis);
> break;
> case SNAPPY:
> is = new SnappyInputStream(fis);
> break;
> case CHECKED:
> default:
> is = new BufferedInputStream(fis);
> }
> return new CheckedInputStream(is, new Adler32());
> }
> /**
>  * Return the OutputStream based on predefined stream mode.
>  *
>  * @param file the file the OutputStream writes to
>  * @param fsync sync the file immediately after write
>  * @return the specific OutputStream
>  * @throws IOException
>  */
> public static CheckedOutputStream getOutputStream(File file, boolean 
> fsync) throws IOException {
> OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new 
> FileOutputStream(file);
> OutputStream os;
> switch (streamMode) {
> case GZIP:
> os = new GZIPOutputStream(fos);
> break;
> case SNAPPY:
> os = new SnappyOutputStream(fos);
> break;
> case CHECKED:
> default:
> os = new BufferedOutputStream(fos);
> }
> return new CheckedOutputStream(os, new Adler32());
> }
> {noformat}
> All three possible resource leaks are caused by the constructors of the 
> intermediate streams (i.e. `is` and `os`), some of which might throw 
> `IOException`s:
>  * in `getOutputStream`, the call to `new GZIPOutputStream` can throw an 
> exception, because `GZIPOutputStream` writes out the header in the 
> constructor. If it does throw, then `fos` is never closed. That it does so 
> makes it hard to use correctly; someone raised this as an issue with the JDK 
> folks [here|https://bugs.openjdk.java.net/browse/JDK-8180899], but they 
> closed it as "won't fix" because the constructor is documented to throw 
> (hence the need to catch the exception here).
>  * in `getInputStream`, the call to `new GZIPInputStream` can throw an 
> `IOException` for a similar reason, causing the file handle held by `fis` to 
> leak.
>  * similarly, the call to `new SnappyInputStream` can throw an `IOException`, 
> because it tries to read the file header during construction, which also 
> causes `fis` to leak. `SnappyOutputStream` cannot throw; I checked 
> [here|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java].
> I'll submit a PR with a (simple) fix shortly after this bug report goes up 
> and gets assigned an issue number, and add a link to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4245) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream

2021-03-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4245.

Resolution: Duplicate

(Hi [~kelloggm].  Thank you for the report.  I believe this one and the others 
were accidental duplicates of ZOOKEEPER-4246, weren't they?)


> Resource leaks in 
> org.apache.zookeeper.server.persistence.SnapStream#getInputStream and 
> #getOutputStream
> 
>
> Key: ZOOKEEPER-4245
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4245
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Martin Kellogg
>Priority: Major
>
>  There are three (related) possible resource leaks in the `getInputStream` 
> and `getOutputStream` methods in `SnapStream.java`. I noticed the first 
> because of the use of the error-prone `GZIPOutputStream`, and the other two 
> after looking at the surrounding code.
> Here is the offending code (copied from 
> [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102]):
> {noformat}
> /**
>  * Return the CheckedInputStream based on the extension of the fileName.
>  *
>  * @param file the file the InputStream read from
>  * @return the specific InputStream
>  * @throws IOException
>  */
> public static CheckedInputStream getInputStream(File file) throws 
> IOException {
> FileInputStream fis = new FileInputStream(file);
> InputStream is;
> switch (getStreamMode(file.getName())) {
> case GZIP:
> is = new GZIPInputStream(fis);
> break;
> case SNAPPY:
> is = new SnappyInputStream(fis);
> break;
> case CHECKED:
> default:
> is = new BufferedInputStream(fis);
> }
> return new CheckedInputStream(is, new Adler32());
> }
> /**
>  * Return the OutputStream based on predefined stream mode.
>  *
>  * @param file the file the OutputStream writes to
>  * @param fsync sync the file immediately after write
>  * @return the specific OutputStream
>  * @throws IOException
>  */
> public static CheckedOutputStream getOutputStream(File file, boolean 
> fsync) throws IOException {
> OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new 
> FileOutputStream(file);
> OutputStream os;
> switch (streamMode) {
> case GZIP:
> os = new GZIPOutputStream(fos);
> break;
> case SNAPPY:
> os = new SnappyOutputStream(fos);
> break;
> case CHECKED:
> default:
> os = new BufferedOutputStream(fos);
> }
> return new CheckedOutputStream(os, new Adler32());
> }
> {noformat}
> All three possible resource leaks are caused by the constructors of the 
> intermediate streams (i.e. `is` and `os`), some of which might throw 
> `IOException`s:
>  * in `getOutputStream`, the call to `new GZIPOutputStream` can throw an 
> exception, because `GZIPOutputStream` writes out the header in the 
> constructor. If it does throw, then `fos` is never closed. That it does so 
> makes it hard to use correctly; someone raised this as an issue with the JDK 
> folks [here|https://bugs.openjdk.java.net/browse/JDK-8180899], but they 
> closed it as "won't fix" because the constructor is documented to throw 
> (hence the need to catch the exception here).
>  * in `getInputStream`, the call to `new GZIPInputStream` can throw an 
> `IOException` for a similar reason, causing the file handle held by `fis` to 
> leak.
>  * similarly, the call to `new SnappyInputStream` can throw an `IOException`, 
> because it tries to read the file header during construction, which also 
> causes `fis` to leak. `SnappyOutputStream` cannot throw; I checked 
> [here|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java].
> I'll submit a PR with a (simple) fix shortly after this bug report goes up 
> and gets assigned an issue number, and add a link to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4244) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream

2021-03-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4244.

Resolution: Duplicate

> Resource leaks in 
> org.apache.zookeeper.server.persistence.SnapStream#getInputStream and 
> #getOutputStream
> 
>
> Key: ZOOKEEPER-4244
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4244
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Martin Kellogg
>Priority: Major
>
>  There are three (related) possible resource leaks in the `getInputStream` 
> and `getOutputStream` methods in `SnapStream.java`. I noticed the first 
> because of the use of the error-prone `GZIPOutputStream`, and the other two 
> after looking at the surrounding code.
> Here is the offending code (copied from 
> [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102]):
> {noformat}
> /**
>  * Return the CheckedInputStream based on the extension of the fileName.
>  *
>  * @param file the file the InputStream read from
>  * @return the specific InputStream
>  * @throws IOException
>  */
> public static CheckedInputStream getInputStream(File file) throws 
> IOException {
> FileInputStream fis = new FileInputStream(file);
> InputStream is;
> switch (getStreamMode(file.getName())) {
> case GZIP:
> is = new GZIPInputStream(fis);
> break;
> case SNAPPY:
> is = new SnappyInputStream(fis);
> break;
> case CHECKED:
> default:
> is = new BufferedInputStream(fis);
> }
> return new CheckedInputStream(is, new Adler32());
> }
> /**
>  * Return the OutputStream based on predefined stream mode.
>  *
>  * @param file the file the OutputStream writes to
>  * @param fsync sync the file immediately after write
>  * @return the specific OutputStream
>  * @throws IOException
>  */
> public static CheckedOutputStream getOutputStream(File file, boolean 
> fsync) throws IOException {
> OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new 
> FileOutputStream(file);
> OutputStream os;
> switch (streamMode) {
> case GZIP:
> os = new GZIPOutputStream(fos);
> break;
> case SNAPPY:
> os = new SnappyOutputStream(fos);
> break;
> case CHECKED:
> default:
> os = new BufferedOutputStream(fos);
> }
> return new CheckedOutputStream(os, new Adler32());
> }
> {noformat}
> All three possible resource leaks are caused by the constructors of the 
> intermediate streams (i.e. `is` and `os`), some of which might throw 
> `IOException`s:
>  * in `getOutputStream`, the call to `new GZIPOutputStream` can throw an 
> exception, because `GZIPOutputStream` writes out the header in the 
> constructor. If it does throw, then `fos` is never closed. That it does so 
> makes it hard to use correctly; someone raised this as an issue with the JDK 
> folks [here|https://bugs.openjdk.java.net/browse/JDK-8180899], but they 
> closed it as "won't fix" because the constructor is documented to throw 
> (hence the need to catch the exception here).
>  * in `getInputStream`, the call to `new GZIPInputStream` can throw an 
> `IOException` for a similar reason, causing the file handle held by `fis` to 
> leak.
>  * similarly, the call to `new SnappyInputStream` can throw an `IOException`, 
> because it tries to read the file header during construction, which also 
> causes `fis` to leak. `SnappyOutputStream` cannot throw; I checked 
> [here|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java].
> I'll submit a PR with a (simple) fix shortly after this bug report goes up 
> and gets assigned an issue number, and add a link to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4243) Resource leaks in org.apache.zookeeper.server.persistence.SnapStream#getInputStream and #getOutputStream

2021-03-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4243.

Resolution: Duplicate

> Resource leaks in 
> org.apache.zookeeper.server.persistence.SnapStream#getInputStream and 
> #getOutputStream
> 
>
> Key: ZOOKEEPER-4243
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4243
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Martin Kellogg
>Priority: Major
>
> There are three (related) possible resource leaks in the getInputStream and 
> getOutputStream methods in SnapStream.java. I noticed the first because of 
> the use of the error-prone GZIPOutputStream, and the other two after looking 
> at the surrounding code.
> Here is the offending code (copied from 
> [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/SnapStream.java#L102):]
>  
> {code:java}
> /** 
>   * Return the CheckedInputStream based on the extension of the fileName.
>   * 
>   * @param file the file the InputStream read from
>   * @return the specific InputStream 
>   * @throws IOException
>   */
> public static CheckedInputStream getInputStream(File file) throws IOException 
> {  
> FileInputStream fis = new FileInputStream(file);
> InputStream is; 
> switch (getStreamMode(file.getName())) { 
> case GZIP:
> is = new GZIPInputStream(fis);
> break;
> case SNAPPY:
> is = new SnappyInputStream(fis);  
> break;
> case CHECKED:   
> default:
> is = new BufferedInputStream(fis);
> }
> return new CheckedInputStream(is, new Adler32());
> }
> 
> /** 
>   * Return the OutputStream based on predefined stream mode. 
>   * 
>   * @param file the file the OutputStream writes to 
>   * @param fsync sync the file immediately after write 
>   * @return the specific OutputStream 
>   * @throws IOException 
>   */
> public static CheckedOutputStream getOutputStream(File file, boolean fsync) 
> throws IOException {
> OutputStream fos = fsync ? new AtomicFileOutputStream(file) : new 
> FileOutputStream(file);
> OutputStream os;
> switch (streamMode) {
> case GZIP:
> os = new GZIPOutputStream(fos);  
> break;
> case SNAPPY:
> os = new SnappyOutputStream(fos);  
> break;
> case CHECKED:   
> default: 
> os = new BufferedOutputStream(fos); 
> }   
> return new CheckedOutputStream(os, new Adler32());  
> }{code}
> All three possible resource leaks are caused by the constructors of the 
> intermediate streams (i.e. is and os), some of which might throw IOExceptions:
>  * in getOutputStream, the call to "new GZIPOutputStream" can throw an 
> exception, because GZIPOutputStream writes out the header in the constructor. 
> If it does throw, then fos is never closed. That it does so makes it hard to 
> use correctly; someone raised this as an issue with the JDK folks 
> [here|[https://bugs.openjdk.java.net/browse/JDK-8180899]], but they closed it 
> as "won't fix" because the constructor is documented to throw (hence why we 
> need to catch the exception here).
>  * in getInputStream, the call to "new GZIPInputStream" can throw an 
> IOException for a similar reason, causing the file handle held by fis to leak.
>  * similarly, the call to "new SnappyInputStream" can throw an IOException, 
> because it tries to read the file header during construction, which also 
> causes fis to leak. SnappyOutputStream cannot throw; I checked 
> [here|[https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyOutputStream.java]].
> I will submit a PR with a fix on Github shortly and update this description 
> with a link.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4251) Flaky test: org.apache.zookeeper.test.WatcherTest

2021-03-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4251:
---
Fix Version/s: 3.8.0

> Flaky test: org.apache.zookeeper.test.WatcherTest
> -
>
> Key: ZOOKEEPER-4251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4251
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0
>
> Attachments: image-2021-03-16-12-24-27-480.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Flakyness=73.3% (11 / 15) 
>  !image-2021-03-16-12-24-27-480.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4251) Flaky test: org.apache.zookeeper.test.WatcherTest

2021-03-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4251.

Fix Version/s: 3.6.3
   Resolution: Fixed

Issue resolved by pull request 1647
[https://github.com/apache/zookeeper/pull/1647]

> Flaky test: org.apache.zookeeper.test.WatcherTest
> -
>
> Key: ZOOKEEPER-4251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4251
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.3
>
> Attachments: image-2021-03-16-12-24-27-480.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Flakyness=73.3% (11 / 15) 
>  !image-2021-03-16-12-24-27-480.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4259) Allow AdminServer to force https

2021-03-25 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4259.

Fix Version/s: 3.8.0
   Resolution: Fixed

> Allow AdminServer to force https
> 
>
> Key: ZOOKEEPER-4259
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4259
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.7.0
>Reporter: Norbert Kalmár
>Assignee: Norbert Kalmár
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Since portunification (ZOOKEEPER-3371), AdminServer supports https. But there 
> is no way to disable http and allow https only. It is my understanding, that 
> to be FLIPS compliant, only https is allowed. This is one reason it is good 
> to have such a feature. 
> To enable https currently, we need to set these parameters in zoo.cfg:
> {code:java}
> ssl.quorum.keyStore.location=/tmp/zookeeper/keystore.jks
> ssl.quorum.keyStore.password=password
> ssl.quorum.trustStore.location=/tmp/zookeeper/truststore.jks
> ssl.quorum.trustStore.password=password
> admin.portUnification=true
> {code}
> I generated keystore and truststore with the following commands:
> {code:java}
> #create test/dev keystore/truststore (ZK runs only on localhost)
> keytool -genkeypair -alias zk.dev -keyalg RSA -keysize 2048 -dname 
> "cn=zk.dev" -keypass password -keystore /tmp/zookeeper/keystore.jks -ext 
> san=dns:localhost -storepass password
> keytool -exportcert -alias zk.dev -keystore /tmp/zookeeper/keystore.jks -file 
> /tmp/zookeeper/zk.dev.cer -rfc
> keytool -keystore /tmp/zookeeper/truststore.jks -storepass password 
> -importcert -alias zk.dev -file /tmp/zookeeper/zk.dev.cer
> #check
> keytool -list -v -keystore /tmp/zookeeper/truststore.jks
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3128) Get CLI Command displays Authentication error for Authorization error

2021-03-23 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-3128:
---
Fix Version/s: (was: 3.7.0)

> Get CLI Command displays Authentication error for Authorization error
> -
>
> Key: ZOOKEEPER-3128
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3128
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CLI Get Command display 
> "org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /b" when user does not have read access on the znode /b.
> Steps to reproduce the bug:
> {noformat}
> [zk: vm1:2181(CONNECTED) 1] create /b
> Created /b
> [zk: vm1:2181(CONNECTED) 2] getAcl /b
> 'world,'anyone
> : cdrwa
> [zk: vm1:2181(CONNECTED) 3] setAcl /b world:anyone:wa
> [zk: vm1:2181(CONNECTED) 4] getAcl /b
> 'world,'anyone
> : wa
> [zk: vm1:2181(CONNECTED) 5] get /b
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /b
> [zk: vm1:2181(CONNECTED) 6]
> {noformat}
> Expected output:
> {noformat}
> [zk: vm1:2181(CONNECTED) 0] get /b
> Insufficient permission : /b
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3128) Get CLI Command displays Authentication error for Authorization error

2021-03-23 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-3128:
---
Fix Version/s: (was: 3.7.0.)
   3.7.0

> Get CLI Command displays Authentication error for Authorization error
> -
>
> Key: ZOOKEEPER-3128
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3128
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.7.0, 3.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CLI Get Command display 
> "org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /b" when user does not have read access on the znode /b.
> Steps to reproduce the bug:
> {noformat}
> [zk: vm1:2181(CONNECTED) 1] create /b
> Created /b
> [zk: vm1:2181(CONNECTED) 2] getAcl /b
> 'world,'anyone
> : cdrwa
> [zk: vm1:2181(CONNECTED) 3] setAcl /b world:anyone:wa
> [zk: vm1:2181(CONNECTED) 4] getAcl /b
> 'world,'anyone
> : wa
> [zk: vm1:2181(CONNECTED) 5] get /b
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /b
> [zk: vm1:2181(CONNECTED) 6]
> {noformat}
> Expected output:
> {noformat}
> [zk: vm1:2181(CONNECTED) 0] get /b
> Insufficient permission : /b
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4211) Expose Quota Metrics to Prometheus

2021-03-19 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305147#comment-17305147
 ] 

Damien Diederen commented on ZOOKEEPER-4211:


Hello [~liwang],

Sorry for the lag; things have been a bit frantic on other fronts.  I will 
review your contribution ASAP.  It might not get into 3.7.0, as we currently 
have a release candidate which is being voted on, but I will consider it for 
3.7.1 and {{master}}.

> Expose Quota Metrics to Prometheus
> --
>
> Key: ZOOKEEPER-4211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4211
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: metric system
>Affects Versions: 3.7.0, 3.7
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In 3.7, Quota limit can be enforced and the quota related stats are captured 
> in the StatsTrack.  From the "listquota" CLI command, we can the quota limit 
> and usage info. 
> As an addition to that, we would like to collect the quota metrics and expose 
> them to the Prometheus for the following:
> 1. Monitoring per namespace (Chroot) quota usage via the Grafana dashboard
> 2. Creating alert based on the quota levels (e.g. 90% used)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4257) learner.asyncSending and learner.closeSocketAsync should be configurable in zoo.cfg

2021-03-17 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303219#comment-17303219
 ] 

Damien Diederen commented on ZOOKEEPER-4257:


Removed the 3.7.0 tag as no resolution has been merged into {{branch-3.7.0}} as 
of now, and the ticket shouldn't appear in the release notes before then.

> learner.asyncSending and learner.closeSocketAsync should be configurable in 
> zoo.cfg
> ---
>
> Key: ZOOKEEPER-4257
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4257
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Mohammad Arshad
>Priority: Minor
> Fix For: 3.8.0
>
>
> Configurations learner.asyncSending and learner.closeSocketAsync introduced 
> in ZOOKEEPER-3575 and ZOOKEEPER-3574  are java system property only, which 
> means can not be configured
> through ZooKeeper configuration file zoo.cfg
> As these JIRA changes are not released yet it is better to correct it and 
> make it configurable through zoo.cfg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4257) learner.asyncSending and learner.closeSocketAsync should be configurable in zoo.cfg

2021-03-17 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4257:
---
Fix Version/s: (was: 3.7.0)

> learner.asyncSending and learner.closeSocketAsync should be configurable in 
> zoo.cfg
> ---
>
> Key: ZOOKEEPER-4257
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4257
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Mohammad Arshad
>Priority: Minor
> Fix For: 3.8.0
>
>
> Configurations learner.asyncSending and learner.closeSocketAsync introduced 
> in ZOOKEEPER-3575 and ZOOKEEPER-3574  are java system property only, which 
> means can not be configured
> through ZooKeeper configuration file zoo.cfg
> As these JIRA changes are not released yet it is better to correct it and 
> make it configurable through zoo.cfg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4241) Change log level without restarting zookeeper

2021-03-17 Thread Damien Diederen (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303217#comment-17303217
 ] 

Damien Diederen commented on ZOOKEEPER-4241:


[~arshad.mohammad]: Removed the 3.7.0 tag as it hasn't been merged into 
{{branch-3.7.0}} as of now, and shouldn't appear in the release notes before it 
is.

> Change log level without restarting zookeeper
> -
>
> Key: ZOOKEEPER-4241
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4241
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.6.2
> Environment: Kubernetes 
>Reporter: Pratik Thacker
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In our usecase of zookeeper, we want to change log level of zookeeper without 
> restarting it.
> This will help us to trace issues without restarting zookeeper as some of the 
> issues may not appear immediately after restart with debug log level enabled, 
> and it may take longer to reproduce the issue after restart.
> Does such feature/API is already available in Apache Zookeeper?
>  If it is not available then could you please  consider this request to 
> implement it?
> Please let us know if you need any further details from us.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4241) Change log level without restarting zookeeper

2021-03-17 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-4241:
---
Fix Version/s: (was: 3.7.0)

> Change log level without restarting zookeeper
> -
>
> Key: ZOOKEEPER-4241
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4241
> Project: ZooKeeper
>  Issue Type: Wish
>  Components: server
>Affects Versions: 3.6.2
> Environment: Kubernetes 
>Reporter: Pratik Thacker
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In our usecase of zookeeper, we want to change log level of zookeeper without 
> restarting it.
> This will help us to trace issues without restarting zookeeper as some of the 
> issues may not appear immediately after restart with debug log level enabled, 
> and it may take longer to reproduce the issue after restart.
> Does such feature/API is already available in Apache Zookeeper?
>  If it is not available then could you please  consider this request to 
> implement it?
> Please let us know if you need any further details from us.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3706) ZooKeeper.close() would leak SendThread when the network is broken

2021-03-17 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen updated ZOOKEEPER-3706:
---
Fix Version/s: (was: 3.7.0)

> ZooKeeper.close() would leak SendThread when the network is broken
> --
>
> Key: ZOOKEEPER-3706
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3706
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.6.0, 3.4.14, 3.5.6
>Reporter: Pierre Yin
>Assignee: Pierre Yin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> The close method of ZooKeeper may cause the leak of SendThread when the 
> network is broken.
> When the network is broken, the SendThread of ZooKeeper client falls into the 
> continuous reconnecting scenario. But there is an unsafe point which is just 
> at the moment before startConnect() during the continuous reconnecting. If 
> SendThread.close() in another thread hit the unsafe point, startConnect() 
> would sleep some time and force to change state to States.CONNECTING although 
> SendThread.close() already set state to States.CLOSED. In this case, the 
> SendThread would be never be dead and nobody would change the state again.
> In normal case, ZooKeeper.close() would be blocked forever to wait 
> closeSession packet is finished until the network broken is recovered. But if 
> user set the request timeout, ZooKeeper.close() would break the block waiting 
> within timeout and invoke SendThread.close() to change state to CLOSED. 
> That's why SendThread.close() can hit the unsafe point.
> Set request timeout is a very common practice. 
> I propose a patch and send it out later.
> Maybe someone can help to review it.
>  
> Thanks
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4231) Add document for snapshot compression config

2021-03-17 Thread Damien Diederen (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Diederen resolved ZOOKEEPER-4231.

Fix Version/s: 3.7.0
   Resolution: Fixed

Issue resolved by pull request 1642
[https://github.com/apache/zookeeper/pull/1642]

> Add document for snapshot compression config
> 
>
> Key: ZOOKEEPER-4231
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4231
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Huizhi Lu
>Assignee: Huizhi Lu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.7.0
>
>   Original Estimate: 24h
>  Time Spent: 40m
>  Remaining Estimate: 23h 20m
>
> Snapshot compression method was added in zookeeper, but there is no clear 
> documentation about the config. This ticket is created to add documentation 
> for the config:
> *zookeeper.snapshot.compression.method* 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >