[jira] [Resolved] (ZOOKEEPER-4531) Revert Netty TCNative change

2022-05-06 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4531.

Fix Version/s: 3.9.0
   3.7.1
   3.6.4
   3.8.1
   Resolution: Fixed

Issue resolved by pull request 1873
[https://github.com/apache/zookeeper/pull/1873]

> Revert Netty TCNative change
> 
>
> Key: ZOOKEEPER-4531
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4531
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.1, 3.6.4, 3.8.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> netty-tcnative is a dependency with netty 4.1.73. After upgrading netty to 
> 4.1.76 we can remove netty-tcnative as the netty-tcnative upgrade to 2.0.48 
> did not resolve any CVEs. 
>  
> As netty 4.1.76 does not have netty-tcnative dependency CVEs will also be 
> resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ZOOKEEPER-4529) Upgrade netty to 4.1.76.Final

2022-05-05 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4529.

Fix Version/s: 3.7.1
   3.6.4
   3.9.0
   3.8.1
   Resolution: Fixed

> Upgrade netty to 4.1.76.Final
> -
>
> Key: ZOOKEEPER-4529
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4529
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To resolve the CVEs generated due to netty-tcnative-classes:jar:2.0.46.Final 
> we should upgrade netty version.
> the following CVEs are coming due to dependency of 
> io.netty:netty-codec:jar:4.1.73.Final on 
> io.netty:netty-tcnative-classes:jar:2.0.46.Final.
>  
> CVE-2014-3488, CVE-2015-2156, CVE-2019-16869, CVE-2019-20444, CVE-2019-20445, 
> CVE-2021-21290, CVE-2021-21295, CVE-2021-21409, CVE-2021-37136, 
> CVE-2021-37137, CVE-2021-43797



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-05-04 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531892#comment-17531892
 ] 

Mohammad Arshad commented on ZOOKEEPER-4510:


dependency-check-maven upgrade to latest release 7.1.0 solves this false 
positive CVE issue. I will raise PR.

> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.4, 3.7.
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ZOOKEEPER-4482) Fix LICENSE FILES for commons-io and commons-cli

2022-04-24 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4482:
---
Fix Version/s: 3.9.0
   (was: 3.6.4)
   (was: 3.8.1)

> Fix LICENSE FILES for commons-io and commons-cli
> 
>
> Key: ZOOKEEPER-4482
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4482
> Project: ZooKeeper
>  Issue Type: Task
>  Components: license
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should rename from commons-io-2.7 to 2.11.0 and we should also add the 
> LICENSE file for commons-cli



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (ZOOKEEPER-4482) Fix LICENSE FILES for commons-io and commons-cli

2022-04-24 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4482.

Resolution: Fixed

> Fix LICENSE FILES for commons-io and commons-cli
> 
>
> Key: ZOOKEEPER-4482
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4482
> Project: ZooKeeper
>  Issue Type: Task
>  Components: license
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should rename from commons-io-2.7 to 2.11.0 and we should also add the 
> LICENSE file for commons-cli



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ZOOKEEPER-4287) Upgrade prometheus client library version to 0.10.0

2022-04-24 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4287:
---
Fix Version/s: (was: 3.7.1)

> Upgrade prometheus client library version to 0.10.0
> ---
>
> Key: ZOOKEEPER-4287
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4287
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.7.0, 3.8.0
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade the client library version to the latest to help investigating the 
> Prometheus impact issue.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ZOOKEEPER-4388) Recover from network partition, follower/observer ephemerals nodes is inconsistent with leader

2022-04-24 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4388:
---
Fix Version/s: (was: 3.7.1)

> Recover from network partition, follower/observer ephemerals nodes is 
> inconsistent with leader
> --
>
> Key: ZOOKEEPER-4388
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4388
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.6.3, 3.6.1, 3.6.2
> Environment: zk version 3.6.0 3.6.1 3.6.2
> the follower/observer network disconnection time exceeds session timeout
>Reporter: shixiaoxiao
>Priority: Major
>  Labels: inconsistency, partitoned, zookeeper
> Attachments: dataInconsistent.png
>
>
> The follower/observer enable read only. When the node returns to normal from 
> the partitioned, the ephemerals nodes will be  inconsistent with the leader 
> node. The reason is that the request to close the timeout sessions is 
> processed by the ReadOnly follower or observer  when they are partitioned and 
> the ephemerals nodes created by these sessions also are delete. When the 
> leader node uses diff to synchronize data with the follower/observer node, 
> the transaction that needs to be synchronized does not include the creation 
> of temporary nodes which created by sessions closed by followers.So the 
> follower/observer  ephemerals nodes is inconsistent with leader.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-24 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4510:
---
Fix Version/s: 3.7.
   (was: 3.7.1)

> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.4, 3.7.
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-16 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523064#comment-17523064
 ] 

Mohammad Arshad commented on ZOOKEEPER-1875:


Thanks [~jerryhe] for raising and submitting the patches.
Thanks [~symat], [~eolivelli] for the reviews.

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-16 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523059#comment-17523059
 ] 

Mohammad Arshad commented on ZOOKEEPER-1875:


When watcher is null ZooKeeper client app is anyway getting null pointer 
exception. 
Now after this fix the apps will start getting IllegalArgumentException which 
will make it easier to figure out the wrong in the code and correct it.


> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
>  Labels: pull-request-available
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-16 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523057#comment-17523057
 ] 

Mohammad Arshad commented on ZOOKEEPER-1875:


I have checked the zk client code carefully, NPE will occur only when watcher 
is set null either throw ZooKeeper constructor or through register method. Now 
I think we should do the exactly what had been submitted in latest 
ZOOKEEPER-1875.patch.

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
>  Labels: pull-request-available
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-11 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520474#comment-17520474
 ] 

Mohammad Arshad commented on ZOOKEEPER-4510:


As CVE false positive issue resolution is taking time. Lets suppress those CVEs 
and move on. I raised PR.

> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4515) ZK Cli quit command always logs error

2022-04-09 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520008#comment-17520008
 ] 

Mohammad Arshad commented on ZOOKEEPER-4515:


Thanks [~Tison], [~eolivelli] for the review.

> ZK Cli quit command always logs error
> -
>
> Key: ZOOKEEPER-4515
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4515
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
> Attachments: image-2022-04-08-15-47-04-325.png, screenshot-1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  !image-2022-04-08-15-47-04-325.png! 
> * When connection is in closing state, this log warning is entirely useless, 
> change this log to debug.
> * When JVM exiting with 0 log info instead of error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ZOOKEEPER-4515) ZK Cli quit command always logs error

2022-04-09 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4515.

Resolution: Fixed

Issue resolved by pull request 1856
[https://github.com/apache/zookeeper/pull/1856]

> ZK Cli quit command always logs error
> -
>
> Key: ZOOKEEPER-4515
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4515
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.1, 3.6.4, 3.8.1
>
> Attachments: image-2022-04-08-15-47-04-325.png, screenshot-1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  !image-2022-04-08-15-47-04-325.png! 
> * When connection is in closing state, this log warning is entirely useless, 
> change this log to debug.
> * When JVM exiting with 0 log info instead of error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-09 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4510:
---
Priority: Blocker  (was: Critical)

> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Blocker
> Fix For: 3.7.1, 3.6.4
>
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4516) checkstyle:check is failing

2022-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519711#comment-17519711
 ] 

Mohammad Arshad commented on ZOOKEEPER-4516:


Thanks [~symat] for the review.

> checkstyle:check  is failing
> 
>
> Key: ZOOKEEPER-4516
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4516
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> checkstyle:check is failing on branch-3.7 and branch-3.6



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ZOOKEEPER-4516) checkstyle:check is failing

2022-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4516.

Resolution: Fixed

Issue resolved by pull request 1858
[https://github.com/apache/zookeeper/pull/1858]

> checkstyle:check  is failing
> 
>
> Key: ZOOKEEPER-4516
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4516
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> checkstyle:check is failing on branch-3.7 and branch-3.6



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4514) ClientCnxnSocketNetty throwing NPE

2022-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519699#comment-17519699
 ] 

Mohammad Arshad commented on ZOOKEEPER-4514:


Thanks [~symat] for the review.

> ClientCnxnSocketNetty throwing NPE
> --
>
> Key: ZOOKEEPER-4514
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4514
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
> Attachments: image-2022-04-07-13-27-13-068.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ClientCnxnSocketNetty throwing NPE. This mainly happens when any of the 
> server is in restarting state and client tries to connect.
>  !image-2022-04-07-13-27-13-068.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4514) ClientCnxnSocketNetty throwing NPE

2022-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4514:
---
Fix Version/s: 3.7.1
   3.6.4
   3.9.0
   3.8.1

> ClientCnxnSocketNetty throwing NPE
> --
>
> Key: ZOOKEEPER-4514
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4514
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
> Attachments: image-2022-04-07-13-27-13-068.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ClientCnxnSocketNetty throwing NPE. This mainly happens when any of the 
> server is in restarting state and client tries to connect.
>  !image-2022-04-07-13-27-13-068.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ZOOKEEPER-4514) ClientCnxnSocketNetty throwing NPE

2022-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4514.

Resolution: Fixed

> ClientCnxnSocketNetty throwing NPE
> --
>
> Key: ZOOKEEPER-4514
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4514
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
> Attachments: image-2022-04-07-13-27-13-068.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ClientCnxnSocketNetty throwing NPE. This mainly happens when any of the 
> server is in restarting state and client tries to connect.
>  !image-2022-04-07-13-27-13-068.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4516) checkstyle:check is failing

2022-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519532#comment-17519532
 ] 

Mohammad Arshad commented on ZOOKEEPER-4516:


branch-3.6 error

{noformat}
[ERROR] 
src\test\java\org\apache\zookeeper\KerberosTicketRenewalTest.java:[220,7] 
(whitespace) WhitespaceAround: 'if' is not followed by whitespace.
[ERROR] 
src\test\java\org\apache\zookeeper\server\quorum\auth\MiniKdcTest.java:[26,8] 
(imports) UnusedImports: Unused import: java.util.HashMap
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-checkstyle-plugin:3.1.0:check (default-cli) on 
project zookeeper: You have 8 Checkstyle violations. -> [Help 1]

{noformat}


> checkstyle:check  is failing
> 
>
> Key: ZOOKEEPER-4516
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4516
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> checkstyle:check is failing on branch-3.7 and branch-3.6



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4516) checkstyle:check is failing

2022-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519513#comment-17519513
 ] 

Mohammad Arshad commented on ZOOKEEPER-4516:


mvn clean install checkstyle:check -DskipTests on branch-3.7 fails with 
following errors

{noformat}
[ERROR] 
src\test\java\org\apache\zookeeper\common\CertificatesToPlayWith.java:[564,90] 
(whitespace) OperatorWrap: '+' should be on a new line.
[ERROR] 
src\test\java\org\apache\zookeeper\common\CertificatesToPlayWith.java:[565,62] 
(whitespace) OperatorWrap: '+' should be on a new line.
[ERROR] src\test\java\org\apache\zookeeper\ZKUtilTest.java:[31,1] (imports) 
ImportOrder: Extra separation in import group before 
'org.apache.zookeeper.data.Stat'

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-checkstyle-plugin:3.1.1:check (default-cli) on 
project zookeeper: You have 438 Checkstyle violations. -> [Help 1]
[ERROR]
{noformat}




> checkstyle:check  is failing
> 
>
> Key: ZOOKEEPER-4516
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4516
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.7.1, 3.6.4
>
>
> checkstyle:check is failing on branch-3.7 and branch-3.6



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4515) ZK Cli quit command always logs error

2022-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4515:
---
Fix Version/s: 3.7.1
   3.6.4
   3.9.0
   3.8.1

> ZK Cli quit command always logs error
> -
>
> Key: ZOOKEEPER-4515
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4515
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
> Attachments: image-2022-04-08-15-47-04-325.png, screenshot-1.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  !image-2022-04-08-15-47-04-325.png! 
> * When connection is in closing state, this log warning is entirely useless, 
> change this log to debug.
> * When JVM exiting with 0 log info instead of error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4515) ZK Cli quit command always logs error

2022-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519510#comment-17519510
 ] 

Mohammad Arshad commented on ZOOKEEPER-4515:


After fix:
 !screenshot-1.png! 


> ZK Cli quit command always logs error
> -
>
> Key: ZOOKEEPER-4515
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4515
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Attachments: image-2022-04-08-15-47-04-325.png, screenshot-1.png
>
>
>  !image-2022-04-08-15-47-04-325.png! 
> * When connection is in closing state, this log warning is entirely useless, 
> change this log to debug.
> * When JVM exiting with 0 log info instead of error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4515) ZK Cli quit command always logs error

2022-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4515:
---
Attachment: screenshot-1.png

> ZK Cli quit command always logs error
> -
>
> Key: ZOOKEEPER-4515
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4515
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Attachments: image-2022-04-08-15-47-04-325.png, screenshot-1.png
>
>
>  !image-2022-04-08-15-47-04-325.png! 
> * When connection is in closing state, this log warning is entirely useless, 
> change this log to debug.
> * When JVM exiting with 0 log info instead of error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4516) checkstyle:check is failing

2022-04-08 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4516:
--

 Summary: checkstyle:check  is failing
 Key: ZOOKEEPER-4516
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4516
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Fix For: 3.7.1, 3.6.4


checkstyle:check is failing on branch-3.7 and branch-3.6



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4515) ZK Cli quit command always logs error

2022-04-08 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4515:
--

 Summary: ZK Cli quit command always logs error
 Key: ZOOKEEPER-4515
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4515
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Attachments: image-2022-04-08-15-47-04-325.png

 !image-2022-04-08-15-47-04-325.png! 

* When connection is in closing state, this log warning is entirely useless, 
change this log to debug.
* When JVM exiting with 0 log info instead of error



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-07 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518799#comment-17518799
 ] 

Mohammad Arshad edited comment on ZOOKEEPER-1875 at 4/7/22 11:15 AM:
-

I think as proposed in the first patch we should skip the watcher processing if 
either watcher or pair.event is null, but we should also add warning so in 
future we can get better understanding of the use case from where problem is 
coming.
{code:java}
if (watcher != null && pair.event != null) {
watcher.process(pair.event);
} else {
LOG.warn(
"Skipping watcher processing as watcher and pair.event cannot"
+ " be null. watcher={}, pair.event={}",
watcher == null ? "null" : watcher.getClass().getName(),
pair.event == null ? "null" : pair.event);
}
{code}


was (Author: arshad.mohammad):
I think as proposed in the first patch we should skip the watcher processing if 
either watcher or pair.event is null, but we should also add warning so in 
future we can get better understanding of the use case from where problem is 
coming.
{code:java}
if (watcher != null && pair.event != null) {
watcher.process(pair.event);
} else {
LOG.warn(
"Skipping watcher processing as watcher or pair.event cannot"
+ " be null. watcher={}, pair.event={}",
watcher == null ? "null" : watcher.getClass().getName(),
pair.event == null ? "null" : pair.event);
}
{code}

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-07 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518799#comment-17518799
 ] 

Mohammad Arshad commented on ZOOKEEPER-1875:


I think as proposed in the first patch we should skip the watcher processing if 
either watcher or pair.event is null, but we should also add warning so in 
future we can get better understanding of the use case from where problem is 
coming.
{code:java}
if (watcher != null && pair.event != null) {
watcher.process(pair.event);
} else {
LOG.warn(
"Skipping watcher processing as watcher or pair.event cannot"
+ " be null. watcher={}, pair.event={}",
watcher == null ? "null" : watcher.getClass().getName(),
pair.event == null ? "null" : pair.event);
}
{code}

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-07 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518795#comment-17518795
 ] 

Mohammad Arshad commented on ZOOKEEPER-1875:


I had observed this issue in the past, but right now don't have details of 
exact cause.

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-07 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-4510:
--

Assignee: Mohammad Arshad

> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.7.1, 3.6.4
>
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4514) ClientCnxnSocketNetty throwing NPE

2022-04-07 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518760#comment-17518760
 ] 

Mohammad Arshad commented on ZOOKEEPER-4514:


NPE is thrown because channel object is null in 
ClientCnxnSocketNetty#sendPkt(Packet p, boolean doFlush) method.
We should do null check same like the null check done in 
sendPacket(ClientCnxn.Packet p) method. 

As sendPacket is delegating the call to sendPkt method, better we can move the 
null check to sendPkt method to handle all scenarios.

> ClientCnxnSocketNetty throwing NPE
> --
>
> Key: ZOOKEEPER-4514
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4514
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Attachments: image-2022-04-07-13-27-13-068.png
>
>
> ClientCnxnSocketNetty throwing NPE. This mainly happens when any of the 
> server is in restarting state and client tries to connect.
>  !image-2022-04-07-13-27-13-068.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4514) ClientCnxnSocketNetty throwing NPE

2022-04-07 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4514:
--

 Summary: ClientCnxnSocketNetty throwing NPE
 Key: ZOOKEEPER-4514
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4514
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Attachments: image-2022-04-07-13-27-13-068.png

ClientCnxnSocketNetty throwing NPE. This mainly happens when any of the server 
is in restarting state and client tries to connect.

 !image-2022-04-07-13-27-13-068.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2022-04-07 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518639#comment-17518639
 ] 

Mohammad Arshad commented on ZOOKEEPER-1875:


This issue is still applicable to all versions.
is anybody interested in raising PR.

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-04-06 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517933#comment-17517933
 ] 

Mohammad Arshad commented on ZOOKEEPER-4504:


Thanks [~eolivelli] and [~symat] for the reviews.
Merged to master, branch-3.8, branch-3.7 and branch-3.6

> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> 
>
> Key: ZOOKEEPER-4504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
> nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>   - waiting to lock <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
> condition [0x7f9c06404000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>   at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>   - locked <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>   at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)
> ZKUtil.deleteRecursive is making async delete API call with MultiCallback as 
> it callback.
> As processWatchEvent is being processed, pathRoot or one of the child paths 
> must have set watcher for delete notification.
> When delete API is called, notification comes first to client then the actual 
> delete response.
> In this case both notification and delete response are processed through 
> callback and 

[jira] [Resolved] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-04-06 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4504.

Resolution: Fixed

Issue resolved by pull request 1843
[https://github.com/apache/zookeeper/pull/1843]

> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> 
>
> Key: ZOOKEEPER-4504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.1, 3.6.4, 3.8.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
> nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>   - waiting to lock <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
> condition [0x7f9c06404000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>   at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>   - locked <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>   at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)
> ZKUtil.deleteRecursive is making async delete API call with MultiCallback as 
> it callback.
> As processWatchEvent is being processed, pathRoot or one of the child paths 
> must have set watcher for delete notification.
> When delete API is called, notification comes first to client then the actual 
> delete response.
> In this case both notification and delete response are processed through 
> callback and through common waitingEvents queue one by 

[jira] [Updated] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-04-06 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3652:
---
Fix Version/s: 3.5.10

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.10, 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-05 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517849#comment-17517849
 ] 

Mohammad Arshad commented on ZOOKEEPER-4510:


Thanks [~c...@qos.ch] for the good suggestion.
I reported false positive issue.
https://github.com/jeremylong/DependencyCheck/issues/4316

> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Critical
> Fix For: 3.7.1, 3.6.4
>
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-05 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517660#comment-17517660
 ] 

Mohammad Arshad commented on ZOOKEEPER-4510:


you are right, I can see both the CVEs are marked as fixed
https://github.com/qos-ch/reload4j/issues/21
https://github.com/qos-ch/reload4j/commit/64902fe18ce5a5dd40487051a2f6231d9fbbe9b0
But don't know why these CVEs are reported in dependency check. 

I think we have to exclude these CVs to pass the dependency check.


> dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, 
> CVE-2022-23307
> ---
>
> Key: ZOOKEEPER-4510
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Critical
> Fix For: 3.7.1, 3.6.4
>
>
> On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is 
> failing with following errors.
> {code:java}
> [ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
> (default-cli) on project zookeeper-assembly:
> [ERROR]
> [ERROR] One or more dependencies were identified with vulnerabilities that 
> have a CVSS score greater than or equal to '0.0':
> [ERROR]
> [ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4510) dependency-check:check failing - reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307

2022-04-05 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4510:
--

 Summary: dependency-check:check failing - reload4j-1.2.19.jar: 
CVE-2020-9493, CVE-2022-23307
 Key: ZOOKEEPER-4510
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4510
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mohammad Arshad
 Fix For: 3.7.1, 3.6.4


On branch-3.7 "mvn clean package -DskipTests dependency-check:check" is failing 
with following errors.

{code:java}
[ERROR] Failed to execute goal org.owasp:dependency-check-maven:6.5.3:check 
(default-cli) on project zookeeper-assembly:
[ERROR]
[ERROR] One or more dependencies were identified with vulnerabilities that have 
a CVSS score greater than or equal to '0.0':
[ERROR]
[ERROR] reload4j-1.2.19.jar: CVE-2020-9493, CVE-2022-23307
{code}





--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-04-05 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517305#comment-17517305
 ] 

Mohammad Arshad commented on ZOOKEEPER-4504:


Hi [~ctubbsii]
I have added few more details in description, Pls have a look.
bq. It seems like this problem is either caused by a poorly written callback 
that synchronizes in a way it shouldn't.
I observed this issue when I upgraded from zk 3.5.6 to zk 3.6.3 in my cluster, 
HDFS code remained same before and after upgrade. So at high level it is not an 
issue of poorly written callback. It is an issue of impact of changed behavior 
in ZKUtil.deleteRecursive API.

Also please have a look on the proposed change in the PR. May be it is helpful 
in understanding the context 





> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> 
>
> Key: ZOOKEEPER-4504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
> nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>   - waiting to lock <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
> condition [0x7f9c06404000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>   at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>   - locked <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>   at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> 

[jira] [Updated] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-04-05 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4504:
---
Description: 
*Problem and Analysis:*
After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA functionality 
as shown in below thread dumps.
{code:java}
"main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
- waiting to lock <0xc17986c0> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
at 
org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
{code}
{code:java}
"main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
condition [0x7f9c06404000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xc1b383c8> (a 
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
at 
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
at 
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
at 
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
- locked <0xc17986c0> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
at 
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
at 
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
at 
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
at 
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
at 
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
at 
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
at 
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
{code}
org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)

ZKUtil.deleteRecursive is making async delete API call with MultiCallback as it 
callback.
As processWatchEvent is being processed, pathRoot or one of the child paths 
must have set watcher for delete notification.

When delete API is called, notification comes first to client then the actual 
delete response.
In this case both notification and delete response are processed through 
callback and through common waitingEvents queue one by one.

First notification is processed, but it cannot complete as it cannot take lock 
on processWatchEvent() method as lock is already taken by another thread while 
calling clearParentZNode()
As delete notification cannot be processed, MultiCallback is not taken from 
queue for processing. It stays there in the queue forever.

 

*Why this problem was not happening with earlier versions (3.5.x)?*

In earlier ZK versions, ZKUtil.deleteRecursive was using sync delete API. So 
delete response was processed directly not though the callback. 
Sot both clearParentZNode and processWatchEvent were completing independently. 


*Proposed Fix:*
There are two approaches to fix this problem. 
1. We can fix the problem in HDFS, modify the HDFS code to avoid 

[jira] [Resolved] (ZOOKEEPER-4505) CVE-2020-36518 - Upgrade jackson databind to 2.13.2.1

2022-03-31 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4505.

Fix Version/s: 3.9.0
   3.7.1
   3.6.4
   3.8.1
   Resolution: Fixed

Issue resolved by pull request 1846
[https://github.com/apache/zookeeper/pull/1846]

> CVE-2020-36518 - Upgrade jackson databind to 2.13.2.1
> -
>
> Key: ZOOKEEPER-4505
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4505
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Edwin Hobor
>Priority: Major
>  Labels: pull-request-available, security
> Fix For: 3.9.0, 3.7.1, 3.6.4, 3.8.1
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *CVE-2020-36518* vulnerability affects jackson-databind in Zookeeper (see 
> [https://github.com/advisories/GHSA-57j2-w4cx-62h2]).
> Upgrading to jackson-databind version *2.13.2.1* should address this issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-30 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514878#comment-17514878
 ] 

Mohammad Arshad commented on ZOOKEEPER-3652:


Merged. Thanks [~sylvain] for the contribution.

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-30 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-3652.

Resolution: Fixed

Issue resolved by pull request 1257
[https://github.com/apache/zookeeper/pull/1257]

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.9.0, 3.7.1, 3.6.4, 3.8.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4388) Recover from network partition, follower/observer ephemerals nodes is inconsistent with leader

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4388:
---
Fix Version/s: 3.7.1
   (was: 3.7)

> Recover from network partition, follower/observer ephemerals nodes is 
> inconsistent with leader
> --
>
> Key: ZOOKEEPER-4388
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4388
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.6.3, 3.6.1, 3.6.2
> Environment: zk version 3.6.0 3.6.1 3.6.2
> the follower/observer network disconnection time exceeds session timeout
>Reporter: shixiaoxiao
>Priority: Major
>  Labels: inconsistency, partitoned, zookeeper
> Fix For: 3.7.1
>
> Attachments: dataInconsistent.png
>
>
> The follower/observer enable read only. When the node returns to normal from 
> the partitioned, the ephemerals nodes will be  inconsistent with the leader 
> node. The reason is that the request to close the timeout sessions is 
> processed by the ReadOnly follower or observer  when they are partitioned and 
> the ephemerals nodes created by these sessions also are delete. When the 
> leader node uses diff to synchronize data with the follower/observer node, 
> the transaction that needs to be synchronized does not include the creation 
> of temporary nodes which created by sessions closed by followers.So the 
> follower/observer  ephemerals nodes is inconsistent with leader.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4287) Upgrade prometheus client library version to 0.10.0

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4287:
---
Fix Version/s: 3.7.1
   (was: 3.7)

> Upgrade prometheus client library version to 0.10.0
> ---
>
> Key: ZOOKEEPER-4287
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4287
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.7.0, 3.8.0
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Upgrade the client library version to the latest to help investigating the 
> Prometheus impact issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3652:
---
Fix Version/s: 3.7.1
   3.9.0
   3.8.1

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3652:
---
Fix Version/s: 3.6.4
   (was: 3.6.4,3.7.1,3.8.0,3.9.0)

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-29 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514212#comment-17514212
 ] 

Mohammad Arshad commented on ZOOKEEPER-3652:


Some how wrong version number "3.6.4,3.7.1,3.8.0,3.9.0" got created into the 
jira system. I will delete this version number

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.4,3.7.1,3.8.0,3.9.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-03-29 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514204#comment-17514204
 ] 

Mohammad Arshad commented on ZOOKEEPER-4504:


Analysis is put into the issue description, hdfs code 
org.apache.hadoop.ha.ActiveStandbyElector

> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> 
>
> Key: ZOOKEEPER-4504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
> nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>   - waiting to lock <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
> condition [0x7f9c06404000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>   at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>   - locked <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>   at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)
> ZKUtil.deleteRecursive is async API call and in callback it is invoking 
> ActiveStandbyElector#processWatchEvent which is synchronized on 
> ActiveStandbyElector instance.
> So there is deadlock, clearParentZNode() is waiting processWatchEvent() to 
> complete and processWatchEvent() is waiting clearParentZNode to complete
>  
> *Why this problem was not happening with earlier versions (3.5.x)?*
> In earlier zk versions, 

[jira] [Updated] (ZOOKEEPER-4289) Reduce the performance impact of Prometheus metrics

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4289:
---
Fix Version/s: 3.8.1
   (was: 3.8.0)

> Reduce the performance impact of Prometheus metrics
> ---
>
> Key: ZOOKEEPER-4289
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4289
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: metric system
>Affects Versions: 3.6.3, 3.7.0, 3.6.2, 3.8.0, 3.7.1
>Reporter: Li Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> I have done load comparison tests for Prometheus enabled vs disabled and 
> found the performance is reduced about 40%-60% for both read and write 
> operations (i.e. getData, getChildren and createNode). 
> Looked more into this and found that Prometheus Summary is very expensive and 
> there are more than 20 Summary metrics added to the new CommitProcessor.
> Need a solution to reduce the impact of Prometheus metrics.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-03-29 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514194#comment-17514194
 ] 

Mohammad Arshad commented on ZOOKEEPER-4504:


There is no need to create new sync or asyc API. There was already one attempt 
in ZOOKEEPER-3763 to make deleteRecursive API compatible to older version. but 
that patch missed changing  delete api invocation from async to sync. In this 
jira we can handle that part and make deleteRecursive API fully compatible with 
older versions

bq. It's not clear from the discussion above what the actual cause was.
Please refer the analysis along with the hdfs code, hope that will give better 
clarity on the root cause.  




> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> 
>
> Key: ZOOKEEPER-4504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
> nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>   - waiting to lock <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
> condition [0x7f9c06404000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>   at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>   - locked <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>   at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)
> ZKUtil.deleteRecursive is async API call and in callback it is invoking 
> 

[jira] [Created] (ZOOKEEPER-4507) Create ZOO_DAEMON_OUT file backup when restarting the server

2022-03-29 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4507:
--

 Summary: Create ZOO_DAEMON_OUT file backup when restarting the 
server
 Key: ZOOKEEPER-4507
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4507
 Project: ZooKeeper
  Issue Type: Improvement
  Components: scripts
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Attachments: image-2022-03-29-20-33-57-181.png

The ZooKeeper server deamon out file zookeeper-$USER-server-$HOSTNAME.out is 
overwritten on every server restart. 
Like the other log file we should create backup of this file also. Many times 
information logged into these file are useful in issue analysis. For example 
this contains the information about which transaction and which snapshot files 
were deleted. These are useful info, we should retain for some time. May be by 
default we can backup 5 .out files like below

 !image-2022-03-29-20-33-57-181.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ZOOKEEPER-4506) Change Server default appender from CONSOLE to ROLLINGFILE

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4506:
---
Summary: Change Server default appender from CONSOLE to ROLLINGFILE  (was: 
Change Server default log4j appender from CONSOLE to ROLLINGFILE)

> Change Server default appender from CONSOLE to ROLLINGFILE
> --
>
> Key: ZOOKEEPER-4506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4506
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>
> Server by default logs to CONSOLE and then contents are redirected to 
> zookeeper-$USER-server-$HOSTNAME.out" file. This file is overwritten on every 
> server restart, the size of this file keeps growing, does not split when size 
> is bigger. 
> I think default logging appender should be ROLLINGFILE instead of CONSOLE. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4506) Change Server default log4j appender from CONSOLE to ROLLINGFILE

2022-03-29 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4506:
--

 Summary: Change Server default log4j appender from CONSOLE to 
ROLLINGFILE
 Key: ZOOKEEPER-4506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4506
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1


Server by default logs to CONSOLE and then contents are redirected to 
zookeeper-$USER-server-$HOSTNAME.out" file. This file is overwritten on every 
server restart, the size of this file keeps growing, does not split when size 
is bigger. 

I think default logging appender should be ROLLINGFILE instead of CONSOLE. 








--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-03-29 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514033#comment-17514033
 ] 

Mohammad Arshad commented on ZOOKEEPER-4504:


yes, I am sure problem is not because of low value of rateLimit. If it was 
because of low concurrency, call would have taken bit longer but not stuck 
forever. Also the call was not deleting a lot of znode, so there was no need to 
submit multiple batches, one batch was sufficient. 

> ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality
> 
>
> Key: ZOOKEEPER-4504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1
>
>
> *Problem and Analysis:*
> After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA 
> functionality as shown in below thread dumps.
> {code:java}
> "main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
> nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
>   - waiting to lock <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
> {code}
> {code:java}
> "main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
> condition [0x7f9c06404000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc1b383c8> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
>   at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
>   at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
>   - locked <0xc17986c0> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
>   at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
>   at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
>   at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
> {code}
> org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
> synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)
> ZKUtil.deleteRecursive is async API call and in callback it is invoking 
> ActiveStandbyElector#processWatchEvent which is synchronized on 
> ActiveStandbyElector instance.
> So there is deadlock, clearParentZNode() is waiting processWatchEvent() to 
> complete and processWatchEvent() is waiting clearParentZNode to complete
>  
> *Why 

[jira] [Updated] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-29 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3652:
---
Fix Version/s: 3.6.4,3.7.1,3.8.0,3.9.0

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.4,3.7.1,3.8.0,3.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-3652) Improper synchronization in ClientCnxn

2022-03-29 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513978#comment-17513978
 ] 

Mohammad Arshad commented on ZOOKEEPER-3652:


Good finding [~sylvain]. I will review and merge your PR.

> Improper synchronization in ClientCnxn
> --
>
> Key: ZOOKEEPER-3652
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3652
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.6
>Reporter: Sylvain Wallez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-2111 introduced {{synchronized(state)}} statements in 
> {{ClientCnxn}} and {{ClientCnxn.SendThread}} to coordinate insertion in 
> {{outgoingQueue}} and draining it when the client connection isn't alive.
> There are several issues with this approach:
>  - the value of the {{state}} field is not stable, meaning we don't always 
> synchronize on the same object.
>  - the {{state}} field is an enum value, which are global objects. So in an 
> application with several ZooKeeper clients connected to different servers, 
> this causes some contention between clients.
> An easy fix is change those {{synchronized(state)}} statements to 
> {{synchronized(outgoingQueue)}} since it is local to each client and is what 
> we want to coordinate.
> I'll be happy to prepare a PR with the above change if this is deemed to be 
> the correct way to fix it.
>  
> Another issue that makes contention worse is 
> {{ClientCnxnSocketNIO.cleanup()}} that is called from within the above 
> synchronized block and contains {{Thread.sleep(100)}}. Why is this sleep 
> statement needed, and can we remove it?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4504) ZKUtil#deleteRecursive causing deadlock in HDFS HA functionality

2022-03-29 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4504:
--

 Summary: ZKUtil#deleteRecursive causing deadlock in HDFS HA 
functionality
 Key: ZOOKEEPER-4504
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4504
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Fix For: 3.7.1, 3.6.4, 3.9.0, 3.8.1


*Problem and Analysis:*
After integrating ZooKeeper 3.6.3 we observed deadlock in HDFS HA functionality 
as shown in below thread dumps.
{code:java}
"main-EventThread" #33 daemon prio=5 os_prio=0 tid=0x7f9c017f1000 
nid=0x101b waiting for monitor entry [0x7f9bda8a6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:603)
- waiting to lock <0xc17986c0> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
at 
org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.process(ActiveStandbyElector.java:1193)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:626)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:582)
{code}
{code:java}
"main" #1 prio=5 os_prio=0 tid=0x7f9c0006 nid=0xea3 waiting on 
condition [0x7f9c06404000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xc1b383c8> (a 
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:999)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:467)
at org.apache.zookeeper.ZKUtil.deleteInBatch(ZKUtil.java:122)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:64)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:76)
at 
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:386)
at 
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:383)
at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1103)
at 
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
at 
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:383)
- locked <0xc17986c0> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
at 
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:290)
at 
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:227)
at 
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:66)
at 
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:186)
at 
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:182)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1741)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:498)
at 
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:182)
at 
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:220)
{code}
org.apache.hadoop.ha.ActiveStandbyElector#clearParentZNode is instance 
synchronized and calls ZKUtil.deleteRecursive(zk, pathRoot)

ZKUtil.deleteRecursive is async API call and in callback it is invoking 
ActiveStandbyElector#processWatchEvent which is synchronized on 
ActiveStandbyElector instance.

So there is deadlock, clearParentZNode() is waiting processWatchEvent() to 
complete and processWatchEvent() is waiting clearParentZNode to complete

 

*Why this problem was not happening with earlier versions (3.5.x)?*

In earlier zk versions, ZKUtil.deleteRecursive was using sync zk API 
intnernally. So there was no callback (processWatchEvent) coming into the 
scenario.


*Proposed Fix:*
There are two approaches to fix this problem. 
1. We can fix the problem in HDFS, modify the HDFS code to avoid the deadlock. 
But we may get similar bugs in other projects.
2. Fix the problem in ZK. Make the API behavior same as the old behavior(use 
sync API to delete the ZK node) and provide new overloaded API with new 
behavior(use async API to delete the ZK node)

I propose to fix the 

[jira] [Resolved] (ZOOKEEPER-4434) Backport ZOOKEEPER-3142 for branch-3.5

2022-02-21 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4434.

Fix Version/s: 3.5.10
   Resolution: Fixed

Issue resolved by pull request 1791
[https://github.com/apache/zookeeper/pull/1791]

> Backport ZOOKEEPER-3142 for branch-3.5
> --
>
> Key: ZOOKEEPER-4434
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4434
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.5.9
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.10
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> h5. Extend SnapshotFormatter to dump data in json format.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ZOOKEEPER-4433) Backport ZOOKEEPER-2872 for branch-3.5

2022-02-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488971#comment-17488971
 ] 

Mohammad Arshad commented on ZOOKEEPER-4433:


Thanks [~ananysin] for submitting the PR.
Added you as ZK contributor

> Backport ZOOKEEPER-2872 for branch-3.5
> --
>
> Key: ZOOKEEPER-4433
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4433
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.9
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ZOOKEEPER-4433) Backport ZOOKEEPER-2872 for branch-3.5

2022-02-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-4433:
--

Assignee: Ananya Singh

> Backport ZOOKEEPER-2872 for branch-3.5
> --
>
> Key: ZOOKEEPER-4433
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4433
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.9
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ZOOKEEPER-4433) Backport ZOOKEEPER-2872 for branch-3.5

2022-02-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4433.

Fix Version/s: 3.5.10
   Resolution: Fixed

Issue resolved by pull request 1790
[https://github.com/apache/zookeeper/pull/1790]

> Backport ZOOKEEPER-2872 for branch-3.5
> --
>
> Key: ZOOKEEPER-4433
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4433
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.9
>Reporter: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.10
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (ZOOKEEPER-4385) Backport ZOOKEEPER-4278 to branch-3.5 to Address CVE-2021-21409

2021-09-24 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4385.

Fix Version/s: 3.5.10
   Resolution: Fixed

Issue resolved by pull request 1762
[https://github.com/apache/zookeeper/pull/1762]

> Backport ZOOKEEPER-4278 to branch-3.5 to Address CVE-2021-21409
> ---
>
> Key: ZOOKEEPER-4385
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4385
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.10
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Backport ZOOKEEPER-4278 to branch-3.5 to address CVE-2021-21409



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-09-23 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419023#comment-17419023
 ] 

Mohammad Arshad commented on ZOOKEEPER-4278:


Thanks [~brahmareddy] for creating new bug ZOOKEEPER-4385 to backport to 
branch-3.5. Pls raise PR as well.

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Ayush Mantri
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0, 3.7.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4282) Redesign quota feature

2021-08-11 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397435#comment-17397435
 ] 

Mohammad Arshad commented on ZOOKEEPER-4282:


[~ztzg] Thanks for sharing the scary story. Its supporting the need to protect 
the internal ZooKeeper data structure from outside modification.

bq.  I would suggest opening another ticket, and creating PRs preventing the 
server crash for 3.5 and 3.6. WDYT? Should I take care of it?
I think preventing server crash after allowing wrong data to be set in quota 
znode will work but it is better to prevent the cause itself. Should not allow 
to set wrong data in quota znodes. 
But the changes I am proposing will be big and may not go in branch-3.6 and 
branch-3.5. So I am OK with any work around solution on those branches. If you 
have interest please go ahead.(y)




> Redesign quota feature
> --
>
> Key: ZOOKEEPER-4282
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4282
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quota
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.8.0
>
>
> *Quota Use Case:*
> Generally in a big data solution deployment multiple services (hdfs, yarn, 
> hbase etc.) use single Zookeeper cluster. So it is very important to ensure 
> fare usage by all services. Sometime services unintentionally, mainly because 
> of faulty behavior, create many znodes and impact the overall reliability of 
> the ZooKeeper service. To ensure the faire usage quota feature is required. 
> But this is the only use case there are many other use cases for quota 
> feature.
> *Current Problems:*
> # Currently, user can set quota by updating znode 
> “/zookeeper/quota/nodepath”, or using setquota/delquota in CLI command.
> This makes the quota setting infective
> Currently any user can set/delete quota, which is not proper, it should be 
> admin operation
> # User is allowed to modify zookeeper system paths like /zookeeper/quota. 
> These are internal to zookeeper should not be allowed to modify.
> # Generally services create single top level znode in Zookeeper like /hbase 
> and create all required znode under it. 
> It is better if it is configurable who can create top level znodes to 
> controll ZooKeeper usage.
> # After ZOOKEEPER-231, there two kinds quota enforcement limits 1. Hard limit 
> 2. Soft limit. 
> I think there should be only limit. When enforce quota is enabled that limits 
> becomes the hard limit otherwise it is soft limit same as old feature, just 
> logs warnings.
> *Proposed Solution*
> # Add setQuota and deleteQuota admin APIs. Add listQuota normal user API
> Modify quota cli commands to use these APIs instead of directory modifying 
> ZooKeeper system path /zookeeper/quota/
> # Protect ZooKeeper system paths from outside modification. System should 
> only be readable from outside
> # Expose configuration to set ACL for root system znode. 
> After this, at the time of ZooKeeper service deployment administrator can 
> create top level znode for a service and set quota. This way we can control 
> overall ZooKeeper usage
> # Revert some of the changes in ZOOKEEPER-231 and move to single quota limit



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4345) Avoid NoSunchMethodException caused by shaded zookeeper jar

2021-08-11 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4345:
---
Summary: Avoid NoSunchMethodException caused by shaded zookeeper jar  (was: 
Avoid NoSunchMethodException caused by shaded)

> Avoid NoSunchMethodException caused by shaded zookeeper jar
> ---
>
> Key: ZOOKEEPER-4345
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4345
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Bo Cui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1, 3.6.4
>
> Attachments: image-2021-08-07-17-30-42-883.png, 
> image-2021-08-07-18-52-00-633.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> in OS Flink, flink relocate zk to 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.*
> [https://github.com/apache/flink-shaded/blob/82f8bb3324864491dc62c4d3e27f1c1ccc49ac84/flink-shaded-zookeeper-parent/pom.xml#L68]
> the maven-shade-plugin changes all 'org.apache.zookeeper' to 
> 'org.apache.flink.shaded.zookeeper3.org.apache.zookeeper'
> if JVM has -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.*, and in shaded 
> zk jar, will get NoSunchMethodException
>   !image-2021-08-07-18-52-00-633.png!
> !image-2021-08-07-17-30-42-883.png!
> code: 
> [https://github.com/apache/zookeeper/blob/9a5da5f9a023e53bf339748b5b7b17278ae36475/zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java#L3029]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4345) Avoid NoSunchMethodException caused by shaded

2021-08-11 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4345.

Fix Version/s: 3.6.4
   3.7.1
   3.8.0
   Resolution: Fixed

Issue resolved by pull request 1736
[https://github.com/apache/zookeeper/pull/1736]

> Avoid NoSunchMethodException caused by shaded
> -
>
> Key: ZOOKEEPER-4345
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4345
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Bo Cui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1, 3.6.4
>
> Attachments: image-2021-08-07-17-30-42-883.png, 
> image-2021-08-07-18-52-00-633.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> in OS Flink, flink relocate zk to 
> org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.*
> [https://github.com/apache/flink-shaded/blob/82f8bb3324864491dc62c4d3e27f1c1ccc49ac84/flink-shaded-zookeeper-parent/pom.xml#L68]
> the maven-shade-plugin changes all 'org.apache.zookeeper' to 
> 'org.apache.flink.shaded.zookeeper3.org.apache.zookeeper'
> if JVM has -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.*, and in shaded 
> zk jar, will get NoSunchMethodException
>   !image-2021-08-07-18-52-00-633.png!
> !image-2021-08-07-17-30-42-883.png!
> code: 
> [https://github.com/apache/zookeeper/blob/9a5da5f9a023e53bf339748b5b7b17278ae36475/zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java#L3029]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-4275) Slowness in sasl login or subject.doAs() causes zk client to falsely assume that the server did not respond, closes connection and goes to unnecessary retries

2021-06-09 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-4275:
--

Assignee: Ravi Kishore Valeti

> Slowness in sasl login or subject.doAs() causes zk client to falsely assume 
> that the server did not respond, closes connection and goes to unnecessary 
> retries
> --
>
> Key: ZOOKEEPER-4275
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4275
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.9
>Reporter: Ravi Kishore Valeti
>Assignee: Ravi Kishore Valeti
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.10, 3.8.0, 3.7.1, 3.6.4
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Zookeeper client does sasl auth (login and subject.doAs())as a preset before 
> attempting a connection to server.
>  If there is a delay in sasl auth (possibly due to slow Kerberos 
> communication), ZK client falsely assumes that the zk server did not respond 
> and runs in to unnecessary multiple retries.
> Client configuration:
>  "zookeeper.session.timeout" = "3000"
>  "zookeeper.recovery.retry" = "1"
>  "zookeeper.recovery.retry.intervalmill" = "500"
> This configuration translates to  
> connect timeout as 1000ms
>  Read Timeout as 2000ms
> Example: There was a 3 second delay in logging in the user as seen from the 
> logs below. The connection attempt was made later. However, zk client did not 
> wait for server response but logged a timeout (3 seconds > 1 sec connect 
> timeout), closed the connection and went to retries. Since there was a 
> consistent delay at Kerberos master, we had seen this retries go as long as 
> 10 mins causing requests to timeout/fail.
> Logs:
> 3/23/21 4:15:*32.389* AM jute.maxbuffer value is x Bytes
> 3/23/21 4:15:*35.395* AM Client successfully logged in.
> 3/23/21 4:15:35.396 AM TGT refresh sleeping until: Wed Mar 24 00:34:31 GMT 
> 2021
> 3/23/21 4:15:35.396 AM TGT refresh thread started.
> 3/23/21 4:15:35.396 AM Client will use GSSAPI as SASL mechanism.
> 3/23/21 4:15:35.396 AM TGT expires:                  xxx Mar xx 04:15:35 GMT 
> 2021
> 3/23/21 4:15:35.396 AM TGT valid starting at:        xxx Mar xx 04:15:35 GMT 
> 2021
> 3/23/21 4:15:*35.397* AM *Opening socket connection* to server x:2181. 
> Will attempt to SASL-authenticate using Login Context section 'Client'
> 3/23/21 4:15:*35.397* AM *Client session timed out, have not heard from 
> server in* *3008ms* for sessionid 0x0
> 3/23/21 4:15:35.397 AM Client session timed out, have not heard from server 
> in 3008ms for sessionid 0x0, closing socket connection and attempting 
> reconnect
> 3/23/21 4:15:35.498 AM TGT renewal thread has been interrupted and will exit.
> 3/23/21 4:15:38.503 AM Client successfully logged in.
> 3/23/21 4:15:38.503 AM TGT expires:                  xxx Mar xx 04:15:38 GMT 
> 2021
> 3/23/21 4:15:38.503 AM Client will use GSSAPI as SASL mechanism.
> 3/23/21 4:15:38.503 AM TGT valid starting at:        xxx Mar xx 04:15:38 GMT 
> 2021
> 3/23/21 4:15:38.503 AM TGT refresh thread started.
> 3/23/21 4:15:38.503 AM TGT refresh sleeping until: Wed Mar 24 00:10:10 GMT 
> 2021
> 3/23/21 4:15:38.506 AM Opening socket connection to server x:2181. Will 
> attempt to SASL-authenticate using Login Context section 'Client'
> 3/23/21 4:15:38.506 AM Client session timed out, have not heard from server 
> in 3009ms for sessionid 0x0, closing socket connection and attempting 
> reconnect
> 3/23/21 4:15:38.506 AM Client session timed out, have not heard from server 
> in 3009ms for sessionid 0x0
> 3/23/21 4:15:38.606 AM TGT renewal thread has been interrupted and will exit.
> 3/23/21 4:15:41.610 AM Client successfully logged in.
> 3/23/21 4:15:41.611 AM TGT refresh sleeping until: xxx Mar xx 23:42:03 GMT 
> 2021
> 3/23/21 4:15:41.611 AM Client will use GSSAPI as SASL mechanism.
> 3/23/21 4:15:41.611 AM TGT valid starting at:        xxx Mar xx 04:15:41 GMT 
> 2021
> 3/23/21 4:15:41.611 AM TGT expires:                  xxx Mar xx 04:15:41 GMT 
> 2021
> 3/23/21 4:15:41.611 AM TGT refresh thread started.
> 3/23/21 4:15:41.612 AM Opening socket connection to server x:2181. Will 
> attempt to SASL-authenticate using Login Context section 'Client'
> 3/23/21 4:15:41.613 AM Client session timed out, have not heard from server 
> in 3006ms for sessionid 0x0
> 3/23/21 4:15:41.613 AM Client session timed out, have not heard from server 
> in 3006ms for sessionid 0x0, closing socket connection and attempting 
> reconnect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4247) NPE while processing message from restarted quorum member

2021-04-14 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4247.

Fix Version/s: 3.6.4
   3.7.1
   3.8.0
   Resolution: Fixed

> NPE while processing message from restarted quorum member
> -
>
> Key: ZOOKEEPER-4247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4247
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.2
> Environment: K8S
>Reporter: Devarshi Shah
>Assignee: Mate Szalay-Beko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1, 3.6.4
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Problem:*
> While upgrading K8S cluster, container running Zookeeper (during serving it's 
> client) will rollover one by one.
>  During this rollover, +Null Pointer Exception+ was observed as below.
>  After updating to the latest Zookeeper 3.6.2 we still see the problem.
>  This is happening on a fresh install (and has all the time).
>  
> *Stack-trace**:*
> 
> {code:java}
> 2021-02-08T12:42:08.229+ [myid:] - ERROR 
> [nioEventLoopGroup-4-1:NettyServerCnxnFactory$CnxnChannelHandler@329] - 
> Unexpected exception in receive
>  java.lang.NullPointerException: null
>      at 
> org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:518)
>  ~[zookeeper-3.6.2.jar:3.6.2]
>      at 
> org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:368)
>  ~[zookeeper-3.6.2.jar:3.6.2]
>      at 
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:326)
>  [zookeeper-3.6.2.jar:3.6.2]
>      at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) 
> [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>  [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) 
> [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) 
> [netty-transport-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  [netty-common-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> [netty-common-4.1.50.Final.jar:4.1.50.Final]
>      at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-common-4.1.50.Final.jar:4.1.50.Final]
>      at java.lang.Thread.run(Thread.java:834) [?:?]
> {code}
>  
>  
> *Expectation:*
> This scenario should be handled and Zookeeper should not print Null Pointer 
> Exception in logs when peer member goes down as a part of the upgrade 
> procedure. 
> We are kindly requesting Apache Zookeeper team to fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-4282) Redesign quota feature

2021-04-14 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4282:
--

 Summary: Redesign quota feature
 Key: ZOOKEEPER-4282
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4282
 Project: ZooKeeper
  Issue Type: New Feature
  Components: quota
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
 Fix For: 3.8.0


*Quota Use Case:*

Generally in a big data solution deployment multiple services (hdfs, yarn, 
hbase etc.) use single Zookeeper cluster. So it is very important to ensure 
fare usage by all services. Sometime services unintentionally, mainly because 
of faulty behavior, create many znodes and impact the overall reliability of 
the ZooKeeper service. To ensure the faire usage quota feature is required. But 
this is the only use case there are many other use cases for quota feature.

*Current Problems:*

# Currently, user can set quota by updating znode “/zookeeper/quota/nodepath”, 
or using setquota/delquota in CLI command.
This makes the quota setting infective
Currently any user can set/delete quota, which is not proper, it should be 
admin operation
# User is allowed to modify zookeeper system paths like /zookeeper/quota. These 
are internal to zookeeper should not be allowed to modify.
# Generally services create single top level znode in Zookeeper like /hbase and 
create all required znode under it. 
It is better if it is configurable who can create top level znodes to controll 
ZooKeeper usage.
# After ZOOKEEPER-231, there two kinds quota enforcement limits 1. Hard limit 
2. Soft limit. 
I think there should be only limit. When enforce quota is enabled that limits 
becomes the hard limit otherwise it is soft limit same as old feature, just 
logs warnings.

*Proposed Solution*

# Add setQuota and deleteQuota admin APIs. Add listQuota normal user API
Modify quota cli commands to use these APIs instead of directory modifying 
ZooKeeper system path /zookeeper/quota/
# Protect ZooKeeper system paths from outside modification. System should only 
be readable from outside
# Expose configuration to set ACL for root system znode. 
After this, at the time of ZooKeeper service deployment administrator can 
create top level znode for a service and set quota. This way we can control 
overall ZooKeeper usage
# Revert some of the changes in ZOOKEEPER-231 and move to single quota limit





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3841) remove useless codes in the Leader.java

2021-04-13 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3841:
---
Fix Version/s: (was: 3.7.1,3.8.0)
   3.7.1
   3.8.0

> remove useless codes in the Leader.java
> ---
>
> Key: ZOOKEEPER-3841
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3841
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Ling Mao
>Priority: Minor
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> - There are some useless code in the Leader.java which were comment out.
> - Pls recheck all the things in this class to clear up
> e.g:
> {code:java}
> // Everything is a go, simply start counting the ticks
> // WARNING: I couldn't find any wait statement on a synchronized
> // block that would be notified by this notifyAll() call, so
> // I commented it out
> //synchronized (this) {
> //notifyAll();
> //}
> {code}
> {code:java}
> //turnOffFollowers();
> {code}
> {code:java}
> //LOG.warn("designated leader is: " + designatedLeader);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4265) Download page broken links

2021-04-13 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4265:
---
Fix Version/s: (was: 3.6.3)

> Download page broken links
> --
>
> Key: ZOOKEEPER-4265
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4265
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Damien Diederen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The download page [1] has broken links for the following release versions:
> 3.6.1
> 3.5.9
> Please remove them from the page.
> If necessary, they can be linked from the archive server, in which case the 
> page should make it clear that they historic releases.
> [1] https://zookeeper.apache.org/releases.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317650#comment-17317650
 ] 

Mohammad Arshad commented on ZOOKEEPER-4278:


Thanks [~ayushmantri] for raising the PR.  
Please raise PR for branch-3.5 also

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Ayush Mantri
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0, 3.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-4278.

Fix Version/s: 3.7.1
   3.8.0
   Resolution: Fixed

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Ayush Mantri
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.8.0, 3.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4278:
---
Priority: Blocker  (was: Major)

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-4278:
---
Fix Version/s: 3.6.3

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Blocker
> Fix For: 3.6.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317085#comment-17317085
 ] 

Mohammad Arshad commented on ZOOKEEPER-4278:


To fix the CVE anyway we have to upgrade to 4.1.61. 
 4.1.62 and 4.1.63 are Regression fix releases. As per the release notes there 
is not much change from 4.1.62 to 4.1.63.

https://netty.io/news/2021/03/30/4-1-61-Final.html
https://netty.io/news/2021/03/31/4-1-62-Final.html
https://netty.io/news/2021/04/01/4-1-63-Final.html

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316947#comment-17316947
 ] 

Mohammad Arshad edited comment on ZOOKEEPER-4278 at 4/8/21, 8:25 AM:
-

Though 4.1.61.Final has fixed the CVE-2021-21409, latest netty release is 
4-1-63-Final. I think we should upgrade to the latest version .
https://netty.io/news/2021/04/01/4-1-63-Final.html


was (Author: arshad.mohammad):
Though 4.1.61.Final has fixed the CVE-2021-21409, latest netty release is 
4-1-63-Final. I think we should upgrade to this version.
https://netty.io/news/2021/04/01/4-1-63-Final.html

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-08 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316947#comment-17316947
 ] 

Mohammad Arshad commented on ZOOKEEPER-4278:


Though 4.1.61.Final has fixed the CVE-2021-21409, latest netty release is 
4-1-63-Final. I think we should upgrade to this version.
https://netty.io/news/2021/04/01/4-1-63-Final.html

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-07 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-4278:
--

Assignee: (was: Mohammad Arshad)

> dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409 
> -
>
> Key: ZOOKEEPER-4278
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-4278) dependency-check:check failing - netty-transport-4.1.60.Final CVE-2021-21409

2021-04-07 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4278:
--

 Summary: dependency-check:check failing - 
netty-transport-4.1.60.Final CVE-2021-21409 
 Key: ZOOKEEPER-4278
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4278
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3992) addWatch api should check the null watch

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3992:
---
Fix Version/s: 3.8.0
   3.6.3

> addWatch api should check the null watch
> 
>
> Key: ZOOKEEPER-3992
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3992
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Ling Mao
>Assignee: Damien Diederen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.7.0, 3.8.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> public void addWatch(String basePath, Watcher watcher, AddWatchMode mode)
> throws KeeperException, InterruptedException {
> PathUtils.validatePath(basePath);
> String serverPath = prependChroot(basePath);
> RequestHeader h = new RequestHeader();
> h.setType(ZooDefs.OpCode.addWatch);
> AddWatchRequest request = new AddWatchRequest(serverPath, mode.getMode());
> ReplyHeader r = cnxn.submitRequest(h, request, new ErrorResponse(),
> 
> {code}
> we need to _*validateWatcher(watcher)*_ to ** avoid the case:
> {code:java}
> zk.addWatch("/a/b", null, PERSISTENT_RECURSIVE);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3980) Fix Jenkinsfiles with new tool names

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3980:
---
Fix Version/s: 3.8.0
   3.6.3

> Fix Jenkinsfiles with new tool names
> 
>
> Key: ZOOKEEPER-3980
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3980
> Project: ZooKeeper
>  Issue Type: Task
>  Components: build-infrastructure
>Affects Versions: 3.7.0
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.3, 3.7.0, 3.8.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3957) Create Owasp check build on new Jenkins instance

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3957:
---
Fix Version/s: 3.8.0
   3.6.3

> Create Owasp check build on new Jenkins instance
> 
>
> Key: ZOOKEEPER-3957
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3957
> Project: ZooKeeper
>  Issue Type: Task
>  Components: build
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.6.3, 3.7.0, 3.8.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We haven't migrated the owasp build to the new instance yet.
> Need to create a new multi-branch Pipeline job here:
> https://ci-hadoop.apache.org/view/ZooKeeper/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3931) "zkServer.sh version" returns a trailing dash

2021-04-03 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314411#comment-17314411
 ] 

Mohammad Arshad commented on ZOOKEEPER-3931:


Thanks [~Suraj Naik] for your contribution, Added you as a contributor

> "zkServer.sh version" returns a trailing dash
> -
>
> Key: ZOOKEEPER-3931
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3931
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Enrico Olivelli
>Assignee: Suraj Naik
>Priority: Major
> Fix For: 3.6.3
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When you run zkServer.sh version the result includes a few spam lines and the 
> version reports a trailing dash 
> {noformat}
> bin/zkServer.sh version
> ZooKeeper JMX enabled by default
> Using config: /xxx/bin/../conf/zoo.cfg
> Apache ZooKeeper, version 3.6.2- 09/04/2020 12:44 GMT
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-3931) "zkServer.sh version" returns a trailing dash

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-3931:
--

Assignee: Suraj Naik

> "zkServer.sh version" returns a trailing dash
> -
>
> Key: ZOOKEEPER-3931
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3931
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Enrico Olivelli
>Assignee: Suraj Naik
>Priority: Major
> Fix For: 3.6.3
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When you run zkServer.sh version the result includes a few spam lines and the 
> version reports a trailing dash 
> {noformat}
> bin/zkServer.sh version
> ZooKeeper JMX enabled by default
> Using config: /xxx/bin/../conf/zoo.cfg
> Apache ZooKeeper, version 3.6.2- 09/04/2020 12:44 GMT
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-3931) "zkServer.sh version" returns a trailing dash

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad resolved ZOOKEEPER-3931.

Fix Version/s: 3.6.3
   Resolution: Fixed

> "zkServer.sh version" returns a trailing dash
> -
>
> Key: ZOOKEEPER-3931
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3931
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Enrico Olivelli
>Priority: Major
> Fix For: 3.6.3
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When you run zkServer.sh version the result includes a few spam lines and the 
> version reports a trailing dash 
> {noformat}
> bin/zkServer.sh version
> ZooKeeper JMX enabled by default
> Using config: /xxx/bin/../conf/zoo.cfg
> Apache ZooKeeper, version 3.6.2- 09/04/2020 12:44 GMT
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3931) "zkServer.sh version" returns a trailing dash

2021-04-03 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314410#comment-17314410
 ] 

Mohammad Arshad commented on ZOOKEEPER-3931:


Created https://issues.apache.org/jira/browse/ZOOKEEPER-4273 to forward port in 
master branch-3.7

> "zkServer.sh version" returns a trailing dash
> -
>
> Key: ZOOKEEPER-3931
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3931
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Enrico Olivelli
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When you run zkServer.sh version the result includes a few spam lines and the 
> version reports a trailing dash 
> {noformat}
> bin/zkServer.sh version
> ZooKeeper JMX enabled by default
> Using config: /xxx/bin/../conf/zoo.cfg
> Apache ZooKeeper, version 3.6.2- 09/04/2020 12:44 GMT
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-4273) Forward port ZOOKEEPER-3931: "zkServer.sh version" returns a trailing dash

2021-04-03 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-4273:
--

 Summary: Forward port ZOOKEEPER-3931: "zkServer.sh version" 
returns a trailing dash
 Key: ZOOKEEPER-4273
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4273
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.7.1
Reporter: Mohammad Arshad


When you run zkServer.sh version the result includes a few spam lines and the 
version reports a trailing dash 
{noformat}
bin/zkServer.sh version
ZooKeeper JMX enabled by default
Using config: /xxx/bin/../conf/zoo.cfg
Apache ZooKeeper, version 3.6.2- 09/04/2020 12:44 GMT

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3934) upgrade dependency-check to version 6.0.0

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3934:
---
Fix Version/s: (was: 3.6.3)
   (was: 3.7.0)

> upgrade dependency-check to version 6.0.0
> -
>
> Key: ZOOKEEPER-3934
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3934
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build, security
>Affects Versions: 3.7.0, 3.5.8, 3.6.2
>Reporter: Patrick D. Hunt
>Assignee: Patrick D. Hunt
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 6.0.0 is now available. I verified it with 3.5, 3.6,3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3933) owasp failing with json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3933:
---
Fix Version/s: (was: 3.6.3)
   (was: 3.7.0)

> owasp failing with json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712
> ---
>
> Key: ZOOKEEPER-3933
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3933
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.7.0, 3.5.8, 3.6.2
>Reporter: Patrick D. Hunt
>Priority: Blocker
>
> dependency-check is failing with:
> json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-3933) owasp failing with json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-3933:
--

Assignee: (was: Mohammad Arshad)

> owasp failing with json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712
> ---
>
> Key: ZOOKEEPER-3933
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3933
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.7.0, 3.5.8, 3.6.2
>Reporter: Patrick D. Hunt
>Priority: Blocker
> Fix For: 3.6.3, 3.7.0
>
>
> dependency-check is failing with:
> json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ZOOKEEPER-3933) owasp failing with json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad reassigned ZOOKEEPER-3933:
--

Assignee: Mohammad Arshad

> owasp failing with json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712
> ---
>
> Key: ZOOKEEPER-3933
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3933
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.7.0, 3.5.8, 3.6.2
>Reporter: Patrick D. Hunt
>Assignee: Mohammad Arshad
>Priority: Blocker
> Fix For: 3.6.3, 3.7.0
>
>
> dependency-check is failing with:
> json-simple-1.1.1.jar: CVE-2020-10663, CVE-2020-7712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3841) remove useless codes in the Leader.java

2021-04-03 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314403#comment-17314403
 ] 

Mohammad Arshad commented on ZOOKEEPER-3841:


Update fix version from 3.6.3 to 3.8.0,3.7.1 as changes are not present in 
branch 3.6

> remove useless codes in the Leader.java
> ---
>
> Key: ZOOKEEPER-3841
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3841
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Ling Mao
>Priority: Minor
> Fix For: 3.7.1,3.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> - There are some useless code in the Leader.java which were comment out.
> - Pls recheck all the things in this class to clear up
> e.g:
> {code:java}
> // Everything is a go, simply start counting the ticks
> // WARNING: I couldn't find any wait statement on a synchronized
> // block that would be notified by this notifyAll() call, so
> // I commented it out
> //synchronized (this) {
> //notifyAll();
> //}
> {code}
> {code:java}
> //turnOffFollowers();
> {code}
> {code:java}
> //LOG.warn("designated leader is: " + designatedLeader);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3841) remove useless codes in the Leader.java

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3841:
---
Fix Version/s: (was: 3.6.3)
   3.7.1,3.8.0

> remove useless codes in the Leader.java
> ---
>
> Key: ZOOKEEPER-3841
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3841
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Ling Mao
>Priority: Minor
> Fix For: 3.7.1,3.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> - There are some useless code in the Leader.java which were comment out.
> - Pls recheck all the things in this class to clear up
> e.g:
> {code:java}
> // Everything is a go, simply start counting the ticks
> // WARNING: I couldn't find any wait statement on a synchronized
> // block that would be notified by this notifyAll() call, so
> // I commented it out
> //synchronized (this) {
> //notifyAll();
> //}
> {code}
> {code:java}
> //turnOffFollowers();
> {code}
> {code:java}
> //LOG.warn("designated leader is: " + designatedLeader);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3798) remove the useless code in the ProposalRequestProcessor#processRequest

2021-04-03 Thread Mohammad Arshad (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated ZOOKEEPER-3798:
---
Fix Version/s: (was: 3.6.3)
   3.8.0,3.7.1

> remove the useless code in the ProposalRequestProcessor#processRequest
> --
>
> Key: ZOOKEEPER-3798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3798
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Ling Mao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0,3.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> remove the following useless codes in the 
> ProposalRequestProcessor#processRequest
> {code:java}
> public void processRequest(Request request) throws RequestProcessorException {
> // LOG.warn("Ack>>> cxid = " + request.cxid + " type = " +
> // request.type + " id = " + request.sessionId);
> // request.addRQRec(">prop");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3798) remove the useless code in the ProposalRequestProcessor#processRequest

2021-04-03 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314402#comment-17314402
 ] 

Mohammad Arshad commented on ZOOKEEPER-3798:


Update fix version from 3.6.3 to 3.8.0,3.7.1 as changes are not present in 
branch 3.6

> remove the useless code in the ProposalRequestProcessor#processRequest
> --
>
> Key: ZOOKEEPER-3798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3798
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Ling Mao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0,3.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> remove the following useless codes in the 
> ProposalRequestProcessor#processRequest
> {code:java}
> public void processRequest(Request request) throws RequestProcessorException {
> // LOG.warn("Ack>>> cxid = " + request.cxid + " type = " +
> // request.type + " id = " + request.sessionId);
> // request.addRQRec(">prop");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3774) Close quorum socket asynchronously on the leader to avoid ping being blocked by long socket closing time

2021-04-03 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314401#comment-17314401
 ] 

Mohammad Arshad commented on ZOOKEEPER-3774:


Update fix version from 3.6.3 to 3.8.0,3.7.1 as changes are not present in 
branch 3.6

> Close quorum socket asynchronously on the leader to avoid ping being blocked 
> by long socket closing time
> 
>
> Key: ZOOKEEPER-3774
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3774
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.8.0, 3.7.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In ZOOKEEPER-3574 we close the quorum sockets on followers asynchronously 
> when a leader is partitioned away so the shutdown process will not be stalled 
> by long socket closing time and the followers can quickly establish a new 
> quorum to serve client requests.
> We've found that the long socket closing time can cause trouble on the leader 
> too when a follower is partitioned away if the partition is detected by 
> PingLaggingDetector. When the ping thread detects partition, it tries to 
> disconnect the follower. If the socket closing time is long, the ping thread 
> will be blocked and no ping is sent to any follower--even the ones still 
> connected to the leader--since the ping thread is responsible for sending 
> pings to all followers. When followers don't receive pings, they don't send 
> ping response. When the leader don't receive ping response, the sessions 
> expire. 
> To prevent good sessions from expiring, we need to close the socket 
> asynchronously on the leader too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >