[jira] [Updated] (ZOOKEEPER-3771) Update zk-merge-pr script to Python3

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-3771:
--
Labels: pull-request-available  (was: )

> Update zk-merge-pr script to Python3
> 
>
> Key: ZOOKEEPER-3771
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3771
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3771) Update zk-merge-pr script to Python3

2020-03-26 Thread Zili Chen (Jira)
Zili Chen created ZOOKEEPER-3771:


 Summary: Update zk-merge-pr script to Python3
 Key: ZOOKEEPER-3771
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3771
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Zili Chen
Assignee: Zili Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-3755) Use maven to create fatjar

2020-03-26 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli resolved ZOOKEEPER-3755.

Fix Version/s: 3.6.1
   Resolution: Fixed

Issue resolved by pull request 1284
[https://github.com/apache/zookeeper/pull/1284]

> Use maven to create fatjar
> --
>
> Key: ZOOKEEPER-3755
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3755
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: build, contrib-fatjar
>Affects Versions: 3.6.0, 3.7.0
>Reporter: Sushant Mane
>Assignee: Sushant Mane
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Replace ant with maven for building fatjar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-3689) zkCli/ZooKeeperMain relies on system properties for TLS config

2020-03-26 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli resolved ZOOKEEPER-3689.

Resolution: Fixed

Issue resolved by pull request 1285
[https://github.com/apache/zookeeper/pull/1285]

> zkCli/ZooKeeperMain relies on system properties for TLS config
> --
>
> Key: ZOOKEEPER-3689
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3689
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: security, server
>Affects Versions: 3.6.0, 3.5.5, 3.5.6
>Reporter: Ron Dagostino
>Assignee: Sankalp Bhatia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The command line client to ZooKeeper (org.apache.zookeeper.ZooKeeperMain, 
> invoked via bin/zkCli.{bat,sh}) has no facility for accepting TLS client 
> configuration (e.g. keystore/truststore location and password) except via 
> system properties.  System properties must be passed on the command line as 
> "-D" arguments and are inherently not secure.  There should be a way to pass 
> the client TLS configuration to org.apache.zookeeper.ZooKeeperMain in a more 
> secure way (e.g. via a file).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ZOOKEEPER-3767) fix a large amount of maven build warnings

2020-03-26 Thread Enrico Olivelli (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Olivelli resolved ZOOKEEPER-3767.

Fix Version/s: 3.7.0
   Resolution: Fixed

Issue resolved by pull request 1291
[https://github.com/apache/zookeeper/pull/1291]

> fix a large amount of maven build warnings
> --
>
> Key: ZOOKEEPER-3767
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3767
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: maoling
>Assignee: Zili Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1. I use my IDEA to find these maven build warnings:
> {code:java}
> /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin/java 
> -Dmaven.multiModuleProjectDirectory=/Users/maoling/workspaces/workspace_zookeeper/zookeeper
>  "-Dmaven.home=/Applications/IntelliJ 
> IDEA.app/Contents/plugins/maven/lib/maven3" 
> "-Dclassworlds.conf=/Applications/IntelliJ 
> IDEA.app/Contents/plugins/maven/lib/maven3/bin/m2.conf" 
> "-Dmaven.ext.class.path=/Applications/IntelliJ 
> IDEA.app/Contents/plugins/maven/lib/maven-event-listener.jar" 
> "-javaagent:/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar=58545:/Applications/IntelliJ 
> IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/plugins/maven/lib/maven3/boot/plexus-classworlds-2.6.0.jar" 
> org.codehaus.classworlds.Launcher -Didea.version2019.3.1 -DskipTests=true 
> package -P !java-build
> {code}
> {code:java}
> Javadoc Warnings
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:340:
>  ?? - ??@see: 
> ?org.apache.zookeeper.ZooKeepergetEphemerals(EphemeralsCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:340:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetEphemerals(String, 
> EphemeralsCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:271:
>  ?? - ??@link: ?org.apache.zookeeper.ZooKeepersync(String, VoidCallback, 
> Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:150:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetACL(String, Stat, 
> ACLCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:89:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetAllChildrenNumber(String, 
> AllChildrenNumberCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:202:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, boolean, 
> Children2Callback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:202:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, Watcher, 
> Children2Callback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:179:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, boolean, 
> ChildrenCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:179:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, Watcher, 
> ChildrenCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:227:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepercreate(String, byte[], List, 
> CreateMode, Create2Callback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:227:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepercreate(String, byte[], List, 
> CreateMode, Create2Callback, Object, long)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:121:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetData(String, boolean, 
> DataCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:121:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetData(String, Watcher, 
> DataCallback, Object)
> /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:121:
>  ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetConfig(boolean, 
> DataCallback, Object)
> 

[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Lasaro Camargos (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lasaro Camargos updated ZOOKEEPER-3769:
---
Description: 
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/company/service/data
 dataLogDir=/company/service/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

In the logs, node3 is killed at 11:17:14

node2 is killed at 11:17:50 2 and node 1 at 11:18:02 

 

 

 

  was:
In a cluster with three nodes, node3 is the leader and the other nodes are 
followers.

If I stop node3, the other two nodes do not finish the leader election.

This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and this 
config

 

tickTime=2000
 initLimit=30
 syncLimit=3
 dataDir=/hedvig/hpod/data
 dataLogDir=/hedvig/hpod/log
 clientPort=2181
 snapCount=10
 autopurge.snapRetainCount=3
 autopurge.purgeInterval=1
 skipACL=yes
 preAllocSize=65536
 maxClientCnxns=0
 4lw.commands.whitelist=*
 admin.enableServer=false

server.1=companydemo1.snc4.companyinc.com:3000:4000
 server.2=companydemo2.snc4.companyinc.com:3000:4000
 server.3=companydemo3.snc4.companyinc.com:3000:4000

 

Could you have a look at the logs and help me figure this out? It seems like 
node 1 is not getting notifications back from node2, but I don't see anything 
wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could be 
causing it.

 

In the logs, node3 is killed at 11:17:14

node2 is killed at 11:17:50 2 and node 1 at 11:18:02 

 

 

 


> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/company/service/data
>  dataLogDir=/company/service/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Lasaro Camargos (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068032#comment-17068032
 ] 

Lasaro Camargos commented on ZOOKEEPER-3769:


I went back and looked into some older logs and could confirm that the 
WorkerReceiver died and that's what caused the election to hang. However, the 
BufferUnderflowException was present in very few instances. Most of the time, 
it was a NegativeArraySizeException that was caught, but pretty much in the 
same situation, that is, after the connection being broken to node3. The 
following are excerpts from node1 and node 3. Let me know if you would like to 
have a look at the full logs.

03/23/20 10:14:45,772 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.ZooKeeperServer] (ZooKeeperServer.java:166) - 
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 
4 datadir /company/service/log/version-2 snapdir 
/company/service/data/version-2

03/23/20 10:14:45,772 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Follower.java:69) - FOLLOWING - 
LEADER ELECTION TOOK - 9 MS

03/23/20 10:14:45,774 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG 
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:202) - 
Resolved address for companydemo3.snc4.companyinc.com: 
companydemo3.snc4.companyinc.com/172.22.64.148

03/23/20 10:14:45,793 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UNKNOWN17 
5 null

03/23/20 10:14:45,798 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i DIFF 
4001f null

03/23/20 10:14:45,799 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:391) - Getting a 
diff from the leader 0x4001f

03/23/20 10:14:45,801 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i NEWLEADER 
5 null

03/23/20 10:14:45,801 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:546) - Learner 
received NEWLEADER message

03/23/20 10:14:45,815 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE 
[org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UPTODATE 
 null

03/23/20 10:14:45,816 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.Learner] (Learner.java:529) - Learner 
received UPTODATE message

03/23/20 10:14:45,816 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG 
[org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:1916) - 
Reconfig feature is disabled, skip reconfig processing.

03/23/20 10:14:45,817 
[QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO 
[org.apache.zookeeper.server.quorum.CommitProcessor] (CommitProcessor.java:256) 
- Configuring CommitProcessor with 32 worker threads.

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] INFO 
[org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:924) - Received connection request 172.22.30.98:58472

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] 
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1038) - Address of remote peer: 3

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] 
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1055) - Calling finish for 3

03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] 
DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1072) - Removing entry from senderWorkerMap sid=3

03/23/20 10:14:46,065 [SendWorker:3] WARN 
[org.apache.zookeeper.server.quorum.QuorumCnxManager] 
(QuorumCnxManager.java:1143) - Interrupted while waiting for message on queue

java.lang.InterruptedException: null

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
 ~[?:?]

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
 ~[?:?]

at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) 
~[?:?]

at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294)
 ~[zookeeper-3.5.7.jar:3.5.7]

at 

[jira] [Resolved] (ZOOKEEPER-3760) remove a useless throwing CliException

2020-03-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Kalmár resolved ZOOKEEPER-3760.
---
Resolution: Fixed

> remove a useless throwing CliException
> --
>
> Key: ZOOKEEPER-3760
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.7
>Reporter: Jinjiang Ling
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1, 3.5.8
>
> Attachments: ZOOKEEPER-3760-1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the 
> function processCmd in ZooKeeperMain.java is just like blow
> {code:java}
> protected boolean processCmd(MyCommandOptions co) throws CliException, 
> IOException, InterruptedException {
> boolean watch = false;
> try {
> watch = processZKCmd(co);
> exitCode = ExitCode.EXECUTION_FINISHED.getValue();
> } catch (CliException ex) {
> exitCode = ex.getExitCode();
> System.err.println(ex.getMessage());
> }
> return watch;
> }
> {code}
> it throws {color:#FF}CliException {color}which has been caught in the 
> funciton, so I think it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ZOOKEEPER-3760) remove a useless throwing CliException

2020-03-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Kalmár updated ZOOKEEPER-3760:
--
Fix Version/s: 3.5.8
   3.6.1

> remove a useless throwing CliException
> --
>
> Key: ZOOKEEPER-3760
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.7
>Reporter: Jinjiang Ling
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.1, 3.5.8
>
> Attachments: ZOOKEEPER-3760-1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the 
> function processCmd in ZooKeeperMain.java is just like blow
> {code:java}
> protected boolean processCmd(MyCommandOptions co) throws CliException, 
> IOException, InterruptedException {
> boolean watch = false;
> try {
> watch = processZKCmd(co);
> exitCode = ExitCode.EXECUTION_FINISHED.getValue();
> } catch (CliException ex) {
> exitCode = ex.getExitCode();
> System.err.println(ex.getMessage());
> }
> return watch;
> }
> {code}
> it throws {color:#FF}CliException {color}which has been caught in the 
> funciton, so I think it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Mate Szalay-Beko (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067546#comment-17067546
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3769:
-

I created a patched 3.5.7 version, where the exception is caught and the 
malformed message is skipped. Can you maybe try out this version?
https://drive.google.com/open?id=1cTdusaEFIVvH2D5KSrj6M9VVJoqlaQwD

This should print out a warning to the log after catching the exception: 
{{Skipping the processing of a partial / malformed response message sent by 
sid=XXX}}

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-2938) Server is unable to join quorum after connection broken to other peers

2020-03-26 Thread Denis Fulachier (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067535#comment-17067535
 ] 

Denis Fulachier commented on ZOOKEEPER-2938:


This issue may be related to ZOOKEEPER-2164, not yet fixed in an official 
version, but done in github for incoming versions 
[3.7.0|https://issues.apache.org/jira/issues/?jql=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+3.7.0],
 
[3.6.1|https://issues.apache.org/jira/issues/?jql=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+3.6.1],
 
[3.5.8|https://issues.apache.org/jira/issues/?jql=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+3.5.8].

I got the same issue, and updating to a 3.6.1-SNAPSHOT version fixed it (just 
built today from 3.6 branch).

> Server is unable to join quorum after connection broken to other peers
> --
>
> Key: ZOOKEEPER-2938
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2938
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Abhay Bothra
>Priority: Major
>
> We see the following logs in the node with {{myid: 1}}
> {code}
> 2017-11-08 15:06:28,375 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:06:28,375 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:07:28,375 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:07:28,375 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:07:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:08:28,375 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:08:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:08:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:09:28,376 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:09:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:09:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2017-11-08 15:10:28,376 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message 
> format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state)
> 2017-11-08 15:10:28,376 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2017-11-08 15:10:28,377 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> {code}
> On the nodes with {{myid: 2}} and {{myid: 3}}, we see connection broken 
> events for {{myid: 1}}
> {code}
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1, 
> my id = 2, error =
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:209)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.net.SocketInputStream.read(SocketInputStream.java:223)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
> 2017-11-07 02:54:32,135 [myid:2] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
> at 
> 

[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Mate Szalay-Beko (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067521#comment-17067521
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3769:
-

I was trying to reproduce the issue using ZooKeeper 3.5.7 and OpenJDK 12.0.2 
with:
- this compose file: 
https://github.com/symat/zookeeper-docker-test/blob/master/3_nodes_zk_jdk_12.yml
- this config (based on your config): 
https://github.com/symat/zookeeper-docker-test/blob/master/conf/ZOOKEEPER-3769_zoo.cfg

I used OpenJDK 12.0.2 runtime in the docker containers. And I was trying out 
ZooKeeper 3.5.7 compiled both with 8u424 and with 12.0.2.

Unfortunately everything was working fine... I haven't seen the 
BufferUnderflowException and the quorum was up quickly after I stopped the 
container of Server 3 (which was the leader perviously).

Maybe it is an OS / networking related thing which can not be simulated with 
docker on a single machine. Anyway, I will create a patched version to handle 
this exception.

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Mate Szalay-Beko (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067433#comment-17067433
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3769:
-

What OS version are you using?

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down

2020-03-26 Thread Mate Szalay-Beko (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067385#comment-17067385
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3769:
-

Sorry, I just wrote the NETTY config parameter wrong. I guess you did it right 
when you tested it, still let me correct myself. You need 
{{serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory}} in the 
zoo.cfg, or using the system property 
{{-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory}}.

I will try to reproduce the issue locally with Docker using ZooKeeper 3.5.7 and 
OpenJDK 12.0.2. Although I am not sure if this can be reproduced in Docker...

I will also create a small patch to handle this BufferUnderflowException 
exception. Do you see this same exception all the time when some of the server 
fails to rejoin? Or was this only a single random error?

> fast leader election does not end if leader is taken down
> -
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.7
>Reporter: Lasaro Camargos
>Assignee: Mate Szalay-Beko
>Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are 
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7,  openjdk version "12.0.2" 2019-07-16, and 
> this config
>  
> tickTime=2000
>  initLimit=30
>  syncLimit=3
>  dataDir=/hedvig/hpod/data
>  dataLogDir=/hedvig/hpod/log
>  clientPort=2181
>  snapCount=10
>  autopurge.snapRetainCount=3
>  autopurge.purgeInterval=1
>  skipACL=yes
>  preAllocSize=65536
>  maxClientCnxns=0
>  4lw.commands.whitelist=*
>  admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
>  server.2=companydemo2.snc4.companyinc.com:3000:4000
>  server.3=companydemo3.snc4.companyinc.com:3000:4000
>  
> Could you have a look at the logs and help me figure this out? It seems like 
> node 1 is not getting notifications back from node2, but I don't see anything 
> wrong with the network so I am wondering if bugs like  ZOOKEEPER-3756 could 
> be causing it.
>  
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)