[jira] [Updated] (ZOOKEEPER-3771) Update zk-merge-pr script to Python3
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ZOOKEEPER-3771: -- Labels: pull-request-available (was: ) > Update zk-merge-pr script to Python3 > > > Key: ZOOKEEPER-3771 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3771 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Zili Chen >Assignee: Zili Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-3771) Update zk-merge-pr script to Python3
Zili Chen created ZOOKEEPER-3771: Summary: Update zk-merge-pr script to Python3 Key: ZOOKEEPER-3771 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3771 Project: ZooKeeper Issue Type: Improvement Reporter: Zili Chen Assignee: Zili Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-3755) Use maven to create fatjar
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Olivelli resolved ZOOKEEPER-3755. Fix Version/s: 3.6.1 Resolution: Fixed Issue resolved by pull request 1284 [https://github.com/apache/zookeeper/pull/1284] > Use maven to create fatjar > -- > > Key: ZOOKEEPER-3755 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3755 > Project: ZooKeeper > Issue Type: Improvement > Components: build, contrib-fatjar >Affects Versions: 3.6.0, 3.7.0 >Reporter: Sushant Mane >Assignee: Sushant Mane >Priority: Major > Labels: pull-request-available > Fix For: 3.6.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Replace ant with maven for building fatjar. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-3689) zkCli/ZooKeeperMain relies on system properties for TLS config
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Olivelli resolved ZOOKEEPER-3689. Resolution: Fixed Issue resolved by pull request 1285 [https://github.com/apache/zookeeper/pull/1285] > zkCli/ZooKeeperMain relies on system properties for TLS config > -- > > Key: ZOOKEEPER-3689 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3689 > Project: ZooKeeper > Issue Type: New Feature > Components: security, server >Affects Versions: 3.6.0, 3.5.5, 3.5.6 >Reporter: Ron Dagostino >Assignee: Sankalp Bhatia >Priority: Major > Labels: pull-request-available > Fix For: 3.6.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The command line client to ZooKeeper (org.apache.zookeeper.ZooKeeperMain, > invoked via bin/zkCli.{bat,sh}) has no facility for accepting TLS client > configuration (e.g. keystore/truststore location and password) except via > system properties. System properties must be passed on the command line as > "-D" arguments and are inherently not secure. There should be a way to pass > the client TLS configuration to org.apache.zookeeper.ZooKeeperMain in a more > secure way (e.g. via a file). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ZOOKEEPER-3767) fix a large amount of maven build warnings
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Olivelli resolved ZOOKEEPER-3767. Fix Version/s: 3.7.0 Resolution: Fixed Issue resolved by pull request 1291 [https://github.com/apache/zookeeper/pull/1291] > fix a large amount of maven build warnings > -- > > Key: ZOOKEEPER-3767 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3767 > Project: ZooKeeper > Issue Type: Improvement >Reporter: maoling >Assignee: Zili Chen >Priority: Major > Labels: pull-request-available > Fix For: 3.7.0 > > Time Spent: 20m > Remaining Estimate: 0h > > 1. I use my IDEA to find these maven build warnings: > {code:java} > /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin/java > -Dmaven.multiModuleProjectDirectory=/Users/maoling/workspaces/workspace_zookeeper/zookeeper > "-Dmaven.home=/Applications/IntelliJ > IDEA.app/Contents/plugins/maven/lib/maven3" > "-Dclassworlds.conf=/Applications/IntelliJ > IDEA.app/Contents/plugins/maven/lib/maven3/bin/m2.conf" > "-Dmaven.ext.class.path=/Applications/IntelliJ > IDEA.app/Contents/plugins/maven/lib/maven-event-listener.jar" > "-javaagent:/Applications/IntelliJ > IDEA.app/Contents/lib/idea_rt.jar=58545:/Applications/IntelliJ > IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath > "/Applications/IntelliJ > IDEA.app/Contents/plugins/maven/lib/maven3/boot/plexus-classworlds-2.6.0.jar" > org.codehaus.classworlds.Launcher -Didea.version2019.3.1 -DskipTests=true > package -P !java-build > {code} > {code:java} > Javadoc Warnings > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:340: > ?? - ??@see: > ?org.apache.zookeeper.ZooKeepergetEphemerals(EphemeralsCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:340: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetEphemerals(String, > EphemeralsCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:271: > ?? - ??@link: ?org.apache.zookeeper.ZooKeepersync(String, VoidCallback, > Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:150: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetACL(String, Stat, > ACLCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:89: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetAllChildrenNumber(String, > AllChildrenNumberCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:202: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, boolean, > Children2Callback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:202: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, Watcher, > Children2Callback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:179: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, boolean, > ChildrenCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:179: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetChildren(String, Watcher, > ChildrenCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:227: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepercreate(String, byte[], List, > CreateMode, Create2Callback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:227: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepercreate(String, byte[], List, > CreateMode, Create2Callback, Object, long) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:121: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetData(String, boolean, > DataCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:121: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetData(String, Watcher, > DataCallback, Object) > /Users/maoling/workspaces//zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/AsyncCallback.java:121: > ?? - ??@see: ?org.apache.zookeeper.ZooKeepergetConfig(boolean, > DataCallback, Object) >
[jira] [Updated] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lasaro Camargos updated ZOOKEEPER-3769: --- Description: In a cluster with three nodes, node3 is the leader and the other nodes are followers. If I stop node3, the other two nodes do not finish the leader election. This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and this config tickTime=2000 initLimit=30 syncLimit=3 dataDir=/company/service/data dataLogDir=/company/service/log clientPort=2181 snapCount=10 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 skipACL=yes preAllocSize=65536 maxClientCnxns=0 4lw.commands.whitelist=* admin.enableServer=false server.1=companydemo1.snc4.companyinc.com:3000:4000 server.2=companydemo2.snc4.companyinc.com:3000:4000 server.3=companydemo3.snc4.companyinc.com:3000:4000 Could you have a look at the logs and help me figure this out? It seems like node 1 is not getting notifications back from node2, but I don't see anything wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could be causing it. In the logs, node3 is killed at 11:17:14 node2 is killed at 11:17:50 2 and node 1 at 11:18:02 was: In a cluster with three nodes, node3 is the leader and the other nodes are followers. If I stop node3, the other two nodes do not finish the leader election. This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and this config tickTime=2000 initLimit=30 syncLimit=3 dataDir=/hedvig/hpod/data dataLogDir=/hedvig/hpod/log clientPort=2181 snapCount=10 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 skipACL=yes preAllocSize=65536 maxClientCnxns=0 4lw.commands.whitelist=* admin.enableServer=false server.1=companydemo1.snc4.companyinc.com:3000:4000 server.2=companydemo2.snc4.companyinc.com:3000:4000 server.3=companydemo3.snc4.companyinc.com:3000:4000 Could you have a look at the logs and help me figure this out? It seems like node 1 is not getting notifications back from node2, but I don't see anything wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could be causing it. In the logs, node3 is killed at 11:17:14 node2 is killed at 11:17:50 2 and node 1 at 11:18:02 > fast leader election does not end if leader is taken down > - > > Key: ZOOKEEPER-3769 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.7 >Reporter: Lasaro Camargos >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: node1.log, node2.log, node3.log > > > In a cluster with three nodes, node3 is the leader and the other nodes are > followers. > If I stop node3, the other two nodes do not finish the leader election. > This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and > this config > > tickTime=2000 > initLimit=30 > syncLimit=3 > dataDir=/company/service/data > dataLogDir=/company/service/log > clientPort=2181 > snapCount=10 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > skipACL=yes > preAllocSize=65536 > maxClientCnxns=0 > 4lw.commands.whitelist=* > admin.enableServer=false > server.1=companydemo1.snc4.companyinc.com:3000:4000 > server.2=companydemo2.snc4.companyinc.com:3000:4000 > server.3=companydemo3.snc4.companyinc.com:3000:4000 > > Could you have a look at the logs and help me figure this out? It seems like > node 1 is not getting notifications back from node2, but I don't see anything > wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could > be causing it. > > In the logs, node3 is killed at 11:17:14 > node2 is killed at 11:17:50 2 and node 1 at 11:18:02 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068032#comment-17068032 ] Lasaro Camargos commented on ZOOKEEPER-3769: I went back and looked into some older logs and could confirm that the WorkerReceiver died and that's what caused the election to hang. However, the BufferUnderflowException was present in very few instances. Most of the time, it was a NegativeArraySizeException that was caught, but pretty much in the same situation, that is, after the connection being broken to node3. The following are excerpts from node1 and node 3. Let me know if you would like to have a look at the full logs. 03/23/20 10:14:45,772 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO [org.apache.zookeeper.server.ZooKeeperServer] (ZooKeeperServer.java:166) - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir /company/service/log/version-2 snapdir /company/service/data/version-2 03/23/20 10:14:45,772 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO [org.apache.zookeeper.server.quorum.Learner] (Follower.java:69) - FOLLOWING - LEADER ELECTION TOOK - 9 MS 03/23/20 10:14:45,774 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG [org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:202) - Resolved address for companydemo3.snc4.companyinc.com: companydemo3.snc4.companyinc.com/172.22.64.148 03/23/20 10:14:45,793 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE [org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UNKNOWN17 5 null 03/23/20 10:14:45,798 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE [org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i DIFF 4001f null 03/23/20 10:14:45,799 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO [org.apache.zookeeper.server.quorum.Learner] (Learner.java:391) - Getting a diff from the leader 0x4001f 03/23/20 10:14:45,801 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE [org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i NEWLEADER 5 null 03/23/20 10:14:45,801 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO [org.apache.zookeeper.server.quorum.Learner] (Learner.java:546) - Learner received NEWLEADER message 03/23/20 10:14:45,815 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] TRACE [org.apache.zookeeper.server.quorum.Learner] (ZooTrace.java:71) - i UPTODATE null 03/23/20 10:14:45,816 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO [org.apache.zookeeper.server.quorum.Learner] (Learner.java:529) - Learner received UPTODATE message 03/23/20 10:14:45,816 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] DEBUG [org.apache.zookeeper.server.quorum.QuorumPeer] (QuorumPeer.java:1916) - Reconfig feature is disabled, skip reconfig processing. 03/23/20 10:14:45,817 [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled)] INFO [org.apache.zookeeper.server.quorum.CommitProcessor] (CommitProcessor.java:256) - Configuring CommitProcessor with 32 worker threads. 03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] INFO [org.apache.zookeeper.server.quorum.QuorumCnxManager] (QuorumCnxManager.java:924) - Received connection request 172.22.30.98:58472 03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] (QuorumCnxManager.java:1038) - Address of remote peer: 3 03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] (QuorumCnxManager.java:1055) - Calling finish for 3 03/23/20 10:14:46,064 [companydemo1.snc4.companyinc.com/172.22.65.65:4000] DEBUG [org.apache.zookeeper.server.quorum.QuorumCnxManager] (QuorumCnxManager.java:1072) - Removing entry from senderWorkerMap sid=3 03/23/20 10:14:46,065 [SendWorker:3] WARN [org.apache.zookeeper.server.quorum.QuorumCnxManager] (QuorumCnxManager.java:1143) - Interrupted while waiting for message on queue java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?] at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?] at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294) ~[zookeeper-3.5.7.jar:3.5.7] at
[jira] [Resolved] (ZOOKEEPER-3760) remove a useless throwing CliException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Norbert Kalmár resolved ZOOKEEPER-3760. --- Resolution: Fixed > remove a useless throwing CliException > -- > > Key: ZOOKEEPER-3760 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.7 >Reporter: Jinjiang Ling >Priority: Major > Labels: pull-request-available > Fix For: 3.6.1, 3.5.8 > > Attachments: ZOOKEEPER-3760-1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the > function processCmd in ZooKeeperMain.java is just like blow > {code:java} > protected boolean processCmd(MyCommandOptions co) throws CliException, > IOException, InterruptedException { > boolean watch = false; > try { > watch = processZKCmd(co); > exitCode = ExitCode.EXECUTION_FINISHED.getValue(); > } catch (CliException ex) { > exitCode = ex.getExitCode(); > System.err.println(ex.getMessage()); > } > return watch; > } > {code} > it throws {color:#FF}CliException {color}which has been caught in the > funciton, so I think it can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3760) remove a useless throwing CliException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Norbert Kalmár updated ZOOKEEPER-3760: -- Fix Version/s: 3.5.8 3.6.1 > remove a useless throwing CliException > -- > > Key: ZOOKEEPER-3760 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.7 >Reporter: Jinjiang Ling >Priority: Major > Labels: pull-request-available > Fix For: 3.6.1, 3.5.8 > > Attachments: ZOOKEEPER-3760-1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the > function processCmd in ZooKeeperMain.java is just like blow > {code:java} > protected boolean processCmd(MyCommandOptions co) throws CliException, > IOException, InterruptedException { > boolean watch = false; > try { > watch = processZKCmd(co); > exitCode = ExitCode.EXECUTION_FINISHED.getValue(); > } catch (CliException ex) { > exitCode = ex.getExitCode(); > System.err.println(ex.getMessage()); > } > return watch; > } > {code} > it throws {color:#FF}CliException {color}which has been caught in the > funciton, so I think it can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067546#comment-17067546 ] Mate Szalay-Beko commented on ZOOKEEPER-3769: - I created a patched 3.5.7 version, where the exception is caught and the malformed message is skipped. Can you maybe try out this version? https://drive.google.com/open?id=1cTdusaEFIVvH2D5KSrj6M9VVJoqlaQwD This should print out a warning to the log after catching the exception: {{Skipping the processing of a partial / malformed response message sent by sid=XXX}} > fast leader election does not end if leader is taken down > - > > Key: ZOOKEEPER-3769 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.7 >Reporter: Lasaro Camargos >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: node1.log, node2.log, node3.log > > > In a cluster with three nodes, node3 is the leader and the other nodes are > followers. > If I stop node3, the other two nodes do not finish the leader election. > This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and > this config > > tickTime=2000 > initLimit=30 > syncLimit=3 > dataDir=/hedvig/hpod/data > dataLogDir=/hedvig/hpod/log > clientPort=2181 > snapCount=10 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > skipACL=yes > preAllocSize=65536 > maxClientCnxns=0 > 4lw.commands.whitelist=* > admin.enableServer=false > server.1=companydemo1.snc4.companyinc.com:3000:4000 > server.2=companydemo2.snc4.companyinc.com:3000:4000 > server.3=companydemo3.snc4.companyinc.com:3000:4000 > > Could you have a look at the logs and help me figure this out? It seems like > node 1 is not getting notifications back from node2, but I don't see anything > wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could > be causing it. > > In the logs, node3 is killed at 11:17:14 > node2 is killed at 11:17:50 2 and node 1 at 11:18:02 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-2938) Server is unable to join quorum after connection broken to other peers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067535#comment-17067535 ] Denis Fulachier commented on ZOOKEEPER-2938: This issue may be related to ZOOKEEPER-2164, not yet fixed in an official version, but done in github for incoming versions [3.7.0|https://issues.apache.org/jira/issues/?jql=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+3.7.0], [3.6.1|https://issues.apache.org/jira/issues/?jql=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+3.6.1], [3.5.8|https://issues.apache.org/jira/issues/?jql=project+%3D+ZOOKEEPER+AND+fixVersion+%3D+3.5.8]. I got the same issue, and updating to a 3.6.1-SNAPSHOT version fixed it (just built today from 3.6 branch). > Server is unable to join quorum after connection broken to other peers > -- > > Key: ZOOKEEPER-2938 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2938 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.6 >Reporter: Abhay Bothra >Priority: Major > > We see the following logs in the node with {{myid: 1}} > {code} > 2017-11-08 15:06:28,375 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (2, 1) > 2017-11-08 15:06:28,375 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (3, 1) > 2017-11-08 15:07:28,375 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message > format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state) > 2017-11-08 15:07:28,375 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (2, 1) > 2017-11-08 15:07:28,376 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (3, 1) > 2017-11-08 15:08:28,375 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message > format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state) > 2017-11-08 15:08:28,376 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (2, 1) > 2017-11-08 15:08:28,376 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (3, 1) > 2017-11-08 15:09:28,376 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message > format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state) > 2017-11-08 15:09:28,376 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (2, 1) > 2017-11-08 15:09:28,376 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (3, 1) > 2017-11-08 15:10:28,376 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message > format version), 1 (n.leader), 0x28e000a8750 (n.zxid), 0x1 (n.round), LOOKING > (n.state), 1 (n.sid), 0x28e (n.peerEpoch) LOOKING (my state) > 2017-11-08 15:10:28,376 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (2, 1) > 2017-11-08 15:10:28,377 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, > so dropping the connection: (3, 1) > {code} > On the nodes with {{myid: 2}} and {{myid: 3}}, we see connection broken > events for {{myid: 1}} > {code} > 2017-11-07 02:54:32,135 [myid:2] - WARN > [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1, > my id = 2, error = > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:209) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.net.SocketInputStream.read(SocketInputStream.java:223) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765) > 2017-11-07 02:54:32,135 [myid:2] - WARN > [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker > 2017-11-07 02:54:32,135 [myid:2] - WARN > [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting > for message on queue > java.lang.InterruptedException > at >
[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067521#comment-17067521 ] Mate Szalay-Beko commented on ZOOKEEPER-3769: - I was trying to reproduce the issue using ZooKeeper 3.5.7 and OpenJDK 12.0.2 with: - this compose file: https://github.com/symat/zookeeper-docker-test/blob/master/3_nodes_zk_jdk_12.yml - this config (based on your config): https://github.com/symat/zookeeper-docker-test/blob/master/conf/ZOOKEEPER-3769_zoo.cfg I used OpenJDK 12.0.2 runtime in the docker containers. And I was trying out ZooKeeper 3.5.7 compiled both with 8u424 and with 12.0.2. Unfortunately everything was working fine... I haven't seen the BufferUnderflowException and the quorum was up quickly after I stopped the container of Server 3 (which was the leader perviously). Maybe it is an OS / networking related thing which can not be simulated with docker on a single machine. Anyway, I will create a patched version to handle this exception. > fast leader election does not end if leader is taken down > - > > Key: ZOOKEEPER-3769 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.7 >Reporter: Lasaro Camargos >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: node1.log, node2.log, node3.log > > > In a cluster with three nodes, node3 is the leader and the other nodes are > followers. > If I stop node3, the other two nodes do not finish the leader election. > This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and > this config > > tickTime=2000 > initLimit=30 > syncLimit=3 > dataDir=/hedvig/hpod/data > dataLogDir=/hedvig/hpod/log > clientPort=2181 > snapCount=10 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > skipACL=yes > preAllocSize=65536 > maxClientCnxns=0 > 4lw.commands.whitelist=* > admin.enableServer=false > server.1=companydemo1.snc4.companyinc.com:3000:4000 > server.2=companydemo2.snc4.companyinc.com:3000:4000 > server.3=companydemo3.snc4.companyinc.com:3000:4000 > > Could you have a look at the logs and help me figure this out? It seems like > node 1 is not getting notifications back from node2, but I don't see anything > wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could > be causing it. > > In the logs, node3 is killed at 11:17:14 > node2 is killed at 11:17:50 2 and node 1 at 11:18:02 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067433#comment-17067433 ] Mate Szalay-Beko commented on ZOOKEEPER-3769: - What OS version are you using? > fast leader election does not end if leader is taken down > - > > Key: ZOOKEEPER-3769 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.7 >Reporter: Lasaro Camargos >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: node1.log, node2.log, node3.log > > > In a cluster with three nodes, node3 is the leader and the other nodes are > followers. > If I stop node3, the other two nodes do not finish the leader election. > This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and > this config > > tickTime=2000 > initLimit=30 > syncLimit=3 > dataDir=/hedvig/hpod/data > dataLogDir=/hedvig/hpod/log > clientPort=2181 > snapCount=10 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > skipACL=yes > preAllocSize=65536 > maxClientCnxns=0 > 4lw.commands.whitelist=* > admin.enableServer=false > server.1=companydemo1.snc4.companyinc.com:3000:4000 > server.2=companydemo2.snc4.companyinc.com:3000:4000 > server.3=companydemo3.snc4.companyinc.com:3000:4000 > > Could you have a look at the logs and help me figure this out? It seems like > node 1 is not getting notifications back from node2, but I don't see anything > wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could > be causing it. > > In the logs, node3 is killed at 11:17:14 > node2 is killed at 11:17:50 2 and node 1 at 11:18:02 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3769) fast leader election does not end if leader is taken down
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067385#comment-17067385 ] Mate Szalay-Beko commented on ZOOKEEPER-3769: - Sorry, I just wrote the NETTY config parameter wrong. I guess you did it right when you tested it, still let me correct myself. You need {{serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory}} in the zoo.cfg, or using the system property {{-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory}}. I will try to reproduce the issue locally with Docker using ZooKeeper 3.5.7 and OpenJDK 12.0.2. Although I am not sure if this can be reproduced in Docker... I will also create a small patch to handle this BufferUnderflowException exception. Do you see this same exception all the time when some of the server fails to rejoin? Or was this only a single random error? > fast leader election does not end if leader is taken down > - > > Key: ZOOKEEPER-3769 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.7 >Reporter: Lasaro Camargos >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: node1.log, node2.log, node3.log > > > In a cluster with three nodes, node3 is the leader and the other nodes are > followers. > If I stop node3, the other two nodes do not finish the leader election. > This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and > this config > > tickTime=2000 > initLimit=30 > syncLimit=3 > dataDir=/hedvig/hpod/data > dataLogDir=/hedvig/hpod/log > clientPort=2181 > snapCount=10 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > skipACL=yes > preAllocSize=65536 > maxClientCnxns=0 > 4lw.commands.whitelist=* > admin.enableServer=false > server.1=companydemo1.snc4.companyinc.com:3000:4000 > server.2=companydemo2.snc4.companyinc.com:3000:4000 > server.3=companydemo3.snc4.companyinc.com:3000:4000 > > Could you have a look at the logs and help me figure this out? It seems like > node 1 is not getting notifications back from node2, but I don't see anything > wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could > be causing it. > > In the logs, node3 is killed at 11:17:14 > node2 is killed at 11:17:50 2 and node 1 at 11:18:02 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)