[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060596#comment-17060596 ] Michael Dürr commented on ZOOKEEPER-2164: - Thank you very much [~eolivelli] and [~symat] ! > fast leader election keeps failing > -- > > Key: ZOOKEEPER-2164 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.4.5 >Reporter: Michi Mutsuzaki >Assignee: Mate Szalay-Beko >Priority: Major > Labels: pull-request-available > Fix For: 3.7.0, 3.6.1, 3.5.8 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. > When I shut down 2, 1 and 3 keep going back to leader election. Here is what > seems to be happening. > - Both 1 and 3 elect 3 as the leader. > - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a > follower. > - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't > timeout for 5 seconds: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 > - By the time 3 receives votes, 1 has given up trying to connect to 3: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 > I'm using 3.4.5, but it looks like this part of the code hasn't changed for a > while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3760) remove a useless throwing CliException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ZOOKEEPER-3760: -- Labels: pull-request-available (was: ) > remove a useless throwing CliException > -- > > Key: ZOOKEEPER-3760 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.7 >Reporter: Jinjiang Ling >Priority: Major > Labels: pull-request-available > Attachments: ZOOKEEPER-3760-1.patch > > > when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the > function processCmd in ZooKeeperMain.java is just like blow > {code:java} > protected boolean processCmd(MyCommandOptions co) throws CliException, > IOException, InterruptedException { > boolean watch = false; > try { > watch = processZKCmd(co); > exitCode = ExitCode.EXECUTION_FINISHED.getValue(); > } catch (CliException ex) { > exitCode = ex.getExitCode(); > System.err.println(ex.getMessage()); > } > return watch; > } > {code} > it throws {color:#FF}CliException {color}which has been caught in the > funciton, so I think it can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ZOOKEEPER-3760) remove a useless throwing CliException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinjiang Ling updated ZOOKEEPER-3760: - Attachment: ZOOKEEPER-3760-1.patch > remove a useless throwing CliException > -- > > Key: ZOOKEEPER-3760 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.7 >Reporter: Jinjiang Ling >Priority: Major > Attachments: ZOOKEEPER-3760-1.patch > > > when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the > function processCmd in ZooKeeperMain.java is just like blow > {code:java} > protected boolean processCmd(MyCommandOptions co) throws CliException, > IOException, InterruptedException { > boolean watch = false; > try { > watch = processZKCmd(co); > exitCode = ExitCode.EXECUTION_FINISHED.getValue(); > } catch (CliException ex) { > exitCode = ex.getExitCode(); > System.err.println(ex.getMessage()); > } > return watch; > } > {code} > it throws {color:#FF}CliException {color}which has been caught in the > funciton, so I think it can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-3760) remove a useless throwing CliException
Jinjiang Ling created ZOOKEEPER-3760: Summary: remove a useless throwing CliException Key: ZOOKEEPER-3760 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3760 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.7 Reporter: Jinjiang Ling when I upgrade zookeeper from 3.4.13 to 3.5.7 in my application, I find the function processCmd in ZooKeeperMain.java is just like blow {code:java} protected boolean processCmd(MyCommandOptions co) throws CliException, IOException, InterruptedException { boolean watch = false; try { watch = processZKCmd(co); exitCode = ExitCode.EXECUTION_FINISHED.getValue(); } catch (CliException ex) { exitCode = ex.getExitCode(); System.err.println(ex.getMessage()); } return watch; } {code} it throws {color:#FF}CliException {color}which has been caught in the funciton, so I think it can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3756) Members failing to rejoin quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060440#comment-17060440 ] Dai Shi commented on ZOOKEEPER-3756: I think you are right that kubernetes networking is one of the main issues here. Because the server IPs in the zookeeper configs are pointing to kubernetes services, opening a TCP connection to those IPs when there are no backend endpoints (which is the case when a pod is deleted) will just hang. I tried running with {{-Dzookeeper.cnxTimeout=500}} and now the cluster stays down for around 3 to 5 seconds when restarting the leader instead of more than 30 seconds. We may be able to tolerate this duration of downtime as a bandaid. I can try and build a 3.6.0 docker image and test the multiAddress feature as well. Is there anything I should pay attention to while upgrading to 3.6.0? Also is it possible to downgrade back to 3.5.7 afterwards? > Members failing to rejoin quorum > > > Key: ZOOKEEPER-3756 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.5.6, 3.5.7 >Reporter: Dai Shi >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: Dockerfile, configmap.yaml, docker-entrypoint.sh, > jmx.yaml, zoo-0.log, zoo-1.log, zoo-2.log, zoo-service.yaml, zookeeper.yaml > > > Not sure if this is the place to ask, please close if it's not. > I am seeing some behavior that I can't explain since upgrading to 3.5: > In a 5 member quorum, when server 3 is the leader and each server has this in > their configuration: > {code:java} > server.1=100.71.255.254:2888:3888:participant;2181 > server.2=100.71.255.253:2888:3888:participant;2181 > server.3=100.71.255.252:2888:3888:participant;2181 > server.4=100.71.255.251:2888:3888:participant;2181 > server.5=100.71.255.250:2888:3888:participant;2181{code} > If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in > the logs: > {code:java} > 2020-03-11 20:23:35,720 [myid:2] - INFO > [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - > LOOKING > 2020-03-11 20:23:35,721 [myid:2] - INFO > [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885] > - New election. My id = 2, proposed zxid=0x1b8005f4bba > 2020-03-11 20:23:35,733 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (3, 2) > 2020-03-11 20:23:35,734 [myid:2] - INFO > [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection > request 100.126.116.201:36140 > 2020-03-11 20:23:35,735 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (4, 2) > 2020-03-11 20:23:35,740 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (5, 2) > 2020-03-11 20:23:35,740 [myid:2] - INFO > [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection > request 100.126.116.201:36142 > 2020-03-11 20:23:35,740 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message > format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING > (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config > version) > 2020-03-11 20:23:35,742 [myid:2] - WARN > [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting > for message on queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131) > 2020-03-11 20:23:35,744 [myid:2] - WARN > [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread > id 3 my id = 2 > 2020-03-11 20:23:35,745 [myid:2] - WARN > [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interrupting > SendWorker{code} > The only way I can seem to get them to rejoin the quorum is to restart the > leader. > However, if I remove server 4 and 5 from the configuration of server 1 or 2 > (so only servers 1, 2, and 3 remain in the configuration
[jira] [Created] (ZOOKEEPER-3759) A way to configure the jmx rmi port
Agostino Sarubbo created ZOOKEEPER-3759: --- Summary: A way to configure the jmx rmi port Key: ZOOKEEPER-3759 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3759 Project: ZooKeeper Issue Type: Bug Reporter: Agostino Sarubbo The start script misses a way to configure a java_rmi port, see also: https://issues.apache.org/jira/browse/KAFKA-8658 [https://github.com/apache/kafka/pull/7088/commits/d02e14da8752a08bfe4f837d1cfea2c7b51e07af] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060265#comment-17060265 ] Agostino Sarubbo commented on ZOOKEEPER-3758: - Hello, here is the requested data: {code:java} ~ # java -version openjdk version "1.8.0_242" OpenJDK Runtime Environment (build 1.8.0_242-b08) OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode) {code} {code:java} # zoo.cfg tickTime=2000 dataDir=/opt/loway/zookeeper/data dataLogDir=/opt/loway/zookeeper/logs clientPort=2181 secureClientPort=2281 initLimit=100 syncLimit=30 4lw.commands.whitelist=* autopurge.purgeInterval=1 autopurge.snapRetainCount=5 server.1=zookeeper1.mydomain:2888:3888 server.2=zookeeper2.mydomain:2888:3888 server.3=zookeeper3.mydomain:2888:3888 server.4=zookeeper4.mydomain:2888:3888 server.5=zookeeper5.mydomain:2888:3888{code} We update zookeeper nodes one by one by installing the new version. We are using static configs, the job is done by ansible so there is no human error during the update. Is there anything else I can provide to debug the issue? > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Assignee: Mate Szalay-Beko >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > - zookeeper.pathStats.slotCapacity = 60
[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060204#comment-17060204 ] Mate Szalay-Beko commented on ZOOKEEPER-2164: - FYI: this ticket contained multiple errors regarding the leader election. We fixed one (with the 0.0.0.0 addresses), but the original one (slow leader election due to synchronized {{connectOne}} method call and socket timeouts) remained unfixed. Now I just faced the same original issue in ZOOKEEPER-3756, and plan to fix it. I don't think we should re-open this jira, but I will rather use the new one. > fast leader election keeps failing > -- > > Key: ZOOKEEPER-2164 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.4.5 >Reporter: Michi Mutsuzaki >Assignee: Mate Szalay-Beko >Priority: Major > Labels: pull-request-available > Fix For: 3.7.0, 3.6.1, 3.5.8 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. > When I shut down 2, 1 and 3 keep going back to leader election. Here is what > seems to be happening. > - Both 1 and 3 elect 3 as the leader. > - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a > follower. > - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't > timeout for 5 seconds: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 > - By the time 3 receives votes, 1 has given up trying to connect to 3: > https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 > I'm using 3.4.5, but it looks like this part of the code hasn't changed for a > while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3756) Members failing to rejoin quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060200#comment-17060200 ] Mate Szalay-Beko commented on ZOOKEEPER-3756: - OK, I have a theory... Maybe this is what happens: - After shutting down the leader, the whole leader election restarts - ZooKeeper tries to open socket connection to the other ZooKeeper servers by using synchronized methods, so only one can run a time (see on the master branch: https://github.com/apache/zookeeper/blob/a5a4743733b8939464af82c1ee68a593fadbe362/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L688 and https://github.com/apache/zookeeper/blob/a5a4743733b8939464af82c1ee68a593fadbe362/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L759) - the default timeout is 5 secs (this is why there is nothing leader election related log message in your log files for 5 sec, until we hit the timeout of socket open to server 3) - by the time when the 5 sec timeout elapsed, the leader election protocol was also timeouted (but AFAIK it is increasing its internal timeout always? I will need to verify this) - after this happens a few time, either the leader election protocol timeout is increased enough to be able tolerate the 5 sec delay (and/or the fact that the server-3 restarted and the socket can be opened now) will cause that this block gets removed and everything goes smoothly after this. But it took 30 seconds, what is way too long... The question is, why the socket needs to timeout (wait for 5 sec) and why the connection doesn't get closed immediately with some 'host unreachable' exception, what we would expect in case if the server goes down and no IP connection can be established. Usually we don't see this problem in production, so I guess it has to do something with Kubernetes networking. Still, this part needs to be refactored in ZooKeeper, we have to make the {{connectOne}} asynchronous, what is not an easy task. Actually this is also something which was suggested in ZOOKEEPER-2164 (but in that ticket there were other errors fixed in the end). In the meanwhile there might be some workarounds: # you can decrease the connection timeout to e.g. 500ms or 1000ms using the {{-Dzookeeper.cnxTimeout=500'}} system property. I am not sure if it will help, but I would be glad if you could test it # an other independent workaround would be using the multiAddress feature of ZooKeeper 3.6.0, enabling it by {{-Dzookeeper.multiAddress.enabled=true}}. Then ZooKeeper should periodically check the availability of the currently used election addresses and kill the socket if the host is unavailable. This way we might kill the dead socket before the timeout happen. However, it might run ICMP traffic (ping) in the background, which I am not sure if will be reliable in kubernetes. No matter if the workarounds would fix the problem for you or not, I would suggest to keep this ticket open, and I will try to implement an asynchronous connection establishment somehow. > Members failing to rejoin quorum > > > Key: ZOOKEEPER-3756 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.5.6, 3.5.7 >Reporter: Dai Shi >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: Dockerfile, configmap.yaml, docker-entrypoint.sh, > jmx.yaml, zoo-0.log, zoo-1.log, zoo-2.log, zoo-service.yaml, zookeeper.yaml > > > Not sure if this is the place to ask, please close if it's not. > I am seeing some behavior that I can't explain since upgrading to 3.5: > In a 5 member quorum, when server 3 is the leader and each server has this in > their configuration: > {code:java} > server.1=100.71.255.254:2888:3888:participant;2181 > server.2=100.71.255.253:2888:3888:participant;2181 > server.3=100.71.255.252:2888:3888:participant;2181 > server.4=100.71.255.251:2888:3888:participant;2181 > server.5=100.71.255.250:2888:3888:participant;2181{code} > If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in > the logs: > {code:java} > 2020-03-11 20:23:35,720 [myid:2] - INFO > [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - > LOOKING > 2020-03-11 20:23:35,721 [myid:2] - INFO > [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885] > - New election. My id = 2, proposed zxid=0x1b8005f4bba > 2020-03-11 20:23:35,733 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (3, 2) > 2020-03-11 20:23:35,734 [myid:2] - INFO > [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection >
[jira] [Assigned] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mate Szalay-Beko reassigned ZOOKEEPER-3758: --- Assignee: Mate Szalay-Beko > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Assignee: Mate Szalay-Beko >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@112] > - zookeeper.pathStats.slotDuration = 15 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@113] > - zookeeper.pathStats.maxDepth = 6 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@114] > - zookeeper.pathStats.initialDelay = 5 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@115] > - zookeeper.pathStats.delay = 5 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@116] > - zookeeper.pathStats.enabled =
[jira] [Comment Edited] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060158#comment-17060158 ] Mate Szalay-Beko edited comment on ZOOKEEPER-3758 at 3/16/20, 12:06 PM: also a small hint: I checked the code and AFAICS this exception suggests that the given ZooKeeper instance (a follower) don't know what is the quorum address / port of the newly elected ZooKeeper server. This shouldn't really happen, unless you hit some bug or configuration issue. I am happy to dig deeper if you can send more info. Also asking in the user mail list (as Enrico suggested) is better, as more people are watching there. was (Author: symat): also a small hint: I checked the code and this exception shows that the given ZooKeeper instance (a follower) don't know what is the quorum address / port of the newly elected ZooKeeper server. This shouldn't really happen, unless you hit some bug or configuration issue. I am happy to dig deeper if you can send more info. Also asking in the user mail list (as Enrico suggested) is better, as more people are watching there. > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > -
[jira] [Comment Edited] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060154#comment-17060154 ] Mate Szalay-Beko edited comment on ZOOKEEPER-3758 at 3/16/20, 12:05 PM: We tested the 3.5.7 -> 3.6.0 upgrade before the release, but of course it is always possible that we missed something... Could you also share your configs and java version when you write the email? Also please provide some more background info, like: are you doing a rolling-upgrade, or just simply starting a new cluster with the old data? Did you change anything in the config compared to the old cluster? Are you using static config files, or you use the dynamic re-config? was (Author: symat): We tested the 3.5.7 -> 3.6.0 upgrade before the release, but of course it is always possible that we missed something... Could you also share your configs and java version when you write the email? Also please provide some more background info, like: are you doing a rolling-upgrade, or just simply starting a new cluster with the old data? Did you change anything in the config compared to the old cluster? > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] >
[jira] [Commented] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060158#comment-17060158 ] Mate Szalay-Beko commented on ZOOKEEPER-3758: - also a small hint: I checked the code and this exception shows that the given ZooKeeper instance (a follower) don't know what is the quorum address / port of the newly elected ZooKeeper server. This shouldn't really happen, unless you hit some bug or configuration issue. I am happy to dig deeper if you can send more info. Also asking in the user mail list (as Enrico suggested) is better, as more people are watching there. > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@112] > - zookeeper.pathStats.slotDuration = 15 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@113] > - zookeeper.pathStats.maxDepth = 6 2020-03-16 10:40:45,519 [myid:1] - INFO >
[jira] [Commented] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060154#comment-17060154 ] Mate Szalay-Beko commented on ZOOKEEPER-3758: - We tested the 3.5.7 -> 3.6.0 upgrade before the release, but of course it tis always possible that we missed something... Could you also share your configs and java version when you write the email? Also please provide some more background info, like: are you doing a rolling-upgrade, or just simply starting a new cluster with the old data? Did you change anything in the config compared to the old cluster? > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@112] > - zookeeper.pathStats.slotDuration = 15 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@113] > - zookeeper.pathStats.maxDepth = 6 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@114] > -
[jira] [Comment Edited] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060154#comment-17060154 ] Mate Szalay-Beko edited comment on ZOOKEEPER-3758 at 3/16/20, 11:56 AM: We tested the 3.5.7 -> 3.6.0 upgrade before the release, but of course it is always possible that we missed something... Could you also share your configs and java version when you write the email? Also please provide some more background info, like: are you doing a rolling-upgrade, or just simply starting a new cluster with the old data? Did you change anything in the config compared to the old cluster? was (Author: symat): We tested the 3.5.7 -> 3.6.0 upgrade before the release, but of course it tis always possible that we missed something... Could you also share your configs and java version when you write the email? Also please provide some more background info, like: are you doing a rolling-upgrade, or just simply starting a new cluster with the old data? Did you change anything in the config compared to the old cluster? > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519
[jira] [Commented] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060095#comment-17060095 ] Enrico Olivelli commented on ZOOKEEPER-3758: Please start a discussion on u...@zookeeper.apache.org It will be easier to help you. > Update from 3.5.7 to 3.6.0 does not work > > > Key: ZOOKEEPER-3758 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Agostino Sarubbo >Priority: Major > > Hello, > we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 > to 3.6.0 but it does not work. > We got the following: > {code:java} > 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] > - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] > - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] > - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] > - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 > [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, > n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, > n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - > Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, > n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format > version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] > - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] > - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] > - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] > - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] > - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 > [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] > - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@112] > - zookeeper.pathStats.slotDuration = 15 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@113] > - zookeeper.pathStats.maxDepth = 6 2020-03-16 10:40:45,519 [myid:1] - INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@114] > - zookeeper.pathStats.initialDelay = 5 2020-03-16 10:40:45,519 [myid:1] - > INFO > [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@115] > - zookeeper.pathStats.delay = 5 2020-03-16 10:40:45,519 [myid:1] - INFO >
[jira] [Created] (ZOOKEEPER-3758) Update from 3.5.7 to 3.6.0 does not work
Agostino Sarubbo created ZOOKEEPER-3758: --- Summary: Update from 3.5.7 to 3.6.0 does not work Key: ZOOKEEPER-3758 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3758 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Agostino Sarubbo Hello, we have a cluster with 5 zookeeper servers. We tried the update from 3.5.7 to 3.6.0 but it does not work. We got the following: {code:java} 2020-03-16 10:40:45,514 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@863] - Peer state changed: looking 2020-03-16 10:40:45,514 [myid:1] - WARN [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1501] - PeerState set to LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1371] - LOOKING 2020-03-16 10:40:45,514 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):FastLeaderElection@931] - New election. My id = 1, proposed zxid=0x0 2020-03-16 10:40:45,515 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING , n.leader:1, n.round:0x1b, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWI NG, n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:3, n.state:FOLLOWI NG, n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,517 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:5, n.state:FOLLOWI NG, n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:4, n.state:LEADING , n.leader:4, n.round:0x1a, n.peerEpoch:0x5c, n.zxid:0x5b0004, message format version:0x2, n.config version:0x0 2020-03-16 10:40:45,518 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@857] - Peer state changed: following 2020-03-16 10:40:45,518 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):QuorumPeer@1453] - FOLLOWING 2020-03-16 10:40:45,518 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1246] - minSessionTimeout set to 4000 2020-03-16 10:40:45,518 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1255] - maxSessionTimeout set to 4 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ResponseCache@45] - Response cache size is initialized with value 400. 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@111] - zookeeper.pathStats.slotCapacity = 60 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@112] - zookeeper.pathStats.slotDuration = 15 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@113] - zookeeper.pathStats.maxDepth = 6 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@114] - zookeeper.pathStats.initialDelay = 5 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@115] - zookeeper.pathStats.delay = 5 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):RequestPathMetricsCollector@116] - zookeeper.pathStats.enabled = false 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1470] - The max bytes for all large requests are set t o 104857600 2020-03-16 10:40:45,519 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=0.0.0.0:2281):ZooKeeperServer@1484] - The large request threshold is set to -1 2020-03-16 10:40:45,519 [myid:1] -
[jira] [Commented] (ZOOKEEPER-3756) Members failing to rejoin quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060051#comment-17060051 ] Mate Szalay-Beko commented on ZOOKEEPER-3756: - Thanks, it's great that you were able to do this test and sent all the logs. I need a bit more time to dig into it, I hope I can analyze it deeper and come back with some answers (possibly questions? :) ) today / tomorrow. > Members failing to rejoin quorum > > > Key: ZOOKEEPER-3756 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3756 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.5.6, 3.5.7 >Reporter: Dai Shi >Assignee: Mate Szalay-Beko >Priority: Major > Attachments: Dockerfile, configmap.yaml, docker-entrypoint.sh, > jmx.yaml, zoo-0.log, zoo-1.log, zoo-2.log, zoo-service.yaml, zookeeper.yaml > > > Not sure if this is the place to ask, please close if it's not. > I am seeing some behavior that I can't explain since upgrading to 3.5: > In a 5 member quorum, when server 3 is the leader and each server has this in > their configuration: > {code:java} > server.1=100.71.255.254:2888:3888:participant;2181 > server.2=100.71.255.253:2888:3888:participant;2181 > server.3=100.71.255.252:2888:3888:participant;2181 > server.4=100.71.255.251:2888:3888:participant;2181 > server.5=100.71.255.250:2888:3888:participant;2181{code} > If servers 1 or 2 are restarted, they fail to rejoin the quorum with this in > the logs: > {code:java} > 2020-03-11 20:23:35,720 [myid:2] - INFO > [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1175] - > LOOKING > 2020-03-11 20:23:35,721 [myid:2] - INFO > [QuorumPeer[myid=2](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@885] > - New election. My id = 2, proposed zxid=0x1b8005f4bba > 2020-03-11 20:23:35,733 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (3, 2) > 2020-03-11 20:23:35,734 [myid:2] - INFO > [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection > request 100.126.116.201:36140 > 2020-03-11 20:23:35,735 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (4, 2) > 2020-03-11 20:23:35,740 [myid:2] - INFO > [WorkerSender[myid=2]:QuorumCnxManager@438] - Have smaller server identifier, > so dropping the connection: (5, 2) > 2020-03-11 20:23:35,740 [myid:2] - INFO > [0.0.0.0/0.0.0.0:3888:QuorumCnxManager$Listener@924] - Received connection > request 100.126.116.201:36142 > 2020-03-11 20:23:35,740 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message > format version), 2 (n.leader), 0x1b8005f4bba (n.zxid), 0x1 (n.round), LOOKING > (n.state), 2 (n.sid), 0x1b8 (n.peerEPoch), LOOKING (my state)0 (n.config > version) > 2020-03-11 20:23:35,742 [myid:2] - WARN > [SendWorker:3:QuorumCnxManager$SendWorker@1143] - Interrupted while waiting > for message on queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) > at > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1294) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:82) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1131) > 2020-03-11 20:23:35,744 [myid:2] - WARN > [SendWorker:3:QuorumCnxManager$SendWorker@1153] - Send worker leaving thread > id 3 my id = 2 > 2020-03-11 20:23:35,745 [myid:2] - WARN > [RecvWorker:3:QuorumCnxManager$RecvWorker@1230] - Interrupting > SendWorker{code} > The only way I can seem to get them to rejoin the quorum is to restart the > leader. > However, if I remove server 4 and 5 from the configuration of server 1 or 2 > (so only servers 1, 2, and 3 remain in the configuration file), then they can > rejoin the quorum fine. Is this expected and am I doing something wrong? Any > help or explanation would be greatly appreciated. Thank you. -- This message was sent by Atlassian Jira (v8.3.4#803005)