Re: [**SPAM**] Re: [**SPAM**] RE: ZK Server does not join quorum after restart
Hi Andor, As this is on a production server, I can’t attach the log file entirely, but I can try and get you as much information as I can: Nearly all of the log file is filled with connection errors from ZooKeeper clients: > WARN NIOServerCnxn – Exception causing close of session 0x0 due to > java.io.IOException: ZooKeeperServer not running > INFO NIOServerCnxn – Closed socket connection for client / (no > session established for client) I grabbed all of the IP addresses in the log file and they’re all from clients, no mention of other ZK servers. Looking at ‘Quorum’, I see a lot of: > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO FastLeaderElection - > Notification time out: 6 > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO QuorumCnxManager - Have > smaller server identifier, so dropping the connection: (2, 1) > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO QuorumCnxManager - Have > smaller server identifier, so dropping the connection: (3, 1) > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO QuorumCnxManager - Have > smaller server identifier, so dropping the connection: (4, 1) > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO QuorumCnxManager - Have > smaller server identifier, so dropping the connection: (5, 1) Let me know if there is anything else you think I should look for. If I find anything interesting I’ll share it here. From: Andor Molnar Reply-To: "user@zookeeper.apache.org" Date: Friday, January 25, 2019 at 10:01 To: "user@zookeeper.apache.org" Subject: [**SPAM**] Re: [**SPAM**] RE: ZK Server does not join quorum after restart Hi Ian, Would you please attach logs from all participants of the ensemble or try to find an exception from when the follower is trying to join? Regards, Andor On Fri, Jan 25, 2019 at 1:37 AM Ian Spence mailto:ian.spe...@globalrelay.net>> wrote: Hi Daniel, Thanks for the quick reply. We use static IP addresses on all of the servers so it did not change after the reboot. Thanks, -Ian From: Daniel Chan mailto:daniel.cw.c...@oracle.com>> on behalf of Daniel Chan < daniel.cw.c...@oracle.com<mailto:daniel.cw.c...@oracle.com>> Reply-To: "user@zookeeper.apache.org<mailto:user@zookeeper.apache.org>" mailto:user@zookeeper.apache.org>> Date: Thursday, January 24, 2019 at 16:36 To: "user@zookeeper.apache.org<mailto:user@zookeeper.apache.org>" mailto:user@zookeeper.apache.org>> Subject: [**SPAM**] RE: ZK Server does not join quorum after restart If its IP address got changed, then you hit a known bug https://issues.apache.org/jira/browse/ZOOKEEPER-1506 and you need to bounce the cluster. Thanks, Daniel -Original Message- From: Ian Spence mailto:ian.spe...@globalrelay.net>mailto:ian.spe...@globalrelay.net>>> Sent: Thursday, January 24, 2019 2:36 PM To: user@zookeeper.apache.org<mailto:user@zookeeper.apache.org><mailto:user@zookeeper.apache.org> Subject: ZK Server does not join quorum after restart Hello We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on CentOS 6. These are physical devices, not virtual machines. One server required hardware maintenance, and was restarted. When the zk software was restarted, it did not rejoin the quorum as a follower. Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not currently serving requests” I googled this message and came across this bug: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0= Does anybody know if there is a work-around to this issue? We’ve seen this problem multiple times in the past and our current solution is to bring down the zk cluster (which is a huge outage-causing pain). Thanks - Ian
Re: [**SPAM**] RE: ZK Server does not join quorum after restart
Hi Ian, Would you please attach logs from all participants of the ensemble or try to find an exception from when the follower is trying to join? Regards, Andor On Fri, Jan 25, 2019 at 1:37 AM Ian Spence wrote: > Hi Daniel, > > Thanks for the quick reply. We use static IP addresses on all of the > servers so it did not change after the reboot. > > Thanks, > -Ian > > From: Daniel Chan on behalf of Daniel Chan < > daniel.cw.c...@oracle.com> > Reply-To: "user@zookeeper.apache.org" > Date: Thursday, January 24, 2019 at 16:36 > To: "user@zookeeper.apache.org" > Subject: [**SPAM**] RE: ZK Server does not join quorum after restart > > > If its IP address got changed, then you hit a known bug > https://issues.apache.org/jira/browse/ZOOKEEPER-1506 and you need to > bounce the cluster. > > Thanks, > Daniel > > -Original Message- > From: Ian Spence ian.spe...@globalrelay.net>> > Sent: Thursday, January 24, 2019 2:36 PM > To: user@zookeeper.apache.org<mailto:user@zookeeper.apache.org> > Subject: ZK Server does not join quorum after restart > > Hello > > We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on > CentOS 6. These are physical devices, not virtual machines. > > One server required hardware maintenance, and was restarted. When the zk > software was restarted, it did not rejoin the quorum as a follower. > > Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not > currently serving requests” > > I googled this message and came across this bug: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0= > > Does anybody know if there is a work-around to this issue? We’ve seen this > problem multiple times in the past and our current solution is to bring > down the zk cluster (which is a huge outage-causing pain). > > Thanks > > - Ian > >
Re: [**SPAM**] RE: ZK Server does not join quorum after restart
Hi Daniel, Thanks for the quick reply. We use static IP addresses on all of the servers so it did not change after the reboot. Thanks, -Ian From: Daniel Chan on behalf of Daniel Chan Reply-To: "user@zookeeper.apache.org" Date: Thursday, January 24, 2019 at 16:36 To: "user@zookeeper.apache.org" Subject: [**SPAM**] RE: ZK Server does not join quorum after restart If its IP address got changed, then you hit a known bug https://issues.apache.org/jira/browse/ZOOKEEPER-1506 and you need to bounce the cluster. Thanks, Daniel -Original Message- From: Ian Spence mailto:ian.spe...@globalrelay.net>> Sent: Thursday, January 24, 2019 2:36 PM To: user@zookeeper.apache.org<mailto:user@zookeeper.apache.org> Subject: ZK Server does not join quorum after restart Hello We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on CentOS 6. These are physical devices, not virtual machines. One server required hardware maintenance, and was restarted. When the zk software was restarted, it did not rejoin the quorum as a follower. Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not currently serving requests” I googled this message and came across this bug: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0= Does anybody know if there is a work-around to this issue? We’ve seen this problem multiple times in the past and our current solution is to bring down the zk cluster (which is a huge outage-causing pain). Thanks - Ian