Re: [**SPAM**] Re: [**SPAM**] RE: ZK Server does not join quorum after restart

2019-01-25 Thread Ian Spence
Hi Andor,

As this is on a production server, I can’t attach the log file entirely, but I 
can try and get you as much information as I can:

Nearly all of the log file is filled with connection errors from ZooKeeper 
clients:

> WARN NIOServerCnxn – Exception causing close of session 0x0 due to 
> java.io.IOException: ZooKeeperServer not running
> INFO NIOServerCnxn – Closed socket connection for client / (no 
> session established for client)

I grabbed all of the IP addresses in the log file and they’re all from clients, 
no mention of other ZK servers.

Looking at ‘Quorum’, I see a lot of:

> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  FastLeaderElection - 
> Notification time out: 6
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (2, 1)
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (3, 1)
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (4, 1)
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] INFO  QuorumCnxManager - Have 
> smaller server identifier, so dropping the connection: (5, 1)

Let me know if there is anything else you think I should look for. If I find 
anything interesting I’ll share it here.



From: Andor Molnar 
Reply-To: "user@zookeeper.apache.org" 
Date: Friday, January 25, 2019 at 10:01
To: "user@zookeeper.apache.org" 
Subject: [**SPAM**] Re: [**SPAM**] RE: ZK Server does not join quorum after 
restart

Hi Ian,

Would you please attach logs from all participants of the ensemble or try
to find an exception from when the follower is trying to join?

Regards,
Andor



On Fri, Jan 25, 2019 at 1:37 AM Ian Spence 
mailto:ian.spe...@globalrelay.net>>
wrote:

Hi Daniel,

Thanks for the quick reply. We use static IP addresses on all of the
servers so it did not change after the reboot.

Thanks,
-Ian

From: Daniel Chan mailto:daniel.cw.c...@oracle.com>> 
on behalf of Daniel Chan <
daniel.cw.c...@oracle.com<mailto:daniel.cw.c...@oracle.com>>
Reply-To: "user@zookeeper.apache.org<mailto:user@zookeeper.apache.org>" 
mailto:user@zookeeper.apache.org>>
Date: Thursday, January 24, 2019 at 16:36
To: "user@zookeeper.apache.org<mailto:user@zookeeper.apache.org>" 
mailto:user@zookeeper.apache.org>>
Subject: [**SPAM**] RE: ZK Server does not join quorum after restart


If its IP address got changed, then you hit a known bug
https://issues.apache.org/jira/browse/ZOOKEEPER-1506  and you need to
bounce the cluster.

Thanks,
Daniel

-Original Message-
From: Ian Spence 
mailto:ian.spe...@globalrelay.net>mailto:ian.spe...@globalrelay.net>>>
Sent: Thursday, January 24, 2019 2:36 PM
To: 
user@zookeeper.apache.org<mailto:user@zookeeper.apache.org><mailto:user@zookeeper.apache.org>
Subject: ZK Server does not join quorum after restart

Hello

We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on
CentOS 6. These are physical devices, not virtual machines.

One server required hardware maintenance, and was restarted. When the zk
software was restarted, it did not rejoin the quorum as a follower.

Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not
currently serving requests”

I googled this message and came across this bug:
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0=

Does anybody know if there is a work-around to this issue? We’ve seen this
problem multiple times in the past and our current solution is to bring
down the zk cluster (which is a huge outage-causing pain).

Thanks

- Ian





Re: [**SPAM**] RE: ZK Server does not join quorum after restart

2019-01-25 Thread Andor Molnar
Hi Ian,

Would you please attach logs from all participants of the ensemble or try
to find an exception from when the follower is trying to join?

Regards,
Andor



On Fri, Jan 25, 2019 at 1:37 AM Ian Spence 
wrote:

> Hi Daniel,
>
> Thanks for the quick reply. We use static IP addresses on all of the
> servers so it did not change after the reboot.
>
> Thanks,
> -Ian
>
> From: Daniel Chan  on behalf of Daniel Chan <
> daniel.cw.c...@oracle.com>
> Reply-To: "user@zookeeper.apache.org" 
> Date: Thursday, January 24, 2019 at 16:36
> To: "user@zookeeper.apache.org" 
> Subject: [**SPAM**] RE: ZK Server does not join quorum after restart
>
>
> If its IP address got changed, then you hit a known bug
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506  and you need to
> bounce the cluster.
>
> Thanks,
> Daniel
>
> -Original Message-
> From: Ian Spence  ian.spe...@globalrelay.net>>
> Sent: Thursday, January 24, 2019 2:36 PM
> To: user@zookeeper.apache.org<mailto:user@zookeeper.apache.org>
> Subject: ZK Server does not join quorum after restart
>
> Hello
>
> We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on
> CentOS 6. These are physical devices, not virtual machines.
>
> One server required hardware maintenance, and was restarted. When the zk
> software was restarted, it did not rejoin the quorum as a follower.
>
> Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not
> currently serving requests”
>
> I googled this message and came across this bug:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0=
>
> Does anybody know if there is a work-around to this issue? We’ve seen this
> problem multiple times in the past and our current solution is to bring
> down the zk cluster (which is a huge outage-causing pain).
>
> Thanks
>
> - Ian
>
>


Re: [**SPAM**] RE: ZK Server does not join quorum after restart

2019-01-24 Thread Ian Spence
Hi Daniel,

Thanks for the quick reply. We use static IP addresses on all of the servers so 
it did not change after the reboot.

Thanks,
-Ian

From: Daniel Chan  on behalf of Daniel Chan 

Reply-To: "user@zookeeper.apache.org" 
Date: Thursday, January 24, 2019 at 16:36
To: "user@zookeeper.apache.org" 
Subject: [**SPAM**] RE: ZK Server does not join quorum after restart


If its IP address got changed, then you hit a known bug 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506  and you need to bounce 
the cluster.

Thanks,
Daniel

-Original Message-
From: Ian Spence mailto:ian.spe...@globalrelay.net>>
Sent: Thursday, January 24, 2019 2:36 PM
To: user@zookeeper.apache.org<mailto:user@zookeeper.apache.org>
Subject: ZK Server does not join quorum after restart

Hello

We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on CentOS 
6. These are physical devices, not virtual machines.

One server required hardware maintenance, and was restarted. When the zk 
software was restarted, it did not rejoin the quorum as a follower.

Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not 
currently serving requests”

I googled this message and came across this bug: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0=

Does anybody know if there is a work-around to this issue? We’ve seen this 
problem multiple times in the past and our current solution is to bring down 
the zk cluster (which is a huge outage-causing pain).

Thanks

- Ian