Re: Leader election failing

2018-10-02 Thread Cee Tee
Quick update: Apparently the election notifications disappeared somewhere between the datacenters (firewall) when the sockets were not used for some time. We fixed this with zookeeper.tcpKeepAlive=true. Regards, Chris On Wed, Aug 8, 2018 at 5:05 PM Andor Molnar wrote: > Some kind of a network

Re: Leader election failing

2018-09-11 Thread Chris
What action should i perform for getting the most usable logs in this case ? Log level to debug and kill -3 when its failing ? On 11 September 2018 9:17:45 pm Andor Molnár wrote: Erm. Thanks for carrying out these tests Chris. Have you by any chance - as Camille suggested - collected

Re: Leader election failing

2018-09-11 Thread Andor Molnár
Erm. Thanks for carrying out these tests Chris. Have you by any chance - as Camille suggested - collected debug logs from these tests? Andor On 09/11/2018 11:08 AM, Cee Tee wrote: > Concluded a test with a 3.4.13 cluster, it shows the same behaviour. > > On Mon, Sep 3, 2018 at 4:56 PM Andor

Re: Leader election failing

2018-09-11 Thread Cee Tee
Concluded a test with a 3.4.13 cluster, it shows the same behaviour. On Mon, Sep 3, 2018 at 4:56 PM Andor Molnar wrote: > Thanks for testing Chris. > > So, if I understand you correctly, you're running the latest version from > branch-3.5. Could we say that this is a 3.5-only problem? > Have

Re: Leader election failing

2018-09-03 Thread Chris
I havent noticed it in 3.4 back when we used it , but i can do a test to confirm it. I will let you know in appx one week. Regards Chris On 3 September 2018 4:56:00 pm Andor Molnar wrote: Thanks for testing Chris. So, if I understand you correctly, you're running the latest version from

Re: Leader election failing

2018-09-03 Thread Andor Molnar
Thanks for testing Chris. So, if I understand you correctly, you're running the latest version from branch-3.5. Could we say that this is a 3.5-only problem? Have you ever tested the same cluster with 3.4? Regards, Andor On Tue, Aug 21, 2018 at 11:29 AM, Cee Tee wrote: > I've tested the

Re: Leader election failing

2018-08-21 Thread Cee Tee
I've tested the patch and let it run 6 days. It did not help, result is still the same. (remaining ZKs form islands based on datacenter they are in). I have mitigated it by doing a daily rolling restart. Regards, Chris On Mon, Aug 13, 2018 at 2:06 PM Andor Molnar wrote: > Hi Chris, > > Would

Re: Leader election failing

2018-08-13 Thread Chris
Interesting, i will have a look at it. Thanks Chris On 13 August 2018 2:06:55 pm Andor Molnar wrote: Hi Chris, Would you mind testing the following patch on your test clusters? I'm not entirely sure, but the issue might be related. https://issues.apache.org/jira/browse/ZOOKEEPER-2930

Re: Leader election failing

2018-08-13 Thread Andor Molnar
Hi Chris, Would you mind testing the following patch on your test clusters? I'm not entirely sure, but the issue might be related. https://issues.apache.org/jira/browse/ZOOKEEPER-2930 Regards, Andor On Wed, Aug 8, 2018 at 6:51 PM, Camille Fournier wrote: > If you have the time and

Re: Leader election failing

2018-08-08 Thread Chris
Running 3.5.5 I managed to recreate it on acc and test cluster today, failing on shutdown of leader. Both had been running for over a week. After restarting all zookeepers it runs fine no matter how many leader shutdowns i throw at it. On 8 August 2018 5:05:34 pm Andor Molnar wrote: Some

Re: Leader election failing

2018-08-08 Thread Andor Molnar
Some kind of a network split? It looks like 1-2 and 3-4 were able to communicate each other, but connection timed out between these 2 splits. When 5 came back online it started with supporters of (1,2) and later 3 and 4 also joined. There was no such issue the day after. Which version of

Re: Leader election failing

2018-08-08 Thread Chris
Actually i have similar issues on my test and acceptance clusters where leader election fails if the cluster has been running for a couple of days. If you stop/start the Zookeepers once they will work fine on further disruptions that day. Not sure yet what the treshold is. On 8 August 2018