Re: Working around Leader election Listner thread death

Flavio Junqueira Tue, 30 Aug 2016 16:55:23 -0700

I'm not sure what you're suggesting, David. Could you be more specific, please?


-Flavio

> On 30 Aug 2016, at 23:54, David Brower <[email protected]> wrote:
> 
> Anything you could do with iptables you can do in the process by having it 
> drop connections from things not on a whitelist, and not having a thread 
> waiting indefinitely for operations from any connection.
> 
> -dB
> 
> 
> On 8/30/2016 2:46 PM, Flavio Junqueira wrote:
>> I was trying to write down an analysis and I haven't been able to come up 
>> with anything that is foolproof. Basically, the two main issues are:
>> 
>> - A bad server is able to connect to a good server in the case it has a 
>> message outstanding and is trying to establish a connection to the good 
>> server. This happens if the server is LOOKING or has an outstanding message 
>> from the previous round. The converse isn't true, though. A good server 
>> can't start a connection to a bad server because the bad server doesn't have 
>> a listener.
>> - If we bounce servers sequentially, there is a chance that a bad server is 
>> elected more than once along the process, which induces multiple leader 
>> election rounds.
>> 
>> Perhaps this is overkill, but I was wondering if it makes sense to filter 
>> election traffic to and from bad servers using, for example, iptables. The 
>> idea is to a rule that are local to each server preventing the server to get 
>> connections established for leader election. For each bad server, we stop 
>> it, remove the rule, and bring it back up. We also stop a minority first 
>> before stoping the bad leader.
>> 
>> -Flavio
>> 
>>> On 29 Aug 2016, at 09:29, Guy Laden <[email protected]> wrote:
>>> 
>>> Hi Flavio, Thanks for your reply. The situation is that indeed all the
>>> servers are in a bad state so it looks like we will have to perform a
>>> cluster restart.
>>> 
>>> We played with attempts to optimize the downtime along the lines you
>>> suggested. In testing it we ran into the issue where a server with no
>>> Listener thread can initiate a leader election connection to a
>>> newly-restarted server that does have a Listener. The result is a quorum
>>> that may include 'bad' servers, even a 'bad' leader. So we tried to first
>>> restart the higher-id servers, because lower-id servers will drop their
>>> leader-election connections to higher id servers.
>>> I'm told there are issues with this flow as well but have not yet
>>> investigated the details.
>>> I also worry about the leader-election retries done with exponential
>>> backoff.
>>> 
>>> I guess we will play with things a bit more but at this point I am tending
>>> towards a simple parallel restart of all servers..
>>> 
>>> Once the clusters are healthy again we will do a rolling upgrade to 3.4.8
>>> sometime soon.
>>> 
>>> Thanks again,
>>> Guy
>>> 
>>> 
>>> On Sun, Aug 28, 2016 at 5:52 PM, Flavio Junqueira <[email protected]> wrote:
>>> 
>>>> Hi Guy,
>>>> 
>>>> We don't have a way to restart the listener thread, so you really need to
>>>> bounce the server. I don't think there is a way of doing this without
>>>> forcing a leader election, assuming all your servers are in this bad state.
>>>> To minimize downtime, one thing you can do is to avoid bouncing the current
>>>> leader until it loses quorum support. Once it loses quorum support, you
>>>> have a quorum of healthy servers and they will elect a new, healthy leader.
>>>> At the point, you can bounce all your unhealthy servers.
>>>> 
>>>> You may also want to move to a later 3.4 release.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 24 Aug 2016, at 23:15, Guy Laden <[email protected]> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> It looks like due to a security scan sending "bad" traffic to the leader
>>>>> election port, we have clusters in which
>>>>> the leader election Listener thread is dead (unchecked exception was
>>>> thrown
>>>>> and thread died - seen in the log).
>>>>> (This seems to be fixed by fixed in
>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2186)
>>>>> 
>>>>> In this state, when a healthy server comes up and tries to connecnt to
>>>> the
>>>>> quorum, it gets stuck on
>>>>> the leader election. It establishes TCP connections to the other servers
>>>>> but any traffic it sends seems
>>>>> to get stuck in the receiver's TCP Recv queue (seen with netstat), and is
>>>>> not read/processed by zk.
>>>>> 
>>>>> Not a good place to be :)
>>>>> 
>>>>> This is with 3.4.6
>>>>> 
>>>>> Is there a way to get such clusters back to a healthy state without loss
>>>> of
>>>>> quorum / client impact?
>>>>> Some way of re-starting the listener thread? or restarting the servers
>>>> in a
>>>>> certain order?
>>>>> e.g. If I restart a minority, say the ones with lower server id's - is
>>>>> there a way to get the majority servers
>>>>> to re-initiate leader election connections with them so as to connect
>>>> them
>>>>> to the quorum? (and to do this without
>>>>> the majority losing quorum).
>>>>> 
>>>>> Thanks,
>>>>> Guy
>>>> 
>

Re: Working around Leader election Listner thread death

Reply via email to