Re: Working around Leader election Listner thread death

Flavio Junqueira Wed, 31 Aug 2016 01:15:46 -0700

Ok, I think I get what you're saying. Perhaps you're missing that this is an 
issue that Guy encountered in 3.4.6 and that is fixed in a later release. We 
are discussing here a workaround for his 3.4.6 deployment, not a permanent 
solution. Does it make sense?


-Flavio
 
> On 31 Aug 2016, at 01:16, David Brower <[email protected]> wrote:
> 
> You'd be programming iptables to pass/accept things from a whitelist of peers 
> you're willing to talk with.
> 
> If you've got such a whitelist, you don't need to program iptables to look at 
> the peer address from a packet/socket and drop it, you can just do it in your 
> message processing code.
> 
> The second part deals with various hang situations.   If you've got a 
> critical thread selecting/reading messages, then it can't wait forever for a 
> stuck read (or write).   Every operation needs to be timed out in some 
> fashion to prevent things like hung election thread.
> 
> You get into this sort of thing when miscreants or PEN-testers start scanning 
> your open ports and sending you malformed or fuzzed packets that you don't 
> handle cleanly, or start some exchange that they don't complete.
> 
> -dB
> 
> Oracle RAC Database and Clusterware Architect
> 
> 
> On 8/30/2016 4:54 PM, Flavio Junqueira wrote:
>> I'm not sure what you're suggesting, David. Could you be more specific, 
>> please?
>> 
>> -Flavio
>> 
>>> On 30 Aug 2016, at 23:54, David Brower <[email protected]> wrote:
>>> 
>>> Anything you could do with iptables you can do in the process by having it 
>>> drop connections from things not on a whitelist, and not having a thread 
>>> waiting indefinitely for operations from any connection.
>>> 
>>> -dB
>>> 
>>> 
>>> On 8/30/2016 2:46 PM, Flavio Junqueira wrote:
>>>> I was trying to write down an analysis and I haven't been able to come up 
>>>> with anything that is foolproof. Basically, the two main issues are:
>>>> 
>>>> - A bad server is able to connect to a good server in the case it has a 
>>>> message outstanding and is trying to establish a connection to the good 
>>>> server. This happens if the server is LOOKING or has an outstanding 
>>>> message from the previous round. The converse isn't true, though. A good 
>>>> server can't start a connection to a bad server because the bad server 
>>>> doesn't have a listener.
>>>> - If we bounce servers sequentially, there is a chance that a bad server 
>>>> is elected more than once along the process, which induces multiple leader 
>>>> election rounds.
>>>> 
>>>> Perhaps this is overkill, but I was wondering if it makes sense to filter 
>>>> election traffic to and from bad servers using, for example, iptables. The 
>>>> idea is to a rule that are local to each server preventing the server to 
>>>> get connections established for leader election. For each bad server, we 
>>>> stop it, remove the rule, and bring it back up. We also stop a minority 
>>>> first before stoping the bad leader.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 29 Aug 2016, at 09:29, Guy Laden <[email protected]> wrote:
>>>>> 
>>>>> Hi Flavio, Thanks for your reply. The situation is that indeed all the
>>>>> servers are in a bad state so it looks like we will have to perform a
>>>>> cluster restart.
>>>>> 
>>>>> We played with attempts to optimize the downtime along the lines you
>>>>> suggested. In testing it we ran into the issue where a server with no
>>>>> Listener thread can initiate a leader election connection to a
>>>>> newly-restarted server that does have a Listener. The result is a quorum
>>>>> that may include 'bad' servers, even a 'bad' leader. So we tried to first
>>>>> restart the higher-id servers, because lower-id servers will drop their
>>>>> leader-election connections to higher id servers.
>>>>> I'm told there are issues with this flow as well but have not yet
>>>>> investigated the details.
>>>>> I also worry about the leader-election retries done with exponential
>>>>> backoff.
>>>>> 
>>>>> I guess we will play with things a bit more but at this point I am tending
>>>>> towards a simple parallel restart of all servers..
>>>>> 
>>>>> Once the clusters are healthy again we will do a rolling upgrade to 3.4.8
>>>>> sometime soon.
>>>>> 
>>>>> Thanks again,
>>>>> Guy
>>>>> 
>>>>> 
>>>>> On Sun, Aug 28, 2016 at 5:52 PM, Flavio Junqueira <[email protected]> wrote:
>>>>> 
>>>>>> Hi Guy,
>>>>>> 
>>>>>> We don't have a way to restart the listener thread, so you really need to
>>>>>> bounce the server. I don't think there is a way of doing this without
>>>>>> forcing a leader election, assuming all your servers are in this bad 
>>>>>> state.
>>>>>> To minimize downtime, one thing you can do is to avoid bouncing the 
>>>>>> current
>>>>>> leader until it loses quorum support. Once it loses quorum support, you
>>>>>> have a quorum of healthy servers and they will elect a new, healthy 
>>>>>> leader.
>>>>>> At the point, you can bounce all your unhealthy servers.
>>>>>> 
>>>>>> You may also want to move to a later 3.4 release.
>>>>>> 
>>>>>> -Flavio
>>>>>> 
>>>>>>> On 24 Aug 2016, at 23:15, Guy Laden <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> It looks like due to a security scan sending "bad" traffic to the leader
>>>>>>> election port, we have clusters in which
>>>>>>> the leader election Listener thread is dead (unchecked exception was
>>>>>> thrown
>>>>>>> and thread died - seen in the log).
>>>>>>> (This seems to be fixed by fixed in
>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2186)
>>>>>>> 
>>>>>>> In this state, when a healthy server comes up and tries to connecnt to
>>>>>> the
>>>>>>> quorum, it gets stuck on
>>>>>>> the leader election. It establishes TCP connections to the other servers
>>>>>>> but any traffic it sends seems
>>>>>>> to get stuck in the receiver's TCP Recv queue (seen with netstat), and 
>>>>>>> is
>>>>>>> not read/processed by zk.
>>>>>>> 
>>>>>>> Not a good place to be :)
>>>>>>> 
>>>>>>> This is with 3.4.6
>>>>>>> 
>>>>>>> Is there a way to get such clusters back to a healthy state without loss
>>>>>> of
>>>>>>> quorum / client impact?
>>>>>>> Some way of re-starting the listener thread? or restarting the servers
>>>>>> in a
>>>>>>> certain order?
>>>>>>> e.g. If I restart a minority, say the ones with lower server id's - is
>>>>>>> there a way to get the majority servers
>>>>>>> to re-initiate leader election connections with them so as to connect
>>>>>> them
>>>>>>> to the quorum? (and to do this without
>>>>>>> the majority losing quorum).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Guy
>

Re: Working around Leader election Listner thread death

Reply via email to