Re: Working around Leader election Listner thread death

Guy Laden Wed, 31 Aug 2016 11:24:37 -0700

Hi Flavio, I think your idea of using iptables should help. I hope to have
time to experiment with it.
Thanks for your help.
Guy



On Wed, Aug 31, 2016 at 11:15 AM, Flavio Junqueira <[email protected]> wrote:

> Ok, I think I get what you're saying. Perhaps you're missing that this is
> an issue that Guy encountered in 3.4.6 and that is fixed in a later
> release. We are discussing here a workaround for his 3.4.6 deployment, not
> a permanent solution. Does it make sense?
>
> -Flavio
>
> > On 31 Aug 2016, at 01:16, David Brower <[email protected]> wrote:
> >
> > You'd be programming iptables to pass/accept things from a whitelist of
> peers you're willing to talk with.
> >
> > If you've got such a whitelist, you don't need to program iptables to
> look at the peer address from a packet/socket and drop it, you can just do
> it in your message processing code.
> >
> > The second part deals with various hang situations.   If you've got a
> critical thread selecting/reading messages, then it can't wait forever for
> a stuck read (or write).   Every operation needs to be timed out in some
> fashion to prevent things like hung election thread.
> >
> > You get into this sort of thing when miscreants or PEN-testers start
> scanning your open ports and sending you malformed or fuzzed packets that
> you don't handle cleanly, or start some exchange that they don't complete.
> >
> > -dB
> >
> > Oracle RAC Database and Clusterware Architect
> >
> >
> > On 8/30/2016 4:54 PM, Flavio Junqueira wrote:
> >> I'm not sure what you're suggesting, David. Could you be more specific,
> please?
> >>
> >> -Flavio
> >>
> >>> On 30 Aug 2016, at 23:54, David Brower <[email protected]>
> wrote:
> >>>
> >>> Anything you could do with iptables you can do in the process by
> having it drop connections from things not on a whitelist, and not having a
> thread waiting indefinitely for operations from any connection.
> >>>
> >>> -dB
> >>>
> >>>
> >>> On 8/30/2016 2:46 PM, Flavio Junqueira wrote:
> >>>> I was trying to write down an analysis and I haven't been able to
> come up with anything that is foolproof. Basically, the two main issues are:
> >>>>
> >>>> - A bad server is able to connect to a good server in the case it has
> a message outstanding and is trying to establish a connection to the good
> server. This happens if the server is LOOKING or has an outstanding message
> from the previous round. The converse isn't true, though. A good server
> can't start a connection to a bad server because the bad server doesn't
> have a listener.
> >>>> - If we bounce servers sequentially, there is a chance that a bad
> server is elected more than once along the process, which induces multiple
> leader election rounds.
> >>>>
> >>>> Perhaps this is overkill, but I was wondering if it makes sense to
> filter election traffic to and from bad servers using, for example,
> iptables. The idea is to a rule that are local to each server preventing
> the server to get connections established for leader election. For each bad
> server, we stop it, remove the rule, and bring it back up. We also stop a
> minority first before stoping the bad leader.
> >>>>
> >>>> -Flavio
> >>>>
> >>>>> On 29 Aug 2016, at 09:29, Guy Laden <[email protected]> wrote:
> >>>>>
> >>>>> Hi Flavio, Thanks for your reply. The situation is that indeed all
> the
> >>>>> servers are in a bad state so it looks like we will have to perform a
> >>>>> cluster restart.
> >>>>>
> >>>>> We played with attempts to optimize the downtime along the lines you
> >>>>> suggested. In testing it we ran into the issue where a server with no
> >>>>> Listener thread can initiate a leader election connection to a
> >>>>> newly-restarted server that does have a Listener. The result is a
> quorum
> >>>>> that may include 'bad' servers, even a 'bad' leader. So we tried to
> first
> >>>>> restart the higher-id servers, because lower-id servers will drop
> their
> >>>>> leader-election connections to higher id servers.
> >>>>> I'm told there are issues with this flow as well but have not yet
> >>>>> investigated the details.
> >>>>> I also worry about the leader-election retries done with exponential
> >>>>> backoff.
> >>>>>
> >>>>> I guess we will play with things a bit more but at this point I am
> tending
> >>>>> towards a simple parallel restart of all servers..
> >>>>>
> >>>>> Once the clusters are healthy again we will do a rolling upgrade to
> 3.4.8
> >>>>> sometime soon.
> >>>>>
> >>>>> Thanks again,
> >>>>> Guy
> >>>>>
> >>>>>
> >>>>> On Sun, Aug 28, 2016 at 5:52 PM, Flavio Junqueira <[email protected]>
> wrote:
> >>>>>
> >>>>>> Hi Guy,
> >>>>>>
> >>>>>> We don't have a way to restart the listener thread, so you really
> need to
> >>>>>> bounce the server. I don't think there is a way of doing this
> without
> >>>>>> forcing a leader election, assuming all your servers are in this
> bad state.
> >>>>>> To minimize downtime, one thing you can do is to avoid bouncing the
> current
> >>>>>> leader until it loses quorum support. Once it loses quorum support,
> you
> >>>>>> have a quorum of healthy servers and they will elect a new, healthy
> leader.
> >>>>>> At the point, you can bounce all your unhealthy servers.
> >>>>>>
> >>>>>> You may also want to move to a later 3.4 release.
> >>>>>>
> >>>>>> -Flavio
> >>>>>>
> >>>>>>> On 24 Aug 2016, at 23:15, Guy Laden <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> It looks like due to a security scan sending "bad" traffic to the
> leader
> >>>>>>> election port, we have clusters in which
> >>>>>>> the leader election Listener thread is dead (unchecked exception
> was
> >>>>>> thrown
> >>>>>>> and thread died - seen in the log).
> >>>>>>> (This seems to be fixed by fixed in
> >>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2186)
> >>>>>>>
> >>>>>>> In this state, when a healthy server comes up and tries to
> connecnt to
> >>>>>> the
> >>>>>>> quorum, it gets stuck on
> >>>>>>> the leader election. It establishes TCP connections to the other
> servers
> >>>>>>> but any traffic it sends seems
> >>>>>>> to get stuck in the receiver's TCP Recv queue (seen with netstat),
> and is
> >>>>>>> not read/processed by zk.
> >>>>>>>
> >>>>>>> Not a good place to be :)
> >>>>>>>
> >>>>>>> This is with 3.4.6
> >>>>>>>
> >>>>>>> Is there a way to get such clusters back to a healthy state
> without loss
> >>>>>> of
> >>>>>>> quorum / client impact?
> >>>>>>> Some way of re-starting the listener thread? or restarting the
> servers
> >>>>>> in a
> >>>>>>> certain order?
> >>>>>>> e.g. If I restart a minority, say the ones with lower server id's
> - is
> >>>>>>> there a way to get the majority servers
> >>>>>>> to re-initiate leader election connections with them so as to
> connect
> >>>>>> them
> >>>>>>> to the quorum? (and to do this without
> >>>>>>> the majority losing quorum).
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Guy
> >
>
>

Re: Working around Leader election Listner thread death

Reply via email to