Hi Flavio, I think your idea of using iptables should help. I hope to have time to experiment with it. Thanks for your help. Guy
On Wed, Aug 31, 2016 at 11:15 AM, Flavio Junqueira <[email protected]> wrote: > Ok, I think I get what you're saying. Perhaps you're missing that this is > an issue that Guy encountered in 3.4.6 and that is fixed in a later > release. We are discussing here a workaround for his 3.4.6 deployment, not > a permanent solution. Does it make sense? > > -Flavio > > > On 31 Aug 2016, at 01:16, David Brower <[email protected]> wrote: > > > > You'd be programming iptables to pass/accept things from a whitelist of > peers you're willing to talk with. > > > > If you've got such a whitelist, you don't need to program iptables to > look at the peer address from a packet/socket and drop it, you can just do > it in your message processing code. > > > > The second part deals with various hang situations. If you've got a > critical thread selecting/reading messages, then it can't wait forever for > a stuck read (or write). Every operation needs to be timed out in some > fashion to prevent things like hung election thread. > > > > You get into this sort of thing when miscreants or PEN-testers start > scanning your open ports and sending you malformed or fuzzed packets that > you don't handle cleanly, or start some exchange that they don't complete. > > > > -dB > > > > Oracle RAC Database and Clusterware Architect > > > > > > On 8/30/2016 4:54 PM, Flavio Junqueira wrote: > >> I'm not sure what you're suggesting, David. Could you be more specific, > please? > >> > >> -Flavio > >> > >>> On 30 Aug 2016, at 23:54, David Brower <[email protected]> > wrote: > >>> > >>> Anything you could do with iptables you can do in the process by > having it drop connections from things not on a whitelist, and not having a > thread waiting indefinitely for operations from any connection. > >>> > >>> -dB > >>> > >>> > >>> On 8/30/2016 2:46 PM, Flavio Junqueira wrote: > >>>> I was trying to write down an analysis and I haven't been able to > come up with anything that is foolproof. Basically, the two main issues are: > >>>> > >>>> - A bad server is able to connect to a good server in the case it has > a message outstanding and is trying to establish a connection to the good > server. This happens if the server is LOOKING or has an outstanding message > from the previous round. The converse isn't true, though. A good server > can't start a connection to a bad server because the bad server doesn't > have a listener. > >>>> - If we bounce servers sequentially, there is a chance that a bad > server is elected more than once along the process, which induces multiple > leader election rounds. > >>>> > >>>> Perhaps this is overkill, but I was wondering if it makes sense to > filter election traffic to and from bad servers using, for example, > iptables. The idea is to a rule that are local to each server preventing > the server to get connections established for leader election. For each bad > server, we stop it, remove the rule, and bring it back up. We also stop a > minority first before stoping the bad leader. > >>>> > >>>> -Flavio > >>>> > >>>>> On 29 Aug 2016, at 09:29, Guy Laden <[email protected]> wrote: > >>>>> > >>>>> Hi Flavio, Thanks for your reply. The situation is that indeed all > the > >>>>> servers are in a bad state so it looks like we will have to perform a > >>>>> cluster restart. > >>>>> > >>>>> We played with attempts to optimize the downtime along the lines you > >>>>> suggested. In testing it we ran into the issue where a server with no > >>>>> Listener thread can initiate a leader election connection to a > >>>>> newly-restarted server that does have a Listener. The result is a > quorum > >>>>> that may include 'bad' servers, even a 'bad' leader. So we tried to > first > >>>>> restart the higher-id servers, because lower-id servers will drop > their > >>>>> leader-election connections to higher id servers. > >>>>> I'm told there are issues with this flow as well but have not yet > >>>>> investigated the details. > >>>>> I also worry about the leader-election retries done with exponential > >>>>> backoff. > >>>>> > >>>>> I guess we will play with things a bit more but at this point I am > tending > >>>>> towards a simple parallel restart of all servers.. > >>>>> > >>>>> Once the clusters are healthy again we will do a rolling upgrade to > 3.4.8 > >>>>> sometime soon. > >>>>> > >>>>> Thanks again, > >>>>> Guy > >>>>> > >>>>> > >>>>> On Sun, Aug 28, 2016 at 5:52 PM, Flavio Junqueira <[email protected]> > wrote: > >>>>> > >>>>>> Hi Guy, > >>>>>> > >>>>>> We don't have a way to restart the listener thread, so you really > need to > >>>>>> bounce the server. I don't think there is a way of doing this > without > >>>>>> forcing a leader election, assuming all your servers are in this > bad state. > >>>>>> To minimize downtime, one thing you can do is to avoid bouncing the > current > >>>>>> leader until it loses quorum support. Once it loses quorum support, > you > >>>>>> have a quorum of healthy servers and they will elect a new, healthy > leader. > >>>>>> At the point, you can bounce all your unhealthy servers. > >>>>>> > >>>>>> You may also want to move to a later 3.4 release. > >>>>>> > >>>>>> -Flavio > >>>>>> > >>>>>>> On 24 Aug 2016, at 23:15, Guy Laden <[email protected]> wrote: > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> It looks like due to a security scan sending "bad" traffic to the > leader > >>>>>>> election port, we have clusters in which > >>>>>>> the leader election Listener thread is dead (unchecked exception > was > >>>>>> thrown > >>>>>>> and thread died - seen in the log). > >>>>>>> (This seems to be fixed by fixed in > >>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2186) > >>>>>>> > >>>>>>> In this state, when a healthy server comes up and tries to > connecnt to > >>>>>> the > >>>>>>> quorum, it gets stuck on > >>>>>>> the leader election. It establishes TCP connections to the other > servers > >>>>>>> but any traffic it sends seems > >>>>>>> to get stuck in the receiver's TCP Recv queue (seen with netstat), > and is > >>>>>>> not read/processed by zk. > >>>>>>> > >>>>>>> Not a good place to be :) > >>>>>>> > >>>>>>> This is with 3.4.6 > >>>>>>> > >>>>>>> Is there a way to get such clusters back to a healthy state > without loss > >>>>>> of > >>>>>>> quorum / client impact? > >>>>>>> Some way of re-starting the listener thread? or restarting the > servers > >>>>>> in a > >>>>>>> certain order? > >>>>>>> e.g. If I restart a minority, say the ones with lower server id's > - is > >>>>>>> there a way to get the majority servers > >>>>>>> to re-initiate leader election connections with them so as to > connect > >>>>>> them > >>>>>>> to the quorum? (and to do this without > >>>>>>> the majority losing quorum). > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Guy > > > >
