Ok, I think I get what you're saying. Perhaps you're missing that this is an issue that Guy encountered in 3.4.6 and that is fixed in a later release. We are discussing here a workaround for his 3.4.6 deployment, not a permanent solution. Does it make sense?
-Flavio > On 31 Aug 2016, at 01:16, David Brower <[email protected]> wrote: > > You'd be programming iptables to pass/accept things from a whitelist of peers > you're willing to talk with. > > If you've got such a whitelist, you don't need to program iptables to look at > the peer address from a packet/socket and drop it, you can just do it in your > message processing code. > > The second part deals with various hang situations. If you've got a > critical thread selecting/reading messages, then it can't wait forever for a > stuck read (or write). Every operation needs to be timed out in some > fashion to prevent things like hung election thread. > > You get into this sort of thing when miscreants or PEN-testers start scanning > your open ports and sending you malformed or fuzzed packets that you don't > handle cleanly, or start some exchange that they don't complete. > > -dB > > Oracle RAC Database and Clusterware Architect > > > On 8/30/2016 4:54 PM, Flavio Junqueira wrote: >> I'm not sure what you're suggesting, David. Could you be more specific, >> please? >> >> -Flavio >> >>> On 30 Aug 2016, at 23:54, David Brower <[email protected]> wrote: >>> >>> Anything you could do with iptables you can do in the process by having it >>> drop connections from things not on a whitelist, and not having a thread >>> waiting indefinitely for operations from any connection. >>> >>> -dB >>> >>> >>> On 8/30/2016 2:46 PM, Flavio Junqueira wrote: >>>> I was trying to write down an analysis and I haven't been able to come up >>>> with anything that is foolproof. Basically, the two main issues are: >>>> >>>> - A bad server is able to connect to a good server in the case it has a >>>> message outstanding and is trying to establish a connection to the good >>>> server. This happens if the server is LOOKING or has an outstanding >>>> message from the previous round. The converse isn't true, though. A good >>>> server can't start a connection to a bad server because the bad server >>>> doesn't have a listener. >>>> - If we bounce servers sequentially, there is a chance that a bad server >>>> is elected more than once along the process, which induces multiple leader >>>> election rounds. >>>> >>>> Perhaps this is overkill, but I was wondering if it makes sense to filter >>>> election traffic to and from bad servers using, for example, iptables. The >>>> idea is to a rule that are local to each server preventing the server to >>>> get connections established for leader election. For each bad server, we >>>> stop it, remove the rule, and bring it back up. We also stop a minority >>>> first before stoping the bad leader. >>>> >>>> -Flavio >>>> >>>>> On 29 Aug 2016, at 09:29, Guy Laden <[email protected]> wrote: >>>>> >>>>> Hi Flavio, Thanks for your reply. The situation is that indeed all the >>>>> servers are in a bad state so it looks like we will have to perform a >>>>> cluster restart. >>>>> >>>>> We played with attempts to optimize the downtime along the lines you >>>>> suggested. In testing it we ran into the issue where a server with no >>>>> Listener thread can initiate a leader election connection to a >>>>> newly-restarted server that does have a Listener. The result is a quorum >>>>> that may include 'bad' servers, even a 'bad' leader. So we tried to first >>>>> restart the higher-id servers, because lower-id servers will drop their >>>>> leader-election connections to higher id servers. >>>>> I'm told there are issues with this flow as well but have not yet >>>>> investigated the details. >>>>> I also worry about the leader-election retries done with exponential >>>>> backoff. >>>>> >>>>> I guess we will play with things a bit more but at this point I am tending >>>>> towards a simple parallel restart of all servers.. >>>>> >>>>> Once the clusters are healthy again we will do a rolling upgrade to 3.4.8 >>>>> sometime soon. >>>>> >>>>> Thanks again, >>>>> Guy >>>>> >>>>> >>>>> On Sun, Aug 28, 2016 at 5:52 PM, Flavio Junqueira <[email protected]> wrote: >>>>> >>>>>> Hi Guy, >>>>>> >>>>>> We don't have a way to restart the listener thread, so you really need to >>>>>> bounce the server. I don't think there is a way of doing this without >>>>>> forcing a leader election, assuming all your servers are in this bad >>>>>> state. >>>>>> To minimize downtime, one thing you can do is to avoid bouncing the >>>>>> current >>>>>> leader until it loses quorum support. Once it loses quorum support, you >>>>>> have a quorum of healthy servers and they will elect a new, healthy >>>>>> leader. >>>>>> At the point, you can bounce all your unhealthy servers. >>>>>> >>>>>> You may also want to move to a later 3.4 release. >>>>>> >>>>>> -Flavio >>>>>> >>>>>>> On 24 Aug 2016, at 23:15, Guy Laden <[email protected]> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> It looks like due to a security scan sending "bad" traffic to the leader >>>>>>> election port, we have clusters in which >>>>>>> the leader election Listener thread is dead (unchecked exception was >>>>>> thrown >>>>>>> and thread died - seen in the log). >>>>>>> (This seems to be fixed by fixed in >>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2186) >>>>>>> >>>>>>> In this state, when a healthy server comes up and tries to connecnt to >>>>>> the >>>>>>> quorum, it gets stuck on >>>>>>> the leader election. It establishes TCP connections to the other servers >>>>>>> but any traffic it sends seems >>>>>>> to get stuck in the receiver's TCP Recv queue (seen with netstat), and >>>>>>> is >>>>>>> not read/processed by zk. >>>>>>> >>>>>>> Not a good place to be :) >>>>>>> >>>>>>> This is with 3.4.6 >>>>>>> >>>>>>> Is there a way to get such clusters back to a healthy state without loss >>>>>> of >>>>>>> quorum / client impact? >>>>>>> Some way of re-starting the listener thread? or restarting the servers >>>>>> in a >>>>>>> certain order? >>>>>>> e.g. If I restart a minority, say the ones with lower server id's - is >>>>>>> there a way to get the majority servers >>>>>>> to re-initiate leader election connections with them so as to connect >>>>>> them >>>>>>> to the quorum? (and to do this without >>>>>>> the majority losing quorum). >>>>>>> >>>>>>> Thanks, >>>>>>> Guy >
