Forgot to mention one other option: increase the conntrack timeout for idle established tcp connections. Some issues with this direction: I think the maximum value for this comes to a bit more than a year. Also, this is a global setting for the machine and the conntrack table is of limited size.
On Wed, Aug 31, 2016 at 9:27 PM, Guy Laden <[email protected]> wrote: > I may be misunderstanding something but to the best of my knowledge the > situation is that if you are running ZooKeeper on Linux+Iptables then > > - If you run 3.5 or later then be sure to enable the TCP keepalive flags > > - If you run 3.4.* or earlier - BEWARE as leader election packets will > eventually be dropped > - your options include: > - manually patching ZK to enable TCP keepalive on the leader > election connections > - run ZK with something like https://github.com/ > flonatel/libdontdie (i have not tested this) > - any other suggestions? > > > > > On Sat, Aug 27, 2016 at 4:22 AM, Patrick Hunt <[email protected]> wrote: > >> I've not seen this but I remember Kishore mentioning they had run with >> iptable based testing at some point, Kishore any insight? >> >> Patrick >> >> On Thu, Aug 25, 2016 at 8:10 AM, Guy Laden <[email protected]> wrote: >> >> > Is anybody running 3.4 branch ZooKeeper on Linux with iptables? >> > >> > We are running 3.4.6 and have run into conntrack silently expiring the >> > leader election connections after they are idle for 5 days. >> > (/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established) >> > We then see leader election on some machines sometimes gets stuck for 15 >> > minutes or so, until the TCP socket times-out. >> > >> > This JIRA seems to fix this but only in 3.5 branch >> > https://issues.apache.org/jira/browse/ZOOKEEPER-1748 >> > >> > Does 3.4.8 make a difference to this issue? >> > >> > If not then this scenario does not seem rare - perhaps it is something >> to >> > add to the wiki? (Will be happy to do it) >> > >> > >
