I may be misunderstanding something but to the best of my knowledge the
situation is that if you are running ZooKeeper on Linux+Iptables then
- If you run 3.5 or later then be sure to enable the TCP keepalive flags
- If you run 3.4.* or earlier - BEWARE as leader election packets will
eventually be dropped
- your options include:
- manually patching ZK to enable TCP keepalive on the leader
election connections
- run ZK with something like https://github.com/flonatel/libdontdie
(i have not tested this)
- any other suggestions?
On Sat, Aug 27, 2016 at 4:22 AM, Patrick Hunt <[email protected]> wrote:
> I've not seen this but I remember Kishore mentioning they had run with
> iptable based testing at some point, Kishore any insight?
>
> Patrick
>
> On Thu, Aug 25, 2016 at 8:10 AM, Guy Laden <[email protected]> wrote:
>
> > Is anybody running 3.4 branch ZooKeeper on Linux with iptables?
> >
> > We are running 3.4.6 and have run into conntrack silently expiring the
> > leader election connections after they are idle for 5 days.
> > (/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established)
> > We then see leader election on some machines sometimes gets stuck for 15
> > minutes or so, until the TCP socket times-out.
> >
> > This JIRA seems to fix this but only in 3.5 branch
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> >
> > Does 3.4.8 make a difference to this issue?
> >
> > If not then this scenario does not seem rare - perhaps it is something to
> > add to the wiki? (Will be happy to do it)
> >
>