Re: Handling of xid rollover

Patrick Hunt Wed, 22 Jun 2016 11:17:28 -0700

Local sessions are only available in 3.5+ so I don't think that's an issue
for Mark (3.4.5). However it's a really good point and I'm not sure myself
what would happen - Thanks Dan!


Patrick

On Wed, Jun 22, 2016 at 11:10 AM, Jordan Zimmerman <
[email protected]> wrote:

> Curator 3.0 will simulate a session expiration when there’s a network
> partition, but Curator 2.0 does not. If you’re using ZK 3.4.5 you’d be
> using Curator 2.0 so the only way you’d see a session expiration is when
> you successfully reconnect to the ensemble.
>
> -JZ
>
> > On Jun 22, 2016, at 12:58 PM, Patrick Hunt <[email protected]> wrote:
> >
> > Hi Mark. See this jira for background:
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1277
> >
> > However what you describe is correct behavior from our perspective. When
> > the lower 32 roll over we now (that was the fix) force a re-election of
> the
> > leader. Leader re-election causes the quorum to stop serving clients
> until
> > a new quorum forms.
> >
> > Leader re-election is a normal behavior for the ZK service, it happens
> > whenever the current leader is lost and a new quorum, with a (possibly
> new)
> > leader needs to reform. Say if the current leader process is restarted.
> > Your clients need to be able to handle this situation (typically the
> client
> > library does this for you).
> >
> > That said, you should not be seeing session expiration as a result of
> this.
> > Client timeouts certainly, but not session expiration. It might happen
> for
> > other reasons, but the leader is the one responsible for expiring
> sessions.
> > If there is no leader (e.g. being re-elected) there is no session
> > expiration. When the new leader is elected it will reset the clock on
> > session expiration, for all sessions, from the time it's reelected. For
> > example you can shutdown the entire ZK server ensemble, start it back up
> an
> > hour later and the clients should all be able to rejoin. Hm, that said
> I'm
> > not sure if Curator is doing some special magic, that's the behavior of
> the
> > stock client that we ship.
> >
> > Patrick
> >
> >
> > On Wed, Jun 22, 2016 at 6:18 AM, Figura, Mark <[email protected]>
> wrote:
> >
> >> Hi,
> >>
> >> We are using ZooKeeper 3.4.5 along with Curator to perform leader
> >> elections and also store some application data on a 3-node ensemble. Our
> >> application is not hard-realtime, but glitches in stream processing do
> get
> >> noticed and may raise support tickets.
> >>
> >> Yesterday, we had such a glitch and by looking through the logs, I found
> >> there was an XID rollover. When this happened, a new election within the
> >> ensemble was triggered and all client connections were closed. From our
> >> application's point of view (possibly filtered through Curator), we saw
> the
> >> session expire and then the connection was lost. This caused our
> >> application to shutdown each component, re-perform leader elections, and
> >> eventually start back up.
> >>
> >> We do have an issue where our application is making many more writes
> than
> >> it should, but once this is fixed, we'll still run into an XID rollover
> >> sooner or later.
> >>
> >> Is there something our application can do to handle this situation
> better?
> >> Are there any plans for Zookeeper to handle this situation without
> closing
> >> client connections?
> >>
> >> Thanks!
> >> Mark
> >>
>
>

Re: Handling of xid rollover

Reply via email to