Hi,

We are using ZooKeeper 3.4.5 along with Curator to perform leader elections and 
also store some application data on a 3-node ensemble. Our application is not 
hard-realtime, but glitches in stream processing do get noticed and may raise 
support tickets.

Yesterday, we had such a glitch and by looking through the logs, I found there 
was an XID rollover. When this happened, a new election within the ensemble was 
triggered and all client connections were closed. From our application's point 
of view (possibly filtered through Curator), we saw the session expire and then 
the connection was lost. This caused our application to shutdown each 
component, re-perform leader elections, and eventually start back up.

We do have an issue where our application is making many more writes than it 
should, but once this is fixed, we'll still run into an XID rollover sooner or 
later.

Is there something our application can do to handle this situation better? Are 
there any plans for Zookeeper to handle this situation without closing client 
connections?

Thanks!
Mark

Reply via email to