Hi, We are using ZooKeeper 3.4.5 along with Curator to perform leader elections and also store some application data on a 3-node ensemble. Our application is not hard-realtime, but glitches in stream processing do get noticed and may raise support tickets.
Yesterday, we had such a glitch and by looking through the logs, I found there was an XID rollover. When this happened, a new election within the ensemble was triggered and all client connections were closed. From our application's point of view (possibly filtered through Curator), we saw the session expire and then the connection was lost. This caused our application to shutdown each component, re-perform leader elections, and eventually start back up. We do have an issue where our application is making many more writes than it should, but once this is fixed, we'll still run into an XID rollover sooner or later. Is there something our application can do to handle this situation better? Are there any plans for Zookeeper to handle this situation without closing client connections? Thanks! Mark
