We ran into an issue where one follower in a five node configuration was significantly out of sync with the rest of the nodes. Running 'stat' showed the stale server at zxid 0x900c44edd and other servers were at zxid 0x900c4679b. I manually ran 'sync /' from the cli but that had no impact. While this was happening, snapshots were being created very frequently (500+ in just a few hours w/o many transactions). The logs for the snapshots are below. Eventually, restarting the server (without cleaning out the database on disk) resolved the issue, but we're now trying to understand what happened and how to prevent it.
At the moment we are using an earlier version of 3.5.0 (revision 1398005). Has anyone seen this before? ................... Mar 28 23:33:38 zookeeper - INFO [SyncThread:7:FileTxnLog@199] - Creating new log file: log.90016c3db Mar 28 23:33:46 zookeeper - INFO [Snapshot Thread:FileTxnSnapLog@270] - Snapshotting: 0x90016e292 to /data/zookeeper/ 10.5.3.61/version-2/snapshot.90016e292 Mar 28 23:33:46 zookeeper - INFO [SyncThread:7:FileTxnLog@199] - Creating new log file: log.90016e294 Mar 28 23:33:54 zookeeper - INFO [Snapshot Thread:FileTxnSnapLog@270] - Snapshotting: 0x90016feb4 to /data/zookeeper/ 10.5.3.61/version-2/snapshot.90016feb4 Mar 28 23:33:54 zookeeper - INFO [SyncThread:7:FileTxnLog@199] - Creating new log file: log.90016feb5 Mar 28 23:34:04 zookeeper - INFO [Snapshot Thread:FileTxnSnapLog@270] - Snapshotting: 0x9001721db to /data/zookeeper/ 10.5.3.61/version-2/snapshot.9001721db ....................... ~Jared
