Hi All, Can you please share your experience regarding ZK snapshot retention and recovery policies?
We have an application where we never need to rollback (i.e., revert back to a previous state by using old snapshots). Given this, I am trying to understand under what circumstances would we ever need to use old ZK snapshots. I understand a lot of these decisions depend on the application and amount of redundancy used at every level (e.g,. RAID level where the snapshots are stored etc) in the product. To simplify the discussion, I would like to rule out any application characteristics and focus mainly on data consistency. - Assuming that we have a 3 node cluster I am trying to figure out when would I really need to use old snapshot files. With 3 nodes we already have at least 2 servers with consistent database. If I loose files on one of the servers, I can use files from the other. In fact, ZK server join will take care of this. I can remove files from a faulty node and reboot that node. The faulty node will sync with the leader. - The old files will be useful if the current snapshot and/or log files are lost or corrupted on all 3 servers. If the loss is due to a disaster (case where we loose all 3 servers), one would have to keep the snapshots on some external storage to recover. However, if the current snapshot file is corrupted on all 3 servers, then the most likely cause would be a bug in ZK. In which case, how can I trust the consistency of the old snapshots? - Given a set of snapshots and log files, how can I verify the correctness of these files? Example, if one of the intermediate snapshot file is corrupt. - The Admin's guide says "Using older log and snapshot files, you can look at the previous state of ZooKeeper servers and even restore that state. The LogFormatter class allows an administrator to look at the transactions in a log." * *Is there a tool that does this for the admin? The LogFormatter only displays the transactions in the log file. - Has anyone ever had to play with the snapshot files in production? Thanks in advance. Regards, -Vishal