Hi All,

Can you please share your experience regarding ZK snapshot retention and
recovery policies?

We have an application where we never need to rollback (i.e., revert back to
a previous state by using old snapshots). Given this, I am trying to
understand under what circumstances would we ever need to use old ZK
snapshots. I understand a lot of these decisions depend on the application
and amount of redundancy used at every level (e.g,. RAID level where the
snapshots are stored etc) in the product. To simplify the discussion, I
would like to rule out any application characteristics and focus mainly on
data consistency.

- Assuming that we have a 3 node cluster I am trying to figure out when
would I really need to use old snapshot files. With 3 nodes we already have
at least 2 servers with consistent database. If I loose files on one of the
servers, I can use files from the other. In fact, ZK server join will take
care of this. I can remove files from a faulty node and reboot that node.
The faulty node will sync with the leader.

- The old files will be useful if the current snapshot and/or log files are
lost or corrupted on all 3 servers. If  the loss is due to a disaster (case
where we loose all 3 servers), one would have to keep the snapshots on some
external storage to recover. However, if the current snapshot file is
corrupted on all 3 servers, then the most likely cause would be a bug in ZK.
In which case, how can I trust the consistency of the old snapshots?

- Given a set of snapshots and log files, how can I verify the correctness
of these files? Example, if one of the intermediate snapshot file is
corrupt.

- The Admin's guide says "Using older log and snapshot files, you can look
at the previous state of ZooKeeper servers and even restore that state. The
LogFormatter class allows an administrator to look at the transactions in a
log." * *Is there a tool that does this for the admin?  The LogFormatter
only displays the transactions in the log file.

- Has anyone ever had to play with the snapshot files in production?

Thanks in advance.

Regards,
-Vishal

Reply via email to