Due to issues in my fingers and brain. On Wed, Sep 8, 2010 at 1:20 PM, Vishal K <vishalm...@gmail.com> wrote:
> Thanks Ted. Did you have to unwind the cluster due to data consistency > issues or due to issues at the application? > > On Wed, Sep 8, 2010 at 4:06 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > > I have used old snapshot files exactly once when I deleted a bunch of > > server > > state trying to unwind a tangled > > cluster. > > > > I keep a few around just for backup purposes. > > > > On Wed, Sep 8, 2010 at 12:01 PM, Vishal K <vishalm...@gmail.com> wrote: > > > > > Hi All, > > > > > > Can you please share your experience regarding ZK snapshot retention > and > > > recovery policies? > > > > > > We have an application where we never need to rollback (i.e., revert > back > > > to > > > a previous state by using old snapshots). Given this, I am trying to > > > understand under what circumstances would we ever need to use old ZK > > > snapshots. I understand a lot of these decisions depend on the > > application > > > and amount of redundancy used at every level (e.g,. RAID level where > the > > > snapshots are stored etc) in the product. To simplify the discussion, I > > > would like to rule out any application characteristics and focus mainly > > on > > > data consistency. > > > > > > - Assuming that we have a 3 node cluster I am trying to figure out when > > > would I really need to use old snapshot files. With 3 nodes we already > > have > > > at least 2 servers with consistent database. If I loose files on one of > > the > > > servers, I can use files from the other. In fact, ZK server join will > > take > > > care of this. I can remove files from a faulty node and reboot that > node. > > > The faulty node will sync with the leader. > > > > > > - The old files will be useful if the current snapshot and/or log files > > are > > > lost or corrupted on all 3 servers. If the loss is due to a disaster > > (case > > > where we loose all 3 servers), one would have to keep the snapshots on > > some > > > external storage to recover. However, if the current snapshot file is > > > corrupted on all 3 servers, then the most likely cause would be a bug > in > > > ZK. > > > In which case, how can I trust the consistency of the old snapshots? > > > > > > - Given a set of snapshots and log files, how can I verify the > > correctness > > > of these files? Example, if one of the intermediate snapshot file is > > > corrupt. > > > > > > - The Admin's guide says "Using older log and snapshot files, you can > > look > > > at the previous state of ZooKeeper servers and even restore that state. > > The > > > LogFormatter class allows an administrator to look at the transactions > in > > a > > > log." * *Is there a tool that does this for the admin? The > LogFormatter > > > only displays the transactions in the log file. > > > > > > - Has anyone ever had to play with the snapshot files in production? > > > > > > Thanks in advance. > > > > > > Regards, > > > -Vishal > > > > > >