Agree, we already have such a tool. In fact we use it to reconstruct the sequence of events that led to a failure and actually restore the system to a previous stable point and replay the events. Unfortunately this is tied closely with Helix but it should be easy to make this a generic tool.
Sergey is this something that will be useful in your case. Thanks, Kishore G On Mon, Jul 8, 2013 at 8:09 PM, Thawan Kooburat <[email protected]> wrote: > On restore part, I think having a separate utility to manipulate the > data/snap dir (by truncating the log/removing snapshot to a given zxid) > would be easier than modifying the server. > > > -- > Thawan Kooburat > > > > > > On 7/8/13 6:34 PM, "kishore g" <[email protected]> wrote: > > >I think what we are looking at is a point in time restore functionality. > >How about adding a feature that says go back to a specific zxid/timestamp. > >This way before doing any change to zookeeper simply note down the > >timestamp/zxid on leader. If things go wrong after making changes, bring > >down zookeepers and provide additional parameter of a zxid/timestamp while > >restarting. The server can go the exact point and make it current. The > >followers can be started blank. > > > > > > > >On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[email protected]> wrote: > > > >> Just saw that this is the corresponding use case to the question posted > >> in dev list. > >> > >> In order to restore the data to a given point in time correctly, you > >>need > >> both snapshot and txnlog. This is because zookeeper snapshot is fuzzy > >>and > >> snapshot alone may not represent a valid state of the server if there > >>are > >> in-flight requests. > >> > >> The 4wl command should cause the server to roll the log and take a > >> snapshot similar to periodic snapshotting operation. Your backup script > >> need grap the snapshot and corresponding txnlog file from the data dir. > >> > >> To restore, just shutdown all hosts, clear the data dir, copy over the > >> snapshot and txnlog, and restart them. > >> > >> > >> -- > >> Thawan Kooburat > >> > >> > >> > >> > >> > >> On 7/8/13 3:28 PM, "Sergey Maslyakov" <[email protected]> wrote: > >> > >> >Thank you for your response, Flavio. I apologize, I did not provide a > >> >clear > >> >explanation of the use case. > >> > > >> >This backup/restore is not intended to be tied to any write event, > >> >instead, > >> >it is expected to run as a periodic (daily?) cron job on one of the > >> >servers, which is not guaranteed to be the leader of the ensemble. > >>There > >> >is > >> >no expectation that all recent changes are committed and persisted to > >> >disk. > >> >The system can sustain the loss of several hours worth of recent > >>changes > >> >in > >> >the event of restore. > >> > > >> >As for finding the leader dynamically and performing backup on it, this > >> >approach could be more difficult as the leader can change time to time > >>and > >> >I still need to fetch the file to store it in my designated backup > >> >location. Taking backup on one server and picking it up from a local > >>file > >> >system looks less error-prone. Even if I went the fancy route and had > >> >Zookeeper send me the serialized DataTree in response to the 4wl, this > >> >approach would involve a lot of moving parts. > >> > > >> >I have already made a PoC for a new 4wl that invokes takeSnapshot() and > >> >returns an absolute path to the snapshot it drops on disk. I have > >>already > >> >protected takeSnapshot() from concurrent invocation, which is likely to > >> >corrupt the snapshot file on disk. This approach works but I'm > >>thinking to > >> >take it one step further by providing the desired path name as an > >>argument > >> >to my new 4lw and to have Zookeeper server drop the snapshot into the > >> >specified file and report success/failure back. This way I can avoid > >> >cluttering the data directory and interfering with what Zookeeper finds > >> >when it scans the data directory. > >> > > >> >Approach with having an additional server that would take the > >>leadership > >> >and populate the ensemble is just a theory. I don't see a clean way of > >> >making a quorum member the leader of the quorum. Am I overlooking > >> >something > >> >simple? > >> > > >> >In backup and restore of an ensemble the biggest unknown for me remains > >> >populating the ensemble with desired data. I can think of two ways: > >> > > >> >1. Clear out all servers by stopping them, purge version-2 directories, > >> >restore a snapshot file on one server that will be brought first, and > >>then > >> >bring up the rest of the ensemble. This way I somewhat force the first > >> >server to be the leader because it has data and it will be the only > >>member > >> >of a quorum with data, provided to the way I start the ensemble. This > >> >looks > >> >like a hack, though. > >> > > >> >2. Clear out the ensemble and reload it with a dedicated client using > >>the > >> >provided Zookeeper API. > >> > > >> >With the approach of backing up an actual snapshot file, option #1 > >>appears > >> >to be more practical. > >> > > >> >I wish I could start the ensemble with a designate leader that would > >> >bootstrap the ensemble with data and then the ensemble would go into > >>its > >> >normal business... > >> > > >> > > >> > > >> >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira > >> ><[email protected]>wrote: > >> > > >> >> One bit that is still a bit confusing to me in your use case is if > >>you > >> >> need to take a snapshot right after some event in your application. > >> >>Even if > >> >> you're able to tell ZooKeeper to take a snapshot, there is no > >>guarantee > >> >> that it will happen at the exact point you want it if update > >>operations > >> >> keep coming. > >> >> > >> >> If you use your four-letter word approach, then would you search for > >>the > >> >> leader or would you simply take a snapshot at any server? If it has > >>to > >> >>go > >> >> through the leader so that you make sure to have the most recent > >> >>committed > >> >> state, then it might not be a bad idea to have an api call that tells > >> >>the > >> >> leader to take a snapshot at some directory of your choice. Informing > >> >>you > >> >> the name of the snapshot file so that you can copy sounds like an > >> >>option, > >> >> but perhaps it is not as convenient. > >> >> > >> >> The approach of adding another server is not very clear. How do you > >> >>force > >> >> it to be the leader? Keep in mind that if it crashes, then it will > >>lose > >> >> leadership. > >> >> > >> >> -Flavio > >> >> > >> >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <[email protected]> > >>wrote: > >> >> > >> >> > It looks like the "dev" mailing list is rather inactive. Over the > >>past > >> >> few > >> >> > days I only saw several automated emails from JIRA and this is > >>pretty > >> >> much > >> >> > it. Contrary to this, the "user" mailing list seems to be more > >>alive > >> >>and > >> >> > more populated. > >> >> > > >> >> > With this in mind, please allow me to cross-post here the message I > >> >>sent > >> >> > into the "dev" list a few days ago. > >> >> > > >> >> > > >> >> > Regards, > >> >> > /Sergey > >> >> > > >> >> > === forwarded message begins here === > >> >> > > >> >> > Hi! > >> >> > > >> >> > I'm facing the problem that has been raised by multiple people but > >> >>none > >> >> of > >> >> > the discussion threads seem to provide a good answer. I dug in > >> >>Zookeeper > >> >> > source code trying to come up with some possible approaches and I > >> >>would > >> >> > like to get your inputs on those. > >> >> > > >> >> > Initial conditions: > >> >> > > >> >> > * I have an ensemble of five Zookeeper servers running v3.4.5 code. > >> >> > * The size of a committed snapshot file is in vicinity of 1GB. > >> >> > * There are about 80 clients connected to the ensemble. > >> >> > * Clients a heavily read biased, i.e., they mostly read and rarely > >> >> write. I > >> >> > would say less than 0.1% of queries modify the data. > >> >> > > >> >> > Problem statement: > >> >> > > >> >> > * Under certain conditions, I may need to revert the data stored in > >> >>the > >> >> > ensemble to an earlier state. For example, one of the clients may > >>ruin > >> >> the > >> >> > application-level data integrity and I need to perform a disaster > >> >> recovery. > >> >> > > >> >> > Things look nice and easy if I'm dealing with a single Zookeeper > >> >>server. > >> >> A > >> >> > file-level copy of the data and dataLog directories should allow > >>me to > >> >> > recover later by stopping Zookeeper, swapping the corrupted data > >>and > >> >> > dataLog directories with a backup, and firing Zookeeper back up. > >> >> > > >> >> > Now, the ensemble deployment and the leader election algorithm in > >>the > >> >> > quorum make things much more difficult. In order to restore from a > >> >>single > >> >> > file-level backup, I need to take the whole ensemble down, wipe out > >> >>data > >> >> > and dataLog directories on all servers, replace these directories > >>with > >> >> > backed up content on one of the servers, bring this server up > >>first, > >> >>and > >> >> > then bring up the rest of the ensemble. This [somewhat] guarantees > >> >>that > >> >> the > >> >> > populated Zookeeper server becomes a member of a majority and > >> >>populates > >> >> the > >> >> > ensemble. This approach works but it is very involving and, thus, > >> >> > error-prone due to a human error. > >> >> > > >> >> > Based on a study of Zookeeper source code, I am considering the > >> >>following > >> >> > alternatives. And I seek advice from Zookeeper development > >>community > >> >>as > >> >> to > >> >> > which approach looks more promising or if there is a better way. > >> >> > > >> >> > Approach #1: > >> >> > > >> >> > Develop a complementary pair of utilities for export and import of > >>the > >> >> > data. Both utilities will act as Zookeeper clients and use the > >> >>existing > >> >> > API. The "export" utility will recursively retrieve data and store > >>it > >> >>in > >> >> a > >> >> > file. The "import" utility will first purge all data from the > >>ensemble > >> >> and > >> >> > then reload it from the file. > >> >> > > >> >> > This approach seems to be the simplest and there are similar tools > >> >> > developed already. For example, the Guano Project: > >> >> > https://github.com/d2fn/guano > >> >> > > >> >> > I don't like two things about it: > >> >> > * Poor performance even on a backup for the data store of my size. > >> >> > * Possible data consistency issues due to concurrent access by the > >> >>export > >> >> > utility as well as other "normal" clients. > >> >> > > >> >> > Approach #2: > >> >> > > >> >> > Add another four-letter command that would force rolling up the > >> >> > transactions and creating a snapshot. The result of this command > >>would > >> >> be a > >> >> > new snapshot.XXXX file on disk and the name of the file could be > >> >>reported > >> >> > back to the client as a response to the four-letter command. This > >> >>way, I > >> >> > would know which snapshot file to grab for future possible restore. > >> >>But > >> >> > restoring from a snapshot file is almost as involving as the > >> >>error-prone > >> >> > sequence described in the "Initial conditions" above. > >> >> > > >> >> > Approach #3: > >> >> > > >> >> > Come up with a way to temporarily add a new Zookeeper server into a > >> >>live > >> >> > ensemble, that would overtake (how?) the leader role and push out > >>the > >> >> > snapshot that it has into all ensemble members upon restore. This > >> >> approach > >> >> > could be difficult and error-prone to implement because it will > >> >>require > >> >> > hacking the existing election algorithm to designate a leader. > >> >> > > >> >> > So, which of the approaches do you think works best for an ensemble > >> >>and > >> >> for > >> >> > the database size of about 1GB? > >> >> > > >> >> > > >> >> > Any advice will be highly appreciated! > >> >> > /Sergey > >> >> > >> >> > >> > >> > >
