Kishore, This sounds like a very elaborate tool. I was trying to find a simplistic approach but what Thawan said about "fuzzy snapshots" makes me a little afraid that there is no simple solution.
On Mon, Jul 8, 2013 at 11:05 PM, kishore g <[email protected]> wrote: > Agree, we already have such a tool. In fact we use it to reconstruct the > sequence of events that led to a failure and actually restore the system to > a previous stable point and replay the events. Unfortunately this is tied > closely with Helix but it should be easy to make this a generic tool. > > Sergey is this something that will be useful in your case. > > Thanks, > Kishore G > > > On Mon, Jul 8, 2013 at 8:09 PM, Thawan Kooburat <[email protected]> wrote: > > > On restore part, I think having a separate utility to manipulate the > > data/snap dir (by truncating the log/removing snapshot to a given zxid) > > would be easier than modifying the server. > > > > > > -- > > Thawan Kooburat > > > > > > > > > > > > On 7/8/13 6:34 PM, "kishore g" <[email protected]> wrote: > > > > >I think what we are looking at is a point in time restore > functionality. > > >How about adding a feature that says go back to a specific > zxid/timestamp. > > >This way before doing any change to zookeeper simply note down the > > >timestamp/zxid on leader. If things go wrong after making changes, bring > > >down zookeepers and provide additional parameter of a zxid/timestamp > while > > >restarting. The server can go the exact point and make it current. The > > >followers can be started blank. > > > > > > > > > > > >On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[email protected]> wrote: > > > > > >> Just saw that this is the corresponding use case to the question > posted > > >> in dev list. > > >> > > >> In order to restore the data to a given point in time correctly, you > > >>need > > >> both snapshot and txnlog. This is because zookeeper snapshot is fuzzy > > >>and > > >> snapshot alone may not represent a valid state of the server if there > > >>are > > >> in-flight requests. > > >> > > >> The 4wl command should cause the server to roll the log and take a > > >> snapshot similar to periodic snapshotting operation. Your backup > script > > >> need grap the snapshot and corresponding txnlog file from the data > dir. > > >> > > >> To restore, just shutdown all hosts, clear the data dir, copy over the > > >> snapshot and txnlog, and restart them. > > >> > > >> > > >> -- > > >> Thawan Kooburat > > >> > > >> > > >> > > >> > > >> > > >> On 7/8/13 3:28 PM, "Sergey Maslyakov" <[email protected]> wrote: > > >> > > >> >Thank you for your response, Flavio. I apologize, I did not provide a > > >> >clear > > >> >explanation of the use case. > > >> > > > >> >This backup/restore is not intended to be tied to any write event, > > >> >instead, > > >> >it is expected to run as a periodic (daily?) cron job on one of the > > >> >servers, which is not guaranteed to be the leader of the ensemble. > > >>There > > >> >is > > >> >no expectation that all recent changes are committed and persisted to > > >> >disk. > > >> >The system can sustain the loss of several hours worth of recent > > >>changes > > >> >in > > >> >the event of restore. > > >> > > > >> >As for finding the leader dynamically and performing backup on it, > this > > >> >approach could be more difficult as the leader can change time to > time > > >>and > > >> >I still need to fetch the file to store it in my designated backup > > >> >location. Taking backup on one server and picking it up from a local > > >>file > > >> >system looks less error-prone. Even if I went the fancy route and had > > >> >Zookeeper send me the serialized DataTree in response to the 4wl, > this > > >> >approach would involve a lot of moving parts. > > >> > > > >> >I have already made a PoC for a new 4wl that invokes takeSnapshot() > and > > >> >returns an absolute path to the snapshot it drops on disk. I have > > >>already > > >> >protected takeSnapshot() from concurrent invocation, which is likely > to > > >> >corrupt the snapshot file on disk. This approach works but I'm > > >>thinking to > > >> >take it one step further by providing the desired path name as an > > >>argument > > >> >to my new 4lw and to have Zookeeper server drop the snapshot into the > > >> >specified file and report success/failure back. This way I can avoid > > >> >cluttering the data directory and interfering with what Zookeeper > finds > > >> >when it scans the data directory. > > >> > > > >> >Approach with having an additional server that would take the > > >>leadership > > >> >and populate the ensemble is just a theory. I don't see a clean way > of > > >> >making a quorum member the leader of the quorum. Am I overlooking > > >> >something > > >> >simple? > > >> > > > >> >In backup and restore of an ensemble the biggest unknown for me > remains > > >> >populating the ensemble with desired data. I can think of two ways: > > >> > > > >> >1. Clear out all servers by stopping them, purge version-2 > directories, > > >> >restore a snapshot file on one server that will be brought first, and > > >>then > > >> >bring up the rest of the ensemble. This way I somewhat force the > first > > >> >server to be the leader because it has data and it will be the only > > >>member > > >> >of a quorum with data, provided to the way I start the ensemble. This > > >> >looks > > >> >like a hack, though. > > >> > > > >> >2. Clear out the ensemble and reload it with a dedicated client using > > >>the > > >> >provided Zookeeper API. > > >> > > > >> >With the approach of backing up an actual snapshot file, option #1 > > >>appears > > >> >to be more practical. > > >> > > > >> >I wish I could start the ensemble with a designate leader that would > > >> >bootstrap the ensemble with data and then the ensemble would go into > > >>its > > >> >normal business... > > >> > > > >> > > > >> > > > >> >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira > > >> ><[email protected]>wrote: > > >> > > > >> >> One bit that is still a bit confusing to me in your use case is if > > >>you > > >> >> need to take a snapshot right after some event in your application. > > >> >>Even if > > >> >> you're able to tell ZooKeeper to take a snapshot, there is no > > >>guarantee > > >> >> that it will happen at the exact point you want it if update > > >>operations > > >> >> keep coming. > > >> >> > > >> >> If you use your four-letter word approach, then would you search > for > > >>the > > >> >> leader or would you simply take a snapshot at any server? If it has > > >>to > > >> >>go > > >> >> through the leader so that you make sure to have the most recent > > >> >>committed > > >> >> state, then it might not be a bad idea to have an api call that > tells > > >> >>the > > >> >> leader to take a snapshot at some directory of your choice. > Informing > > >> >>you > > >> >> the name of the snapshot file so that you can copy sounds like an > > >> >>option, > > >> >> but perhaps it is not as convenient. > > >> >> > > >> >> The approach of adding another server is not very clear. How do you > > >> >>force > > >> >> it to be the leader? Keep in mind that if it crashes, then it will > > >>lose > > >> >> leadership. > > >> >> > > >> >> -Flavio > > >> >> > > >> >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <[email protected]> > > >>wrote: > > >> >> > > >> >> > It looks like the "dev" mailing list is rather inactive. Over the > > >>past > > >> >> few > > >> >> > days I only saw several automated emails from JIRA and this is > > >>pretty > > >> >> much > > >> >> > it. Contrary to this, the "user" mailing list seems to be more > > >>alive > > >> >>and > > >> >> > more populated. > > >> >> > > > >> >> > With this in mind, please allow me to cross-post here the > message I > > >> >>sent > > >> >> > into the "dev" list a few days ago. > > >> >> > > > >> >> > > > >> >> > Regards, > > >> >> > /Sergey > > >> >> > > > >> >> > === forwarded message begins here === > > >> >> > > > >> >> > Hi! > > >> >> > > > >> >> > I'm facing the problem that has been raised by multiple people > but > > >> >>none > > >> >> of > > >> >> > the discussion threads seem to provide a good answer. I dug in > > >> >>Zookeeper > > >> >> > source code trying to come up with some possible approaches and I > > >> >>would > > >> >> > like to get your inputs on those. > > >> >> > > > >> >> > Initial conditions: > > >> >> > > > >> >> > * I have an ensemble of five Zookeeper servers running v3.4.5 > code. > > >> >> > * The size of a committed snapshot file is in vicinity of 1GB. > > >> >> > * There are about 80 clients connected to the ensemble. > > >> >> > * Clients a heavily read biased, i.e., they mostly read and > rarely > > >> >> write. I > > >> >> > would say less than 0.1% of queries modify the data. > > >> >> > > > >> >> > Problem statement: > > >> >> > > > >> >> > * Under certain conditions, I may need to revert the data stored > in > > >> >>the > > >> >> > ensemble to an earlier state. For example, one of the clients may > > >>ruin > > >> >> the > > >> >> > application-level data integrity and I need to perform a disaster > > >> >> recovery. > > >> >> > > > >> >> > Things look nice and easy if I'm dealing with a single Zookeeper > > >> >>server. > > >> >> A > > >> >> > file-level copy of the data and dataLog directories should allow > > >>me to > > >> >> > recover later by stopping Zookeeper, swapping the corrupted data > > >>and > > >> >> > dataLog directories with a backup, and firing Zookeeper back up. > > >> >> > > > >> >> > Now, the ensemble deployment and the leader election algorithm in > > >>the > > >> >> > quorum make things much more difficult. In order to restore from > a > > >> >>single > > >> >> > file-level backup, I need to take the whole ensemble down, wipe > out > > >> >>data > > >> >> > and dataLog directories on all servers, replace these directories > > >>with > > >> >> > backed up content on one of the servers, bring this server up > > >>first, > > >> >>and > > >> >> > then bring up the rest of the ensemble. This [somewhat] > guarantees > > >> >>that > > >> >> the > > >> >> > populated Zookeeper server becomes a member of a majority and > > >> >>populates > > >> >> the > > >> >> > ensemble. This approach works but it is very involving and, thus, > > >> >> > error-prone due to a human error. > > >> >> > > > >> >> > Based on a study of Zookeeper source code, I am considering the > > >> >>following > > >> >> > alternatives. And I seek advice from Zookeeper development > > >>community > > >> >>as > > >> >> to > > >> >> > which approach looks more promising or if there is a better way. > > >> >> > > > >> >> > Approach #1: > > >> >> > > > >> >> > Develop a complementary pair of utilities for export and import > of > > >>the > > >> >> > data. Both utilities will act as Zookeeper clients and use the > > >> >>existing > > >> >> > API. The "export" utility will recursively retrieve data and > store > > >>it > > >> >>in > > >> >> a > > >> >> > file. The "import" utility will first purge all data from the > > >>ensemble > > >> >> and > > >> >> > then reload it from the file. > > >> >> > > > >> >> > This approach seems to be the simplest and there are similar > tools > > >> >> > developed already. For example, the Guano Project: > > >> >> > https://github.com/d2fn/guano > > >> >> > > > >> >> > I don't like two things about it: > > >> >> > * Poor performance even on a backup for the data store of my > size. > > >> >> > * Possible data consistency issues due to concurrent access by > the > > >> >>export > > >> >> > utility as well as other "normal" clients. > > >> >> > > > >> >> > Approach #2: > > >> >> > > > >> >> > Add another four-letter command that would force rolling up the > > >> >> > transactions and creating a snapshot. The result of this command > > >>would > > >> >> be a > > >> >> > new snapshot.XXXX file on disk and the name of the file could be > > >> >>reported > > >> >> > back to the client as a response to the four-letter command. This > > >> >>way, I > > >> >> > would know which snapshot file to grab for future possible > restore. > > >> >>But > > >> >> > restoring from a snapshot file is almost as involving as the > > >> >>error-prone > > >> >> > sequence described in the "Initial conditions" above. > > >> >> > > > >> >> > Approach #3: > > >> >> > > > >> >> > Come up with a way to temporarily add a new Zookeeper server > into a > > >> >>live > > >> >> > ensemble, that would overtake (how?) the leader role and push out > > >>the > > >> >> > snapshot that it has into all ensemble members upon restore. This > > >> >> approach > > >> >> > could be difficult and error-prone to implement because it will > > >> >>require > > >> >> > hacking the existing election algorithm to designate a leader. > > >> >> > > > >> >> > So, which of the approaches do you think works best for an > ensemble > > >> >>and > > >> >> for > > >> >> > the database size of about 1GB? > > >> >> > > > >> >> > > > >> >> > Any advice will be highly appreciated! > > >> >> > /Sergey > > >> >> > > >> >> > > >> > > >> > > > > >
