On restore part, I think having a separate utility to manipulate the data/snap dir (by truncating the log/removing snapshot to a given zxid) would be easier than modifying the server.
-- Thawan Kooburat On 7/8/13 6:34 PM, "kishore g" <[email protected]> wrote: >I think what we are looking at is a point in time restore functionality. >How about adding a feature that says go back to a specific zxid/timestamp. >This way before doing any change to zookeeper simply note down the >timestamp/zxid on leader. If things go wrong after making changes, bring >down zookeepers and provide additional parameter of a zxid/timestamp while >restarting. The server can go the exact point and make it current. The >followers can be started blank. > > > >On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[email protected]> wrote: > >> Just saw that this is the corresponding use case to the question posted >> in dev list. >> >> In order to restore the data to a given point in time correctly, you >>need >> both snapshot and txnlog. This is because zookeeper snapshot is fuzzy >>and >> snapshot alone may not represent a valid state of the server if there >>are >> in-flight requests. >> >> The 4wl command should cause the server to roll the log and take a >> snapshot similar to periodic snapshotting operation. Your backup script >> need grap the snapshot and corresponding txnlog file from the data dir. >> >> To restore, just shutdown all hosts, clear the data dir, copy over the >> snapshot and txnlog, and restart them. >> >> >> -- >> Thawan Kooburat >> >> >> >> >> >> On 7/8/13 3:28 PM, "Sergey Maslyakov" <[email protected]> wrote: >> >> >Thank you for your response, Flavio. I apologize, I did not provide a >> >clear >> >explanation of the use case. >> > >> >This backup/restore is not intended to be tied to any write event, >> >instead, >> >it is expected to run as a periodic (daily?) cron job on one of the >> >servers, which is not guaranteed to be the leader of the ensemble. >>There >> >is >> >no expectation that all recent changes are committed and persisted to >> >disk. >> >The system can sustain the loss of several hours worth of recent >>changes >> >in >> >the event of restore. >> > >> >As for finding the leader dynamically and performing backup on it, this >> >approach could be more difficult as the leader can change time to time >>and >> >I still need to fetch the file to store it in my designated backup >> >location. Taking backup on one server and picking it up from a local >>file >> >system looks less error-prone. Even if I went the fancy route and had >> >Zookeeper send me the serialized DataTree in response to the 4wl, this >> >approach would involve a lot of moving parts. >> > >> >I have already made a PoC for a new 4wl that invokes takeSnapshot() and >> >returns an absolute path to the snapshot it drops on disk. I have >>already >> >protected takeSnapshot() from concurrent invocation, which is likely to >> >corrupt the snapshot file on disk. This approach works but I'm >>thinking to >> >take it one step further by providing the desired path name as an >>argument >> >to my new 4lw and to have Zookeeper server drop the snapshot into the >> >specified file and report success/failure back. This way I can avoid >> >cluttering the data directory and interfering with what Zookeeper finds >> >when it scans the data directory. >> > >> >Approach with having an additional server that would take the >>leadership >> >and populate the ensemble is just a theory. I don't see a clean way of >> >making a quorum member the leader of the quorum. Am I overlooking >> >something >> >simple? >> > >> >In backup and restore of an ensemble the biggest unknown for me remains >> >populating the ensemble with desired data. I can think of two ways: >> > >> >1. Clear out all servers by stopping them, purge version-2 directories, >> >restore a snapshot file on one server that will be brought first, and >>then >> >bring up the rest of the ensemble. This way I somewhat force the first >> >server to be the leader because it has data and it will be the only >>member >> >of a quorum with data, provided to the way I start the ensemble. This >> >looks >> >like a hack, though. >> > >> >2. Clear out the ensemble and reload it with a dedicated client using >>the >> >provided Zookeeper API. >> > >> >With the approach of backing up an actual snapshot file, option #1 >>appears >> >to be more practical. >> > >> >I wish I could start the ensemble with a designate leader that would >> >bootstrap the ensemble with data and then the ensemble would go into >>its >> >normal business... >> > >> > >> > >> >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira >> ><[email protected]>wrote: >> > >> >> One bit that is still a bit confusing to me in your use case is if >>you >> >> need to take a snapshot right after some event in your application. >> >>Even if >> >> you're able to tell ZooKeeper to take a snapshot, there is no >>guarantee >> >> that it will happen at the exact point you want it if update >>operations >> >> keep coming. >> >> >> >> If you use your four-letter word approach, then would you search for >>the >> >> leader or would you simply take a snapshot at any server? If it has >>to >> >>go >> >> through the leader so that you make sure to have the most recent >> >>committed >> >> state, then it might not be a bad idea to have an api call that tells >> >>the >> >> leader to take a snapshot at some directory of your choice. Informing >> >>you >> >> the name of the snapshot file so that you can copy sounds like an >> >>option, >> >> but perhaps it is not as convenient. >> >> >> >> The approach of adding another server is not very clear. How do you >> >>force >> >> it to be the leader? Keep in mind that if it crashes, then it will >>lose >> >> leadership. >> >> >> >> -Flavio >> >> >> >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <[email protected]> >>wrote: >> >> >> >> > It looks like the "dev" mailing list is rather inactive. Over the >>past >> >> few >> >> > days I only saw several automated emails from JIRA and this is >>pretty >> >> much >> >> > it. Contrary to this, the "user" mailing list seems to be more >>alive >> >>and >> >> > more populated. >> >> > >> >> > With this in mind, please allow me to cross-post here the message I >> >>sent >> >> > into the "dev" list a few days ago. >> >> > >> >> > >> >> > Regards, >> >> > /Sergey >> >> > >> >> > === forwarded message begins here === >> >> > >> >> > Hi! >> >> > >> >> > I'm facing the problem that has been raised by multiple people but >> >>none >> >> of >> >> > the discussion threads seem to provide a good answer. I dug in >> >>Zookeeper >> >> > source code trying to come up with some possible approaches and I >> >>would >> >> > like to get your inputs on those. >> >> > >> >> > Initial conditions: >> >> > >> >> > * I have an ensemble of five Zookeeper servers running v3.4.5 code. >> >> > * The size of a committed snapshot file is in vicinity of 1GB. >> >> > * There are about 80 clients connected to the ensemble. >> >> > * Clients a heavily read biased, i.e., they mostly read and rarely >> >> write. I >> >> > would say less than 0.1% of queries modify the data. >> >> > >> >> > Problem statement: >> >> > >> >> > * Under certain conditions, I may need to revert the data stored in >> >>the >> >> > ensemble to an earlier state. For example, one of the clients may >>ruin >> >> the >> >> > application-level data integrity and I need to perform a disaster >> >> recovery. >> >> > >> >> > Things look nice and easy if I'm dealing with a single Zookeeper >> >>server. >> >> A >> >> > file-level copy of the data and dataLog directories should allow >>me to >> >> > recover later by stopping Zookeeper, swapping the corrupted data >>and >> >> > dataLog directories with a backup, and firing Zookeeper back up. >> >> > >> >> > Now, the ensemble deployment and the leader election algorithm in >>the >> >> > quorum make things much more difficult. In order to restore from a >> >>single >> >> > file-level backup, I need to take the whole ensemble down, wipe out >> >>data >> >> > and dataLog directories on all servers, replace these directories >>with >> >> > backed up content on one of the servers, bring this server up >>first, >> >>and >> >> > then bring up the rest of the ensemble. This [somewhat] guarantees >> >>that >> >> the >> >> > populated Zookeeper server becomes a member of a majority and >> >>populates >> >> the >> >> > ensemble. This approach works but it is very involving and, thus, >> >> > error-prone due to a human error. >> >> > >> >> > Based on a study of Zookeeper source code, I am considering the >> >>following >> >> > alternatives. And I seek advice from Zookeeper development >>community >> >>as >> >> to >> >> > which approach looks more promising or if there is a better way. >> >> > >> >> > Approach #1: >> >> > >> >> > Develop a complementary pair of utilities for export and import of >>the >> >> > data. Both utilities will act as Zookeeper clients and use the >> >>existing >> >> > API. The "export" utility will recursively retrieve data and store >>it >> >>in >> >> a >> >> > file. The "import" utility will first purge all data from the >>ensemble >> >> and >> >> > then reload it from the file. >> >> > >> >> > This approach seems to be the simplest and there are similar tools >> >> > developed already. For example, the Guano Project: >> >> > https://github.com/d2fn/guano >> >> > >> >> > I don't like two things about it: >> >> > * Poor performance even on a backup for the data store of my size. >> >> > * Possible data consistency issues due to concurrent access by the >> >>export >> >> > utility as well as other "normal" clients. >> >> > >> >> > Approach #2: >> >> > >> >> > Add another four-letter command that would force rolling up the >> >> > transactions and creating a snapshot. The result of this command >>would >> >> be a >> >> > new snapshot.XXXX file on disk and the name of the file could be >> >>reported >> >> > back to the client as a response to the four-letter command. This >> >>way, I >> >> > would know which snapshot file to grab for future possible restore. >> >>But >> >> > restoring from a snapshot file is almost as involving as the >> >>error-prone >> >> > sequence described in the "Initial conditions" above. >> >> > >> >> > Approach #3: >> >> > >> >> > Come up with a way to temporarily add a new Zookeeper server into a >> >>live >> >> > ensemble, that would overtake (how?) the leader role and push out >>the >> >> > snapshot that it has into all ensemble members upon restore. This >> >> approach >> >> > could be difficult and error-prone to implement because it will >> >>require >> >> > hacking the existing election algorithm to designate a leader. >> >> > >> >> > So, which of the approaches do you think works best for an ensemble >> >>and >> >> for >> >> > the database size of about 1GB? >> >> > >> >> > >> >> > Any advice will be highly appreciated! >> >> > /Sergey >> >> >> >> >> >>
