Sounds like a long transaction (or undo) log, which may impact the performance.
On Mon, Jul 8, 2013 at 8:34 PM, kishore g <[email protected]> wrote: > I think what we are looking at is a point in time restore functionality. > How about adding a feature that says go back to a specific zxid/timestamp. > This way before doing any change to zookeeper simply note down the > timestamp/zxid on leader. If things go wrong after making changes, bring > down zookeepers and provide additional parameter of a zxid/timestamp while > restarting. The server can go the exact point and make it current. The > followers can be started blank. > > > > On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[email protected]> wrote: > > > Just saw that this is the corresponding use case to the question posted > > in dev list. > > > > In order to restore the data to a given point in time correctly, you need > > both snapshot and txnlog. This is because zookeeper snapshot is fuzzy and > > snapshot alone may not represent a valid state of the server if there are > > in-flight requests. > > > > The 4wl command should cause the server to roll the log and take a > > snapshot similar to periodic snapshotting operation. Your backup script > > need grap the snapshot and corresponding txnlog file from the data dir. > > > > To restore, just shutdown all hosts, clear the data dir, copy over the > > snapshot and txnlog, and restart them. > > > > > > -- > > Thawan Kooburat > > > > > > > > > > > > On 7/8/13 3:28 PM, "Sergey Maslyakov" <[email protected]> wrote: > > > > >Thank you for your response, Flavio. I apologize, I did not provide a > > >clear > > >explanation of the use case. > > > > > >This backup/restore is not intended to be tied to any write event, > > >instead, > > >it is expected to run as a periodic (daily?) cron job on one of the > > >servers, which is not guaranteed to be the leader of the ensemble. There > > >is > > >no expectation that all recent changes are committed and persisted to > > >disk. > > >The system can sustain the loss of several hours worth of recent changes > > >in > > >the event of restore. > > > > > >As for finding the leader dynamically and performing backup on it, this > > >approach could be more difficult as the leader can change time to time > and > > >I still need to fetch the file to store it in my designated backup > > >location. Taking backup on one server and picking it up from a local > file > > >system looks less error-prone. Even if I went the fancy route and had > > >Zookeeper send me the serialized DataTree in response to the 4wl, this > > >approach would involve a lot of moving parts. > > > > > >I have already made a PoC for a new 4wl that invokes takeSnapshot() and > > >returns an absolute path to the snapshot it drops on disk. I have > already > > >protected takeSnapshot() from concurrent invocation, which is likely to > > >corrupt the snapshot file on disk. This approach works but I'm thinking > to > > >take it one step further by providing the desired path name as an > argument > > >to my new 4lw and to have Zookeeper server drop the snapshot into the > > >specified file and report success/failure back. This way I can avoid > > >cluttering the data directory and interfering with what Zookeeper finds > > >when it scans the data directory. > > > > > >Approach with having an additional server that would take the leadership > > >and populate the ensemble is just a theory. I don't see a clean way of > > >making a quorum member the leader of the quorum. Am I overlooking > > >something > > >simple? > > > > > >In backup and restore of an ensemble the biggest unknown for me remains > > >populating the ensemble with desired data. I can think of two ways: > > > > > >1. Clear out all servers by stopping them, purge version-2 directories, > > >restore a snapshot file on one server that will be brought first, and > then > > >bring up the rest of the ensemble. This way I somewhat force the first > > >server to be the leader because it has data and it will be the only > member > > >of a quorum with data, provided to the way I start the ensemble. This > > >looks > > >like a hack, though. > > > > > >2. Clear out the ensemble and reload it with a dedicated client using > the > > >provided Zookeeper API. > > > > > >With the approach of backing up an actual snapshot file, option #1 > appears > > >to be more practical. > > > > > >I wish I could start the ensemble with a designate leader that would > > >bootstrap the ensemble with data and then the ensemble would go into its > > >normal business... > > > > > > > > > > > >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira > > ><[email protected]>wrote: > > > > > >> One bit that is still a bit confusing to me in your use case is if you > > >> need to take a snapshot right after some event in your application. > > >>Even if > > >> you're able to tell ZooKeeper to take a snapshot, there is no > guarantee > > >> that it will happen at the exact point you want it if update > operations > > >> keep coming. > > >> > > >> If you use your four-letter word approach, then would you search for > the > > >> leader or would you simply take a snapshot at any server? If it has to > > >>go > > >> through the leader so that you make sure to have the most recent > > >>committed > > >> state, then it might not be a bad idea to have an api call that tells > > >>the > > >> leader to take a snapshot at some directory of your choice. Informing > > >>you > > >> the name of the snapshot file so that you can copy sounds like an > > >>option, > > >> but perhaps it is not as convenient. > > >> > > >> The approach of adding another server is not very clear. How do you > > >>force > > >> it to be the leader? Keep in mind that if it crashes, then it will > lose > > >> leadership. > > >> > > >> -Flavio > > >> > > >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <[email protected]> > wrote: > > >> > > >> > It looks like the "dev" mailing list is rather inactive. Over the > past > > >> few > > >> > days I only saw several automated emails from JIRA and this is > pretty > > >> much > > >> > it. Contrary to this, the "user" mailing list seems to be more alive > > >>and > > >> > more populated. > > >> > > > >> > With this in mind, please allow me to cross-post here the message I > > >>sent > > >> > into the "dev" list a few days ago. > > >> > > > >> > > > >> > Regards, > > >> > /Sergey > > >> > > > >> > === forwarded message begins here === > > >> > > > >> > Hi! > > >> > > > >> > I'm facing the problem that has been raised by multiple people but > > >>none > > >> of > > >> > the discussion threads seem to provide a good answer. I dug in > > >>Zookeeper > > >> > source code trying to come up with some possible approaches and I > > >>would > > >> > like to get your inputs on those. > > >> > > > >> > Initial conditions: > > >> > > > >> > * I have an ensemble of five Zookeeper servers running v3.4.5 code. > > >> > * The size of a committed snapshot file is in vicinity of 1GB. > > >> > * There are about 80 clients connected to the ensemble. > > >> > * Clients a heavily read biased, i.e., they mostly read and rarely > > >> write. I > > >> > would say less than 0.1% of queries modify the data. > > >> > > > >> > Problem statement: > > >> > > > >> > * Under certain conditions, I may need to revert the data stored in > > >>the > > >> > ensemble to an earlier state. For example, one of the clients may > ruin > > >> the > > >> > application-level data integrity and I need to perform a disaster > > >> recovery. > > >> > > > >> > Things look nice and easy if I'm dealing with a single Zookeeper > > >>server. > > >> A > > >> > file-level copy of the data and dataLog directories should allow me > to > > >> > recover later by stopping Zookeeper, swapping the corrupted data and > > >> > dataLog directories with a backup, and firing Zookeeper back up. > > >> > > > >> > Now, the ensemble deployment and the leader election algorithm in > the > > >> > quorum make things much more difficult. In order to restore from a > > >>single > > >> > file-level backup, I need to take the whole ensemble down, wipe out > > >>data > > >> > and dataLog directories on all servers, replace these directories > with > > >> > backed up content on one of the servers, bring this server up first, > > >>and > > >> > then bring up the rest of the ensemble. This [somewhat] guarantees > > >>that > > >> the > > >> > populated Zookeeper server becomes a member of a majority and > > >>populates > > >> the > > >> > ensemble. This approach works but it is very involving and, thus, > > >> > error-prone due to a human error. > > >> > > > >> > Based on a study of Zookeeper source code, I am considering the > > >>following > > >> > alternatives. And I seek advice from Zookeeper development community > > >>as > > >> to > > >> > which approach looks more promising or if there is a better way. > > >> > > > >> > Approach #1: > > >> > > > >> > Develop a complementary pair of utilities for export and import of > the > > >> > data. Both utilities will act as Zookeeper clients and use the > > >>existing > > >> > API. The "export" utility will recursively retrieve data and store > it > > >>in > > >> a > > >> > file. The "import" utility will first purge all data from the > ensemble > > >> and > > >> > then reload it from the file. > > >> > > > >> > This approach seems to be the simplest and there are similar tools > > >> > developed already. For example, the Guano Project: > > >> > https://github.com/d2fn/guano > > >> > > > >> > I don't like two things about it: > > >> > * Poor performance even on a backup for the data store of my size. > > >> > * Possible data consistency issues due to concurrent access by the > > >>export > > >> > utility as well as other "normal" clients. > > >> > > > >> > Approach #2: > > >> > > > >> > Add another four-letter command that would force rolling up the > > >> > transactions and creating a snapshot. The result of this command > would > > >> be a > > >> > new snapshot.XXXX file on disk and the name of the file could be > > >>reported > > >> > back to the client as a response to the four-letter command. This > > >>way, I > > >> > would know which snapshot file to grab for future possible restore. > > >>But > > >> > restoring from a snapshot file is almost as involving as the > > >>error-prone > > >> > sequence described in the "Initial conditions" above. > > >> > > > >> > Approach #3: > > >> > > > >> > Come up with a way to temporarily add a new Zookeeper server into a > > >>live > > >> > ensemble, that would overtake (how?) the leader role and push out > the > > >> > snapshot that it has into all ensemble members upon restore. This > > >> approach > > >> > could be difficult and error-prone to implement because it will > > >>require > > >> > hacking the existing election algorithm to designate a leader. > > >> > > > >> > So, which of the approaches do you think works best for an ensemble > > >>and > > >> for > > >> > the database size of about 1GB? > > >> > > > >> > > > >> > Any advice will be highly appreciated! > > >> > /Sergey > > >> > > >> > > > > >
