Re: Efficient backup and a reasonable restore of an ensemble

Thawan Kooburat Mon, 08 Jul 2013 20:10:39 -0700

On restore part, I think having a separate utility to manipulate the
data/snap dir (by truncating the log/removing snapshot to a given zxid)
would be easier than modifying the server.



-- 
Thawan Kooburat





On 7/8/13 6:34 PM, "kishore g" <[email protected]> wrote:

>I think what we are looking at is a  point in time restore functionality.
>How about adding a feature that says go back to a specific zxid/timestamp.
>This way before doing any change to zookeeper simply note down the
>timestamp/zxid on leader. If things go wrong after making changes, bring
>down zookeepers and provide additional parameter of a zxid/timestamp while
>restarting. The server can go the exact point and make it current. The
>followers can be started blank.
>
>
>
>On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[email protected]> wrote:
>
>> Just saw that  this is the corresponding use case to the question posted
>> in dev list.
>>
>> In order to restore the data to a given point in time correctly, you
>>need
>> both snapshot and txnlog. This is because zookeeper snapshot is fuzzy
>>and
>> snapshot alone may not represent a valid state of the server if there
>>are
>> in-flight requests.
>>
>> The 4wl command should cause the server to roll the log and take a
>> snapshot similar to periodic snapshotting operation. Your backup script
>> need grap the snapshot and corresponding txnlog file from the data dir.
>>
>> To restore, just shutdown all hosts, clear the data dir, copy over the
>> snapshot and txnlog, and restart them.
>>
>>
>> --
>> Thawan Kooburat
>>
>>
>>
>>
>>
>> On 7/8/13 3:28 PM, "Sergey Maslyakov" <[email protected]> wrote:
>>
>> >Thank you for your response, Flavio. I apologize, I did not provide a
>> >clear
>> >explanation of the use case.
>> >
>> >This backup/restore is not intended to be tied to any write event,
>> >instead,
>> >it is expected to run as a periodic (daily?) cron job on one of the
>> >servers, which is not guaranteed to be the leader of the ensemble.
>>There
>> >is
>> >no expectation that all recent changes are committed and persisted to
>> >disk.
>> >The system can sustain the loss of several hours worth of recent
>>changes
>> >in
>> >the event of restore.
>> >
>> >As for finding the leader dynamically and performing backup on it, this
>> >approach could be more difficult as the leader can change time to time
>>and
>> >I still need to fetch the file to store it in my designated backup
>> >location. Taking backup on one server and picking it up from a local
>>file
>> >system looks less error-prone. Even if I went the fancy route and had
>> >Zookeeper send me the serialized DataTree in response to the 4wl, this
>> >approach would involve a lot of moving parts.
>> >
>> >I have already made a PoC for a new 4wl that invokes takeSnapshot() and
>> >returns an absolute path to the snapshot it drops on disk. I have
>>already
>> >protected takeSnapshot() from concurrent invocation, which is likely to
>> >corrupt the snapshot file on disk. This approach works but I'm
>>thinking to
>> >take it one step further by providing the desired path name as an
>>argument
>> >to my new 4lw and to have Zookeeper server drop the snapshot into the
>> >specified file and report success/failure back. This way I can avoid
>> >cluttering the data directory and interfering with what Zookeeper finds
>> >when it scans the data directory.
>> >
>> >Approach with having an additional server that would take the
>>leadership
>> >and populate the ensemble is just a theory. I don't see a clean way of
>> >making a quorum member the leader of the quorum. Am I overlooking
>> >something
>> >simple?
>> >
>> >In backup and restore of an ensemble the biggest unknown for me remains
>> >populating the ensemble with desired data. I can think of two ways:
>> >
>> >1. Clear out all servers by stopping them, purge version-2 directories,
>> >restore a snapshot file on one server that will be brought first, and
>>then
>> >bring up the rest of the ensemble. This way I somewhat force the first
>> >server to be the leader because it has data and it will be the only
>>member
>> >of a quorum with data, provided to the way I start the ensemble. This
>> >looks
>> >like a hack, though.
>> >
>> >2. Clear out the ensemble and reload it with a dedicated client using
>>the
>> >provided Zookeeper API.
>> >
>> >With the approach of backing up an actual snapshot file, option #1
>>appears
>> >to be more practical.
>> >
>> >I wish I could start the ensemble with a designate leader that would
>> >bootstrap the ensemble with data and then the ensemble would go into
>>its
>> >normal business...
>> >
>> >
>> >
>> >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira
>> ><[email protected]>wrote:
>> >
>> >> One bit that is still a bit confusing to me in your use case is if
>>you
>> >> need to take a snapshot right after some event in your application.
>> >>Even if
>> >> you're able to tell ZooKeeper to take a snapshot, there is no
>>guarantee
>> >> that it will happen at the exact point you want it if update
>>operations
>> >> keep coming.
>> >>
>> >> If you use your four-letter word approach, then would you search for
>>the
>> >> leader or would you simply take a snapshot at any server? If it has
>>to
>> >>go
>> >> through the leader so that you make sure to have the most recent
>> >>committed
>> >> state, then it might not be a bad idea to have an api call that tells
>> >>the
>> >> leader to take a snapshot at some directory of your choice. Informing
>> >>you
>> >> the name of the snapshot file so that you can copy sounds like an
>> >>option,
>> >> but perhaps it is not as convenient.
>> >>
>> >> The approach of adding another server is not very clear. How do you
>> >>force
>> >> it to be the leader? Keep in mind that if it crashes, then it will
>>lose
>> >> leadership.
>> >>
>> >> -Flavio
>> >>
>> >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <[email protected]>
>>wrote:
>> >>
>> >> > It looks like the "dev" mailing list is rather inactive. Over the
>>past
>> >> few
>> >> > days I only saw several automated emails from JIRA and this is
>>pretty
>> >> much
>> >> > it. Contrary to this, the "user" mailing list seems to be more
>>alive
>> >>and
>> >> > more populated.
>> >> >
>> >> > With this in mind, please allow me to cross-post here the message I
>> >>sent
>> >> > into the "dev" list a few days ago.
>> >> >
>> >> >
>> >> > Regards,
>> >> > /Sergey
>> >> >
>> >> > === forwarded message begins here ===
>> >> >
>> >> > Hi!
>> >> >
>> >> > I'm facing the problem that has been raised by multiple people but
>> >>none
>> >> of
>> >> > the discussion threads seem to provide a good answer. I dug in
>> >>Zookeeper
>> >> > source code trying to come up with some possible approaches and I
>> >>would
>> >> > like to get your inputs on those.
>> >> >
>> >> > Initial conditions:
>> >> >
>> >> > * I have an ensemble of five Zookeeper servers running v3.4.5 code.
>> >> > * The size of a committed snapshot file is in vicinity of 1GB.
>> >> > * There are about 80 clients connected to the ensemble.
>> >> > * Clients a heavily read biased, i.e., they mostly read and rarely
>> >> write. I
>> >> > would say less than 0.1% of queries modify the data.
>> >> >
>> >> > Problem statement:
>> >> >
>> >> > * Under certain conditions, I may need to revert the data stored in
>> >>the
>> >> > ensemble to an earlier state. For example, one of the clients may
>>ruin
>> >> the
>> >> > application-level data integrity and I need to perform a disaster
>> >> recovery.
>> >> >
>> >> > Things look nice and easy if I'm dealing with a single Zookeeper
>> >>server.
>> >> A
>> >> > file-level copy of the data and dataLog directories should allow
>>me to
>> >> > recover later by stopping Zookeeper, swapping the corrupted data
>>and
>> >> > dataLog directories with a backup, and firing Zookeeper back up.
>> >> >
>> >> > Now, the ensemble deployment and the leader election algorithm in
>>the
>> >> > quorum make things much more difficult. In order to restore from a
>> >>single
>> >> > file-level backup, I need to take the whole ensemble down, wipe out
>> >>data
>> >> > and dataLog directories on all servers, replace these directories
>>with
>> >> > backed up content on one of the servers, bring this server up
>>first,
>> >>and
>> >> > then bring up the rest of the ensemble. This [somewhat] guarantees
>> >>that
>> >> the
>> >> > populated Zookeeper server becomes a member of a majority and
>> >>populates
>> >> the
>> >> > ensemble. This approach works but it is very involving and, thus,
>> >> > error-prone due to a human error.
>> >> >
>> >> > Based on a study of Zookeeper source code, I am considering the
>> >>following
>> >> > alternatives. And I seek advice from Zookeeper development
>>community
>> >>as
>> >> to
>> >> > which approach looks more promising or if there is a better way.
>> >> >
>> >> > Approach #1:
>> >> >
>> >> > Develop a complementary pair of utilities for export and import of
>>the
>> >> > data. Both utilities will act as Zookeeper clients and use the
>> >>existing
>> >> > API. The "export" utility will recursively retrieve data and store
>>it
>> >>in
>> >> a
>> >> > file. The "import" utility will first purge all data from the
>>ensemble
>> >> and
>> >> > then reload it from the file.
>> >> >
>> >> > This approach seems to be the simplest and there are similar tools
>> >> > developed already. For example, the Guano Project:
>> >> > https://github.com/d2fn/guano
>> >> >
>> >> > I don't like two things about it:
>> >> > * Poor performance even on a backup for the data store of my size.
>> >> > * Possible data consistency issues due to concurrent access by the
>> >>export
>> >> > utility as well as other "normal" clients.
>> >> >
>> >> > Approach #2:
>> >> >
>> >> > Add another four-letter command that would force rolling up the
>> >> > transactions and creating a snapshot. The result of this command
>>would
>> >> be a
>> >> > new snapshot.XXXX file on disk and the name of the file could be
>> >>reported
>> >> > back to the client as a response to the four-letter command. This
>> >>way, I
>> >> > would know which snapshot file to grab for future possible restore.
>> >>But
>> >> > restoring from a snapshot file is almost as involving as the
>> >>error-prone
>> >> > sequence described in the "Initial conditions" above.
>> >> >
>> >> > Approach #3:
>> >> >
>> >> > Come up with a way to temporarily add a new Zookeeper server into a
>> >>live
>> >> > ensemble, that would overtake (how?) the leader role and push out
>>the
>> >> > snapshot that it has into all ensemble members upon restore. This
>> >> approach
>> >> > could be difficult and error-prone to implement because it will
>> >>require
>> >> > hacking the existing election algorithm to designate a leader.
>> >> >
>> >> > So, which of the approaches do you think works best for an ensemble
>> >>and
>> >> for
>> >> > the database size of about 1GB?
>> >> >
>> >> >
>> >> > Any advice will be highly appreciated!
>> >> > /Sergey
>> >>
>> >>
>>
>>

Re: Efficient backup and a reasonable restore of an ensemble

Reply via email to