A word of preemptive self-defense: I am not an experienced Java developer.
Please, don't throw rotten eggs at me if I did not follow well-known Java
coding patterns :)


Regards,
/Sergey


On Fri, Jul 19, 2013 at 2:15 PM, Sergey Maslyakov <[email protected]> wrote:

> I can share this patch based on 3.4.5, which does thee trick.
>
> It adds a "snps" 4lw command that accepts one mandatory argument, which is
> an absolute path for the direcotry where the snapshot file will be dropped.
> The "absoluteness" of the path s verified by UNIX rules. Not sure how it
> would work in Windows, though. The target directory must exist and be
> writeable by the effective UID of Zookeeper server.
>
> If the operation was successful, Zookeeper server responds back with the
> absolute path of the snapshot file. You can watch for the '/' character to
> trigger your reaction to the response.
>
> In my case, a 700MB snapshot takes about 30 seconds to write out.
>
> Please see several examples below:
>
> ~ $ mkdir /tmp/snapshot-test
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps /tmp/snapshot-test
> /tmp/snapshot-test/snapshot.316c8
> Connection to localhost closed by foreign host.
>
> ~ $ ls -al /tmp/snapshot-test/snapshot.316c8
> -rw-r--r--   1 srvr     srvr     719602373 Jul 19 14:09
> /tmp/snapshot-test/snapshot.316c8
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps blah
> Snapshot directory path must be absoulte, i.e., it must start with '/'.
> Path "blah" does not meet the criteria.
> Connection to localhost closed by foreign host.
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps /tmp/blah
> Error while serializing snapshot into /tmp/blah/snapshot.316c8.
> /tmp/blah/snapshot.316c8 (No such file or directory)
> Connection to localhost closed by foreign host.
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps
> Snapshot directory path must be absoulte, i.e., it must start with '/'.
> Path "" does not meet the criteria.
> Connection to localhost closed by foreign host.
>
> ~ $
>
>
>
>
> On Fri, Jul 19, 2013 at 1:42 PM, jack ma <[email protected]> wrote:
>
>> Thanks Sergei.
>>
>> That is great improvement idea for the zookeeper. I think that zookeeper
>> is
>> planning to add a new 4lrt command "snap", but it is not ready yet.
>>
>> My original questions is based on the current version of zookeeper
>> (3.4.5),
>> do you have any answers for them?
>>
>> Appreciate for the help.
>>
>> thanks
>> Jack
>>
>>
>>
>>
>> On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <[email protected]
>> >wrote:
>>
>> > Jack,
>> >
>> > Here is how I see the backup process happening.
>> >
>> > 1. Zookeeper server can be changed to support a new 4lw that will write
>> out
>> > the current state of the DataTree into a snapshot file with the path and
>> > name provided as an argument to this new command (barring all the
>> > permissions, disk space, and other system-level restrictions).
>> Probably, I
>> > would ask Zookeeper to save the snapshot in a directory outside of the
>> > standard "dataLog" for the sake of cleanliness.
>> >
>> > 2. When Zookeeper server responds to the new "snapshot" command with
>> > success indication, the requesting process knows that the file has been
>> > written out and it can go and process it. It can add some metadata and
>> > create an archive to store it somewhere, for example. Alternatively,
>> > Zookeeper server could stream the data it would have written into a
>> > snapshot as the response to the new "snapshot" command. This way, the
>> > client becomes responsible for persistence and this lifts a number of
>> > permission-related issues (but raises some other issues too). Oh, and by
>> > the way, it looks like snapshot files are rather compressible. I did see
>> > the factor of 20 and more on the data that I have.
>> >
>> > 3. Disk cleanups are performed.
>> >
>> > With this backup procedure the restore would turn into:
>> >
>> > 1. Stopping all ensemble mebers
>> >
>> > 2. Wiping out dataDir/version-2 and dataLogDir/version-2
>> >
>> > 3. Restoring the snapshot taken by the above backup procedure on one of
>> the
>> > servers into dataDir/version-2
>> >
>> > 4. Bringing this server online
>> >
>> > 5. Allowing some time for it to load the snapshot. You could send "isro"
>> > 4lw command to it to see when it stops responding with "null". When the
>> > response becomes "ro" or "rw", this is when it is ready to populate
>> others
>> > with its own data
>> >
>> > 6. Bring up other servers one-by-one, to allow them form a quorum with
>> the
>> > populated server
>> >
>> >
>> > Hope, this helps! I'd be glad to hear from people who know the
>> internals of
>> > Zookeeper server better whether this approach is flawed or robust.
>> >
>> >
>> > Regards,
>> > /Sergey
>> >
>> >
>> > On Fri, Jul 19, 2013 at 1:00 PM, jack ma <[email protected]> wrote:
>> >
>> > > I asked those question in the thread
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=eg2xs557+j+698j...@mail.gmail.com%3e
>> > > ,
>> > > but there is no response for that.
>> > >
>> > > So I posted those questions again here, hopefully I could get helps
>> > > from the community.
>> > >
>> > > I want to make sure I am fully understanding the procedures of
>> zookeeper
>> > > backup and disaster recovery:
>> > >
>> > > For the backup procedures at zookeeper assemble:
>> > > (1) Login to any host which state is "Serving"
>> > >            Question:
>> > >                   Do I have to login to leader node, or any node is
>> ok?
>> > > (2) Copy latest snapshot file and transaction log from version-2
>> > directory.
>> > >            Question:
>> > >                   How to make sure we do not copy corrupt files if the
>> > > snapshot/transaction log is in the middle of update? Do we have to
>> > shutdown
>> > > the node to make the copy?
>> > >                   besides the transaction log and snapshot, do we
>> have to
>> > > copy other files such as the ecoch files
>> > >
>> > > For the disaster recovery procedures at zookeeper assemble:
>> > > (1) recreate the machines for the zookeeper ensemble
>> > > (2) copy snapshot/transaction log we backed up into the zookeeper
>> > > dataDir\version-2 and logDir\version2.
>> > >            Question:
>> > >                  Do we have to copy the epoch files?
>> > >                  Do we have to copy snapshot/transaction log backed
>> up to
>> > > all the zookeeper node, or just the first node we starts?
>> > >
>> > > Appreciate your time and help.
>> > > Jack
>> > >
>> >
>>
>
>

Reply via email to