A word of preemptive self-defense: I am not an experienced Java developer. Please, don't throw rotten eggs at me if I did not follow well-known Java coding patterns :)
Regards, /Sergey On Fri, Jul 19, 2013 at 2:15 PM, Sergey Maslyakov <[email protected]> wrote: > I can share this patch based on 3.4.5, which does thee trick. > > It adds a "snps" 4lw command that accepts one mandatory argument, which is > an absolute path for the direcotry where the snapshot file will be dropped. > The "absoluteness" of the path s verified by UNIX rules. Not sure how it > would work in Windows, though. The target directory must exist and be > writeable by the effective UID of Zookeeper server. > > If the operation was successful, Zookeeper server responds back with the > absolute path of the snapshot file. You can watch for the '/' character to > trigger your reaction to the response. > > In my case, a 700MB snapshot takes about 30 seconds to write out. > > Please see several examples below: > > ~ $ mkdir /tmp/snapshot-test > > ~ $ telnet localhost 12181 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > snps /tmp/snapshot-test > /tmp/snapshot-test/snapshot.316c8 > Connection to localhost closed by foreign host. > > ~ $ ls -al /tmp/snapshot-test/snapshot.316c8 > -rw-r--r-- 1 srvr srvr 719602373 Jul 19 14:09 > /tmp/snapshot-test/snapshot.316c8 > > ~ $ telnet localhost 12181 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > snps blah > Snapshot directory path must be absoulte, i.e., it must start with '/'. > Path "blah" does not meet the criteria. > Connection to localhost closed by foreign host. > > ~ $ telnet localhost 12181 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > snps /tmp/blah > Error while serializing snapshot into /tmp/blah/snapshot.316c8. > /tmp/blah/snapshot.316c8 (No such file or directory) > Connection to localhost closed by foreign host. > > ~ $ telnet localhost 12181 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > snps > Snapshot directory path must be absoulte, i.e., it must start with '/'. > Path "" does not meet the criteria. > Connection to localhost closed by foreign host. > > ~ $ > > > > > On Fri, Jul 19, 2013 at 1:42 PM, jack ma <[email protected]> wrote: > >> Thanks Sergei. >> >> That is great improvement idea for the zookeeper. I think that zookeeper >> is >> planning to add a new 4lrt command "snap", but it is not ready yet. >> >> My original questions is based on the current version of zookeeper >> (3.4.5), >> do you have any answers for them? >> >> Appreciate for the help. >> >> thanks >> Jack >> >> >> >> >> On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <[email protected] >> >wrote: >> >> > Jack, >> > >> > Here is how I see the backup process happening. >> > >> > 1. Zookeeper server can be changed to support a new 4lw that will write >> out >> > the current state of the DataTree into a snapshot file with the path and >> > name provided as an argument to this new command (barring all the >> > permissions, disk space, and other system-level restrictions). >> Probably, I >> > would ask Zookeeper to save the snapshot in a directory outside of the >> > standard "dataLog" for the sake of cleanliness. >> > >> > 2. When Zookeeper server responds to the new "snapshot" command with >> > success indication, the requesting process knows that the file has been >> > written out and it can go and process it. It can add some metadata and >> > create an archive to store it somewhere, for example. Alternatively, >> > Zookeeper server could stream the data it would have written into a >> > snapshot as the response to the new "snapshot" command. This way, the >> > client becomes responsible for persistence and this lifts a number of >> > permission-related issues (but raises some other issues too). Oh, and by >> > the way, it looks like snapshot files are rather compressible. I did see >> > the factor of 20 and more on the data that I have. >> > >> > 3. Disk cleanups are performed. >> > >> > With this backup procedure the restore would turn into: >> > >> > 1. Stopping all ensemble mebers >> > >> > 2. Wiping out dataDir/version-2 and dataLogDir/version-2 >> > >> > 3. Restoring the snapshot taken by the above backup procedure on one of >> the >> > servers into dataDir/version-2 >> > >> > 4. Bringing this server online >> > >> > 5. Allowing some time for it to load the snapshot. You could send "isro" >> > 4lw command to it to see when it stops responding with "null". When the >> > response becomes "ro" or "rw", this is when it is ready to populate >> others >> > with its own data >> > >> > 6. Bring up other servers one-by-one, to allow them form a quorum with >> the >> > populated server >> > >> > >> > Hope, this helps! I'd be glad to hear from people who know the >> internals of >> > Zookeeper server better whether this approach is flawed or robust. >> > >> > >> > Regards, >> > /Sergey >> > >> > >> > On Fri, Jul 19, 2013 at 1:00 PM, jack ma <[email protected]> wrote: >> > >> > > I asked those question in the thread >> > > >> > > >> > >> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=eg2xs557+j+698j...@mail.gmail.com%3e >> > > , >> > > but there is no response for that. >> > > >> > > So I posted those questions again here, hopefully I could get helps >> > > from the community. >> > > >> > > I want to make sure I am fully understanding the procedures of >> zookeeper >> > > backup and disaster recovery: >> > > >> > > For the backup procedures at zookeeper assemble: >> > > (1) Login to any host which state is "Serving" >> > > Question: >> > > Do I have to login to leader node, or any node is >> ok? >> > > (2) Copy latest snapshot file and transaction log from version-2 >> > directory. >> > > Question: >> > > How to make sure we do not copy corrupt files if the >> > > snapshot/transaction log is in the middle of update? Do we have to >> > shutdown >> > > the node to make the copy? >> > > besides the transaction log and snapshot, do we >> have to >> > > copy other files such as the ecoch files >> > > >> > > For the disaster recovery procedures at zookeeper assemble: >> > > (1) recreate the machines for the zookeeper ensemble >> > > (2) copy snapshot/transaction log we backed up into the zookeeper >> > > dataDir\version-2 and logDir\version2. >> > > Question: >> > > Do we have to copy the epoch files? >> > > Do we have to copy snapshot/transaction log backed >> up to >> > > all the zookeeper node, or just the first node we starts? >> > > >> > > Appreciate your time and help. >> > > Jack >> > > >> > >> > >
