Your description sounds right to me. In the very worst case, you can start a server with a snapshot file. You'll lose the latest txns, but will be able to restore some state.
You may also want to have a look at Exhibitor: https://github.com/Netflix/exhibitor I'm not sure if it is being maintained, but used to be a good tool to manage ZK. -Flavio > On 15 Jun 2016, at 18:59, Gokul <[email protected]> wrote: > > Hi, > > I'm working on taking periodic snapshots of zookeeper data dir for > backup(point-in-time) and recovery. I am using Zookeeper 3.4.5 version. > > I'm just wondering whether there will be a race condition between my backup > process(reading data - trans log & snapshots) and the ZK process(writing to > trans log) resulting in backing up a corrupted(non-usable for recovery) > transaction log. > > 1. ZK server appends(FileTxnLog.append) each transaction to transaction > log using BufferedOutputStream, so transactions may be committed(buffer > full) to disk at this stage > 2. Also ZK server forcefully flushes(SyncRequestProcessor) transaction > to disk(transaction log) if there are no incoming messages or in batches of > 1000 if there are incoming messages pouring in continuously > 3. In step 1, each transaction is written disk in these 4 sequential > steps wherein each step execution separately may result in the data being > written to disk > 1. Write checksum (writeLong) > 2. Write transaction length in bytes (writeInt) > 3. Write transaction(header & data) in bytes (write bytes) > 4. Write EOF marker (write byte) > 4. Each transaction is read in the following way in recovery flow > 1. Read checksum (readLong) - throws IOException > 2. Read transaction length (readInt) - throws IOException > 3. Read transaction bytes (read bytes) - throws IOException > 4. Read EOF marker (read byte), if not found ignore the transaction > and stop reading > > Let's say the backup process took backup when any of 3.1 to 3.4 are > complete or 3.3 is in progress(half written). Looking at the read flow, I > think even then I would be able to restore all the transactions except the > last partial transaction which was incomplete. So Is there any other > possibility of not able to read transaction log? Like when the backup was > done when 3.1 or 3.2 were in progress? > > -- > Thanks and Regards, > Gokul
