Thanks. I think freeze and thaw APIs are needed in ZK to support a point in time backup.
On Thu, Jun 16, 2016 at 2:08 AM, Flavio Junqueira <[email protected]> wrote: > Your description sounds right to me. In the very worst case, you can start > a server with a snapshot file. You'll lose the latest txns, but will be > able to restore some state. > > You may also want to have a look at Exhibitor: > > https://github.com/Netflix/exhibitor > > I'm not sure if it is being maintained, but used to be a good tool to > manage ZK. > > -Flavio > > > > On 15 Jun 2016, at 18:59, Gokul <[email protected]> wrote: > > > > Hi, > > > > I'm working on taking periodic snapshots of zookeeper data dir for > > backup(point-in-time) and recovery. I am using Zookeeper 3.4.5 version. > > > > I'm just wondering whether there will be a race condition between my > backup > > process(reading data - trans log & snapshots) and the ZK process(writing > to > > trans log) resulting in backing up a corrupted(non-usable for recovery) > > transaction log. > > > > 1. ZK server appends(FileTxnLog.append) each transaction to transaction > > log using BufferedOutputStream, so transactions may be committed(buffer > > full) to disk at this stage > > 2. Also ZK server forcefully flushes(SyncRequestProcessor) transaction > > to disk(transaction log) if there are no incoming messages or in > batches of > > 1000 if there are incoming messages pouring in continuously > > 3. In step 1, each transaction is written disk in these 4 sequential > > steps wherein each step execution separately may result in the data > being > > written to disk > > 1. Write checksum (writeLong) > > 2. Write transaction length in bytes (writeInt) > > 3. Write transaction(header & data) in bytes (write bytes) > > 4. Write EOF marker (write byte) > > 4. Each transaction is read in the following way in recovery flow > > 1. Read checksum (readLong) - throws IOException > > 2. Read transaction length (readInt) - throws IOException > > 3. Read transaction bytes (read bytes) - throws IOException > > 4. Read EOF marker (read byte), if not found ignore the transaction > > and stop reading > > > > Let's say the backup process took backup when any of 3.1 to 3.4 are > > complete or 3.3 is in progress(half written). Looking at the read flow, I > > think even then I would be able to restore all the transactions except > the > > last partial transaction which was incomplete. So Is there any other > > possibility of not able to read transaction log? Like when the backup was > > done when 3.1 or 3.2 were in progress? > > > > -- > > Thanks and Regards, > > Gokul > > -- Thanks and Regards, Gokul
