Creating a new znode for each update isn't really necessary. Just create a
file that will contain all of the updates for the next snapshot and do
atomic updates to add to the list of updates belonging to that snapshot.
When you complete the snapshot, you will create a new file. After a time
you can delete the old snapshot lists since they are now redundant. This
will leave only a few snapshot files in your directory and getChildren will
be fast. Getting the contents of the file will give you a list of
transactions to apply and when you are done with those, you can get the file
again to get any new ones before considering yourself to be up to date. The
snapshot file doesn't need to contain the updates themselves, but instead
can contain pointers to other znodes that would actually contain the
I think that the tendency to use file creation as the basic atomic operation
is a holdover from days when we used filesystems that way. With ZK, file
updates are ordered, atomic and you know that you updated the right version
which makes many uses of directory updates much less natural.
On Tue, Jul 20, 2010 at 7:26 PM, André Oriani <
> Hi folks,
> I was considering using Zookeeper to implement a replication protocol due
> the global order guarantee. In my case, operations are logged by creating
> persistent sequential znodes. Knowing the name of last applied znode,
> backups can identify pending operations and apply them in order. Because I
> want to allow backups to join the system at any time, I will not delete a
> znode before a checkpoint. Thus, I can ending up with thousand of child
> nodes and consequently ZooKeeper.getChildren() calls might be very
> since a huge list of node will be returned.
> I thought of using another znode to store the last created znode. So if the
> last applied znode was op-11 and last created znode was op-14, I would try
> to read op-12 and op-13. However, in order to protect against partial
> failure, I have to encode some extra information ( I am using
> <session-id>-<local sequential number>) in the name of znodes. Thus it is
> not possible to predict their names (they'll be op-<almost random
> string>-<zookeeper seq number>). Consequently , I will have to call
> getChildren() anyway.
> Has somebody faced the same issue ? Has anybody found a better solution ?
> I was thinking of extending ZooKeeper code to have some kind of indexed
> access to child znodes, but I don`t know how easy/clever is that.