I was considering using Zookeeper to implement a replication protocol due
the global order guarantee. In my case, operations are logged by creating
persistent sequential znodes. Knowing the name of last applied znode,
backups can identify pending operations and apply them in order. Because I
want to allow backups to join the system at any time, I will not delete a
znode before a checkpoint. Thus, I can ending up with thousand of child
nodes and consequently ZooKeeper.getChildren() calls might be very consuming
since a huge list of node will be returned.
I thought of using another znode to store the last created znode. So if the
last applied znode was op-11 and last created znode was op-14, I would try
to read op-12 and op-13. However, in order to protect against partial
failure, I have to encode some extra information ( I am using
<session-id>-<local sequential number>) in the name of znodes. Thus it is
not possible to predict their names (they'll be op-<almost random
string>-<zookeeper seq number>). Consequently , I will have to call
Has somebody faced the same issue ? Has anybody found a better solution ?
I was thinking of extending ZooKeeper code to have some kind of indexed
access to child znodes, but I don`t know how easy/clever is that.