Hi zookeepers,

When I dig into ZooKeeper's internals, I have learned the following flaw about 
znode version in ZooKeeper: znode's version will be reset when znode is 
deleted/re-created. This is a trap for some operations which make updates based 
on znode version. 

Let's see an example:  a client gets the data of a znode (e.g, /test)  and 
version(e.g, 1), change the data of the znode, and writes it back with the 
condition that the version does not change (still be 1). If another client 
deletes and re-creates this znode during the first client is updating the data, 
the version matches, but it now contains the wrong data.

The problem I can see is that the znode version is designed to be a 
monotonically increasing integer. If we can include the birth-date(timestamp) 
of the znode or zxid for the creation of the znode as part of the znode's 
version, and only the integer part of the version will increase every time when 
the znode is updated, while keeping the birth-date or zxid part of the version 
not change, we can avoid the problem.

Of course, there will be some cost for the new design: it needs bigger size for 
the version field.

Thanks,
- Robin 

Reply via email to