Hi!

I've been trying to understand the ZooKeeper semantics when it comes
to ordering of watch notifications and other requests. Based on the
technical documentation and the book, I've been able to follow the
main rules but they seem to be a bit unclear when it comes to an
actual implementation.

(A) Order w.r.t to updated node data - I found the following statement
in the documentation: "ZooKeeper provides an ordering guarantee: a
client will never see a change for which it has set a watch until it
first sees the watch event."
At the same time, I found the following passage in ZooKeeper's book:
"One important guarantee of notifications is that they are delivered
to a client before any other change is made to the same znode. If a
client sets a watch to a znode and there are two consecutive updates
to the znode, the client receives the notification after the first
update and before it has a chance to observe the second update by,
say, reading the znode data."
(B) Order of watches - if there are two state changes u and u', watch
notifications corresponding to both of them must be delivered in the
same order.
(C) Order of system changes - if there are state updates u and u'
related to nodes a and b, respectively, and a client has set a watch
on node a, the client **cannot** read new value of b before seeing a
watch event related to a.

Questions:
1) I'm confused by the order of operations implied by (A): clients
can't observe the new state before receiving a watch event but when
they receive it, the new state must be available. Thus, the server
must send watch notifications to each client and update the node data.
What's the order there? No matter if we choose to first update node
contents or to process watch invocation, it can happen that a client
receives watch notification and performs a read before node data is
updated (stale data), or client reads updated data before receiving a
watch notification. Such scenarios are not likely, but they can happen
with non-deterministic delays.

How is it resolved in ZooKeeper? Is the node data not overwritten
in-place and two copies are kept until all watch notifications are
acknowledged, when we no longer need an old copy to return stale
value? Are read requests from client X stalled until the watch event
is acknowledged by client X? Or is there another solution employed
there?

2) Does C) imply that all updates received by a local server are
applied in a serial manner, i.e., for update u the server must receive
an acknowledgement of watch notification from each client interested
in it, before proceeding with update u'? Otherwise, a read issued by
the client might return the new data before it has received the watch
event; we can assume an unexpected network delay.

Thanks in advance for any help you can provide!
Best regards,
Marcin Copik

Reply via email to