Hi! I've been trying to understand the ZooKeeper semantics when it comes to ordering of watch notifications and other requests. Based on the technical documentation and the book, I've been able to follow the main rules but they seem to be a bit unclear when it comes to an actual implementation.
(A) Order w.r.t to updated node data - I found the following statement in the documentation: "ZooKeeper provides an ordering guarantee: a client will never see a change for which it has set a watch until it first sees the watch event." At the same time, I found the following passage in ZooKeeper's book: "One important guarantee of notifications is that they are delivered to a client before any other change is made to the same znode. If a client sets a watch to a znode and there are two consecutive updates to the znode, the client receives the notification after the first update and before it has a chance to observe the second update by, say, reading the znode data." (B) Order of watches - if there are two state changes u and u', watch notifications corresponding to both of them must be delivered in the same order. (C) Order of system changes - if there are state updates u and u' related to nodes a and b, respectively, and a client has set a watch on node a, the client **cannot** read new value of b before seeing a watch event related to a. Questions: 1) I'm confused by the order of operations implied by (A): clients can't observe the new state before receiving a watch event but when they receive it, the new state must be available. Thus, the server must send watch notifications to each client and update the node data. What's the order there? No matter if we choose to first update node contents or to process watch invocation, it can happen that a client receives watch notification and performs a read before node data is updated (stale data), or client reads updated data before receiving a watch notification. Such scenarios are not likely, but they can happen with non-deterministic delays. How is it resolved in ZooKeeper? Is the node data not overwritten in-place and two copies are kept until all watch notifications are acknowledged, when we no longer need an old copy to return stale value? Are read requests from client X stalled until the watch event is acknowledged by client X? Or is there another solution employed there? 2) Does C) imply that all updates received by a local server are applied in a serial manner, i.e., for update u the server must receive an acknowledgement of watch notification from each client interested in it, before proceeding with update u'? Otherwise, a read issued by the client might return the new data before it has received the watch event; we can assume an unexpected network delay. Thanks in advance for any help you can provide! Best regards, Marcin Copik