Wait-free means that from the user's point of view, all operations will proceed or fail immediately. Synchronization internal to the server to ensure coherent memory access doesn't really apply here.
The point is that ZK is not a lock manager and there is no corollary to a wait operation. This is extremely important because locks encourage timeout designs which are notoriously unreliable (at least in expectation when implemented by normal human engineers). The high level operations that have to happen are, roughly, a) the node receiving your request gets the request and validates that it appears well formed. b) if the request is a read operation, it is satisfied by direct access to the data c) if the request is a write operation, it is forwarded to the master d) the master looks at memory and the queue to decide if this operation will succeed or fail e) if the operation will fail, the failure is returned f) if the operation will succeed, then it is converted to an idempotent operation and sent to the rest of the cluster for committing g) once a quorum commits the operation, success is returned h) if a quorum fails to commit the operation in time, failure is returned This is a moderately long list, but nothing here has any long lived locks or much chance for starvation. That leads to the claim that all operations are lock-less from the user point of view. On Mon, Dec 10, 2012 at 9:45 AM, Dmitry Kuvayskiy < [email protected]> wrote: > Hello all. > > The paper "ZooKeeper: Wait-free coordination for Internet-scale > systems" claims that znodes are wait-free objects (citing: ``Our > system, Zookeeper, hence implements an API that manipulates simple > _wait-free_ data objects organized hierarchically as in file > systems''). But I wonder if they are really "wait-free" (in the sense > that "every operation has a bound on the number of steps the algorithm > will take before the operation completes")? > As far as I know, ZooKeeper uses locks (well, "synchronized" keyword) > internally. I also cannot imagine a system that big and complex > working only with wait-free data structures. And if we use locks > internally, how can we state that the external interface is wait-free? > Also, I didn't find anything about "wait-freedom" on the > http://zookeeper.apache.org/ site. So can anyone explain a bit, what > this "wait-free" really means and how it is implemented in the code? > > -- > Yours sincerely, > Dmitry Kuvayskiy >
