Thank you. Now it's clear. On Mon, Dec 10, 2012 at 6:58 PM, Ted Dunning <[email protected]> wrote: > Wait-free means that from the user's point of view, all operations will > proceed or fail immediately. Synchronization internal to the server to > ensure coherent memory access doesn't really apply here. > > The point is that ZK is not a lock manager and there is no corollary to a > wait operation. This is extremely important because locks encourage > timeout designs which are notoriously unreliable (at least in expectation > when implemented by normal human engineers). > > The high level operations that have to happen are, roughly, > > a) the node receiving your request gets the request and validates that it > appears well formed. > > b) if the request is a read operation, it is satisfied by direct access to > the data > > c) if the request is a write operation, it is forwarded to the master > > d) the master looks at memory and the queue to decide if this operation > will succeed or fail > > e) if the operation will fail, the failure is returned > > f) if the operation will succeed, then it is converted to an idempotent > operation and sent to the rest of the cluster for committing > > g) once a quorum commits the operation, success is returned > > h) if a quorum fails to commit the operation in time, failure is returned > > This is a moderately long list, but nothing here has any long lived locks > or much chance for starvation. That leads to the claim that all operations > are lock-less from the user point of view. > > > On Mon, Dec 10, 2012 at 9:45 AM, Dmitry Kuvayskiy < > [email protected]> wrote: > >> Hello all. >> >> The paper "ZooKeeper: Wait-free coordination for Internet-scale >> systems" claims that znodes are wait-free objects (citing: ``Our >> system, Zookeeper, hence implements an API that manipulates simple >> _wait-free_ data objects organized hierarchically as in file >> systems''). But I wonder if they are really "wait-free" (in the sense >> that "every operation has a bound on the number of steps the algorithm >> will take before the operation completes")? >> As far as I know, ZooKeeper uses locks (well, "synchronized" keyword) >> internally. I also cannot imagine a system that big and complex >> working only with wait-free data structures. And if we use locks >> internally, how can we state that the external interface is wait-free? >> Also, I didn't find anything about "wait-freedom" on the >> http://zookeeper.apache.org/ site. So can anyone explain a bit, what >> this "wait-free" really means and how it is implemented in the code? >> >> -- >> Yours sincerely, >> Dmitry Kuvayskiy >>
-- Yours sincerely, Dmitry Kuvayskiy
