Re: zookeeper client session write-read consistency

Chris Nauroth Thu, 03 Mar 2016 11:26:51 -0800

Hello Wayne,

No, this cannot happen.  ZooKeeper guarantees that a client session will
never "go back in time" and read state older than state that the client
session already read.  This is accomplished by exchanging a transaction
ID, called the "zxid", between the server and client.  On every response
from the server, it tells the client its last known committed zxid.  The
client remembers this zxid.  If at any point a client needs to reestablish
its session, there is a guarantee enforced that the session reestablishes
with a server that has a zxid equal to or greater than the client's last
observed zxid.

For more details, please see pages 9-10 of "ZooKeeper: Wait-free
coordination for Internet-scale systems".

http://static.cs.brown.edu/courses/cs227/archives/2012/papers/replication/h
unt.pdf

Quoting that paper:

ZooKeeper servers process requests from clients in
FIFO order. Responses include the zxid that the response
is relative to. Even heartbeat messages during intervals
of no activity include the last zxid seen by the server that
the client is connected to. If the client connects to a new
server, that new server ensures that its view of the ZooKeeper
data is at least as recent as the view of the client
by checking the last zxid of the client against its last zxid.
If the client has a more recent view than the server, the
server does not reestablish the session with the client until
the server has caught up. The client is guaranteed to
be able to find another server that has a recent view of the
system since the client only sees changes that have been
replicated to a majority of the ZooKeeper servers. This
behavior is important to guarantee durability.

--Chris Nauroth

On 2/27/16, 4:34 PM, "wayne" <[email protected]> wrote:

>Thanks German for the awesome answer!
>
>I have an interesting followup question. My question is how does Zookeeper
>guarantee that within a client session, a subsequent read always observes
>a
>previous write?
>
>More specifically, consider the following set up (to simplify the
>discussion
>assuming every operation is synchronous):
>
>I have Zookeeper server s1, s2 and s3. s1 is the ZAB master. I have a
>client
>c, which connects to s3 (its local replica) and establish a session with
>s3.
>
>(1) c issues a write, which goes to s3, s3 identifies that it is not the
>master, s3 forwards the request to s1. Then a network partition happens
>between s3 and s1, so s1 is only able to replicate the write to itself and
>s2. (The write succeeds because the majority agree to commit the write).
>Then the network partition heals itself, so s1 returns success to s3,
>which
>in turn returns success to the client.
>
>(2) c issues a read, the reads goes to s3, s3 serves the read locally,
>which
>does not reflect the write in (1).
>
>Could this ever happen? Thanks!
>
>
>
>--
>View this message in context:
>http://zookeeper-user.578899.n2.nabble.com/zookeeper-client-session-write-
>read-consistency-tp7579330p7582086.html
>Sent from the zookeeper-user mailing list archive at Nabble.com.
>

Re: zookeeper client session write-read consistency

Reply via email to