Thanks German for the awesome answer! I have an interesting followup question. My question is how does Zookeeper guarantee that within a client session, a subsequent read always observes a previous write?
More specifically, consider the following set up (to simplify the discussion assuming every operation is synchronous): I have Zookeeper server s1, s2 and s3. s1 is the ZAB master. I have a client c, which connects to s3 (its local replica) and establish a session with s3. (1) c issues a write, which goes to s3, s3 identifies that it is not the master, s3 forwards the request to s1. Then a network partition happens between s3 and s1, so s1 is only able to replicate the write to itself and s2. (The write succeeds because the majority agree to commit the write). Then the network partition heals itself, so s1 returns success to s3, which in turn returns success to the client. (2) c issues a read, the reads goes to s3, s3 serves the read locally, which does not reflect the write in (1). Could this ever happen? Thanks! -- View this message in context: http://zookeeper-user.578899.n2.nabble.com/zookeeper-client-session-write-read-consistency-tp7579330p7582086.html Sent from the zookeeper-user mailing list archive at Nabble.com.
