Thanks Alex.Now I clearly understand what zookeeper guarantees and what doesn't. I will revisit Paxos algorithm as you pointed to me.I didn't really understand it before. Thanks a lot
At 2015-01-06 23:17:50, "Alexander Shraer" <shra...@gmail.com> wrote: >Both scenarios are possible and are ok. An operation is guaranteed not to >be lost only if a quorum acked it. If less than a quorum acked it may be >lost or may remain as in your scenarios. Imagine, for example, that a >client connects to a server, submits an operation and right after the >server receives it from the client the server crashes. Or if the server >doesn't receive it at all and the client fails. The operation can be lost >in these cases too. Intuitively what's important is that its not lost after >it is committed or after someone saw it. And this can't happen with the >guarantee that it will not be lost if a quorum acks it. > >Try reading "Paxos made simple" for a better intuition on this. >http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf > >Alex > >On Mon, Jan 5, 2015 at 10:15 PM, bit1...@163.com <bit1...@163.com> wrote: > >> Thanks Alex for the detailed explanation, I understand much better. >> >> Following the your explanation, another question hits me. Say, that, only >> one follower A persist the write to the disk and acks the proposal to the >> leader,but others don't. >> >> The all the quorum are restarted. There are two things may happen >> >> 1. B,C,D starts first, assume D is the leader. >> Then A's last write will be lost because A will sync with D, I think this >> is OK because, the client of A didn't get the response that A's write is >> successful. >> >> 2. A,B,C starts first >> Assume A is the leader since it has more recent transaction id, then the >> whole quorum will have this write because B,C will sync with A. At last, >> the whole quorum will have the write. Is this the expected behavior? >> I don't think so because 1 and 2 are conflicting. In 1, A's write is >> inaccessible,but in 2, A's write is accessible. >> >> Is there something that I miss? Thanks. >> >> >> >> >> >> >> >> >> bit1...@163.com >> >> From: Alexander Shraer >> Date: 2015-01-06 13:26 >> To: user@zookeeper.apache.org >> Subject: Re: Question about the two-phrase commit >> Hi, >> >> A few things are not accurate. First, ZooKeeper implements consensus on >> each operation, not 2 phase commit. >> There are differences in the definition and guaratees of 2PC and Consensus. >> >> > 2. Followers ack the proposal and writes the change to the disk(but not >> persisted yet?) >> >> Before acking a follower writes/persists the proposed operation to disk (an >> operations log). >> >> > 4. When each follower receives the commit request, follower commits the >> changes(persist the change for ever?) >> >> The commit operaiton does not trigger a write to disk. What it does is that >> now the state change is applied to an in memory data structure holding >> ZooKeeper state. Since reads are served from that in-memory data structure, >> the write is now visible to reads. >> >> > d. Assume that When the response from A is back to client telling the >> client that the write is successful, But in the period, the >> > other followers (B,C,D) haven't even received the commit request, and >> B,C,D are down without getting a chance to commit the >> > change. >> >> Whether an operation was "committed" or not is not important during >> recovery. What's important is that a quorum acked it. So when you restart >> B, C, D, one of them necessarily has this write in its log. The others may >> have it or not but in any case, who ever is elected leader will have this >> operation in the log. When a leader is established the log is applied to >> memory (there are also snapshots which allow truncating the log, so the >> snapshot is applied first and then the log). When A syncs with the new >> leader, the only thing that it can loose are operations that were not acked >> by a quorum previously, and hence was not committed. >> >> Alex >> >> On Sun, Jan 4, 2015 at 11:05 PM, bit1...@163.com <bit1...@163.com> wrote: >> >> > >> > In the above process, something rare could happen >> > a. Say,there are 5 nodes in the quorum(1 leader E, 4 follower A,B,C,D). >> > b. The write operation is issued by the client that connects to Follower >> A >> > c. A commits the changes and response to the client that the writer >> > succeeds. >> > d. Assume that When the response from A is back to client telling the >> > client that the write is successful, But in the period, the other >> followers >> >> >> H >>