Hi, Alex Thanks for your reply and Flavio's
I think i finally get the idea. :) Would it be appropriate to see the ZAB as a 3PC without the READY/WAIT status? As all the participators will reply VOTE_COMMIT (they do not abort...). I will read the source code and hope can do some stuff with ZAB. Thanks a lot for all the replies. -- daidong On 2011年4月22日星期五 at 上午3:54, Alexander Shraer [via zookeeper-user] wrote: > Hi Daidong, > > In addition to Flavio's response, I'll try to address some of your specific > questions. > > > In my opinion, an atomic broadcast protocol must guarantee all the non- > > faulty servers have the same status eventually. So in the 2PC protocol, > > the coordinator must block until "all" the servers reply "ok". > > Designed this way, the protocol wouldn't be able to tolerate any failures - > the leader could block > waiting for a response from a server that had crashed. The idea is to receive > enough "ok" messages > to guarantee that even if a minority of servers crash, the information is > still not lost. That's why > the leader waits for a majority of acks. Messages are still sent to all > followers, so they will eventually > get them (or if they disconnect they will later reconnect and synch with the > leader automatically). > > Regarding your second question - formally, sequential consistency guarantees > that operations of each client take effect in the order > they were submitted by the client - so a client's read is guaranteed to see > its own last complete write. > In the example you mention, the client first executes a create() and then > getChildren(). If clients C1 and C2 both submit a create() > concurrently, one of these requests will reach the leader and will be > scheduled by the leader before the other one, suppose the create() request of > C1. > Then, when C2 is notified about the completion of its own create, FIFO > ensures that it also finds out about any operation that completed before that > create() > (these messages were sent by the leader earlier). So when C2 finally runs > getChildren(), its local state will already have every operation that was > scheduled > by the leader before its own create() completed. > > In general, ZAB implements state-machine replication by executing consensus > on each operation. To understand the general idea, > I recommend reading Lamport's "Paxos made simple" paper I sent earlier - it > has a constructive explanation of this > (although the algorithm is somewhat different from ZAB). > > Alex > > > -----Original Message----- > > From: daidong [mailto:] > > Sent: Wednesday, April 20, 2011 11:31 PM > > To: [hidden email] > > Subject: Re: RE: Problems about Zab protocol > > > > Hi, Alex > > > > Thanks for your reply. :) > > > > I knew ZAB has two modes, but things i do not quit understand focus on > > the broadcast mode. In the ZAB paper, authors said ZAB is a simple > > version of two phases commit protocol because we don't have abort > > actions in followers. I do not quit understand this. > > > > In my opinion, an atomic broadcast protocol must guarantee all the non- > > faulty servers have the same status eventually. So in the 2PC protocol, > > the coordinator must block until "all" the servers reply "ok". If there > > is not any abort too, consider the situation that we have a very slow > > follower F who processes messages slower than other followers. > > According TCP and FIFO channel, We can say all the messages will be > > processed orderly in F, however, the messages will assemble if > > coordinator continues to broadcasting. What happens if the receive > > buffer in F is overflow? > > > > Is there any mechanism i have not noticed to avoid this situation in > > ZAB? > > > > About my second questions, I read the consistency guarantees section, > > thanks for your tips. I still have a question, if zookeeper do not make > > sure that all the clients will see the latest value, how the lock > > mechanism works? i checked the recipe example code in Zookeeper 3.3.3, > > when a client try to get the write lock, it does not sync() before call > > getChildren(). If other client has created a ephemeral node with the > > lowest number suffix, this client does not get this information as > > getChildren() do not sync with leader. Is there any possibility that > > two clients will think they both got the lock? > > > > Thanks for any words. :) > > -- > > daidong > > Sent with Sparrow > > On 2011年4月21日星期四 at 上午2:30, Alexander Shraer [via zookeeper- > > user] wrote: > > > Hi, > > > > > > Regarding your first question - ZAB has two parts - the broadcast > > protocol you mention, > > > which is executed by a leader, and the leader election protocol, > > which recovers from a leader failure. > > > This is similar to the way other state-machine replication algorithms > > work, where you have > > > a fast normal mode and a slower recovery mode (you don't need to > > execute both all the time - only when the leader fails). > > > See Paxos state-machine replication for example (section 3): > > http://research.microsoft.com/en- > > us/um/people/lamport/pubs/pubs.html#paxos-simple > > > > > > Regarding your second question - Zookeeper basically guarantees so > > called "sequential consistency" semantics. > > > This guarantees that the real execution looks to clients like some > > sequential execution in which > > > the operations of every client appear in the order they were > > submitted. It does not guarantee that a read of one client > > > returns the latest value written by another client. This allows reads > > to be executed locally. If you need to return the latest > > > state, you can use the sync() call which flushes the pending updates > > between the leader and a follower. > > > See also the "consistency guarantees" section here: > > > > > http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.htm > > l > > > > > > Alex > > > > > > > -----Original Message----- > > > > From: daidong [mailto:[hidden email]] > > > > Sent: Wednesday, April 20, 2011 2:38 AM > > > > To: [hidden email] > > > > Subject: Problems about Zab protocol > > > > > > > > Hi, everyone. > > > > > > > > Recently, i read the paper "a simple total ordered broadcast > > protocol" > > > > and > > > > there are some problems i can not figure out. Hope anyone can help > > > > me... :P > > > > > > > > The paper describes the Zab protocol as a 2 phase commit protocol > > when > > > > system is under broadcast mode. However some paper(Skeen 82, "A > > Quorum > > > > Based > > > > Commit Protocol") has mentioned if we want to extend an 2PC to > > adapt a > > > > quorum based commit protocol we must introduce a three phase commit > > > > protocol(In fact, i haven't quit understood this, :( ). However > > > > according > > > > Zab paper, this still can be done. Why and how to do this? > > > > > > > > Secondly, even Zookeeper can guarantee that status in different > > > > followers > > > > are consistent. However, this consistency only works among a quorum > > of > > > > followers that has acked the COMMIT. As the client can connect to > > any > > > > followers when perform reading action, so what happens if the > > client > > > > happens > > > > to connect with the follower that has not acked the COMMIT? I can > > not > > > > find > > > > the information in this paper... > > > > > > > > If i ask some naive question, Hope anybody can tell me where i can > > find > > > > the > > > > answer or some suggestions, thanks :) > > > > > > > > > > > > -- > > > > View this message in context: http://zookeeper- > > > > user.578899.n2.nabble.com/Problems-about-Zab-protocol- > > > > tp6290102p6290102.html > > > > Sent from the zookeeper-user mailing list archive at Nabble.com. > > > > > > > > > If you reply to this email, your message will be added to the > > discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems- > > about-Zab-protocol-tp6290102p6291775.html > > > To unsubscribe from Problems about Zab protocol, click here. > > > > > > > > > > > > > > > > > -- > > View this message in context: http://zookeeper- > > user.578899.n2.nabble.com/Problems-about-Zab-protocol- > > tp6290102p6293369.html > > Sent from the zookeeper-user mailing list archive at Nabble.com. > > > If you reply to this email, your message will be added to the discussion > below: > http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6295361.html > > To unsubscribe from Problems about Zab protocol, click here. > > > -- View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6298861.html Sent from the zookeeper-user mailing list archive at Nabble.com.