Qian, ZooKeeper gurantees that if a client sees some transaction response, then it will persist but the one's that a client does not see might be discarded or committed. So in case a quorum does not log the transaction, there might be a case wherein a zookeeper server which does not have the logged transaction becomes the leader (because the machines with the logged transaction are down). In that case the transaction is discarded. In a case when a machine which has the logged transaction becomes the leader that transaction will be committed.
Hope that clear your doubt. mahadev On 1/28/10 6:02 PM, "Qian Ye" <yeqian....@gmail.com> wrote: > Thanks henry and ben, actually I have read the paper henry mentioned in this > mail, but I'm still not so clear with some of the details. Anyway, maybe > more study on the source code can help me understanding. Since Ben said > that, "if less than a quorum of servers have accepted a transaction, we can > commit or discard". Would this feature cause any unexpected problem? Can you > give some hints about this issue? > > > > On Fri, Jan 29, 2010 at 1:09 AM, Benjamin Reed <br...@yahoo-inc.com> wrote: > >> henry is correct. just to state another way, Zab guarantees that if a >> quorum of servers have accepted a transaction, the transaction will commit. >> this means that if less than a quorum of servers have accepted a >> transaction, we can commit or discard. the only constraint we have in >> choosing is ordering. we have to decide which partially accepted >> transactions are going to be committed and which discarded before we propose >> any new messages so that ordering is preserved. >> >> ben >> >> >> Henry Robinson wrote: >> >>> Hi - >>> >>> Note that a machine that has the highest received zxid will necessarily >>> have >>> seen the most recent transaction that was logged by a quorum of followers >>> (the FIFO property of TCP again ensures that all previous messages will >>> have >>> been seen). This is the property that ZAB needs to preserve. The idea is >>> to >>> avoid missing a commit that went to a node that has since failed. >>> >>> I was therefore slightly imprecise in my previous mail - it's possible for >>> only partially-proposed proposals to be committed if the leader that is >>> elected next has seen them. Only when another proposal is committed >>> instead >>> must the original proposal be discarded. >>> >>> I highly recommend Ben Reed's and Flavio Junqueira's LADIS paper on the >>> subject, for those with portal.acm.org access: >>> http://portal.acm.org/citation.cfm?id=1529978 >>> >>> Henry >>> >>> On 27 January 2010 21:52, Qian Ye <yeqian....@gmail.com> wrote: >>> >>> >>> >>>> Hi Henry: >>>> >>>> According to your explanation, "*ZAB makes the guarantee that a proposal >>>> which has been logged by >>>> a quorum of followers will eventually be committed*" , however, the >>>> source >>>> code of Zookeeper, the FastLeaderElection.java file, shows that, in the >>>> election, the candidates only provide their zxid in the votes, the one >>>> with >>>> the max zxid would win the election. I mean, it seems that no check has >>>> been >>>> made to make sure whether the latest proposal has been logged by a quorum >>>> of >>>> servers. >>>> >>>> In this situation, the zookeeper would deliver a proposal, which is known >>>> as >>>> a failed one by the client. Imagine this scenario, a zookeeper cluster >>>> with >>>> 5 servers, Leader only receives 1 ack for proposal A, after a timeout, >>>> the >>>> client is told that the proposal failed. At this time, all servers >>>> restart >>>> due to a power failure. The server have the log of proposal A would be >>>> the >>>> leader, however, the client is told the proposal A failed. >>>> >>>> Do I misunderstand this? >>>> >>>> >>>> On Wed, Jan 27, 2010 at 10:37 AM, Henry Robinson <he...@cloudera.com> >>>> wrote: >>>> >>>> >>>> >>>>> Qing - >>>>> >>>>> That part of the documentation is slightly confusing. The elected leader >>>>> must have the highest zxid that has been written to disk by a quorum of >>>>> followers. ZAB makes the guarantee that a proposal which has been logged >>>>> >>>>> >>>> by >>>> >>>> >>>>> a quorum of followers will eventually be committed. Conversely, any >>>>> proposals that *don't* get logged by a quorum before the leader sending >>>>> them >>>>> dies will not be committed. One of the ZAB papers covers both these >>>>> situations - making sure proposals are committed or skipped at the right >>>>> moments. >>>>> >>>>> So you get the neat property that leader election can be live in exactly >>>>> the >>>>> case where the ZK cluster is live. If a quorum of peers aren't available >>>>> >>>>> >>>> to >>>> >>>> >>>>> elect the leader, the resulting cluster won't be live anyhow, so it's ok >>>>> for >>>>> leader election to fail. >>>>> >>>>> FLP impossibility isn't actually strictly relevant for ZAB, because FLP >>>>> requires that message reordering is possible (see all the stuff in that >>>>> paper about non-deterministically drawing messages from a potentially >>>>> deliverable set). TCP FIFO channels don't reorder, so provide the extra >>>>> signalling that ZAB requires. >>>>> >>>>> cheers, >>>>> Henry >>>>> >>>>> 2010/1/26 Qing Yan <qing...@gmail.com> >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I have question about how zookeeper *remembers* a commit operation. >>>>>> >>>>>> According to >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperInternals.html#sc_s >>>> ummary >>>> >>>> >>>>> <quote> >>>>>> >>>>>> >>>>>> The leader will issue a COMMIT to all followers as soon as a quorum of >>>>>> followers have ACKed a message. Since messages are ACKed in order, >>>>>> >>>>>> >>>>> COMMITs >>>>> >>>>> >>>>>> will be sent by the leader as received by the followers in order. >>>>>> >>>>>> COMMITs are processed in order. Followers deliver a proposals message >>>>>> >>>>>> >>>>> when >>>>> >>>>> >>>>>> that proposal is committed. >>>>>> </quote> >>>>>> >>>>>> My question is will leader wait for COMMIT to be processed by quorum >>>>>> of followers before consider >>>>>> COMMIT to be success? From the documentation it seems that leader >>>>>> >>>>>> >>>>> handles >>>> >>>> >>>>> COMMIT asynchronously and >>>>>> don't expect confirmation from followers. In the extreme case, what >>>>>> >>>>>> >>>>> happens >>>>> >>>>> >>>>>> if leader issue a COMMIT >>>>>> to all followers and crash immediately before the COMMIT message can go >>>>>> >>>>>> >>>>> out >>>>> >>>>> >>>>>> of the network. How the system >>>>>> remembers the COMMIT ever happens? >>>>>> >>>>>> Actually this is related to the leader election process: >>>>>> >>>>>> <quote> >>>>>> ZooKeeper messaging doesn't care about the exact method of electing a >>>>>> leader >>>>>> has long as the following holds: >>>>>> >>>>>> - >>>>>> >>>>>> The leader has seen the highest zxid of all the followers. >>>>>> - >>>>>> >>>>>> A quorum of servers have committed to following the leader. >>>>>> >>>>>> Of these two requirements only the first, the highest zxid amoung the >>>>>> followers needs to hold for correct operation. >>>>>> >>>>>> </quote> >>>>>> >>>>>> Is there a liveness issue try to find "The leader has seen the highest >>>>>> >>>>>> >>>>> zxid >>>>> >>>>> >>>>>> of all the followers"? What if some of the followers (which happens to >>>>>> holding the highest zxid) cannot be contacted(FLP impossible result?) >>>>>> It will be more striaghtforward if COMMIT requires confirmation from a >>>>>> quorum of the followers. But I guess things get >>>>>> optimized according to Zab's FIFO nature...just want to hear some >>>>>> clarification about it. >>>>>> >>>>>> Thanks alot! >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> With Regards! >>>> >>>> Ye, Qian >>>> Made in Zhejiang University >>>> >>>> >>>> >>> >>> >>> >>> >>> >> >> >