Hi Flavio,
>That's not accurate. Being recorded by a quorum guarantees that a txn will be >in the initial state of future epochs, but a prospective leader might have >txns it its log that haven't been recorded in *a log*. The ?>prospective >leader needs to make sure that such txns are recorded in a quorum before >establishing a new epoch, though. I guess you meant a quorum not a LOG in above world *log* !!! Thank you Ibrahim -----Original Message----- From: Flavio Junqueira [mailto:[email protected]] Sent: Monday, October 05, 2015 06:23 م To: [email protected]? Subject: Re: 3-server Zab cluster > On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) > <[email protected]> wrote: > > Hi Rakesh, > > In Zab, before the end of synchronization phase, new leader will not commit > any proposals in transaction logs that have not got a majority of acks from > pervious ensemble (that what you are saying). That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though. > I think what Zab does is that before the end of synchronization phase, in L > and F2 (the new quorum), L (a prospective leader) will sync its own state > with F2 as the initial state. Referring to my scenario, zxid =10 is part of > the initial state and as a result it will be delivered in new quorum (L and > F2) before processing new proposals of new epoch. Yes, this is right. > > You can read this thread > http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581 > 583.html > <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758 > 1583.html> for more info > > What do you think? Does anyone have any questions or concerns about such > (small) optimization? I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve. -Flavio > > Ibrahim > > From: Rakesh Radhakrishnan [mailto:[email protected] > <mailto:[email protected]>] > Sent: Thursday, October 01, 2015 06:15 م > To: Ibrahim El-sanosi (PGR) > Subject: Re: 3-server Zab cluster > >>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and before serving >>>>>>>>> clients, L synchronizes its state with F2, resulting in zxid=10 will >>>>>>>>> be committed in L and F2 as well. I also though this process is the >>>>>>>>> same as Zab, isn't it? > > Since L didn't receives any ACK responses from F1 or F2 before leaving the > Leader status previously, L won't commit transaction zxid=10. IIUC after > re-forming the new quorum L will not have any mechanism to re-initiate the > proposal(Active messaging phase) for the previous zxid=10. > > -Rakesh > > On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) > <[email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>>> wrote: > Thank you Rakesh. > >>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>> >>>joins quorum first and L become the leader again. But the newly formed >>>> >>>quorum will not have the zxid=10 transaction. This will make the >>>> cluster >>>inconsistent, isn't it? > > (***) Ok, I thought when F2 form a quorum with L and before serving clients, > L synchronizes its state with F2, resulting in zxid=10 will be committed in L > and F2 as well. I also though this process is the same as Zab, isn't it? > > >>>> Apart from the above case I'm not seeing any other problems with 3 node >>>> >>>cluster. The above data loss case can be avoided by putting an >>>> assumption >>>that more than a tolerated number of server failures may >>>> affect the cluster >>>consistency and results in data loss. > > Yes, if the solution above (***) is not correct, you assumption makes sense. > > Ibrahim > > From: Rakesh Radhakrishnan [mailto:[email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>>] > Sent: 01 October 2015 17:26 > To: [email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>>; Ibrahim El-sanosi (PGR) > > Subject: Re: 3-server Zab cluster > > Hi Ibrahim, > > Below example taken from your older mail thread. > >>>>>> 1. leader (L) sends a proposal p with zxid =10 to F1 and F2. >>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and >>>>>> crashes. F2 crashes before receiving P10. L has not received any >>>>>> ACKs > > My thoughts for the above scenario is, > > In your case, zk client sees a successful response from F1. Then assume F2 > joins quorum first and L become the leader again. But the newly formed quorum > will not have the zxid=10 transaction. This will make the cluster > inconsistent, isn't it? > > Apart from the above case I'm not seeing any other problems with 3 node > cluster. The above data loss case can be avoided by putting an assumption > that more than a tolerated number of server failures may affect the cluster > consistency and results in data loss. But I feel this optimization would have > more cases if we scale up the cluster size beyond 3 servers. Now, I'm not > thinking in that direction as your case is limited to 3 node cluster. > > Regards, > Rakesh > > > On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) > <[email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>>> wrote: > Yes Alex, in my post I mentioned that this (small) optimization can only work > with 3-servers cluster. > > Who could confirm the optimization can work? > > Ibrahim > > -----Original Message----- > From: Alexander Shraer [mailto:[email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>>] > Sent: Tuesday, September 29, 2015 12:11 ص > To: [email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>> > Subject: Re: 3-server Zab cluster > > I'm not 100% sure whether operations that were pending on the leader are sent > out during sync when this leader looses quorum and re-elected. If so, then > maybe you're right. But in any case, this would not work for 5 or more > servers... > > On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < > [email protected] > <mailto:[email protected]><mailto:[email protected] > <mailto:[email protected]>>> wrote: > >> Thank you Alex for replaying. >> >> When you said " the leader gets re-elected and the operation is >> truncated from logs at other servers". I though the new leader will >> sync the its logs with other followers (synchronization phase), >> resulting in the operation will commit by new quorum. Let me make the >> scenarios as steps: >> >> 1. leader (L) sends a proposal p with zxid =10 to F1 and F2. >> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 >> crashes before receiving P10. L has not received any ACKs >> >> Possible solution (1) >> The leader will move to LOOKING phase as there is no quorum >> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum >> with the L (pervious leader), L becomes new leader again as it has latest >> zxid (10) in its log. >> L syncs its state with F2, as a result L, F1 (before crashing) and F2 >> commit P10. Is that correct? >> >> Possible solution (2) >> The leader will move to LOOKING phase as there is no quorum >> supporting its leadership. Now Assume F1 (with Zxid =10 committed) >> wakes up. I am not sure who should be a leader (F1 with Zxid =10 >> committed or L (pervious >> leader) with Zxid = 10 logged), I think F1 become a new leader as it >> has Zxid = 10 committed. F1 forms a quorum with the L (pervious >> leader), F1 becomes new leader as it has latest zxid (10) . L (new >> leader) syncs its state with L (pervious leader now become a >> follower), as a result Zxid10 commits by new quorum. Is that correct? >> >> What do you think? >> >> Ibrahim >> >> >> >> >> >> -----Original Message----- >> From: Alexander Shraer [mailto:[email protected] >> <mailto:[email protected]><mailto:[email protected] >> <mailto:[email protected]>>] >> Sent: Monday, September 28, 2015 07:27 م >> To: [email protected] >> <mailto:[email protected]><mailto:[email protected] >> <mailto:[email protected]>> >> Cc: [email protected] >> <mailto:[email protected]><mailto:[email protected] >> <mailto:[email protected]>> >> Subject: Re: 3-server Zab cluster >> >> Committing locally when sending an ACK at a server would lead to loss >> of consistency - it is possible that this is the only server that >> acks, e.g., this server is temporarily disconnected from the leader, >> the leader gets re-elected and the operation is truncated from logs >> at other servers. Its ok to ACK it but its not ok to commit since >> this exposes this to users as a committed operation that they can see. >> >> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < >> [email protected] >> <mailto:[email protected]><mailto:[email protected] >> <mailto:[email protected]>>> wrote: >> >>> In Zab, assume we have a cluster consists of 3-servers. To deliver a >>> write request, it must run 3 communication steps proposal, >>> acknowledgement and commit. >>> As Zab uses reliable FIFO, it is possible to remove commit round. As >>> soon as a follower receives a proposal, it logs, sends an ACK and >>> commits locally. Upon receiving ACK from any follower, leader >>> commits a proposal locally, no COMMIT message need to be sent to >>> followers. In this case, all servers commit a proposal in two >>> round-trips, resulting in reducing latency particularly in followers. >>> >>> Note that this optimization can only work in 3-servers cluster >>> (follower reaches a majority as soon as it acks). >>> Does anyone see any problems with such (small) optimization? >>> Ibrahim
