>I'm not entirely sure what the optimization is and if you are proposing a 
>change or what. Are you looking for a blessing from this community? I'd like 
>to understand what you're trying to achieve.


As Zab uses reliable FIFO, it is possible to remove commit round. As soon as a 
follower receives a proposal, it logs, sends an ACK and commits locally. Upon 
receiving ACK from any follower, leader commits a proposal locally, no COMMIT 
message need to be sent to followers. In this case, all servers commit a 
proposal in two round-trips, resulting in reducing latency particularly in 
followers. 

Note that this optimization can only work in 3-servers cluster (follower 
reaches a majority as soon as it acks).  

The proposal:

ZK with  3-server cluster,  it is common use compared to 5 or 7, etc ensemble 
(I think). Clients  who  use 3-ZK ensemble and look to achieve better latency, 
we may provide this optimization (above algorithm)  as optional. 

I hope my aim is clear now.

Ibrahim 

-----Original Message-----
From: Flavio Junqueira [mailto:[email protected]] 
Sent: Monday, October 05, 2015 06:23 م
To: [email protected]
Subject: Re: 3-server Zab cluster


> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) 
> <[email protected]> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit 
> any proposals in transaction logs that have not got a majority of acks from 
> pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be 
in the initial state of future epochs, but a prospective leader might have txns 
it its log that haven't been recorded in a log. The prospective leader needs to 
make sure that such txns are recorded in a quorum before establishing a new 
epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L 
> and F2 (the new quorum), L (a prospective leader) will sync its own state 
> with F2 as the initial state.  Referring to my scenario, zxid =10 is part of 
> the initial state and as a result it will be delivered in new quorum (L and 
> F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread 
> http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> 583.html 
> <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> 1583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such 
> (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a 
change or what. Are you looking for a blessing from this community? I'd like to 
understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:[email protected] 
> <mailto:[email protected]>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving 
>>>>>>>>> clients, L synchronizes its state with F2, resulting in zxid=10 will 
>>>>>>>>> be committed in L and F2 as well. I also though this process is the 
>>>>>>>>> same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the 
> Leader status previously, L won't commit transaction zxid=10. IIUC after 
> re-forming the new quorum L will not have any mechanism to re-initiate the 
> proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) 
> <[email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 
>>>> >>>joins quorum first and L become the leader again. But the newly formed 
>>>> >>>quorum will not have the zxid=10 transaction. This will make the 
>>>> cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, 
> L synchronizes its state with F2, resulting in zxid=10 will be committed in L 
> and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node 
>>>> >>>cluster. The above data loss case can be avoided by putting an 
>>>> assumption >>>that more than a tolerated number of server failures may 
>>>> affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:[email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>]
> Sent: 01 October 2015 17:26
> To: [email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and 
>>>>>> crashes. F2 crashes before receiving P10. L has not received any 
>>>>>> ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 
> joins quorum first and L become the leader again. But the newly formed quorum 
> will not have the zxid=10 transaction. This will make the cluster 
> inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node 
> cluster. The above data loss case can be avoided by putting an assumption 
> that more than a tolerated number of server failures may affect the cluster 
> consistency and results in data loss. But I feel this optimization would have 
> more cases if we scale up the cluster size beyond 3 servers. Now, I'm not 
> thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) 
> <[email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work 
> with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:[email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: [email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent 
> out during sync when this leader looses quorum and re-elected. If so, then 
> maybe you're right. But in any case, this would not work for 5 or more 
> servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < 
> [email protected] 
> <mailto:[email protected]><mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is 
>> truncated from logs at other servers". I though the new leader will 
>> sync the its logs with other followers (synchronization phase), 
>> resulting in the operation will commit by new quorum.  Let me make the 
>> scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum 
>> with the L (pervious leader), L becomes new leader again as it has latest 
>> zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F1 (with Zxid =10  committed) 
>> wakes up. I am not sure who should be a leader (F1 with Zxid =10 
>> committed or L (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a 
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:[email protected] 
>> <mailto:[email protected]><mailto:[email protected] 
>> <mailto:[email protected]>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: [email protected] 
>> <mailto:[email protected]><mailto:[email protected] 
>> <mailto:[email protected]>>
>> Cc: [email protected] 
>> <mailto:[email protected]><mailto:[email protected] 
>> <mailto:[email protected]>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss 
>> of consistency - it is possible that this is the only server that 
>> acks, e.g., this server is temporarily disconnected from the leader, 
>> the leader gets re-elected and the operation is truncated from logs 
>> at other servers. Its ok to ACK it but its not ok to commit since 
>> this exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
>> [email protected] 
>> <mailto:[email protected]><mailto:[email protected] 
>> <mailto:[email protected]>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
>>> write request, it must run 3 communication steps proposal, 
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As 
>>> soon as a follower receives a proposal, it logs, sends an ACK and 
>>> commits locally. Upon receiving ACK from any follower, leader 
>>> commits a proposal locally, no COMMIT message need to be sent to 
>>> followers. In this case, all servers commit a proposal in two 
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster 
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim

Reply via email to