Flavio Paiva Junqueira commented on ZOOKEEPER-368:

Jeff, I surely appreciate your support. The discussion about leader election 
only started less than a month ago. The first post I can see about it is from 
Oct. 19, so we haven't been discussing or ignoring it for the past 5 months. I 
must also say, as the reporter of this feature, that I'm very interested in 
having it in, but please understand that this patch touches core functionality 
and I'd like to make sure I'm comfortable with all changes. Most changes are 
fine for me, and my only point of contention has been on leader election. My 
understanding from the latest post of Henry is that he will add points 3 and 4 
to this patch. If this is correct, then let's focus on the leader election 

We have agreed to postpone changes to the hardcoded majority checks in separate 
jiras. In fact, there are at least two jiras open about it. The second issue is 
using FLE with Observers. I'm under the impression that it wouldn't be so 
difficult to make such changes, and I think it would make this patch stronger. 
However, I'm happy to commit it without the FLE feature implemented, and in 
fact I'd like to work on it. If you people don't mind, I'd like to be assigned 

To make sure we are on the same page, I'll review it again once Henry uploads a 
new patch with points 3 and 4 implemented (it must compile and pass tests, of 
course). Is this reasonable?


> Observers
> ---------
>                 Key: ZOOKEEPER-368
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: quorum
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>         Attachments: obs-refactor.patch, observer-refactor.patch, observers 
> sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to