Henry Robinson updated ZOOKEEPER-368:

    Attachment: ZOOKEEPER-368.patch

I'm attaching a first cut at this JIRA. I'd like comments on the broad approach 
- I'm aware there are more than a few rough edges in the code that need 
smoothing out.

I've introduced a PeerType enum to QuorumPeers that denote the peer as either a 
PARTICIPANT or an OBSERVER. I've also extended PeerState with an OBSERVING 
state. It is possible for PARTICIPANT nodes to be in the OBSERVING state if 
they have joined the ensemble but aren't part of the current view (there are a 
few references to views in this patch that reflect my work on the dynamic 
cluster membership stuff, however they're typically placeholder code). As a 
result, I've update the FollowerHandler code to send the current view to a new 
follower during the initial handshaking.

Observers hear about committed proposals through INFORM messages that the 
Leader sends to them. Apart from that, they operate much like Followers (and 
therefore share the same code) - when they connect, they sync. Eventually I 
envisage adding plugins to observers so that the proposals they see can be 
published according to whatever protocol is required. 

Observers don't participate in leader elections, and therefore only use the 
LeaderElection class which (by my reading) only deals with finding out who the 
current leader is. It is the only election class in this patch that correctly 
updates the PeerState depending on the current PeerType once a leader has been 
found. I haven't yet completely convinced myself that Observers don't actually 
actively participate in elections in this patch, so I'll be working to make 
sure of that. 

A node can be configured as an observer by having peerType=observer in its 
config file, otherwise it defaults to participant.

> Observers
> ---------
>                 Key: ZOOKEEPER-368
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: quorum
>            Reporter: Flavio Paiva Junqueira
>         Attachments: ZOOKEEPER-368.patch
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to