[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728797#action_12728797
 ] 

Henry Robinson commented on ZOOKEEPER-368:
------------------------------------------

{i'm trying to understand the use case for a follower that connects as an 
observer. this would adversely affect the reliability of the system since a 
follower acting as an observer would count as a failed follower even though it 
is up. did you have a case in mind?}

Not really. I was working on the assumption that it should be the peer and not 
the leader which decides whether or not it wants to be an observer or a 
follower. Since peers are identified on the leader by their IP address, there's 
no way for the leader to tell the difference between a follower or an observer 
from the same address. 

I was thinking of situations where a follower might not be able to, e.g., write 
to disk because a partition was full and therefore wanted to indicate to the 
leader that it had effectively failed while still receiving updates. If the 
leader sees that a follower has stopped replying to pings (because they are 
queued behind a sync to disk), I think it currently disconnects the follower; 
the follower might want to reconnect as an observer to do 'best effort' 
relaying of updates to clients. 

I admit this is quite contrived! It's ok for this not to work as I describe, 
although I think it will work as a side effect of the dynamic cluster stuff.

{i think it is reasonable to turn off the sync for the observer, but we 
probably still want to log to disk so that we can recover quickly. otherwise we 
will keep doing state transfers from the leader every time we connect. right?}

Yep, absolutely. However as Flavio says the leader can rate limit state 
transfers if this seems to be happening a lot. I would expect that log to disk 
be turned on for most observers, but we can turn it off transparently to the 
leader if we just want to use an observer to, say, publish updates to an rss 
feed or something and don't want to throw disk at the problem. In fact, 
observers could even turn off syncing with the leader for certain use cases.

As soon as I find a little bit of bandwidth I'll get the new patch on here for 
discussion. 



> Observers
> ---------
>
>                 Key: ZOOKEEPER-368
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: quorum
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Henry Robinson
>         Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to