[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780516#action_12780516
 ] 

Hudson commented on ZOOKEEPER-368:
--

Integrated in ZooKeeper-trunk #545 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/545/])


 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-17 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779178#action_12779178
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Thanks Ben! I agree that 581 is a genuine issue, I'll take it up over on its 
JIRA. 

Mahadev - 

An Observer won't be able to connect to a pre-Observer ensemble because it 
doesn't send FOLLOWERINFO (rather, it sends OBSERVERINFO). The effect is that 
it retries, and is rejected. I have just verified this. 

If a server is brought down and reconnects as an Observer, it will be able to 
connect to the ensemble without problem. The Leader does not validate the type 
of the Learner that connects so it happily accepts the OBSERVERINFO handshake 
and carries on. It is possible that, if the process was restarted very quickly 
and the server was originally the Leader, that there might be some confusion 
when the Observer refuses to issue proposals. My belief is that the old Leader 
would be identified as failed. 

This should probably be considered user error? Users must not try and start the 
cluster with different configurations at each node. I can think of similar 
'bugs' in the current code where different servers have different 
configurations and therefore acknowledge different quorum groups, meaning that 
there wouldn't be consensus on who is the Leader, for example. 

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is 

[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-17 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779184#action_12779184
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Flavio - 

Thanks for your comments!

1. Will do. 
2. QuorumPeerTestBase is extended by QuorumPeerMainTest and ObserverTest; it 
seemed to make sense to introduce a base class rather than have ObserverTest 
extend QuorumPeerMainTest and then have to manually disable the tests that I 
didn't want to run. Also, it makes the test classes themselves shorter and 
easier to reason about, 
3. See above.

Henry

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-17 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779186#action_12779186
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

I have just made sure that the docs compile, and after Henry's response, I 
don't have further comments. + 1, good job, Henry!




 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779194#action_12779194
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

thanks henry for the comments and testing! Thanks for all the hard work and 
responses. I have one more question. Sorry I couldnt find the answer to that, 
so wanted to ask again. I know looking at the code that this shouldnt be a 
problem but I think it is worth running a small test for it.

- the test is to have 2 servers (s1, S2) from the old code and 1 server (s3) 
with the new code and verify that s1/s2 or s3 are all capable of becoming the 
leader and everything works fine with neone of them becoming a leader. This 
could be done by bringing up s1, s2 and s3 at the same time and killing them 
one at a time and bringing the other up. SOmething like testing a rolling 
upgrade wherein one server is the new code and the other servers are old code. 
This is not testing observers but just testing (though I think it should work 
fine looking at the code) that the older versions will work with the new 
version irrespective of which of them is the leader.

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-17 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779205#action_12779205
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Mahadev - 

Ah, I understand now. Yes, I just ran that test. Everything works as I expected 
it to - both when two servers are from the old code and then when two servers 
are from the new. They can be swapped out for each other will no ill effects, 
AFAIK.  

Thanks!

Henry

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779209#action_12779209
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

thats great  ill commit the patch  tonight! 

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778447#action_12778447
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

henry,
 you dont have to upload the diff the built docs (pdf/html). THe person 
committing the patch will run ant docs and commit the updated docs in src/docs.



 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778612#action_12778612
 ] 

Hadoop QA commented on ZOOKEEPER-368:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425104/ZOOKEEPER-368.patch
  against trunk revision 835618.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 25 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/64/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/64/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/64/console

This message is automatically generated.

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 


[jira] Commented: (ZOOKEEPER-368) Observers: core functionality

2009-11-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778704#action_12778704
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

henry, 
  did you get a chance to test observers with old (meaning without hte patch) 
zookeeper servers? repeating my comments here:

 -   looking at the patch it seems like it would work with servers prior to 
including this patch. Did you try some testing with current servers (killing 
one at a time and brining them up in a round robin fashion) just to make sure 
it works all fine with the current servers (not including the patch)?
 - what happens if a server configured as follower is suddenly brought down and 
is made an observer and the other way around as well? Just checking to see if 
we have these scenarios covered because such mistakes are easy to make when 
setting up servers

 Observers: core functionality 
 --

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Fix For: 3.3.0

 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Edit (Henry Robinson/henryr) 12/11/09:
 This JIRA specifically concerns the implementation of non-voting peers called 
 Observers, their documentation and their tests. 
 Explicit goals are 1. not breaking any current ZK functionality, 2. enabling 
 at least one deployment scenario involving Observers, 3. documentation 
 describing how to use the feature and 4. tests validating the correct 
 behaviour of 2. 
 Non goals of this JIRA are 1. performance optimizations specific to 
 Observers, 2. compatibility with every feature of ZooKeeper (in particular 
 all leader election protocols), which are both to be addressed in future 
 JIRAs. 
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers for more detail of use 
 cases, proposed design and usage.
 See http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide for a brief 
 commentary on the current patch. 
 -
 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-12 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777047#action_12777047
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Jeff, I surely appreciate your support. The discussion about leader election 
only started less than a month ago. The first post I can see about it is from 
Oct. 19, so we haven't been discussing or ignoring it for the past 5 months. I 
must also say, as the reporter of this feature, that I'm very interested in 
having it in, but please understand that this patch touches core functionality 
and I'd like to make sure I'm comfortable with all changes. Most changes are 
fine for me, and my only point of contention has been on leader election. My 
understanding from the latest post of Henry is that he will add points 3 and 4 
to this patch. If this is correct, then let's focus on the leader election 
issues.

We have agreed to postpone changes to the hardcoded majority checks in separate 
jiras. In fact, there are at least two jiras open about it. The second issue is 
using FLE with Observers. I'm under the impression that it wouldn't be so 
difficult to make such changes, and I think it would make this patch stronger. 
However, I'm happy to commit it without the FLE feature implemented, and in 
fact I'd like to work on it. If you people don't mind, I'd like to be assigned 
to ZOOKEEPER-578.

To make sure we are on the same page, I'll review it again once Henry uploads a 
new patch with points 3 and 4 implemented (it must compile and pass tests, of 
course). Is this reasonable?


 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-12 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12777091#action_12777091
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


Last night I was thinking about this, 368 - why the disconnect? and I believe 
I have figured out the underlying issue. JIRAs are primarily problem statements 
and resolutions (patches for the most part). In this case the solution doesn't 
fit the problem statement subject: Observers. This is not observers, really 
it's more like phase 1 of Observers - code changes and tests, limited 
functionality (UDP LE only) with additional JIRAs to address subsequent to 
this patch going in.  I know when I reviewed this patch, and if you look at my 
most recent comments, this is the mindset I had - this is observers, but 
really that's not Henry's intent. That's fine from my perspective, iterative 
development is great, improve things but don't break existing functionality, 
but the JIRA description here (esp subject) doesn't fit and that's throwing 
people. Creating additional JIRAs would also make this more clear (obs phase 2 
adding ..., phase 3 finalizing observers, code complete -- whatever). 
Changing the subject on this JIRA would make this more clear.

Ben had a good summary of next steps so I won't go through that. Flavio and 
Henry seem to have a plan in place to execute. So lets wrap this up boys and 
girls. ;-)

Finally, I want to point out that if a patch takes 5 months or 5 years, if it's 
not ready to go in it's not ready, regardless of outside pressure. It's the 
contributor's responsibility to work with the committers to get a patch 
committed. It's the committers responsibility to work with the contributor, 
review the patch, provide useful feedback and try to get the issues resolved 
with limited muss/fuss.

Henry, you've been doing a great job on this (and support of ZK in general). I 
know both you and Flavio (and the rest of the committers) have been spending a 
lot of time on this - thanks all! So like I said, let's wrap this up and move 
on.




 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776482#action_12776482
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Thanks for the updated patch and the review guide, Henry. The review guide is 
quite handy. 

To me, we need the following to complete this patch:

# Make it work with FLE, which is the default leader election;
# Get rid of all hardcoded quorum.size() / 2 and replace it with 
containsQuorum();
# Include a test with hierarchical quorums and observers;
# Prepare a forrest document for the feature, describing what it does, how to 
configure ZooKeeper to use it, and perhaps one or two cases in which it would 
be useful to use observers.


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776524#action_12776524
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

nice reviewer guide! the patch looks really good. for me it's good to go once 
you have address the 4 things that flavio raised. (the forest doc is a pain, if 
you have troubles with it i'll help with the formatting if you give me the 
content.)

for historical purposes do you have a copy of that summary that was produced of 
the differences in operation and motivation between zero-weight followers and 
observers? it's been a while and i can't remember how it was published. it 
would be good to put a comment about it here in this issue.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776566#action_12776566
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Flavio, Ben - thanks for the comments! Feels like we're getting close with this 
one.

To Flavio's specific points:

1. In order to make this work with FLE, the easiest thing is to have a 
ResponderThread be running all the time. However, a ResponderThread currently 
only runs when electionAlg=0. To make the responder thread run for all 
electionAlg types is easy, but this introduces a UDP dependency which some 
installations do not want. So we need to make ResponderThread be both UDP and 
TCP compliant. This is easy enough (I have written this code), but it also 
makes configuration yet more complicated because there is yet another port that 
needs specifying (there is some port re-use in the code currently that's a bit 
sketchy I think, and that doesn't work in all cases, we need another dedicated 
port). We will have to discuss whether we want to require strings of the form 
server.id:address:port:port:port:learnertype or if it's time to break out the 
per-server configuration into a more structured format. At this point, I feel 
like this is complicated enough, and orthogonal to Observers, to warrant its 
own JIRA - it would make the Observers patch too complicated. Also, this 
feature requires getting the race condition bug fixed. 

I've created https://issues.apache.org/jira/browse/ZOOKEEPER-578 for this issue.

So we can block the Observers patch on this feature, or we can get a reduced 
Observers patch in (and prevent another cycle of refactoring when trunk gets 
updated and the patch no longer applies). Either is good; but I'm probably in 
favour of getting the patch in now and updating once the ResponderThread JIRA 
gets closed. The change to re-enable Observers for all election types is pretty 
trivial.

2. I think this is a great idea - I'd point out that the hardcoded 
quorum.size() / 2 usages predate the Observers patch! For example, see 
termPredicate(..) in AuthFastLeaderElection.java and lookForLeader in 
LeaderElection.java. This should therefore be a separate JIRA (I'm trying to 
avoid having several issues fixed by this patch).

I've created https://issues.apache.org/jira/browse/ZOOKEEPER-577 for this 
issue. 

3. Yes, will do.

4. Yep, will do.

Ben - I didn't take great notes at that meeting (jetlag!), but my recollection 
is: we were trying to reconcile having Observers change roles and join the 
ensemble as voting members with the complications of doing so. Zero-weight 
followers are a great way to do that. However, we decided that actually that 
might not be a feature we wanted. At that point, the optimisations you can make 
with Observers, particularly for WANs such as batching and the single-message 
INFORM protocol, means it makes sense to logically separate out Observers in 
the code. We could have special-cased handling of 0-weight clients, but we felt 
that since this would involve a step-change in the behaviour of peers as the 
weight went from 0 to 0+ it would be a bit counter intuitive. 



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just 

[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776569#action_12776569
 ] 

Jeff Hammerbacher commented on ZOOKEEPER-368:
-

bq. So we can block the Observers patch on this feature, or we can get a 
reduced Observers patch in (and prevent another cycle of refactoring when trunk 
gets updated and the patch no longer applies). Either is good; but I'm probably 
in favour of getting the patch in now and updating once the ResponderThread 
JIRA gets closed. The change to re-enable Observers for all election types is 
pretty trivial.

I agree with Henry. Let's get this patch in and handle the unusual case of 
TCP-only FLE in a separate patch for 578.

bq. This should therefore be a separate JIRA (I'm trying to avoid having 
several issues fixed by this patch).

Also strongly agree. Get the patch in and handle the hardcoded code removal 
this patch turned up in 577.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776654#action_12776654
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

henry,
thanks for the patch.
the patch looks good.

some comments:

- looking at the patch it seems like it would work with servers prior to 
including this patch. Did you try some testing with current servers (killing 
one at a time and brining them up  in a round robin fashion) just to make sure 
it works all fine with the current servers (not including the patch)?
- what happens if a server configured as follower is suddenly brought down and 
is made an observer and the other way around as well? Just checking to see if 
we have these scenarios covered because such mistakes are easy to make when 
setting up servers
- also it would be good to have more javadocs in the code. Its good to have 
javadocs just be explaining whats going on in each method (though we lack that 
kind of documentation in the code but I do hope we can get more javadoc)
- I think its fine to do FLE in another jira as long as it gets done. It would 
not be a useful feature if it does not run with FLE. I would have gone with 
making it work with FLE first and then trying to see if it works with LE or not.
- removing the quorums.getsize()/2 with containsQuorum() can surely be done in 
another jira.
- also performance benchmarking the code with this patch and without this patch 
so that we make sure that this patch doesnt degrade the performance in any way 
will be good to have

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776690#action_12776690
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

# I think it is quite important to have it working with FLE because FLE is the 
default leader election currently. My preference is to have it fixed before we 
get this patch in because it is not an unusual case. I'm happy to work with 
Henry on getting this fixed, btw;
# I didn't realize that the majority checks were the ones of the leader 
election implementations. This is pending at least for AFLE because AFLE does 
not use server identifiers, and there is a jira open to fix this issue 
(ZOOKEEPER-372). Fixing this hasn't been a priority because we haven't been 
able to decide whether we should support all implementations of leader election 
or not. We have been trying to keep FLE in good shape, though. To me, it is ok 
to postpone these changes if the checks are only on the LE and AFLE. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776774#action_12776774
 ] 

Jeff Hammerbacher commented on ZOOKEEPER-368:
-

Flavio,

This patch was submitted five months ago. Blocking its submission to trunk, 
which is not intended to be a stable release, so that another bug can be fixed 
seems imprudent. I hope you'll consider pushing the patch through and fixing 
the FLE issue separately. Mahadev seems to agree with this philosophy. Henry 
doesn't work full time on Zookeeper and we'd really like to get this JIRA in so 
that work can be done in smaller chunks moving forward.

Thanks,
Jeff

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776840#action_12776840
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

jeff, i agree that we shouldn't hold a patch to fix a bug somewhere else, but 
we also generally try to keep our trunk correct, so generally we want to see 
doc, test, and correct behavior before committing especially with something 
that touches the core. having said that i think the missing doc, functionality, 
and testing is confined to the observer function, so i think we should commit 
it and fix the rest of the observer code as separate patches to avoid having to 
refresh the patch.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775947#action_12775947
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Henry, good job so far. Please bear with me for a little longer:

# Could you please update the changes to the test files? Due to a few recently 
committed patches they don't apply to trunk any longer;
# Could you make sure to remove all unnecessary LOG statements? Some of them 
look like messages you used for your own debugging (they start with HNR) and 
others are commented out. I think I've seen a TODO comment as well;
# It sounds like this feature works with both majority and hierarchical 
quorums. Is it correct? Can I have observers with hierarchical quorums?

This might be a little late for this patch now, but for future patches that 
introduce features like this, it is probably a good idea to have a brief design 
document explaining changes to the protocol and to ensemble configuration.  


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775956#action_12775956
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


Henry, I don't see any docs for this in src/docs. I suggest that you start a 
new document (new xml file) for this feature, it should explain why/how(torun) 
at the very
least -- so that potential users can come up to speed.

Flavio, could you also review the comments on this JIRA as part of your commit 
review? We should make sure that either all of the issues are addressed,
or at the very least new JIRAs are created (Henry could you do this?) for the 
pending items so that we don't lose the comments/concerns/issues that have been 
identified
previously (this is a major new/visible feature so I think it warrants the 
extra time/effort).

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776004#action_12776004
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Hi Falvio / Patrick - 

Thanks for your comments! 

Design document - there's a brief writeup at 
http://wiki.apache.org/hadoop/ZooKeeper/Observers which very broadly covers the 
design. I will update it when I get a moment to do so. 

User documentation - yes, will do, already on my to do list. There is a section 
in the above wiki page that will be a good start. 

Quorums - yes, it should work with all mechanisms. The only caveat is that it 
only works with the simple LeaderElection protocol, which presumes a majority 
quorum approach (there are lines where votes  quorum.size() / 2 is hardcoded 
rather than using the verifier - I think this is the source of at least one of 
the to-dos). 

Debug messages: ugh, sorry about that. Will update the patch to build against 
trunk shortly and remove those messages. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-10 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776233#action_12776233
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

I just put up a set of notes on the patch on the wiki here: 
http://wiki.apache.org/hadoop/ZooKeeper/Observers/ReviewGuide to help make the 
review a little less painful - although non-comprehensive, it should help 
explain most of the major code changes. 

An updated patch will follow very shortly. 



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772213#action_12772213
 ] 

Hadoop QA commented on ZOOKEEPER-368:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423742/ZOOKEEPER-368.patch
  against trunk revision 831486.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/console

This message is automatically generated.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768510#action_12768510
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Ok, that's what I'm doing - making a TCPResponderThread and a 
UDPResponderThread, and have a configuration option to choose between them 
(defaulting to UDP). 



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-19 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767245#action_12767245
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Henry, are these throughput values for synchronous or asynchronous operations? 
Throughput sounds pretty low in any case. 

Also, from your description, it sounds like you're testing only with write 
operations (create). I would expect write throughput to be independent on the 
number of observers (but dependent upon the number of followers). Read 
throughput should increase with the number of observers.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767519#action_12767519
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

I have had to change the Leader Election code just a little bit to support 
Observers, and I wanted to run the decisions past everyone.

Observers don't participate in Leader Elections in the sense that they don't 
cast votes. However, they need to learn the results. The way I do this at the 
moment is to force Observers always to use LeaderElection as their election 
algorithm (and disable vote casting for them). So essentially they simply query 
the rest of the ensemble for a quorum of votes. This works well, and has the 
advantage of not needing to teach all LE algorithms about observers. The only 
change I make to the rest of the code is to always start a responder thread, no 
matter what the prevailing election type on the follower, so that they'll 
always respond to the queries from observers.

The correctness of this relies on the fact that a leader must always be 
supported by a quorum, no matter what the protocol used to elect the leader in 
the first place is. So it's always correct to believe that a leader that is 
supported by a quorum is actually the leader.

Does this sound right? Are there any gotchas about always running the responder 
thread?

Henry


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-19 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767539#action_12767539
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Flavio - 

Throughput is low - but we should be looking at the relative numbers, not the 
absolute values (I get similar numbers running the current trunk in the same 
configuration). One reason throughput might be arbitrarily low is because I'm 
running these benchmarks against a single machine, so might be hitting disk 
bottlenecks due to contention for the logs. 

These numbers were for synchronous create operations, issued from a single 
client. So read throughput would at best stay constant since the client can't 
take advantage of the parallelism offered by multiple observers. I've also 
benchmarked reads and mixed workloads (the most interesting, typically). Reads, 
as expected, are fairly constant in throughput. Mixed workloads are better in 
heterogeneous clusters, again as you would expect. 

These are just indicative numbers to ensure that we're on the right track :)

Mahadev - I ran the experiment you suggested, I'll attach the chart results 
below. 

Henry

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-16 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766785#action_12766785
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Just to update: I'm putting the patch through some tests and micro-benchmarks 
here and hope to offer a new patch very soon. The results are so far 
encouraging:

Created 1 nodes in 17.1312019825 seconds, at 583.730202365 requests/second 
(without Observers)
Created 1 nodes in 9.63481593132 seconds, at 1037.90254752 requests/second 
(with Observers)

This is a 7-node ensemble, in the first instance it's homogeneously Followers, 
in the second 4 Observers and 3 Followers. As expected, performance is 
seriously better than the homogeneous case. 



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766793#action_12766793
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

just to make it more clear, you can start with an ensemble of 3 servers, then 
increase it to 4 (with one observer), then 5 (with 2 observers), 6 (with 3 
observers)  

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-16 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766792#action_12766792
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

henry, good to see this. one important benchmark wuld be to see how much does 
the throughput vary with increasing number of observers. Example, we can start 
with an ensemble of 3, with no observers (this performance should be comparable 
to the code without Observers) and then add one observer at a time and see how 
much the throughput varies!


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-08 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763753#action_12763753
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

great... lets work to get this in soon before the codebase changes again... 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-09-19 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757709#action_12757709
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

henry,
  I tried applying the refactor patch to the trunk. It fails as it is. I did 
try mending the patch myself but looks like there has been a lot of 
refactoring, so Ill leave it up to you. Would you be able to upload a new patch 
for refactor and the other patch? Sorry about the delay on reviewing this code. 
I understand the frustration.  Hopefully we can get this done in the next few 
weeks 

thanks


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-09-15 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755382#action_12755382
 ] 

Jeff Hammerbacher commented on ZOOKEEPER-368:
-

Hey,

Now that 3.2.1 has gone out, does it make sense to review this patch?

Thanks,
Jeff

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-09-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755390#action_12755390
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

jeff, henry,
  I will probably be taking a look at it sometime this week 

thanks

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-09-15 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755490#action_12755490
 ] 

Jeff Hammerbacher commented on ZOOKEEPER-368:
-

Great! Thanks, Mahadev.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-08-12 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742532#action_12742532
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Anyone had a chance to try this out yet?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-08-12 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742550#action_12742550
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Henry, we have been concentrating on the jiras for the 3.2.1 release. We'll get 
back to observers as soon as we are done with this release. Is this ok?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-08-12 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742553#action_12742553
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Of course! That was just a keepalive message :)

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733571#action_12733571
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

I agree that using a quorum of weight 0 would emulate most of the behaviour of 
Observers. 

For:

* Would simplify the patch (presumably still need code to ensure that observers 
aren't elected as leaders)
* Exercises existing code

Against:

* Adds an extra message to each proposal instance, increasing chatter across 
the WAN (yes they don't need to send ACKs, but they will still receive PROPOSAL 
and COMMIT msgs)
* Perhaps less easy to customise request processor behaviour, although it's not 
established that we'll definitely want to do this. 
* Adding subscription behaviour (see mailing list discussion) would be less 
clean - we would need to decide if we wanted all Followers to be able to choose 
the znodes they receive updates from (I think this would overcomplicate the 
Leader which would need to determine when / if it had a quorum for each node, 
and under which conditions it was to revoke its own Leadership). The 
alternative with the proposed approach is to only allow weight-0 Followers to 
choose a subscription, which seems a bit hacky to me. 

My vote would probably be for continuing with the current approach (perhaps I'm 
biased because I don't want to throw away the work already done :)).

There are actually two orthogonal issues related to the coding of this patch.

1. Do we want to separate out Observers into a separate class, and retain the 
Peer- [Follower|Observer] hierarchy? 
2. How do we implement Observers - as members of a weight-0 quorum or with 
INFORM packets and some custom logic on the Leader side?

The main complexity of this patch is due to 1. (Note how the patch size spiked 
by 50kb when I included the class hierarchy split). The actual logic needed to 
make Observers work as implemented is fairly simple: on the server side the 
main changes are in QuorumPeer and Leader.

I would argue in favour of 1 - I think there is an increased readability with 
splitting the Follower code out and this gives a natural point to customise the 
behaviour of Followers or Observers in the future; plus if we should ever 
decide we want another kind of Peer the class hierarchy is already in place to 
make it happen more easily. Plus, as I argued above, I think only Observers 
should have partial subscription ability; this class gives us the right place 
to put it. 

Given 1, I would then argue for implementing Observers as currently imagined 
due to the small advantage of losing the extra message, plus the benefits of 
keeping the implementation separate on the Leader to make it easy to implement 
subscriptions later. The size of the implementation patch compared to the size 
of the restructuring patch is small.

If we decide against 1, then the situation is less clear as implementing 2 with 
weight-0 quorums would probably lead to a smaller patch. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't 

[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733789#action_12733789
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

i'm very sensitive to the work already done issue! i've totally been there.

the con argument for the increased chatter is actually quite minimal since the 
COMMIT message is just a few bytes that gets merged into an existing TCP 
stream.the restriction only weight-0 followers subscribing to a portion of the 
tree is a bit hacky, but it eliminates the need for a bunch of new code.

to be honest, there are two things that really concern me:

1) the amount of new code we have to add if we don't use weight-0 followers and 
the the new test cases that we have to write. since observers use a different 
code path we have to add a lot more tests.
2) one use of observers is to do graceful change over for ensemble changes. 
changing from a weight-0 follower to a follower that is a voting participant 
just means that the follower will start sending ACKs when it gets the proposal 
that it starts voting. we can do that very fast on the fly with no interruption 
to the follower. if we try to convert an observer, the new follower must switch 
from observer to follower and sync up to the leader before it can commit the 
new ensemble message. this increases the interruption of the change and the 
likelihood of failure.

btw, we could setup a phone conference if it would help. (everyone would be 
invited of course. we have global access numbers.)

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733790#action_12733790
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

hey i'm looking at the patch, can you comment on the VIEWCHANGE message? does 
that refer to ensemble membership change or the subscribe to a subtree that was 
mentioned.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733818#action_12733818
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

 i'm very sensitive to the work already done issue! i've totally been there. 

:) As long as we get the right solution, it's all good.

The amount of new code is an issue. I think we have to just make a judgement 
call. Most of the Observer code is shared between Observers and Followers, so I 
think that existing tests would exercise a lot of the code. I will finish up a 
tightened version of the patch in the next day or so that I think can be 
considered for commit (in terms of code quality) and post that, and I will also 
write something on the wiki page about precisely what has changed so that we 
have something concrete to discuss. The current patch is really only a 
proof-of-concept.

For example: the VIEWCHANGE message was a hangover from the dynamic ensemble 
change stuff (since removed), and certainly shouldn't be in the patch. Now I've 
got the hang of git branches, keeping things separate is a lot easier to do...

I'm up for a conference call to discuss this - I'm also in SF for a week next 
week, so maybe we can meet in person at last and talk this over :)

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733951#action_12733951
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

t seems to me that determining that a server is an observer based on its 
weight, and switching between observer and follower during ensemble changes are 
separate issues. Suppose that we go with the zero-weight option. Once we read a 
weight zero, we know that the server is an observer and we have to execute the 
corresponding code. From Ben's suggestion, the difference between a follower 
and an observer can be as simple as an if statement that guards the block that 
sends acks. In Henry's patch, there are separate classes for observers and 
followers.

It seems that we also have to make sure that during the transition from an old 
ensemble to a new ensemble we have a quorum available for the new ensemble, 
perhaps counting new members in, so it could be tricky to undergo this 
transition correctly if we have to switch from observer to follower. I think 
that what has to happen is that the update reflecting the view change has to be 
the first committed operation of the new ensemble. Consequently, an observer 
has to make sure that it has seen all updates before committing the ensemble 
update, and be ready to ack and commit the ensemble update once the leader of 
the new ensemble proposes it.

Henry, could you sketch out your thoughts on how your modifications will handle 
ensemble changes? I wonder if it is best to try to reach agreement on this 
issue first before submitting a patch. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-20 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733410#action_12733410
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

henry, i was thinking the other day that an observer is very similar to a 
follower in a flexible quorum with 0 weight. actually the more i thought about 
it, the more i realized that it should be the same. a follower with 0 weight 
really should not send ACKs back and then it would be an observer. it turns out 
that there is a comment in ZOOKEEPER-29 that makes this observation as well. in 
that issue the differences that flavio points out are no longer relevant. i 
think. what do you think?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731333#action_12731333
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

From the patch, an observer observes the leader. In the description of this 
jira, I had suggested that we also have observers of followers for scalability 
purposes. I understand that this will require some changes, like adding code 
to the followers to handle connections to observers, forwarding INFORM 
messages to the observers, syncing with observers, etc. The observer would 
also need to be slightly different because it wouldn't need to look for a 
leader.

I wonder if this is something we should pursue in this patch, in another patch, 
or simply forget about it. I think the 3rd option is a little too drastic as I 
believe it is a nice feature that will enable deployments in which the number 
of messages the leader sends per operation does not increase with the number of 
observers.

Comments? 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Gustavo Niemeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731366#action_12731366
 ] 

Gustavo Niemeyer commented on ZOOKEEPER-368:


I've checked out the patch and have some minor questions/comments to make.  
Please note that I'm a mortal as per the terminology above, so I apologize 
for the lack of deeper understanding. :-)

[1]

+for (FollowerHandler f : observingFollowers) {
+if (!self.viewContains(f.sid))
+f.queuePacket(qp);

I'm missing the reasoning why the !self.viewContains() test is needed here.  
For the untrained eye, it sounds redundant with the fact that there are 
separate lists for observers and followers.  Might be nice to add the reasoning 
as a comment in the code, for the next pair of eyes passing over it.

[2]

+ * @param zxid
+ * @param proposal
+ */
+public void inform(Proposal proposal) {   

There's no zxid parameter.

[3]

I'm slightly concerned about point (5) from Patrick too.  If anyone can 
subscribe as an Observer, it means that the ACLs for reading become very 
ineffective for protecting data in the ensemble (e.g. passwords, etc).  I 
understand that it's tricky to address this in the right way right now since 
there's no concept of authentication between servers, but it'd be awesome if 
such an approach could be considered in the near future for the right way of 
solving this, rather than simply whitelisting by IP.  The whitelist solution 
might be a good short term approach, but it's also a very weak approach for 
securing information.


[4]

-forwardingFollowers.remove(follower);
-}
+forwardingFollowers.remove(follower);
+}
(...)
-Socket s = ss.accept();
+Socket s = ss.accept();  
(...)
-self.setCurrentVote(result.vote);
+self.setCurrentVote(result.vote);  


There are a few changes in the diff which are just adding trailing whitespace 
at the end of some lines.

[5]

The VIEWCHANGE message type isn't being sent anywhere in the patch.  Is it 
preparing for a forthcoming change?

[6]

These changes look very nice overall! (again, for an untrained eye)  Thanks for 
working on this Henry. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731549#action_12731549
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


I believe another option for security today would be to use stunnel with 
certificates. This would provide both authentication of
the observer and also encryption of the connection. This should work with 
no/limited other changes. Of course there is a performance
issue to think about...

We have been considering moving the server over to some NIO framework like 
grizzly. I've heard that
doing so would have the added benefit of supporting ssl/certs with no/limited 
changes on our behalf (ie grizzly supports
this natively). This would be (another) great project for someone to work on. 
;-)


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731660#action_12731660
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

henry/flavio thanks for your patch. I have some higher leveel questions 
regarding obersvers currently connecting to leader

- does an oberserver forward update request to the leader? How is this handled?
- also, you guys mentioned that an observer should be able to connect for 
follower later sometime other than just the leader, do you have plans to do 
that? or have you thought about how you would do that and would that require 
any changes in the design/code changes you are submitting in this jira?
- also, just a cursory look at the patch shows that their is some duplicated 
code between follower and observers? Is that necessary? I havent taken a deeper 
look at the code, so dont have suggestions to do that right now, (one way 
woyuld be just have a single class that takes a parameter of diabling acks and 
other stuff and the new classess that inherit from that follower/observer just 
say if thats enabled or not)



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731662#action_12731662
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

alos the code currently doesnt compile:

{code}
 location: package org.apache.zookeeper.server
[javac] import org.apache.zookeeper.server.ObserverBean;
[javac]   ^
[javac] 
/home/mahadev/workspace/zookeeper-commit-trunk/src/java/main/org/apache/zookeeper/server/quorum/Observer.java:71:
 cannot find symbol
[javac] symbol  : class ObserverBean
[javac] location: class org.apache.zookeeper.server.quorum.Observer
[javac] zk.registerJMX(new ObserverBean(this, zk), 
self.jmxLocalPeerBean);
{code}



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731674#action_12731674
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Hi Mahadev - 

* An observer sends REQUEST packets to the Leader just like a Follower would 
(using the same code) - is this what you meant?
* See my earlier comments to Flavio - I think connecting Observers to Followers 
would be a significant change and therefore should wait for another JIRA. Most 
if not all the code would be on the Follower side, which would make me nervous 
about backwards compatibility.
* There is a single parent class (Peer) in which common code is located. More 
needs to be moved from Observer and Follower into it in the final patch.



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731687#action_12731687
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

thanks henry,

bq. See my earlier comments to Flavio - I think connecting Observers to 
Followers would be a significant change and therefore should wait for another 
JIRA. Most if not all the code would be on the Follower side, which would make 
me nervous about backwards compatibility.

- i just meant, that are you guys keeping that in mind in terms of some work 
that needs to be done right now to make that easier to do later (like: 
generalizing somethings for making it easy for observers to connect to 
followers later sometime)
- also one minor nit on QuorumPeerConfig. Should QuorumPeerConfig  just throw 
an exception and fail if peerType is specfied and it does not match either 
observer or participant? If not specified it goes to default of participant!... 
this is just to prevent any configuration mishaps.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-15 Thread Raghu S (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731711#action_12731711
 ] 

Raghu S commented on ZOOKEEPER-368:
---

Henry, I see a new compile error with your new patch:

[javac] Compiling 2 source files to 
C:\EclipseWorkspace\ZK320\ZooKeeper320Source\build\classes
[javac] 
C:\EclipseWorkspace\ZK320\ZooKeeper320Source\src\java\main\org\apache\zookeeper\server\quorum\Observer.java:165:
 cannot find symbol
[javac] symbol  : class Record
[javac] location: class org.apache.zookeeper.server.quorum.Observer
[javac] Record txn2 = SerializeUtils.deserializeTxn(ia2, hdr2);
[javac] ^

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730762#action_12730762
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Hi Flavio - 

Have you downloaded the lastest patch (78kb, dated 13th July)? For me it 
applies cleanly against a fresh checkout and builds with no errors. The patch 
you have uploaded is only 25kb so I think might be based off an earlier patch...

Henry



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730763#action_12730763
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

{quote}
what is the practical use of Observers? What is the real motivation besides 
Paxos listeners for this? Is there a valid real world scenario that we would 
solve or justify with this? I can see one use case being Read scalability 
without decreasing write throughput. anything else?
{quote}

Mahadev, given that many applications (if not most of them) of ZooKeeper have a 
high ratio of reads to writes, read scalability without affecting write 
performance is quite an important goal. Do you still think we need more 
motivation? 

You might have missed the point that correlating observers and learners was 
just to map to an abstraction that some people know.   

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730772#action_12730772
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Weird - you're applying the patch with patch -p0  ZOOKEEPER-368.patch from 
zookeeper root dir?

Anyhow, thanks for the fix! I'll make sure future patches (and certainly the 
one I propose for commit) are ASF granted.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730775#action_12730775
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

Agreed, it is weird. It works from the command line, but not from Eclipse. 
Another Eclipse mystery.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731046#action_12731046
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

bq. Mahadev, given that many applications (if not most of them) of ZooKeeper 
have a high ratio of reads to writes, read scalability without affecting write 
performance is quite an important goal. Do you still think we need more 
motivation?

my point was just that we should motivate in such a fashion that lesser mortals 
like me :) and folks who are using zookeeper understand the motivation in a 
simpler fashion.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731051#action_12731051
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


I've only been following this a bit, but some questions around the plan wrt 
manageability:

1) adding removing observers, observer itself needs to be configured, any 
changes needed to config on existing ensemble?
2) JMX - what's the plan? what additional properties/actions will be supported?
3) 4letter words - same issues as jmx
4) debug-ability - ensure adequate logging (log4j) on ensemble as well as on 
obs itself

5) security - will an ensemble allow any observer to connect to it? today we 
have ensemble participants hardwired into the config of each of the servers 
right?

testing and b/w compat were mentioned before, but I'm interested to hear/see 
more on the plan for that as well (I'm probably going to look at beefing up 
unit  systest next, esp around b/w compat, so would be good to have a better 
idea where this is headed)


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731057#action_12731057
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


btw, documentation will be needed as well.

Perhaps a wiki proposal page should be created that will capture the current 
proposal for easy review? This JIRA can capture ongoing discussion, with 
agreed upon results captured in the wiki design/functional document.


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731058#action_12731058
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


 my point was just that we should motivate in such a fashion that lesser 
 mortals like me  and folks who are using zookeeper understand the motivation 
 in a simpler fashion.

lesserMortal += 1

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731073#action_12731073
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

to address the motivation a bit consider poorly connected data centers and 
cross datacenter zookeeper. we need to put zookeeper servers in the poorly 
connected data centers because we will want to service all the reads locally in 
those data centers, but we don't want to affect reliability or latency in other 
data centers. for example, imagine we have 5 poorly connected data centers and 
3 well connected data centers. we may put two servers in each data center. that 
means that we have an ensemble of 16 servers, but because of the poorly 
connected data centers, we are more likely to lose quorum than if we made the 5 
poorly connected data centers observers and just used the 3 well connected data 
centers to commit changes. you can view observers as proxies. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731090#action_12731090
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

These are all great points. 

Addressing Patrick's points:

1. No - the ensemble doesn't 'know' about the observers (although the Leader 
maintains FollowerHandlers for them) so doesn't notice when they leave.
2. I need to look further into JMX - I don't know a lot about it. Currently 
Observers do pretty much the same thing as Followers.
3. Probably need to update 'stat'; I'll look at the others.
4. I've been trying to do this but need to do a second pass to say I'm happy 
with it.
5. Yes, for the time being any observer can connect. It would be very easy to 
whitelist say ranges of observer IPs in the config file - in fact we probably 
want to do that for all peers. 

I'll be more formal about testing and b/w compatibility although I'll 
appreciate input on both. I'll try and write a few words for a wiki page - I've 
already started work on the documentation but don't want to go too far before 
we agree on the general approach.



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731102#action_12731102
 ] 

Patrick Hunt commented on ZOOKEEPER-368:


btw, this looks great so far (both the ideas/solutions and open discussion), so 
I don't mean to scare you off. ;-)

at the same time though, my (and Mahadev I'm sure) questions are motivated as 
someone who not only develops this code but is
also responsible for supporting production deployments. Having solid answers to 
these questions will really
help in the long run. 

Thanks for everyone working on this, keep up the good work!


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-13 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730522#action_12730522
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Hi Mahadev - 

To answer your question one by one:

* I think the read-scalability issue is an important use case. Observers are 
also a natural way to deal with peers that try to connect but are not part of 
the current view, which will be important for the dynamic membership patch. I 
also would like to use them to publish proposals; for example they are a 
convenient point to integrate a larger publish-subscribe system. Watches are 
very useful for their intended purposes but are not ideal for listening to a 
stream of proposals since they are cancelled once fired. 
* Set peerType=observer in the configuration file to make a peer an observer. 
In the dynamic membership patch, a follower that tries to connect while not in 
the current view will become an observer until the view is changed to include 
it.
* Yes, there have been no invasive changes in the Follower code (but some 
restructuring to allow code re-use). All tests currently pass for me, which 
while not conclusive proof, suggests that there have been no behavioural 
changes and none are in by design. 
* We must test in the same way we test the behaviour of Followers - their 
ability to connect, to see proposals, to withstand stress. Some important 
observer specific test cases will be:
** Ensuring an observer does not vote in an election or in a proposal (by 
testing when an ensemble is not quorate but would be if the observer were a 
follower)
** Making sure that an observer does not see messages it shouldn't (PROPOSALs 
in particular)
** Ensuring that an observer does not become a leader.




 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-13 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730542#action_12730542
 ] 

Mahadev konar commented on ZOOKEEPER-368:
-

hi henry, thanks for your response. For the case of backwards compatibility, I 
meant compatibility between 3.2 servers and this new code, 3.2 client and this 
new code. Would the new servers be able to work with old 3.2 servers? This is 
really important since quite a few of our users  using zookeeper do rolling 
upgrades, (upgrading one server at a time) so as to keep the service running. 
Also, not all the clients can be upgraded at the same time. So, any new feature 
should take this into account. does that seem reasonable?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-13 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730570#action_12730570
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Seems completely reasonable to me - it's an important property for ZK to 
maintain. My belief is that a new Follower connecting to a 3.2 server will work 
just fine, and vice versa. An observer that tries to connect to a 3.2 server 
will probably fail due to sending the wrong kind of info packet. This obviously 
needs verification, but I've designed the patch to leave current functionality 
pretty much untouched.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728593#action_12728593
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

(Reposting last bit of conversation from ZK-107, more appropriate to this jira)

_sorry to jump in late here. rather than adding the inform, why don't we just 
send the PROPOSE and COMMIT to the Observer as normal, and just make the 
Observer not send ACKs? That way we change as little code as possible with 
minimum overhead. It also makes switching from Observer to Follower as easy as 
turning on the ACKs. I also think Observers should be able to issue proposals. 
One use case for observers are remote data centers that basically proxy clients 
that connect to ZooKeeper. This means an Observer is just a Follower that 
doesn't vote (ACK)._

That's definitely one way to do it. The other side to that argument is to keep 
the message complexity down, especially if we can envisage use cases with lots 
of Observers. A connection to a remote Observer might be more likely to violate 
the FIFO requirement of ZK connections; having a single-message protocol makes 
it easier to deal with this case (not a correctness issue of Observers, just 
annoying if PROPOSALs arrive after COMMITs for some reason). I think that's a 
marginal issue though. My preference is for INFORM messages as this completely 
separates Observer logic from Follower logic and doesn't add much complexity to 
the code.

The Observer also has to take care not to participate in leader elections. I 
think Observers also need to announce themselves as such to the Leader, to 
enable the case where a Follower wishes to connect as an Observer temporarily 
(otherwise the Leader will think the Observer to be a Follower and use it as 
part of a quorum). Also if the leader can distinguish between followers and 
observers then it can treat both differently (e.g. through batching multiple 
INFORMs or allowing observers to lag by prioritising follower traffic).

Keeping Observers as special-case Followers would simplify the code for the 
observers patch (I've got a new version nearly ready to submit, just fixing 
some tests). However, it would mean that Observers are harder to customise - 
for example, there's no persistence requirement for an Observer and so some of 
the RequestProcessors can be optionally removed or replaced by something that 
only asynchronously writes to disk. Keeping them lightweight has been a goal. 
My feeling was that I was introducing too many 'if (amObserver()) {...}' 
branches to an already fairly hard to follow bit of code (in particular 
Follower.followLeader). Breaking the functionality into two separate classes 
seems to have made things cleaner.

Regarding Observers being able to issue proposals; I don't have a problem with 
that, should be reasonably easy to add. 



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having 

[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728596#action_12728596
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

I prefer to have the INFORM message for a couple of reasons:

# Message complexity;
# It make the abstraction clear: Observers are only being informed of the 
outcome of agreement.

I agree that it has the disadvantage of requiring more modifications to the 
current code, which makes it sound a little scary, but I'm willing to see how 
it will end if we go down this path.

I also agree that observers should be able to propose request. I was 
considering it given, so I'm glad that Ben made it explicit. 

I was wondering how difficult it would be to make observers (perhaps even any 
replica) go into read-only mode once they disconnect from the rest of the 
ensemble. Should it be a different jira or it is something we can consider for 
this patch?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728749#action_12728749
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

since henry is doing the work, i'm certainly willing to see how it goes with a 
new message. the two disadvantages i see with this approach (of adding INFORM):
* at a high level the only difference between a follower and an observer is 
that its ACKs do not get counted. i think this high level difference can 
translate into a simple modification of the follower. (since we already have to 
make sure that the follower sees proposals before commits, we shouldn't run 
into the problem you alluded to henry.)
* for the dynamic ensemble patch we want to be able to convert observers to 
followers easily and quickly. flipping a switch at the observer to start 
sending ACKs and at the leader to start acknowledging the ACKs would be an easy 
way to do the conversion.

oh and there already is a jira about going into read-only mode on disconnect.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728751#action_12728751
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

hey, henry two other questions/comments for you:

* i'm trying to understand the use case for a follower that connects as an 
observer. this would adversely affect the reliability of the system since a 
follower acting as an observer would count as a failed follower even though it 
is up. did you have a case in mind?
* i think it is reasonable to turn off the sync for the observer, but we 
probably still want to log to disk so that we can recover quickly. otherwise we 
will keep doing state transfers from the leader every time we connect. right?

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728775#action_12728775
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

{quote}
for the dynamic ensemble patch we want to be able to convert observers to 
followers easily and quickly. flipping a switch at the observer to start 
sending ACKs and at the leader to start acknowledging the ACKs would be an easy 
way to do the conversion.
{quote}

We need a transition from OBSERVING to FOLLOWING on the state of QuorumPeer, 
and  I believe your concern is that we end up doing a state transfer when 
undergoing that transition. Is it right?

{quote}
i'm trying to understand the use case for a follower that connects as an 
observer. this would adversely affect the reliability of the system since a 
follower acting as an observer would count as a failed follower even though it 
is up. did you have a case in mind?
{quote}

Why would a follower connect as an observer? I guess I'm missing the point here.

{quote}
i think it is reasonable to turn off the sync for the observer, but we probably 
still want to log to disk so that we can recover quickly. otherwise we will 
keep doing state transfers from the leader every time we connect. right?
{quote}

I believe you're suggesting this because you're afraid that an observer that 
re-connects too often might end up overwhelming the leader. A third option 
would be to have the observer only keeping the database in memory and having 
the leader limiting the frequency of state transfers to observers. If you're 
concerned about the performance of an observer, then it might be an idea to 
consider. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728778#action_12728778
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

The jira on read-only mode is ZOOKEEPER-40. Thanks, Ben.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728797#action_12728797
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

{i'm trying to understand the use case for a follower that connects as an 
observer. this would adversely affect the reliability of the system since a 
follower acting as an observer would count as a failed follower even though it 
is up. did you have a case in mind?}

Not really. I was working on the assumption that it should be the peer and not 
the leader which decides whether or not it wants to be an observer or a 
follower. Since peers are identified on the leader by their IP address, there's 
no way for the leader to tell the difference between a follower or an observer 
from the same address. 

I was thinking of situations where a follower might not be able to, e.g., write 
to disk because a partition was full and therefore wanted to indicate to the 
leader that it had effectively failed while still receiving updates. If the 
leader sees that a follower has stopped replying to pings (because they are 
queued behind a sync to disk), I think it currently disconnects the follower; 
the follower might want to reconnect as an observer to do 'best effort' 
relaying of updates to clients. 

I admit this is quite contrived! It's ok for this not to work as I describe, 
although I think it will work as a side effect of the dynamic cluster stuff.

{i think it is reasonable to turn off the sync for the observer, but we 
probably still want to log to disk so that we can recover quickly. otherwise we 
will keep doing state transfers from the leader every time we connect. right?}

Yep, absolutely. However as Flavio says the leader can rate limit state 
transfers if this seems to be happening a lot. I would expect that log to disk 
be turned on for most observers, but we can turn it off transparently to the 
leader if we just want to use an observer to, say, publish updates to an rss 
feed or something and don't want to throw disk at the problem. In fact, 
observers could even turn off syncing with the leader for certain use cases.

As soon as I find a little bit of bandwidth I'll get the new patch on here for 
discussion. 



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-01 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726056#action_12726056
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-368:
--

+1 to the idea of having a Peer base class, and specializing it to Observer and 
Follower. 

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-01 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726058#action_12726058
 ] 

Henry Robinson commented on ZOOKEEPER-368:
--

Thanks for the new patch! 

I've done some work on splitting out the code already. I'll hopefully have some 
time to finish it this week. The patch will become quite big, but I think the 
resulting code will be pretty clean.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.