[jira] Commented: (ZOOKEEPER-564) Give more feedback on that current flow of events in java client logs

2009-10-30 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771829#action_12771829
 ] 

Patrick Hunt commented on ZOOKEEPER-564:


I've worked out the following for session establishment, teardown and 
expiration handling. I'm not convinced about the numbering (and getting the 
numbers right (no gaps for example) in all cases might be tough). 

We'd also include some documentation describing the client session 
establishment, teardown, and expiration handling which would refer to the 
messages (a bit of handwaving here cuz nothing yet, but think that it would be 
some docs describing what's below). These logs make it more clear  re 
documenting the steps - first a socket connection is created, then a session is 
established.

following is logging at info level

client side log create session

2009-10-29 21:37:05,393 - INFO  - Initiating client connection, 
connectString=localhost:2181 sessionTimeout=3 
watcher=org.apache.zookeeper.zookeepermain$mywatc...@1608e05
2009-10-29 21:37:05,449 - INFO  - Opening socket connection to server 
localhost/127.0.0.1:2181
2009-10-29 21:37:05,493 - INFO  - Socket connection established to 
localhost/127.0.0.1:2181, initiating session
Welcome to ZooKeeper!
2009-10-29 21:37:05,547 - INFO  - Session establishment complete, sessionid = 
0x124a3f255ce

client side log close session

2009-10-29 21:37:08,677 - INFO  - Session: 0x124a3f255ce closed


server watching client session creation

2009-10-29 20:57:19,748 - INFO  - Accepted socket connection from 
/127.0.0.1:49641
2009-10-29 20:57:19,784 - INFO  - Client attempting to establish new session at 
/127.0.0.1:49641
2009-10-29 20:57:19,801 - INFO  - Established session 0x124a3cdf52d for 
client /127.0.0.1:49641

server watching client close session

2009-10-29 20:57:49,772 - INFO  - Processed session termination for sessionid: 
0x124a3cdf52d
2009-10-29 20:57:49,775 - INFO  - Closed socket connection for client 
/127.0.0.1:49641 which had sessionid 0x124a3cdf52d

server expiring client session

2009-10-29 21:00:18,001 - INFO  - Expiring session 0x124a3cdf52d0001, timeout 
of 3ms exceeded
2009-10-29 21:00:18,002 - INFO  - Processed session termination for sessionid: 
0x124a3cdf52d0001
2009-10-29 21:00:18,004 - INFO  - Closed socket connection for client 
/127.0.0.1:49644 which had sessionid 0x124a3cdf52d0001

server watching client attempt to re-establish expired session

2009-10-29 21:00:28,222 - INFO  - Accepted socket connection from 
/127.0.0.1:51000
2009-10-29 21:00:28,223 - INFO  - Client attempting to renew session 
0x124a3cdf52d0001 at /127.0.0.1:51000
2009-10-29 21:00:28,225 - INFO  - Invalid session 0x124a3cdf52d0001 for client 
/127.0.0.1:51000, probably expired
2009-10-29 21:00:28,227 - INFO  - Closed socket connection for client 
/127.0.0.1:51000 which had sessionid 0



 Give more feedback on that current flow of events in java client logs
 -

 Key: ZOOKEEPER-564
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-564
 Project: Zookeeper
  Issue Type: Improvement
Affects Versions: 3.2.1
Reporter: Jean-Daniel Cryans

 As discussed during the 10/23 meeting, one issue we have in debugging ZK 
 client logs with HBase is that we have a hard time following the flow of 
 events. It may be obvious for a ZK dev, but in our POV that kind of trace 
 isn't very intuitive:
 {code}
 2009-09-27 15:41:10,776 INFO org.apache.zookeeper.ClientCnxn: Attempting 
 connection to server ...
 2009-09-27 15:41:10,776 INFO org.apache.zookeeper.ClientCnxn: Priming 
 connection to java.nio.channels.SocketChannel[connected local=/ ... remote=...
 2009-09-27 15:41:10,776 INFO org.apache.zookeeper.ClientCnxn: Server 
 connection successful 
 2009-09-27 15:41:10,784 WARN org.apache.zookeeper.ClientCnxn: Exception 
 closing session 0x0 to sun.nio.ch.selectionkeyi...@2c9b42e6
 {code}
 This excerpt is just an example. We would like to see something like a 
 numbering of the events and possibly, in the case of an exception, at which 
 point did it went wrong and what's the next step.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-562) c client can flood server with pings if tcp send queue filled

2009-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771910#action_12771910
 ] 

Hudson commented on ZOOKEEPER-562:
--

Integrated in ZooKeeper-trunk #513 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/513/])
. c client can flood server with pings if tcp send queue filled. (ben reed 
via mahadev)


 c client can flood server with pings if tcp send queue filled
 -

 Key: ZOOKEEPER-562
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-562
 Project: Zookeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.2.1
Reporter: Patrick Hunt
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.2, 3.3.0

 Attachments: ZOOKEEPER-562.patch


 The c client can flood the server with pings if the tcp queue is filled.
 Say the cluster is overloaded and shuts down the recv processing
 a c client can send a ping, but since last_send is only updated on successful 
 pushing of data into the 
 socket, if flush_send_queue fails to send any data (send_buffer returns 0) 
 then last_send is not updated
 and zookeeper_interest will again send a ping the next time it is woken - 
 which could be 0 if recv_to is close
 to 0, easily could happen if server is not sending data to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Bugfix release 3.2.2

2009-10-30 Thread Mahadev Konar
Hi all,
  We are planning to make a bugfix release 3.2.2 which will include a
critical bugfix in the c client code. The jira is ZOOKEEPER-562,
http://issues.apache.org/jira/browse/ZOOKEEPER-562.

 If you would like some fix to be considered for this bugfix release please
feel free to post on the zookeeper-dev list.


Thanks
Mahadev



Re: Bugfix release 3.2.2

2009-10-30 Thread Henry Robinson
Will the release include all JIRAs up to 562, or a cherrypick of bugfixes?

It would be great to get zkpython fixes in:

http://issues.apache.org/jira/browse/ZOOKEEPER-538
 http://issues.apache.org/jira/browse/ZOOKEEPER-562
http://issues.apache.org/jira/browse/ZOOKEEPER-554http://issues.apache.org/jira/browse/ZOOKEEPER-562

are both genuine bug fixes.

http://issues.apache.org/jira/browse/ZOOKEEPER-510http://issues.apache.org/jira/browse/ZOOKEEPER-562
http://issues.apache.org/jira/browse/ZOOKEEPER-540
 http://issues.apache.org/jira/browse/ZOOKEEPER-562
http://issues.apache.org/jira/browse/ZOOKEEPER-541http://issues.apache.org/jira/browse/ZOOKEEPER-562

are parts of that general patch effort and there are probably enough
dependencies for it to make sense to include all 5.

cheers,
Henry

On Fri, Oct 30, 2009 at 10:44 AM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi all,
  We are planning to make a bugfix release 3.2.2 which will include a
 critical bugfix in the c client code. The jira is ZOOKEEPER-562,
 http://issues.apache.org/jira/browse/ZOOKEEPER-562.

  If you would like some fix to be considered for this bugfix release please
 feel free to post on the zookeeper-dev list.


 Thanks
 Mahadev




[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover

2009-10-30 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772020#action_12772020
 ] 

Ted Dunning commented on ZOOKEEPER-22:
--


Is there progress on this issue?

 Automatic request retries on connect failover
 -

 Key: ZOOKEEPER-22
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Mahadev konar
 Fix For: 3.3.0


 Moved from SourceForge to Apache.
 http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547
 When a connection to a ZooKeeper server fails, all of the pending requests
 will return an error. In reality the requests should be resubmitted when
 the client reestablishes a connection to ZooKeeper.
 For read requests, it's no big deal to just reissue the request. For update
 requests, the ZooKeeper must be able to detect if the request has been
 processed and, if so, return the result of the previous execution;
 otherwise, it should process the request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-555) Add stat information to GetChildrenResponse

2009-10-30 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-555:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just commmited this. thanks arni and pat.

 Add stat information to GetChildrenResponse
 ---

 Key: ZOOKEEPER-555
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-555
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, contrib-bindings, java client, server
Affects Versions: 3.3.0
Reporter: Árni Már Jónsson
Assignee: Árni Már Jónsson
Priority: Minor
 Fix For: 3.3.0

 Attachments: getchildren_stat.patch, ZOOKEEPER-555.patch, 
 ZOOKEEPER-555.patch, ZOOKEEPER-555.patch


 GetChildren() is the only non-create/delete API which does not include the 
 node stat information. I propose that the definition of GetChildren() should 
 be:
 class GetChildrenResponse {
 vectorustring children;
 org.apache.zookeeper.data.Stat stat;
 }
 There is a trivial fix to the server (FinalRequestProcessor.java): rsp = new 
 GetChildrenResponse(children, stat);
 And something similar to the  client library.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-555) Add stat information to GetChildrenResponse

2009-10-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772159#action_12772159
 ] 

Mahadev konar commented on ZOOKEEPER-555:
-

+1 this looks good... 

 Add stat information to GetChildrenResponse
 ---

 Key: ZOOKEEPER-555
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-555
 Project: Zookeeper
  Issue Type: Improvement
  Components: c client, contrib-bindings, java client, server
Affects Versions: 3.3.0
Reporter: Árni Már Jónsson
Assignee: Árni Már Jónsson
Priority: Minor
 Fix For: 3.3.0

 Attachments: getchildren_stat.patch, ZOOKEEPER-555.patch, 
 ZOOKEEPER-555.patch, ZOOKEEPER-555.patch


 GetChildren() is the only non-create/delete API which does not include the 
 node stat information. I propose that the definition of GetChildren() should 
 be:
 class GetChildrenResponse {
 vectorustring children;
 org.apache.zookeeper.data.Stat stat;
 }
 There is a trivial fix to the server (FinalRequestProcessor.java): rsp = new 
 GetChildrenResponse(children, stat);
 And something similar to the  client library.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover

2009-10-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772161#action_12772161
 ] 

Mahadev konar commented on ZOOKEEPER-22:


ted, due to some laziness from my side, I havent made much progress on this. I 
expect to make good progress next week and hope to post a patch within a week 
or two.

 Automatic request retries on connect failover
 -

 Key: ZOOKEEPER-22
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Mahadev konar
 Fix For: 3.3.0


 Moved from SourceForge to Apache.
 http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547
 When a connection to a ZooKeeper server fails, all of the pending requests
 will return an error. In reality the requests should be resubmitted when
 the client reestablishes a connection to ZooKeeper.
 For read requests, it's no big deal to just reissue the request. For update
 requests, the ZooKeeper must be able to detect if the request has been
 processed and, if so, return the result of the previous execution;
 otherwise, it should process the request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover

2009-10-30 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772165#action_12772165
 ] 

Ted Dunning commented on ZOOKEEPER-22:
--


I wouldn't call it laziness.  At most distraction.

But a lot of ZK users will breathe a sigh of relief when this fix gets deployed!

Thanks for your efforts on this.

 Automatic request retries on connect failover
 -

 Key: ZOOKEEPER-22
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22
 Project: Zookeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Patrick Hunt
Assignee: Mahadev konar
 Fix For: 3.3.0


 Moved from SourceForge to Apache.
 http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547
 When a connection to a ZooKeeper server fails, all of the pending requests
 will return an error. In reality the requests should be resubmitted when
 the client reestablishes a connection to ZooKeeper.
 For read requests, it's no big deal to just reissue the request. For update
 requests, the ZooKeeper must be able to detect if the request has been
 processed and, if so, return the result of the previous execution;
 otherwise, it should process the request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Bugfix release 3.2.2

2009-10-30 Thread Patrick Hunt

+1

Henry Robinson wrote:

Will the release include all JIRAs up to 562, or a cherrypick of bugfixes?

It would be great to get zkpython fixes in:

http://issues.apache.org/jira/browse/ZOOKEEPER-538
 http://issues.apache.org/jira/browse/ZOOKEEPER-562
http://issues.apache.org/jira/browse/ZOOKEEPER-554http://issues.apache.org/jira/browse/ZOOKEEPER-562

are both genuine bug fixes.

http://issues.apache.org/jira/browse/ZOOKEEPER-510http://issues.apache.org/jira/browse/ZOOKEEPER-562
http://issues.apache.org/jira/browse/ZOOKEEPER-540
 http://issues.apache.org/jira/browse/ZOOKEEPER-562
http://issues.apache.org/jira/browse/ZOOKEEPER-541http://issues.apache.org/jira/browse/ZOOKEEPER-562

are parts of that general patch effort and there are probably enough
dependencies for it to make sense to include all 5.

cheers,
Henry

On Fri, Oct 30, 2009 at 10:44 AM, Mahadev Konar maha...@yahoo-inc.comwrote:


Hi all,
 We are planning to make a bugfix release 3.2.2 which will include a
critical bugfix in the c client code. The jira is ZOOKEEPER-562,
http://issues.apache.org/jira/browse/ZOOKEEPER-562.

 If you would like some fix to be considered for this bugfix release please
feel free to post on the zookeeper-dev list.


Thanks
Mahadev






[jira] Updated: (ZOOKEEPER-368) Observers

2009-10-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-368:
-

Status: Patch Available  (was: Open)

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-368) Observers

2009-10-30 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-368:
-

Attachment: ZOOKEEPER-368.patch

New patch - now that the refactor has gone in, Hudson should be able to give 
this the once over.

Findbugs is 0 for me, patch applies against trunk and tests pass. 

The only restriction with this patch is that Observers only work with the 
vanilla LeaderElection protocol. This is because they need a responder thread 
to run so that they can query votes from the ensemble, and this doesn't happen 
if electionAlg0. I have a patch nearly done to start the responderThread for 
every leader election algorithm, but it's not as simple as it might seem: we 
need a TCP responder thread, a new port to run it on and a possible race 
condition with LETest sorted out first. I've done most of this, but adding 
those to this patch would just overcomplicate things. An exception will be 
thrown if you try to start a cluster w/o electionAlg=0 (and there's a test for 
this). 

That aside, I'd be grateful for comments and feedback, as I think this patch is 
very nearly good to go. 


 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772213#action_12772213
 ] 

Hadoop QA commented on ZOOKEEPER-368:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423742/ZOOKEEPER-368.patch
  against trunk revision 831486.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/console

This message is automatically generated.

 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, observers 
 sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.