[jira] Commented: (ZOOKEEPER-564) Give more feedback on that current flow of events in java client logs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771829#action_12771829 ] Patrick Hunt commented on ZOOKEEPER-564: I've worked out the following for session establishment, teardown and expiration handling. I'm not convinced about the numbering (and getting the numbers right (no gaps for example) in all cases might be tough). We'd also include some documentation describing the client session establishment, teardown, and expiration handling which would refer to the messages (a bit of handwaving here cuz nothing yet, but think that it would be some docs describing what's below). These logs make it more clear re documenting the steps - first a socket connection is created, then a session is established. following is logging at info level client side log create session 2009-10-29 21:37:05,393 - INFO - Initiating client connection, connectString=localhost:2181 sessionTimeout=3 watcher=org.apache.zookeeper.zookeepermain$mywatc...@1608e05 2009-10-29 21:37:05,449 - INFO - Opening socket connection to server localhost/127.0.0.1:2181 2009-10-29 21:37:05,493 - INFO - Socket connection established to localhost/127.0.0.1:2181, initiating session Welcome to ZooKeeper! 2009-10-29 21:37:05,547 - INFO - Session establishment complete, sessionid = 0x124a3f255ce client side log close session 2009-10-29 21:37:08,677 - INFO - Session: 0x124a3f255ce closed server watching client session creation 2009-10-29 20:57:19,748 - INFO - Accepted socket connection from /127.0.0.1:49641 2009-10-29 20:57:19,784 - INFO - Client attempting to establish new session at /127.0.0.1:49641 2009-10-29 20:57:19,801 - INFO - Established session 0x124a3cdf52d for client /127.0.0.1:49641 server watching client close session 2009-10-29 20:57:49,772 - INFO - Processed session termination for sessionid: 0x124a3cdf52d 2009-10-29 20:57:49,775 - INFO - Closed socket connection for client /127.0.0.1:49641 which had sessionid 0x124a3cdf52d server expiring client session 2009-10-29 21:00:18,001 - INFO - Expiring session 0x124a3cdf52d0001, timeout of 3ms exceeded 2009-10-29 21:00:18,002 - INFO - Processed session termination for sessionid: 0x124a3cdf52d0001 2009-10-29 21:00:18,004 - INFO - Closed socket connection for client /127.0.0.1:49644 which had sessionid 0x124a3cdf52d0001 server watching client attempt to re-establish expired session 2009-10-29 21:00:28,222 - INFO - Accepted socket connection from /127.0.0.1:51000 2009-10-29 21:00:28,223 - INFO - Client attempting to renew session 0x124a3cdf52d0001 at /127.0.0.1:51000 2009-10-29 21:00:28,225 - INFO - Invalid session 0x124a3cdf52d0001 for client /127.0.0.1:51000, probably expired 2009-10-29 21:00:28,227 - INFO - Closed socket connection for client /127.0.0.1:51000 which had sessionid 0 Give more feedback on that current flow of events in java client logs - Key: ZOOKEEPER-564 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-564 Project: Zookeeper Issue Type: Improvement Affects Versions: 3.2.1 Reporter: Jean-Daniel Cryans As discussed during the 10/23 meeting, one issue we have in debugging ZK client logs with HBase is that we have a hard time following the flow of events. It may be obvious for a ZK dev, but in our POV that kind of trace isn't very intuitive: {code} 2009-09-27 15:41:10,776 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server ... 2009-09-27 15:41:10,776 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/ ... remote=... 2009-09-27 15:41:10,776 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2009-09-27 15:41:10,784 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@2c9b42e6 {code} This excerpt is just an example. We would like to see something like a numbering of the events and possibly, in the case of an exception, at which point did it went wrong and what's the next step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-562) c client can flood server with pings if tcp send queue filled
[ https://issues.apache.org/jira/browse/ZOOKEEPER-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771910#action_12771910 ] Hudson commented on ZOOKEEPER-562: -- Integrated in ZooKeeper-trunk #513 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/513/]) . c client can flood server with pings if tcp send queue filled. (ben reed via mahadev) c client can flood server with pings if tcp send queue filled - Key: ZOOKEEPER-562 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-562 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.2.1 Reporter: Patrick Hunt Assignee: Benjamin Reed Priority: Blocker Fix For: 3.2.2, 3.3.0 Attachments: ZOOKEEPER-562.patch The c client can flood the server with pings if the tcp queue is filled. Say the cluster is overloaded and shuts down the recv processing a c client can send a ping, but since last_send is only updated on successful pushing of data into the socket, if flush_send_queue fails to send any data (send_buffer returns 0) then last_send is not updated and zookeeper_interest will again send a ping the next time it is woken - which could be 0 if recv_to is close to 0, easily could happen if server is not sending data to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Bugfix release 3.2.2
Hi all, We are planning to make a bugfix release 3.2.2 which will include a critical bugfix in the c client code. The jira is ZOOKEEPER-562, http://issues.apache.org/jira/browse/ZOOKEEPER-562. If you would like some fix to be considered for this bugfix release please feel free to post on the zookeeper-dev list. Thanks Mahadev
Re: Bugfix release 3.2.2
Will the release include all JIRAs up to 562, or a cherrypick of bugfixes? It would be great to get zkpython fixes in: http://issues.apache.org/jira/browse/ZOOKEEPER-538 http://issues.apache.org/jira/browse/ZOOKEEPER-562 http://issues.apache.org/jira/browse/ZOOKEEPER-554http://issues.apache.org/jira/browse/ZOOKEEPER-562 are both genuine bug fixes. http://issues.apache.org/jira/browse/ZOOKEEPER-510http://issues.apache.org/jira/browse/ZOOKEEPER-562 http://issues.apache.org/jira/browse/ZOOKEEPER-540 http://issues.apache.org/jira/browse/ZOOKEEPER-562 http://issues.apache.org/jira/browse/ZOOKEEPER-541http://issues.apache.org/jira/browse/ZOOKEEPER-562 are parts of that general patch effort and there are probably enough dependencies for it to make sense to include all 5. cheers, Henry On Fri, Oct 30, 2009 at 10:44 AM, Mahadev Konar maha...@yahoo-inc.comwrote: Hi all, We are planning to make a bugfix release 3.2.2 which will include a critical bugfix in the c client code. The jira is ZOOKEEPER-562, http://issues.apache.org/jira/browse/ZOOKEEPER-562. If you would like some fix to be considered for this bugfix release please feel free to post on the zookeeper-dev list. Thanks Mahadev
[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover
[ https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772020#action_12772020 ] Ted Dunning commented on ZOOKEEPER-22: -- Is there progress on this issue? Automatic request retries on connect failover - Key: ZOOKEEPER-22 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Patrick Hunt Assignee: Mahadev konar Fix For: 3.3.0 Moved from SourceForge to Apache. http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547 When a connection to a ZooKeeper server fails, all of the pending requests will return an error. In reality the requests should be resubmitted when the client reestablishes a connection to ZooKeeper. For read requests, it's no big deal to just reissue the request. For update requests, the ZooKeeper must be able to detect if the request has been processed and, if so, return the result of the previous execution; otherwise, it should process the request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-555) Add stat information to GetChildrenResponse
[ https://issues.apache.org/jira/browse/ZOOKEEPER-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-555: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just commmited this. thanks arni and pat. Add stat information to GetChildrenResponse --- Key: ZOOKEEPER-555 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-555 Project: Zookeeper Issue Type: Improvement Components: c client, contrib-bindings, java client, server Affects Versions: 3.3.0 Reporter: Árni Már Jónsson Assignee: Árni Már Jónsson Priority: Minor Fix For: 3.3.0 Attachments: getchildren_stat.patch, ZOOKEEPER-555.patch, ZOOKEEPER-555.patch, ZOOKEEPER-555.patch GetChildren() is the only non-create/delete API which does not include the node stat information. I propose that the definition of GetChildren() should be: class GetChildrenResponse { vectorustring children; org.apache.zookeeper.data.Stat stat; } There is a trivial fix to the server (FinalRequestProcessor.java): rsp = new GetChildrenResponse(children, stat); And something similar to the client library. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-555) Add stat information to GetChildrenResponse
[ https://issues.apache.org/jira/browse/ZOOKEEPER-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772159#action_12772159 ] Mahadev konar commented on ZOOKEEPER-555: - +1 this looks good... Add stat information to GetChildrenResponse --- Key: ZOOKEEPER-555 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-555 Project: Zookeeper Issue Type: Improvement Components: c client, contrib-bindings, java client, server Affects Versions: 3.3.0 Reporter: Árni Már Jónsson Assignee: Árni Már Jónsson Priority: Minor Fix For: 3.3.0 Attachments: getchildren_stat.patch, ZOOKEEPER-555.patch, ZOOKEEPER-555.patch, ZOOKEEPER-555.patch GetChildren() is the only non-create/delete API which does not include the node stat information. I propose that the definition of GetChildren() should be: class GetChildrenResponse { vectorustring children; org.apache.zookeeper.data.Stat stat; } There is a trivial fix to the server (FinalRequestProcessor.java): rsp = new GetChildrenResponse(children, stat); And something similar to the client library. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover
[ https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772161#action_12772161 ] Mahadev konar commented on ZOOKEEPER-22: ted, due to some laziness from my side, I havent made much progress on this. I expect to make good progress next week and hope to post a patch within a week or two. Automatic request retries on connect failover - Key: ZOOKEEPER-22 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Patrick Hunt Assignee: Mahadev konar Fix For: 3.3.0 Moved from SourceForge to Apache. http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547 When a connection to a ZooKeeper server fails, all of the pending requests will return an error. In reality the requests should be resubmitted when the client reestablishes a connection to ZooKeeper. For read requests, it's no big deal to just reissue the request. For update requests, the ZooKeeper must be able to detect if the request has been processed and, if so, return the result of the previous execution; otherwise, it should process the request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover
[ https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772165#action_12772165 ] Ted Dunning commented on ZOOKEEPER-22: -- I wouldn't call it laziness. At most distraction. But a lot of ZK users will breathe a sigh of relief when this fix gets deployed! Thanks for your efforts on this. Automatic request retries on connect failover - Key: ZOOKEEPER-22 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Patrick Hunt Assignee: Mahadev konar Fix For: 3.3.0 Moved from SourceForge to Apache. http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547 When a connection to a ZooKeeper server fails, all of the pending requests will return an error. In reality the requests should be resubmitted when the client reestablishes a connection to ZooKeeper. For read requests, it's no big deal to just reissue the request. For update requests, the ZooKeeper must be able to detect if the request has been processed and, if so, return the result of the previous execution; otherwise, it should process the request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Bugfix release 3.2.2
+1 Henry Robinson wrote: Will the release include all JIRAs up to 562, or a cherrypick of bugfixes? It would be great to get zkpython fixes in: http://issues.apache.org/jira/browse/ZOOKEEPER-538 http://issues.apache.org/jira/browse/ZOOKEEPER-562 http://issues.apache.org/jira/browse/ZOOKEEPER-554http://issues.apache.org/jira/browse/ZOOKEEPER-562 are both genuine bug fixes. http://issues.apache.org/jira/browse/ZOOKEEPER-510http://issues.apache.org/jira/browse/ZOOKEEPER-562 http://issues.apache.org/jira/browse/ZOOKEEPER-540 http://issues.apache.org/jira/browse/ZOOKEEPER-562 http://issues.apache.org/jira/browse/ZOOKEEPER-541http://issues.apache.org/jira/browse/ZOOKEEPER-562 are parts of that general patch effort and there are probably enough dependencies for it to make sense to include all 5. cheers, Henry On Fri, Oct 30, 2009 at 10:44 AM, Mahadev Konar maha...@yahoo-inc.comwrote: Hi all, We are planning to make a bugfix release 3.2.2 which will include a critical bugfix in the c client code. The jira is ZOOKEEPER-562, http://issues.apache.org/jira/browse/ZOOKEEPER-562. If you would like some fix to be considered for this bugfix release please feel free to post on the zookeeper-dev list. Thanks Mahadev
[jira] Updated: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-368: - Status: Patch Available (was: Open) Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: obs-refactor.patch, observer-refactor.patch, observers sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-368: - Attachment: ZOOKEEPER-368.patch New patch - now that the refactor has gone in, Hudson should be able to give this the once over. Findbugs is 0 for me, patch applies against trunk and tests pass. The only restriction with this patch is that Observers only work with the vanilla LeaderElection protocol. This is because they need a responder thread to run so that they can query votes from the ensemble, and this doesn't happen if electionAlg0. I have a patch nearly done to start the responderThread for every leader election algorithm, but it's not as simple as it might seem: we need a TCP responder thread, a new port to run it on and a possible race condition with LETest sorted out first. I've done most of this, but adding those to this patch would just overcomplicate things. An exception will be thrown if you try to start a cluster w/o electionAlg=0 (and there's a test for this). That aside, I'd be grateful for comments and feedback, as I think this patch is very nearly good to go. Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: obs-refactor.patch, observer-refactor.patch, observers sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772213#action_12772213 ] Hadoop QA commented on ZOOKEEPER-368: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423742/ZOOKEEPER-368.patch against trunk revision 831486. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/42/console This message is automatically generated. Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: obs-refactor.patch, observer-refactor.patch, observers sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.