[jira] Created: (ZOOKEEPER-876) Unnecessary snapshot transfers between new leader and followers
Unnecessary snapshot transfers between new leader and followers --- Key: ZOOKEEPER-876 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-876 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Diogo Priority: Minor When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs. 1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerFollower.java:310) 2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerFollower.java:269) 3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-876) Unnecessary snapshot transfers between new leader and followers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912979#action_12912979 ] Diogo commented on ZOOKEEPER-876: - If follower is one zxid behind, the check of the interval of committed logs excludes the follower in LearnerFollower.java:269. For example, if follower has snapshot-1000 and no logs, and leader has snapshot-1000, log-1001, and so on. The check in that line would force the follower receive again a snapshot. I think the condition should consider minCommittedLog - 1 as lower bound in the test. Unnecessary snapshot transfers between new leader and followers --- Key: ZOOKEEPER-876 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-876 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Diogo Priority: Minor When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs. 1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerFollower.java:310) 2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerFollower.java:269) 3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid
[ https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912980#action_12912980 ] Diogo commented on ZOOKEEPER-869: - Thanks for the comment. I opened a new issue based on this: ZOOKEEPER-876. Support for election of leader with arbitrary zxid -- Key: ZOOKEEPER-869 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869 Project: Zookeeper Issue Type: New Feature Reporter: Diogo Priority: Minor Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid. To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener
[ https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diogo updated ZOOKEEPER-874: Attachment: ZOOKEEPER-874.patch The same as before but now with correct indentation and filename. FileTxnSnapLog.restore does not call listener - Key: ZOOKEEPER-874 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.3.1 Reporter: Diogo Priority: Trivial Fix For: 3.4.0 Attachments: commitlog-listener.patch, ZOOKEEPER-874.patch FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener
[ https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diogo updated ZOOKEEPER-874: Attachment: (was: commitlog-listener.patch) FileTxnSnapLog.restore does not call listener - Key: ZOOKEEPER-874 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.3.1 Reporter: Diogo Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-874.patch FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-876) Unnecessary snapshot transfers between new leader and followers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diogo updated ZOOKEEPER-876: Status: Patch Available (was: Open) Depends on the patch ZOOKEEPER-874. (Applied in any order) Unnecessary snapshot transfers between new leader and followers --- Key: ZOOKEEPER-876 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-876 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Diogo Priority: Minor Attachments: ZOOKEEPER-876.patch When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs. 1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerFollower.java:310) 2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerFollower.java:269) 3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-876) Unnecessary snapshot transfers between new leader and followers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diogo updated ZOOKEEPER-876: Attachment: ZOOKEEPER-876.patch Some fixes and a unit test. Unnecessary snapshot transfers between new leader and followers --- Key: ZOOKEEPER-876 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-876 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Diogo Priority: Minor Attachments: ZOOKEEPER-876.patch When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs. 1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerFollower.java:310) 2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerFollower.java:269) 3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
REMINDER: development on ClientCnxn is blocked for 1.5 months
Hi, I don't know what is blocking progress right now and whether somebody is working on it, so I just cry for help: Since 08/11 I'm waiting to submit some improvements to the java client code (ZOOKEEPER-835), which has been blocked by ZOOKEEPER-823. Since 02/09 there's a patch for ZOOKEEPER-823 waiting to be tested by hudson, updated for newer trunk on 15/09. Today is 21/09. Could anybody please give a status update? I'll have my last exam at university this saturday, so I'd have some time to contribute afterwards, if you could help me. happy hacking, Thomas Koch, http://www.koch.ro
[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-823: --- Status: Open (was: Patch Available) update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-823: --- Status: Patch Available (was: Open) update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-823: --- Status: Open (was: Patch Available) update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913089#action_12913089 ] Patrick Hunt commented on ZOOKEEPER-823: I just ran this against latest trunk and it fails with: Testcase: testDisconnectedAddAuth took 2.167 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for / org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1259) at org.apache.zookeeper.test.ACLTest.testDisconnectedAddAuth(ACLTest.java:67) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:51) (log attached) update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-823: --- Attachment: TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz log of test failure, all other tests passed though (incl nionetty) so it seems this is related to the netty client. update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913095#action_12913095 ] Patrick Hunt commented on ZOOKEEPER-823: Saw an additional test failure on another machine, it failed with: [junit] Running org.apache.zookeeper.test.NettyNettySuiteTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.test.NettyNettySuiteTest FAILED (timeout) That was the entire content of the log output for the test However the prior test failed with a ton of warnings, including this which seems suspicious, it needs to be tracked down why this could happen: [junit] 2010-09-21 09:42:25,890 [myid:] - INFO [New I/O server worker #32-3:zookeeperser...@801] - Client attempting to establish new session at /127.0.0.1:48780 [junit] 2010-09-21 09:42:25,891 [myid:] - INFO [SyncThread:0:zookeeperser...@577] - Established session 0x12b352cbcb80001 with negotiated timeout 3 for client /127.0.0.1:48780 [junit] 2010-09-21 09:42:25,900 [myid:] - INFO [New I/O client worker #41-1:clientcnxn$sendthr...@904] - Session establishment complete on server localhost/127.0.0.1:11250, sessionid = 0x12b352cbcb80001, negotiated timeout = 3 [junit] 2010-09-21 09:42:26,205 [myid:] - WARN [New I/O client worker #41-1:clientcnxnsocketnetty$zkclienthand...@289] - Exception caught [id: 0x66bb1ead, /127.0.0.1:48780 = localhost/127.0.0.1:11250] EXCEPTION: java.io.IOException: Nothing in the queue, but got 294 [junit] java.io.IOException: Nothing in the queue, but got 294 [junit] at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:968) [junit] at org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientHandler.messageReceived(ClientCnxnSocketNetty.java:270) [junit] at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) [junit] at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) [junit] at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) [junit] at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) [junit] at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) [junit] at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit] at java.lang.Thread.run(Thread.java:619) update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: REMINDER: development on ClientCnxn is blocked for 1.5 months
On Tue, Sep 21, 2010 at 8:18 AM, Thomas Koch tho...@koch.ro wrote: Since 08/11 I'm waiting to submit some improvements to the java client code (ZOOKEEPER-835), which has been blocked by ZOOKEEPER-823. Since 02/09 there's a patch for ZOOKEEPER-823 waiting to be tested by hudson, updated for newer trunk on 15/09. Today is 21/09. Could anybody please give a status update? Thomas, thanks for highlighting the issue. Some recent changes by the builds team broke hudson patch builds for hadoop. Unfortunately there is no way for us to trigger this manually. Giri is working with apache infra to resolve the issue, however it's complicated and might take some time. Current ETA is a few days (3-5 sounds like). Giri will update us when it's addressed. We've had a few bugs come in recently, and a fix release in progress -- this has been diverting resources from reviews. Also note that there has been activity on 823 - it's just that each time someone (usually me) looked at the jira an issue was found. Most recently the patch was failing to apply against the latest trunk. Prior to that the tests were failing, etc... This is why I suggested to you initially that the patch be kept simple(ish), and to do further refactorings later. This is a complicated patch to complicated code, it's going to take some time to settle. Note: keeping patches simple and discrete helps to get them reviewed quickly and cleanly. (granted not always possible) At the same time though we should be getting back to these patches more quickly. I've personally spent a ton of time on this issue, initially creating and most recently reviewing some refactorings, I'd really like to see this go in myself! I've asked Ben/Flavio/Mahadev to take a look and help drive this to completion. Thomas, thanks for the help. Hopefully we'll get this one sorted out asap and move on to other issues. Regards, Patrick I'll have my last exam at university this saturday, so I'd have some time to contribute afterwards, if you could help me. happy hacking, Thomas Koch, http://www.koch.ro
Snapshot on startup
Hey all, I was looking at the code that loads the snapshots/logged transactions into the database on startup, and noticed that the FileTxnSnapLog initializes the log iterator to the last transaction committed to the snapshot (restore()). This causes the last transaction to be processed, even though its already in the snapshot. I'm not sure if this is a big problem in reality, or if it was intentional. Does anyone know anything about this? Also, it seems like loadDataBase is called twice in ZooKeeperServer.loadData(), is that intentional for some reason? Thanks, Jared
[jira] Commented: (ZOOKEEPER-864) Hedwig C++ client improvements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913102#action_12913102 ] Hadoop QA commented on ZOOKEEPER-864: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454033/ZOOKEEPER-864.diff against trunk revision 998200. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 34 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 27 release audit warnings (more than the trunk's current 26 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/117/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/117/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/117/console This message is automatically generated. Hedwig C++ client improvements -- Key: ZOOKEEPER-864 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864 Project: Zookeeper Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.4.0 Attachments: warnings.txt, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff, ZOOKEEPER-864.diff I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking. Tests are now automated, just run make check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913104#action_12913104 ] Hadoop QA commented on ZOOKEEPER-756: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12452455/zooInspectorChanges.patch against trunk revision 998200. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/118/console This message is automatically generated. some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abmar Barros updated ZOOKEEPER-702: --- Attachment: ZOOKEEPER-702.patch In this patch: * Replaced 'heartbeat' for 'ping' in the forrest documentation * Revised javadocs * Expanded the hammer (client and quorum) and recovery tests to exercise all failure detectors implementations. Flavio, I have run the tests you mentioned and they seem to fail in the trunk also. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913307#action_12913307 ] Vishal K commented on ZOOKEEPER-822: Hi Flavio, +1. Looks good. I remember looking at the socket.connect() method, but I don't remember why I ruled it out in the favor of thread. Minor point - missing space before error in LOG.warn(Connection broken: for id + sid + my id = + self.getId() + error..). Thank you. -Vishal Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.