[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Savu updated ZOOKEEPER-799: -- Attachment: (was: monitoring.tar.gz) Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Savu updated ZOOKEEPER-799: -- Attachment: (was: monitoring.tar.gz) Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Savu updated ZOOKEEPER-799: -- Attachment: (was: ZOOKEEPER-799.patch) Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Savu updated ZOOKEEPER-799: -- Attachment: (was: ZOOKEEPER-799.patch) Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Savu updated ZOOKEEPER-799: -- Attachment: monitoring.tar.gz Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Savu updated ZOOKEEPER-799: -- Status: Patch Available (was: Open) Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887847#action_12887847 ] Hadoop QA commented on ZOOKEEPER-799: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449351/monitoring.tar.gz against trunk revision 962697. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/141/console This message is automatically generated. Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887852#action_12887852 ] Patrick Hunt commented on ZOOKEEPER-799: I see, both files are necessary to build. I'll take a look at this asap (don't worry about hudson). Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887854#action_12887854 ] Mahadev konar commented on ZOOKEEPER-795: - ben, can you take a look at this? eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-783: Status: Patch Available (was: Open) i think we can do without a test on this one. marking it PA committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887870#action_12887870 ] Andrei Savu commented on ZOOKEEPER-799: --- Not really. The archive only contains some extra files (screenshots). I don't understand why Hudson keeps trying to apply it as patch even of it's not marked for inclusion. -original message- Subject: [jira] Commented: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib From: Patrick Hunt (JIRA) j...@apache.org Date: 13/07/2010 20:13 [ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887852#action_12887852 ] Patrick Hunt commented on ZOOKEEPER-799: I see, both files are necessary to build. I'll take a look at this asap (don't worry about hudson). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-800) zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-800: --- Assignee: Mahadev konar zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE --- Key: ZOOKEEPER-800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-800 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.3.1 Reporter: Michi Mutsuzaki Assignee: Mahadev konar Priority: Minor Fix For: 3.3.2, 3.4.0 This happened when I called zoo_add_auth() immediately after zookeeper_init(). It took me a while to figure out that authentication actually failed since zoo_add_auth() returned ZOK. It should return ZINVALIDSTATE instead. --Michi -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887891#action_12887891 ] Hadoop QA commented on ZOOKEEPER-783: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446054/ZOOKEEPER-783.patch against trunk revision 962697. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/142/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/142/console This message is automatically generated. committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-765) Add python example script
[ https://issues.apache.org/jira/browse/ZOOKEEPER-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887896#action_12887896 ] Andrei Savu commented on ZOOKEEPER-765: --- I think Henry's queue should also be part of this patch: http://github.com/henryr/pyzk-recipes What do you think? Add python example script - Key: ZOOKEEPER-765 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-765 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings, documentation Reporter: Travis Crawford Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: zk.py, ZOOKEEPER-765.patch When adding some zookeeper-based functionality to a python script I had to figure everything out without guidance, which while doable, would have been a lot easier with an example. I extracted a skeleton program structure out with hopes its useful to others (maybe add as an example in the source or wiki?). This script does an aget() and sets a watch, and hopefully illustrates what's going on, and where to plug in your application code that gets run when the znode changes. There are probably some bugs, which if we fix now and provide a well-reviewed example hopefully others will not run into the same mistakes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-799) Add tools and recipes for monitoring as a contrib
[ https://issues.apache.org/jira/browse/ZOOKEEPER-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-799: --- Fix Version/s: 3.4.0 Add tools and recipes for monitoring as a contrib - Key: ZOOKEEPER-799 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-799 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Andrei Savu Assignee: Andrei Savu Fix For: 3.4.0 Attachments: monitoring.tar.gz, ZOOKEEPER-799.patch Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-780: -- Assignee: Andrei Savu zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887941#action_12887941 ] Vishal K commented on ZOOKEEPER-790: Folks, Sorry to the delay. The patch did not work. Any other ideas? Thanks. -Vishal Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887969#action_12887969 ] Flavio Paiva Junqueira commented on ZOOKEEPER-790: -- Vishal, What exactly didn't work? Do you get the same error messages with the patch? Do you have a reliable way of reproducing it? In general, it would be useful if you could provide more detail. Thanks! Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
Hi Flavio , I got the same error messages. I can reproduce this quite easily. I will capture the logs again. Is there anything else you would like me to provide? Thanks. -Vishal On Tue, Jul 13, 2010 at 3:52 PM, Flavio Paiva Junqueira (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887969#action_12887969] Flavio Paiva Junqueira commented on ZOOKEEPER-790: -- Vishal, What exactly didn't work? Do you get the same error messages with the patch? Do you have a reliable way of reproducing it? In general, it would be useful if you could provide more detail. Thanks! Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
I forgot if you have provided already a description of how you reproduce it. If you could point me out to that, I would appreciate.-FlavioOn Jul 13, 2010, at 11:33 PM, Vishal K wrote:Hi Flavio ,I got the same error messages. I can reproduce this quite easily. I willcapture the logs again. Is there anything else you would like me to provide?Thanks.-VishalOn Tue, Jul 13, 2010 at 3:52 PM, Flavio Paiva Junqueira (JIRA) j...@apache.org wrote: [https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887969#action_12887969]Flavio Paiva Junqueira commented on ZOOKEEPER-790:--Vishal, What exactly didn't work? Do you get the same error messages withthe patch? Do you have a reliable way of reproducing it? In general, itwould be useful if you could provide more detail.Thanks!Last processed zxid set prematurely while establishing leadership- Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790.patchThe leader code is setting the last processed zxid to the first of thenew epoch even before connecting to a quorum of followers. Because theleader code sets this value before connecting to a quorum of followers(Leader.java:281) and the follower code throws an IOException(Follower.java:73) if the leader epoch is smaller, we have that when thefalse leader drops leadership and becomes a follower, it finds a smallerepoch and kills itself.--This message is automatically generated by JIRA.-You can reply to this email to add a comment to the issue online. flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888119#action_12888119 ] Vishal K commented on ZOOKEEPER-790: copying comments from email to jira. Hi Flavio , I got the same error messages. I can reproduce this quite easily. I will capture the logs again. Is there anything else you would like me to provide? Thanks. -Vishal Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888121#action_12888121 ] Vishal K commented on ZOOKEEPER-790: copying comments from email to jira. I forgot if you have provided already a description of how you reproduce it. If you could point me out to that, I would appreciate. -Flavio Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888122#action_12888122 ] Vishal K commented on ZOOKEEPER-790: From ZOOKEEPER-335.. Hi, I enabled tracing and did some more debugging. Looks like the restarted peer (and trying to join the cluster) determines that it is a leader and increments its epoch. However, rest of the nodes don't acknowledge this node as the leader, and hence, have an older epoch. I will attache the log. Unfortunately, I don't have traces from other nodes. I will repeat the experiment later and attache logs from other nodes. Scenario: * Form a 3 node cluster. This is not just ZK cluster. It also involves our application cluster that uses ZK. * Kill one of the follower * After a minute or so restart follower * Follower rejects leader with Leader epoch y is less than our epoch y + 1 From logs: a) Peer X restarts and starts leader election. a) For a small window of time, X thinks that it is the new leader! During this window, for some reason, rest of the nodes tell X that they are also trying to find a leader. I.e., all 3 nodes are in LOOKING state. After seeing that all 3 nodes are in LOOKING state, X decides to be a leader? 155 2010-06-20 23:22:46,421 - DEBUG [WorkerSender Thread:quorumcnxmana...@346] - Opening channel to server 1 156 2010-06-20 23:22:46,423 - DEBUG [WorkerReceiver Thread:fastleaderelection$messenger$workerrecei...@214] - Receive new notification message. My id = 0 157 2010-06-20 23:22:46,424 - INFO [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@689] - Notification: 0, 77309411393, 1, 0, LOOKING, LOOKING, 0 158 2010-06-20 23:22:46,424 - DEBUG [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@495] - id: 0, proposed id: 0, zxid: 77309411393, proposed zxid: 77309411393 159 2010-06-20 23:22:46,424 - DEBUG [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@717] - Adding vote: From = 0, Proposed leader = 0, Porposed zxid = 77309411393, Proposed epoch = 1 160 2010-06-20 23:22:46,426 - INFO [WorkerSender Thread:quorumcnxmana...@162] - Have smaller server identifier, so dropping the connection: (1, 0) 161 2010-06-20 23:22:46,426 - DEBUG [WorkerSender Thread:quorumcnxmana...@346] - Opening channel to server 2 162 2010-06-20 23:22:46,427 - DEBUG [Thread-1:quorumcnxmanager$liste...@445] - Connection request /192.168.1.182:46701 163 2010-06-20 23:22:46,427 - DEBUG [Thread-1:quorumcnxmanager$liste...@448] - Connection request: 0 164 2010-06-20 23:22:46,428 - DEBUG [Thread-1:quorumcnxmanager$sendwor...@504] - Address of remote peer: 1 165 2010-06-20 23:22:46,428 - INFO [WorkerSender Thread:quorumcnxmana...@162] - Have smaller server identifier, so dropping the connection: (2, 0) 166 2010-06-20 23:22:46,431 - DEBUG [WorkerReceiver Thread:fastleaderelection$messenger$workerrecei...@214] - Receive new notification message. My id = 0 167 2010-06-20 23:22:46,432 - INFO [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@689] - Notification: 1, 77309411372, 1, 0, LOOKING, LOOKING, 1 168 2010-06-20 23:22:46,432 - DEBUG [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@495] - id: 1, proposed id: 0, zxid: 77309411372, proposed zxid: 77309411393 169 2010-06-20 23:22:46,432 - DEBUG [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@717] - Adding vote: From = 1, Proposed leader = 1, Porposed zxid = 77309411372, Proposed epoch = 1 170 2010-06-20 23:22:46,436 - DEBUG [Thread-1:quorumcnxmanager$liste...@445] - Connection request /192.168.1.183:44310 171 2010-06-20 23:22:46,436 - DEBUG [Thread-1:quorumcnxmanager$liste...@448] - Connection request: 0 172 2010-06-20 23:22:46,436 - DEBUG [Thread-1:quorumcnxmanager$sendwor...@504] - Address of remote peer: 2 173 2010-06-20 23:22:46,440 - DEBUG [WorkerReceiver Thread:fastleaderelection$messenger$workerrecei...@214] - Receive new notification message. My id = 0 174 2010-06-20 23:22:46,440 - INFO [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@689] - Notification: 2, 7301097, 1, 0, LOOKING, LOOKING, 2 175 2010-06-20 23:22:46,440 - DEBUG [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@495] - id: 2, proposed id: 0, zxid: 7301097, proposed zxid: 77309411393 176 2010-06-20 23:22:46,441 - DEBUG [QuorumPeer:/0.0.0.0:2181:fastleaderelect...@717] - Adding vote: From = 2, Proposed leader = 2, Porposed zxid = 7301097, Proposed epoch = 1 177 2010-06-20 23:22:46,441 - INFO [QuorumPeer:/0.0.0.0:2181:quorump...@647] - LEADING b) As a result X increments its epoch. Worse, since this node decided to be a leader, it starts doing transactions. The first set of transactions start removing all ephemeral nodes. But these transactions are only done locally. Other peers do not ack these transactions since they know that this peer is not the leader. c) After a few seconds (8 secs), X relinquishes leadership since it does not receive any ack from rest of the peers d) It starts leader election
[jira] Updated: (ZOOKEEPER-733) use netty to handle client connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-733: --- Attachment: ZOOKEEPER-733.patch this latest patch cleans up the logging a bit more it also adds Nio client - Netty server based unit tests - these are a subset of the base tests but using the netty server cnxn factory. use netty to handle client connections -- Key: ZOOKEEPER-733 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-733 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Benjamin Reed Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: accessive.jar, flowctl.zip, moved.zip, QuorumTestFailed_sessionmoved_TRACE_LOG.txt.gz, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch, ZOOKEEPER-733.patch we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.