Eclipse HELIOS release
Hi all, We are currently busy getting our Zookeeper based remote services discovery ready for Eclipse Helios. However, the build system has some kind of packing problems with java 1.6 [1]. We have bumped down the provider to 1.5 and will document to run on 1.6. Will Zookeeper 3.3.0 definitely NOT run with java 1.5? Thanks, Wim Jongman [1] https://bugs.eclipse.org/bugs/show_bug.cgi?id=315155
Re: Eclipse HELIOS release
Hi Wim, afaik it works with 1.5 java under linux, however it's been a while since I tried this. The default javac target in build.xml is 1.5. Should be relatively easy for you to verify. Patrick On 06/01/2010 05:51 AM, Wim Jongman wrote: Hi all, We are currently busy getting our Zookeeper based remote services discovery ready for Eclipse Helios. However, the build system has some kind of packing problems with java 1.6 [1]. We have bumped down the provider to 1.5 and will document to run on 1.6. Will Zookeeper 3.3.0 definitely NOT run with java 1.5? Thanks, Wim Jongman [1] https://bugs.eclipse.org/bugs/show_bug.cgi?id=315155
[jira] Updated: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-744: --- Status: Patch Available (was: Open) Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-773) Log visualisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-773: --- Fix Version/s: 3.4.0 Log visualisation - Key: ZOOKEEPER-773 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-773 Project: Zookeeper Issue Type: Improvement Components: contrib Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-773.diff Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view. a) Server view The server view shows the interactions between the different servers in an ensemble. The X axis represents time. * Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception * The colour of the line represents the election state of the server. - orange means LOOKING for leader - dark green means the server is the leader - light green means the server is following a leader - yellow means there isn't enough information to determine the state of the server. * The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered. b) Session view The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser. The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event. 2 - Compiling Running Run ant jar in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code. Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-773) Log visualisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874108#action_12874108 ] Patrick Hunt commented on ZOOKEEPER-773: naming zkgraph is not very descriptive - ppl may think it graphs zk namespace? How about logvisualizer or loggrapher (something indicating that the contrib visualizes the log). Although even this is potentially misleading (txn log vs log4j log). You have a great description at the top of the readme which is good though. (perhaps indicate that this is log4j log and not txn log?) lib dir missing from patch (build fails, I created by hand and it was fine, good to get into patch though) bin scripts are not executable - perhaps update patch, also committer should chmod the scripts to add x if missing add apache license to zkgraph.css LogFormatter.java has license twice For some reason the app itself does not work for me. I click on the links (I tried both chrome and firefox) but nothing happens. add log for example, I click that and nothing at all happens. Here's the console that's running the zkgraph.sh script (this is the entire content of the console): $ bin/zkgraph.sh MergedLogSource(size=0, start=0, end=0) 2010-06-01 09:52:35.695:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog 2010-06-01 09:52:35.733:INFO::jetty-7.0.1.v20091125 2010-06-01 09:52:35.969:INFO::Started selectchannelconnec...@0.0.0.0:8182 log4j:WARN No appenders could be found for logger (org.apache.zookeeper.graph.servlets.NumEvents). log4j:WARN Please initialize the log4j system properly. Log visualisation - Key: ZOOKEEPER-773 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-773 Project: Zookeeper Issue Type: Improvement Components: contrib Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-773.diff Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view. a) Server view The server view shows the interactions between the different servers in an ensemble. The X axis represents time. * Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception * The colour of the line represents the election state of the server. - orange means LOOKING for leader - dark green means the server is the leader - light green means the server is following a leader - yellow means there isn't enough information to determine the state of the server. * The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered. b) Session view The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser. The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event. 2 - Compiling Running Run ant jar in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code. Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-773) Log visualisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874110#action_12874110 ] Patrick Hunt commented on ZOOKEEPER-773: I'm on karmic btw (ubuntu) with the latest 1.6 jvm. Log visualisation - Key: ZOOKEEPER-773 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-773 Project: Zookeeper Issue Type: Improvement Components: contrib Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-773.diff Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view. a) Server view The server view shows the interactions between the different servers in an ensemble. The X axis represents time. * Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception * The colour of the line represents the election state of the server. - orange means LOOKING for leader - dark green means the server is the leader - light green means the server is following a leader - yellow means there isn't enough information to determine the state of the server. * The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered. b) Session view The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser. The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event. 2 - Compiling Running Run ant jar in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code. Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874111#action_12874111 ] Patrick Hunt commented on ZOOKEEPER-744: Andrei, are you are still working on this or should I review for commit? Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-773) Log visualisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874112#action_12874112 ] Hadoop QA commented on ZOOKEEPER-773: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445568/ZOOKEEPER-773.diff against trunk revision 947063. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/111/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/111/console This message is automatically generated. Log visualisation - Key: ZOOKEEPER-773 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-773 Project: Zookeeper Issue Type: Improvement Components: contrib Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-773.diff Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view. a) Server view The server view shows the interactions between the different servers in an ensemble. The X axis represents time. * Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception * The colour of the line represents the election state of the server. - orange means LOOKING for leader - dark green means the server is the leader - light green means the server is following a leader - yellow means there isn't enough information to determine the state of the server. * The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered. b) Session view The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser. The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event. 2 - Compiling Running Run ant jar in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code. Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874119#action_12874119 ] Patrick Hunt commented on ZOOKEEPER-775: My initial feedback: The build fails with proto buff errors. I already have protobuf installed for other reasons, can't mvn pull down the protobuf version it needs vs relying on the system wide libs? Including the bk/zk jar files in server/lib is a really bad practice. Why do we have to do this vs using versions built from the current tree? If we have to do ths we should also indicate that this is not a release version - typically in mvn this means putting something like SNAPSHOT into the name of the jars. The README should have a tiny bit of text describing, at a very high level, what hedwig is. The contents of the scripts directory are not executable by default, we should either fix this in the patch or note it for committer to fix at commit time. ZooKeeperTestBase.java has windows eol characters. Perhaps run dos2unix on all the files and refresh the patch? (looks like all the tests suffer from this?) RAT identified the following files which seem like they should have license headers added: {quote} ./formatter.xml ./pom.xml ./client/pom.xml ./client/src/main/cpp/Makefile ./client/src/main/cpp/log4cpp.conf ./client/src/main/resources/log4j.properties ./doc/build.txt ./doc/dev.txt ./doc/doc.txt ./doc/user.txt ./protocol/Makefile ./protocol/pom.xml ./protocol/src/main/protobuf/PubSubProtocol.proto ./scripts/analyze.py ./scripts/hw.bash ./scripts/quote ./server/pom.xml {quote} A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-775: --- Status: Open (was: Patch Available) Cancelling patch until issues are addressed. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874137#action_12874137 ] Benjamin Reed commented on ZOOKEEPER-775: - i would like to fix the build once we have it in the subversion repository. should i just remove the README? i'm not sure it is worth expanding since it would duplicate text in the docs directory i'll fix the scripts and the dos2unix with respect to the headers, i notice that configs, docs, and Makefiles don't have the license header in the zk repository, which leaves: ./pom.xml ./client/pom.xml ./protocol/pom.xml ./protocol/src/main/protobuf/PubSubProtocol.proto ./scripts/analyze.py ./scripts/hw.bash ./scripts/quote ./server/pom.xml is it okay if i just do those? A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874148#action_12874148 ] Savu Andrei commented on ZOOKEEPER-744: --- You can review the patch for commit. Right know I'm writing monitoring scripts for Nagios, Cacti and Ganglia (in this order). The script for nagios is almost ready. Thanks. -original message- Subject: [jira] Commented: (ZOOKEEPER-744) Add monitoring four-letter word From: Patrick Hunt (JIRA) j...@apache.org Date: 01/06/2010 19:07 [ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874111#action_12874111 ] Patrick Hunt commented on ZOOKEEPER-744: Andrei, are you are still working on this or should I review for commit? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874160#action_12874160 ] Patrick Hunt commented on ZOOKEEPER-775: bq. i would like to fix the build once we have it in the subversion repository. by this I take it you mean removing the libs? Ok, but we should name the lib jars with SNAPSHOT as part of the initial commit to avoid any confusion (that's not a problem, right?) bq. readme It's good to have the readme even if it's a bit short/duplicative. Provides a common hook for new users. bq. licenses rat found all that, what you are suggesting is inline with core, so those exceptions sound fine. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874163#action_12874163 ] Abmar Barros commented on ZOOKEEPER-702: I have updated the wiki page for this (http://wiki.apache.org/hadoop/ZooKeeper/GSoCFailureDetector), including a discussion on whether there should be another thread for failure detection. I have previously discussed it with Flavio in order to raise benefits and drawbacks of including another thread in the application and we are expecting feedback to make this decision. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-744: --- Status: Patch Available (was: Open) Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-744: --- Status: Open (was: Patch Available) Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Priority: Critical ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal : ZooKeeper-trunk #831
See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/831/
[jira] Commented: (ZOOKEEPER-774) Recipes tests are slightly outdated: they do not compile against JUnit 4.8
[ https://issues.apache.org/jira/browse/ZOOKEEPER-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874196#action_12874196 ] Hudson commented on ZOOKEEPER-774: -- Integrated in ZooKeeper-trunk #831 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/831/]) Recipes tests are slightly outdated: they do not compile against JUnit 4.8 -- Key: ZOOKEEPER-774 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-774 Project: Zookeeper Issue Type: Bug Components: recipes Affects Versions: 3.3.0 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-774.patch As title -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874192#action_12874192 ] Hudson commented on ZOOKEEPER-769: -- Integrated in ZooKeeper-trunk #831 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/831/]) Leader can treat observers as quorum members Key: ZOOKEEPER-769 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Environment: Ubuntu Karmic x64 Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: follower.log, leader.log, observer.log, warning.patch, zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874195#action_12874195 ] Hudson commented on ZOOKEEPER-772: -- Integrated in ZooKeeper-trunk #831 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/831/]) zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.4.0 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-783: - Attachment: ZOOKEEPER-783.patch Defensive copying added to getCommittedLog() and synchronization during clear(). No tests added; really not sure how best to test for this. It does fix my test case but it's very difficult to distill that into a test (plus it only fails once in about 100 runs). committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Priority: Critical Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874198#action_12874198 ] Hadoop QA commented on ZOOKEEPER-744: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445822/ZOOKEEPER-744.patch against trunk revision 947063. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/112/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/112/console This message is automatically generated. Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated ZOOKEEPER-775: Attachment: ZOOKEEPER-775_3.patch libs_2.zip updated to address phunts comments. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-773) Log visualisation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-773: --- Status: Open (was: Patch Available) Canceling patch while Ivan looks into chrome issues and other comments I identified in my review. Log visualisation - Key: ZOOKEEPER-773 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-773 Project: Zookeeper Issue Type: Improvement Components: contrib Reporter: Ivan Kelly Assignee: Ivan Kelly Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-773.diff Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view. a) Server view The server view shows the interactions between the different servers in an ensemble. The X axis represents time. * Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception * The colour of the line represents the election state of the server. - orange means LOOKING for leader - dark green means the server is the leader - light green means the server is following a leader - yellow means there isn't enough information to determine the state of the server. * The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered. b) Session view The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser. The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event. 2 - Compiling Running Run ant jar in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code. Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-744) Add monitoring four-letter word
[ https://issues.apache.org/jira/browse/ZOOKEEPER-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-744: --- Status: Open (was: Patch Available) Andrei, looks good, a few comments while reviewing the patch: 1) indicate in the docs that not all keys are available on all platforms (fd count only on unix for example) 2) change node_count to znode_count (reduce confusion btw serving nodes and znodes) 3) your implementation of ephemeral counting: org.apache.zookeeper.server.DataTree.getEphemeralsCount() is inefficient, use entrySet instead (rather than keyset) 4) take a look at how ephemeral counting is done here: org.apache.zookeeper.server.DataTreeBean.countEphemerals() You might use refactor to use this code in both places. 5) watch_count is only counting the number of paths that are watched, not the total number of watches (a path may have multiple watches - ie multiple sessions watching the same path) Looks like this is a bug in the existing implementation (currently only exposed in the bean). You should fix this. Add a test for this while you are at it to verify correct counting. 6) good that you capture the quorum info, is there a way to capture the date/time of the last election? Add monitoring four-letter word --- Key: ZOOKEEPER-744 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-744 Project: Zookeeper Issue Type: New Feature Components: server Affects Versions: 3.4.0 Reporter: Travis Crawford Assignee: Savu Andrei Fix For: 3.4.0 Attachments: zk-ganglia.png, ZOOKEEPER-744.patch, ZOOKEEPER-744.patch Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key space character value line: # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 From the mailing list: Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt ph...@apache.org To: zookeeper-u...@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: It would be a lot easier from the operations perspective if the leader explicitly published some health stats: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. This would greatly simplify monitoring alerting - when an instance falls behind one could configure their monitoring system to let someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.