[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889426#action_12889426
 ] 

Hadoop QA commented on ZOOKEEPER-821:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449737/ZOOKEEPER-821.patch
  against trunk revision 963957.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/148/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/148/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/148/console

This message is automatically generated.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Status: Patch Available  (was: Open)

Added a revised patch that includes a corresponding test case.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Attachment: (was: ZOOKEEPER-821.patch)

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Attachment: ZOOKEEPER-821.patch

Updated with a corresponding test case.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-821:
---

Status: Open  (was: Patch Available)

I'd suggest adding a check for the version in 
src/contrib/zkpython/src/test/connection_test.py

you probably just want to check there's a properly formatted version string, 
don't need to check the particular version imo.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-821:
--

Assignee: Rich Schumacher

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889383#action_12889383
 ] 

Rich Schumacher commented on ZOOKEEPER-821:
---

I'm not sure that this warrants a corresponding test case, but let me know if 
it needs one.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889372#action_12889372
 ] 

Hadoop QA commented on ZOOKEEPER-821:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449716/ZOOKEEPER-821.patch
  against trunk revision 963957.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/147/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/147/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/147/console

This message is automatically generated.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Status: Patch Available  (was: Open)

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Schumacher updated ZOOKEEPER-821:
--

Attachment: ZOOKEEPER-821.patch

Adds the ZooKeeper server version as the '__version__' attribute in the Python 
module.

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-16 Thread Rich Schumacher (JIRA)
Add ZooKeeper version information to zkpython
-

 Key: ZOOKEEPER-821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
 Project: Zookeeper
  Issue Type: Improvement
  Components: contrib-bindings
Affects Versions: 3.3.1
Reporter: Rich Schumacher
Priority: Trivial
 Fix For: 3.4.0


Since installing and using ZooKeeper I've built and installed no less than four 
versions of the zkpython bindings.  It would be really helpful if the module 
had a '__version__' attribute to easily tell which version is currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-16 Thread Abmar Barros (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889280#action_12889280
 ] 

Abmar Barros commented on ZOOKEEPER-702:


Hi Diogo! Thank you for indicating this paper, I haven't found such failure 
detection type so far, it is very interesting. 

It proposes a simple way of estimating heartbeats arrival times based on 
application messages. However it does require the attachment of sending times 
to all application messages (or at least the ones it will use to do the 
estimation), which is an overhead to message size. Anyway, with the separate 
failure detector module, it would be easy to implement a new Failure Detector 
that uses such data.

So far, I have adapted the proposed failure detectors in order to compute the 
estimated next arrival time only when a heartbeat is received.

> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: Zookeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
> Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
> chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-16 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889264#action_12889264
 ] 

Patrick Hunt commented on ZOOKEEPER-816:


Typically log4j logs are expected to be consumed by users - i.e. debugging, 
here you're really talking about output that will be machine processed. Is this 
really logging in the log4j typical sense, perhaps a separate mechanism should 
be used? Another problem with log4j logging is that changes to the message 
format, say to make it more "readable", can cause problems for the down stream 
processor. Not to mention that a separate mechanism could be made much more 
efficient than the general log4j log mechanism. Perhaps ZK's own WAL could be 
used for this - reuse the WAL code or write the information directly to the ZK 
transaction log (might not be such a good idea, but should be considered as an 
option).

Additionally you should consider something like Aspects for this project. This 
is a cross-cutting feature, something that aspects are well suited for. A 
benefit of this approach is that it would allow those interested in the feature 
to enable it while those uninterested would incur zero performance penalty.

> Detecting and diagnosing elusive bugs and faults in Zookeeper
> -
>
> Key: ZOOKEEPER-816
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
> Project: Zookeeper
>  Issue Type: New Feature
>Reporter: Miguel Correia
>Priority: Minor
>
> Complex distributed systems like Zookeeper tend to fail in strange ways that 
> are hard to diagnose. The objective is to build a tool that helps understand 
> when and where these problems occurred based on Zookeeper's traces (i.e., 
> logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-16 Thread Patrick Hunt

Hi Ivan, good stuff, please add this as a comment on the jira.

Patrick

On 07/16/2010 08:24 AM, Ivan Kelly wrote:

Zookeeper's traces (i.e., logs in TRACE level) provide some
information that can be helpful to understand what happened. For
instance, they contain information about the clients that are
connected, the operations issued, etc. However, in real deployments
with many clients (say, hundreds), traces are typically turned off to
avoid the high overhead that they cause. Furthermore, the data in the
traces is probably not enough for our purposes because it does not
include, e.g., the replies to operations or the data values.

As far as I've seen, this overhead comes in two forms, CPU and disk. CPU
overhead is mostly due to formatting. Disk obviously because tracing
will fill your disk fairly quickly. Perhaps something could be done to
combat both of these. To fix the formatting problem we could use a
binary log format. I've seen this done in C++ but not in java. The basic
idea is that if you have TRACE("operation %x happened to %s %p", obj1,
obj2, obj3); a preprocessor replaces this with TRACE(0x1234, obj1, obj2,
obj3) where 0x1234 is an identifier for the trace. Then when the trace
occurs a binary blob [0x1234, value of obj1, value of obj2, value of
obj3] is logged. Then when the logs are pulled of the machine you run a
post processor to do all the formatting and you get your full trace.

Regarding the disk overhead, traces are usually only interesting in the
run up to a failure. We could have a ring buffer in memory that is
constantly traced to, old traces being overwritten when the ring buffer
reaches it's limit. These traces should only be dumped to the filesystem
when an error or fatal level event occurs, thereby giving you a trace of
what was happening before you fell over.



-Ivan


[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-16 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889225#action_12889225
 ] 

Vishal K commented on ZOOKEEPER-790:


great! I will give it a try. Thanks.

> Last processed zxid set prematurely while establishing leadership
> -
>
> Key: ZOOKEEPER-790
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.3.1
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
>Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
> ZOOKEEPER-790.travis.log.bz2
>
>
> The leader code is setting the last processed zxid to the first of the new 
> epoch even before connecting to a quorum of followers. Because the leader 
> code sets this value before connecting to a quorum of followers 
> (Leader.java:281) and the follower code throws an IOException 
> (Follower.java:73) if the leader epoch is smaller, we have that when the 
> false leader drops leadership and becomes a follower, it finds a smaller 
> epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-16 Thread Flavio Paiva Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Paiva Junqueira updated ZOOKEEPER-790:
-

Attachment: ZOOKEEPER-790.patch

Hi! I'm attaching a patch that works for my setup (and passes the unit tests as 
well), so I would appreciate if you could give it a try.

> Last processed zxid set prematurely while establishing leadership
> -
>
> Key: ZOOKEEPER-790
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.3.1
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
>Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, 
> ZOOKEEPER-790.travis.log.bz2
>
>
> The leader code is setting the last processed zxid to the first of the new 
> epoch even before connecting to a quorum of followers. Because the leader 
> code sets this value before connecting to a quorum of followers 
> (Leader.java:281) and the follower code throws an IOException 
> (Follower.java:73) if the leader epoch is smaller, we have that when the 
> false leader drops leadership and becomes a follower, it finds a smaller 
> epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-820) update c unit tests to ensure "zombie" java server processes don't cause failure

2010-07-16 Thread Patrick Hunt (JIRA)
update c unit tests to ensure "zombie" java server processes don't cause failure


 Key: ZOOKEEPER-820
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.3.2, 3.4.0


When the c unit tests are run sometimes the server doesn't shutdown at the end 
of the test, this causes subsequent tests (hudson esp) to fail.

1) we should try harder to make the server shut down at the end of the test, I 
suspect this is related to test failing/cleanup
2) before the tests are run we should see if the old server is still running 
and try to shut it down


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-820) update c unit tests to ensure "zombie" java server processes don't cause failure

2010-07-16 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889216#action_12889216
 ] 

Patrick Hunt commented on ZOOKEEPER-820:


Mahadev suggested this for item 2 from the description:

>   I think we could some more standard tools like netstat to getthe process
> using that port and try killing it. This would be less error prone nd more
> reliable than other tools.


> update c unit tests to ensure "zombie" java server processes don't cause 
> failure
> 
>
> Key: ZOOKEEPER-820
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: Patrick Hunt
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
>
> When the c unit tests are run sometimes the server doesn't shutdown at the 
> end of the test, this causes subsequent tests (hudson esp) to fail.
> 1) we should try harder to make the server shut down at the end of the test, 
> I suspect this is related to test failing/cleanup
> 2) before the tests are run we should see if the old server is still running 
> and try to shut it down

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-16 Thread Ivan Kelly
Zookeeper's traces (i.e., logs in TRACE level) provide some  
information that can be helpful to understand what happened. For  
instance, they contain information about the clients that are  
connected, the operations issued, etc. However, in real deployments  
with many clients (say, hundreds), traces are typically turned off  
to avoid the high overhead that they cause. Furthermore, the data in  
the traces is probably not enough for our purposes because it does  
not include, e.g., the replies to operations or the data values.
As far as I've seen, this overhead comes in two forms, CPU and disk.  
CPU overhead is mostly due to formatting. Disk obviously because  
tracing will fill your disk fairly quickly. Perhaps something could be  
done to combat both of these. To fix the formatting problem we could  
use a binary log format. I've seen this done in C++ but not in java.  
The basic idea is that if you have TRACE("operation %x happened to %s  
%p", obj1, obj2, obj3); a preprocessor replaces this with  
TRACE(0x1234, obj1, obj2, obj3) where 0x1234 is an identifier for the  
trace. Then when the trace occurs a binary blob [0x1234, value of  
obj1, value of obj2, value of obj3] is logged. Then when the logs are  
pulled of the machine you run a post processor to do all the  
formatting and you get your full trace.


Regarding the disk overhead, traces are usually only interesting in  
the run up to a failure. We could have a ring buffer in memory that is  
constantly traced to, old traces being overwritten when the ring  
buffer reaches it's limit. These traces should only be dumped to the  
filesystem when an error or fatal level event occurs, thereby giving  
you a trace of what was happening before you fell over.




-Ivan


[jira] Created: (ZOOKEEPER-818) improve the traces with additional information needed

2010-07-16 Thread Miguel Correia (JIRA)
improve the traces with additional information needed 
--

 Key: ZOOKEEPER-818
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-818
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Miguel Correia
Priority: Minor


The current traces do not include all the information we need to do the 
checking. The main additions would be to log the replies and hashes of values 
read/written.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-817) improve the efficiency of tracing

2010-07-16 Thread Miguel Correia (JIRA)
improve the efficiency of tracing
-

 Key: ZOOKEEPER-817
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-817
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Miguel Correia
Priority: Minor


Zookeeper uses two kinds of logs, logs for information and debugging (the ones 
considered in this project) and transaction logs (need for Zab/Paxos to be 
fault tolerant); the latter are very efficient so the idea would be to make the 
first likewise using similar mechanisms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-16 Thread Miguel Correia (JIRA)
Detecting and diagnosing elusive bugs and faults in Zookeeper
-

 Key: ZOOKEEPER-816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
 Project: Zookeeper
  Issue Type: New Feature
Reporter: Miguel Correia
Priority: Minor


Complex distributed systems like Zookeeper tend to fail in strange ways that 
are hard to diagnose. The objective is to build a tool that helps understand 
when and where these problems occurred based on Zookeeper's traces (i.e., logs 
in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-819) build the checking tool

2010-07-16 Thread Miguel Correia (JIRA)
build the checking tool
---

 Key: ZOOKEEPER-819
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-819
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Miguel Correia
Priority: Minor


Building the checking tool is the hardest part of the project. It involves 
putting the traces together in a unified trace and checking if this unified 
trace shows that Zookeeper is satisfying a set of properties (e.g., a getData 
returns what was stored by the previous setData or create).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-816) Detecting and diagnosing elusive bugs and faults in Zookeeper

2010-07-16 Thread Miguel Correia (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889202#action_12889202
 ] 

Miguel Correia commented on ZOOKEEPER-816:
--

Let me give a longer explanation of the project. Practical experience with 
Zookeeper has shown that sometimes there are failures whose causes are hard to 
understand. Some of these failures may be caused by elusive bugs in the code; 
others may be due to failures rarer than crashes, say corruptions of data 
somewhere in a server.

Zookeeper's traces (i.e., logs in TRACE level) provide some information that 
can be helpful to understand what happened. For instance, they contain 
information about the clients that are connected, the operations issued, etc. 
However, in real deployments with many clients (say, hundreds), traces are 
typically turned off to avoid the high overhead that they cause. Furthermore, 
the data in the traces is probably not enough for our purposes because it does 
not include, e.g., the replies to operations or the data values. 

The project involves 3 subtasks:

1- improve the efficiency of logging

2- improve the traces with additional information needed

3- build the checking tool


> Detecting and diagnosing elusive bugs and faults in Zookeeper
> -
>
> Key: ZOOKEEPER-816
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-816
> Project: Zookeeper
>  Issue Type: New Feature
>Reporter: Miguel Correia
>Priority: Minor
>
> Complex distributed systems like Zookeeper tend to fail in strange ways that 
> are hard to diagnose. The objective is to build a tool that helps understand 
> when and where these problems occurred based on Zookeeper's traces (i.e., 
> logs in TRACE level). Minor changes to the server code will be needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-16 Thread Diogo (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889188#action_12889188
 ] 

Diogo commented on ZOOKEEPER-702:
-

Hi!
There is paper by Satzger et al. where it is described an easy method to use 
non-periodic application messages as periodic heartbeats. The paper is called 
"A Lazy Monitoring Approach for Heartbeat-Style Failure Detectors".

I hope that is still relevant.

Cheers, Diogo

> GSoC 2010: Failure Detector Model
> -
>
> Key: ZOOKEEPER-702
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
> Project: Zookeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Abmar Barros
> Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
> chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
> ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
> ZOOKEEPER-702.patch
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership

2010-07-16 Thread Flavio Paiva Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889072#action_12889072
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-790:
--

Some additional information. We create a new transaction log when we try to 
append a transaction and there is no log file open. Consequently, we only 
create it when there is something to write. For the bad run above, here is the 
content of the log:

{noformat}
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
7/15/10 5:27:45 PM CEST session 0x129d6b61b5b cxid 0x0 zxid 0x20001 
closeSession
EOF reached after 1 txns.
{noformat}

> Last processed zxid set prematurely while establishing leadership
> -
>
> Key: ZOOKEEPER-790
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.3.1
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
>Priority: Blocker
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2
>
>
> The leader code is setting the last processed zxid to the first of the new 
> epoch even before connecting to a quorum of followers. Because the leader 
> code sets this value before connecting to a quorum of followers 
> (Leader.java:281) and the follower code throws an IOException 
> (Follower.java:73) if the leader epoch is smaller, we have that when the 
> false leader drops leadership and becomes a follower, it finds a smaller 
> epoch and kills itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.