[jira] Updated: (ZOOKEEPER-576) docs need to be updated for session moved exception and how to handle it

2009-11-19 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-576:


Status: Patch Available  (was: Open)

> docs need to be updated for session moved exception and how to handle it
> 
>
> Key: ZOOKEEPER-576
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-576
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Benjamin Reed
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-576.patch
>
>
> the handling and implications of session moved exception should be documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-582) ZooKeeper can revert to old data when a snapshot is created outside of normal processing

2009-11-19 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-582:


Status: Open  (was: Patch Available)

looks good mahadev just two things:

1) (minor) in getLastLoggedZxid() you should be useing maxLogZxid instead of 
calling getLastLoggedZxid() again.

2) when doing the sanity check with the leaders zxid you should be checking 
epochs not zxids. it is possible for a follower to see something later and have 
to truncate from the same epoch, put a follower should never see a later epoch.

> ZooKeeper can revert to old data when a snapshot is created outside of normal 
> processing
> 
>
> Key: ZOOKEEPER-582
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-582
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.1, 3.1.1
>Reporter: Benjamin Reed
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.2, 3.3.0, 3.1.2
>
> Attachments: test.patch, ZOOKEEPER-582.patch, ZOOKEEPER-582.patch, 
> ZOOKEEPER-582.patch, ZOOKEEPER-582.patch, ZOOKEEPER-582_3.1.patch, 
> ZOOKEEPER-582_3.2.patch
>
>
> when zookeeper starts up it will restore the most recent state (latest zxid) 
> it finds in the data directory. unfortunately, in the quorum version of 
> zookeeper updates are logged using an epoch based on the latest log file in a 
> directory. if there is a snapshot with a higher epoch than the log files, the 
> zookeeper server will start logging using an epoch one higher than the 
> highest log file.
> so if a data directory has a snapshot with an epoch of 27 and there are no 
> log files, zookeeper will start logging changes using epoch 1. if the cluster 
> restarts the state will be restored from the snapshot with the epoch of 27, 
> which in effect, restores old data.
> normal operation of zookeeper will never result in this situation.
> this does not effect standalone zookeeper.
> a fix should make sure to use an epoch one higher than the current state, 
> whether it comes from the snapshot or log, and should include a sanity check 
> to make sure that a follower never connects to a leader that has a lower 
> epoch than its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-425) Add OSGi metadata to zookeeper.jar

2009-11-19 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780219#action_12780219
 ] 

Benjamin Reed commented on ZOOKEEPER-425:
-

david, nice work. i think we should break this patch into possibly three parts:

1) manifest update to just export the right packages
2) an activator that registers a ZooKeeper object as a service. (this would be 
a client object)
3) an activator that starts up a server.

you seem to be ignoring 2). it might be useful to have the bundles of an osgi 
framework share a ZooKeeper object. maybe not though. sharing makes it nice to 
get a preconfigured connected ZooKeeper object from the service registry.

we can't really support 3) properly right now. it is possible to shutdown a 
server and start it back up. the interfaces aren't very well exposed but the 
tests do it. a bigger problem is that we have System.exits in our code, and 
even if the bundle doesn't have permission to call System.exit, not exiting can 
cause bad things to happen.

i would suggest focusing this issue on 1) and possibly 2), but leave 3) for a 
separate issue.

> Add OSGi metadata to zookeeper.jar
> --
>
> Key: ZOOKEEPER-425
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-425
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.1
>Reporter: David Bosschaert
> Attachments: MANIFEST.MF, zk_patch3.patch
>
>
> After adding OSGi metadata to zookeeper.jar it can be used as both an OSGi 
> bundle as well as an ordinary jar file. 
> In the CXF/DOSGi project the buildsystem does this using the 
> maven-bundle-plugin: 
> http://svn.apache.org/repos/asf/cxf/dosgi/trunk/discovery/distributed/zookeeper-wrapper/pom.xml
> The MANIFEST.MF generated by maven-bundle-plugin is attached to this bug, 
> this works for the CXF/DOSGi project.
> If your buildsystem isn't using maven, I would advise to use bnd 
> (http://www.aqute.biz/Code/Bnd). BND defines its own ant task in which you 
> should be able to use more or less the same instructions as were used in 
> maven:
> 
>   ZooKeeper bundle
>   This bundle contains the ZooKeeper 
> library
>   org.apache.hadoop.zookeeper
>   3.1.1
>   *
>   *;version=3.1.1
> 
> Oh and one other thing. Is it really necessary to put the source code in the 
> Jar file too? I would put that in a separate source distribution :)
> See also: 
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200905.mbox/%3c4a2009b1.3030...@yahoo-inc.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-3) syncLimit has slightly different comments in the class header, and > inline with the variable.

2009-11-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-3:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed revision 881849.


>  syncLimit has slightly different comments in the class header, and > inline 
> with the variable.
> ---
>
> Key: ZOOKEEPER-3
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Benjamin Reed
>Assignee: Mahadev konar
>Priority: Trivial
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-3.patch
>
>
> syncLimit as documented in QuorumPeer is documented twice with two different 
> aspects of in each instance. It should be better documented and unified. 
> (Probably remove the second instance.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-519) Followerhandler should close the socket if it gets an exception on a write.

2009-11-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-519:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed revision 881847.

> Followerhandler should close the socket if it gets an exception on a write.
> ---
>
> Key: ZOOKEEPER-519
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-519
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-519.patch
>
>
> We noticed this in our tests -
> {code}
> java.net.SocketException: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
> at 
> org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:122)
> at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:126)
> at 
> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:126)
> at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:878)
> at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:890)
> at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:890)
> at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:890)
> at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:890)
> at 
> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:890)
> at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:940)
> at 
> org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:102)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.serializeSnapshot(ZooKeeperServer.java:269)
> at 
> org.apache.zookeeper.server.quorum.FollowerHandler.run(FollowerHandler.java:263)
> {code}
> So the followerhandler got an exception while writing to the socket but the 
> follower was still waiting on the socket for a read and got a read timeout 
> after 60 seconds or so. To just make sure we handle this rightly, we should 
> close the socket at the followerhandler when we get an excpetion, so that the 
> follower immediately recognizes that its disconnected from the leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-532) java compiler should be target Java 1.5

2009-11-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-532:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 881841.
thanx hiram!

> java compiler should be target Java 1.5
> ---
>
> Key: ZOOKEEPER-532
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-532
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Hiram Chirino
>Assignee: Hiram Chirino
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-532.patch, ZOOKEEPER-532.patch
>
>
> The jars released in 3.2.1 will not run on Java 1.5.  With a small build 
> change, it is possible to generate jars that will run on Java 1.5.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-532) java compiler should be target Java 1.5

2009-11-18 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-532:


Hadoop Flags: [Reviewed]

+1

> java compiler should be target Java 1.5
> ---
>
> Key: ZOOKEEPER-532
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-532
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Hiram Chirino
>Assignee: Hiram Chirino
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-532.patch, ZOOKEEPER-532.patch
>
>
> The jars released in 3.2.1 will not run on Java 1.5.  With a small build 
> change, it is possible to generate jars that will run on Java 1.5.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-582) ZooKeeper can revert to old data when a snapshot is created outside of normal processing

2009-11-17 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-582:


Attachment: test.patch

this patch reproduces the problems outlined in this issue.

> ZooKeeper can revert to old data when a snapshot is created outside of normal 
> processing
> 
>
> Key: ZOOKEEPER-582
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-582
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.1.1, 3.2.1
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.2, 3.1.2
>
> Attachments: test.patch
>
>
> when zookeeper starts up it will restore the most recent state (latest zxid) 
> it finds in the data directory. unfortunately, in the quorum version of 
> zookeeper updates are logged using an epoch based on the latest log file in a 
> directory. if there is a snapshot with a higher epoch than the log files, the 
> zookeeper server will start logging using an epoch one higher than the 
> highest log file.
> so if a data directory has a snapshot with an epoch of 27 and there are no 
> log files, zookeeper will start logging changes using epoch 1. if the cluster 
> restarts the state will be restored from the snapshot with the epoch of 27, 
> which in effect, restores old data.
> normal operation of zookeeper will never result in this situation.
> this does not effect standalone zookeeper.
> a fix should make sure to use an epoch one higher than the current state, 
> whether it comes from the snapshot or log, and should include a sanity check 
> to make sure that a follower never connects to a leader that has a lower 
> epoch than its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-547) Sanity check in QuorumCnxn Manager and quorum communication port.

2009-11-17 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779236#action_12779236
 ] 

Benjamin Reed commented on ZOOKEEPER-547:
-

Committed revision 881641. (for branch 3.2.2) i had to pull in 
src/java/test/org/apache/zookeeper/PortAssignment.java
from trunk.

> Sanity check in QuorumCnxn Manager and quorum communication port.
> -
>
> Key: ZOOKEEPER-547
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-547
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-547.patch, ZOOKEEPER-547.patch, 
> ZOOKEEPER-547.patch, ZOOKEEPER-547.patch
>
>
> We need to put some sanity checks in QuorumCnxnManager and the other quorum 
> port for rogue clients. Sometimes a clients might get misconfigured and they 
> might send random characters on such ports. We need to make sure that such 
> rogue clients do not bring down the clients and need to put in some sanity 
> checks with respect to packet lengths and deserialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-547) Sanity check in QuorumCnxn Manager and quorum communication port.

2009-11-17 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-547:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 881623.

> Sanity check in QuorumCnxn Manager and quorum communication port.
> -
>
> Key: ZOOKEEPER-547
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-547
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-547.patch, ZOOKEEPER-547.patch, 
> ZOOKEEPER-547.patch, ZOOKEEPER-547.patch
>
>
> We need to put some sanity checks in QuorumCnxnManager and the other quorum 
> port for rogue clients. Sometimes a clients might get misconfigured and they 
> might send random characters on such ports. We need to make sure that such 
> rogue clients do not bring down the clients and need to put in some sanity 
> checks with respect to packet lengths and deserialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-582) ZooKeeper can revert to old data when a snapshot is created outside of normal processing

2009-11-17 Thread Benjamin Reed (JIRA)
ZooKeeper can revert to old data when a snapshot is created outside of normal 
processing


 Key: ZOOKEEPER-582
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-582
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.1
Reporter: Benjamin Reed
 Fix For: 3.2.2


when zookeeper starts up it will restore the most recent state (latest zxid) it 
finds in the data directory. unfortunately, in the quorum version of zookeeper 
updates are logged using an epoch based on the latest log file in a directory. 
if there is a snapshot with a higher epoch than the log files, the zookeeper 
server will start logging using an epoch one higher than the highest log file.

so if a data directory has a snapshot with an epoch of 27 and there are no log 
files, zookeeper will start logging changes using epoch 1. if the cluster 
restarts the state will be restored from the snapshot with the epoch of 27, 
which in effect, restores old data.

normal operation of zookeeper will never result in this situation.

this does not effect standalone zookeeper.

a fix should make sure to use an epoch one higher than the current state, 
whether it comes from the snapshot or log, and should include a sanity check to 
make sure that a follower never connects to a leader that has a lower epoch 
than its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-544) improve client testability - allow test client to access connected server location

2009-11-17 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778980#action_12778980
 ] 

Benjamin Reed commented on ZOOKEEPER-544:
-

why do you have testableLocalSocketAddress in ZooKeeper? The cnxn object is 
protected. You don't need that method is ZooKeeper do you?

> improve client testability - allow test client to access connected server 
> location
> --
>
> Key: ZOOKEEPER-544
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-544
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client, java client, tests
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-544.patch
>
>
> This came up recently on the user list. If you are developing tests for your 
> zk client you need to be able to access the server that your
> session is currently connected to. The reason is that your test needs to know 
> which server in the quorum to shutdown in order to
> verify you are handling failover correctly. Similar for session expiration 
> testing.
> however we should be careful, we prefer not to expose this to all clients, 
> this is an implementation detail that we typically
> want to hide. 
> also we should provide this in both the c and java clients
> I suspect we should add a protected method on ZooKeeper. This will make a 
> higher bar (user will have to subclass) for 
> the user to access this method. In tests it's fine, typically you want a 
> "TestableZooKeeper" class anyway. In c we unfortunately
> have less options, we can just rely on docs for now. 
> In both cases (c/java) we need to be very very clear in the docs that this is 
> for testing only and to clearly define semantics.
> We should add the following at the same time:
> toString() method to ZooKeeper which includes server ip/port, client port, 
> any other information deemed useful (connection stats like send/recv?)
> the java ZooKeeper is missing "deterministic connection order" that the c 
> client has. this is also useful for testing. again, protected and 
> clear docs that this is for testing purposes only!
> Any other things we should expose?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-547) Sanity check in QuorumCnxn Manager and quorum communication port.

2009-11-17 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-547:


Hadoop Flags: [Reviewed]

+1 looks good.

> Sanity check in QuorumCnxn Manager and quorum communication port.
> -
>
> Key: ZOOKEEPER-547
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-547
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-547.patch, ZOOKEEPER-547.patch, 
> ZOOKEEPER-547.patch, ZOOKEEPER-547.patch
>
>
> We need to put some sanity checks in QuorumCnxnManager and the other quorum 
> port for rogue clients. Sometimes a clients might get misconfigured and they 
> might send random characters on such ports. We need to make sure that such 
> rogue clients do not bring down the clients and need to put in some sanity 
> checks with respect to packet lengths and deserialization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-581) peerType in configuration file is redundant

2009-11-17 Thread Benjamin Reed (JIRA)
peerType in configuration file is redundant
---

 Key: ZOOKEEPER-581
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-581
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Benjamin Reed
Priority: Minor


to configure a machine to be an observer you must add a peerType=observer to 
the configuration file and an observer tag to the server list. this is 
redundant. if the observer tag is on the entry of a machine it should know it 
is an observer without needing the peerType tag.

on the other hand, do we really need the observers in the server list? they 
don't vote.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-567) javadoc for getchildren2 needs to mention "new in 3.3.0"

2009-11-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-567:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 835519.

> javadoc for getchildren2 needs to mention "new in 3.3.0"
> 
>
> Key: ZOOKEEPER-567
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-567
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, java client
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-567.patch
>
>
> the javadoc/cdoc for getchildren2 needs to mention that the methods are "new 
> in 3.3.0"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-567) javadoc for getchildren2 needs to mention "new in 3.3.0"

2009-11-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-567:


Hadoop Flags: [Reviewed]

> javadoc for getchildren2 needs to mention "new in 3.3.0"
> 
>
> Key: ZOOKEEPER-567
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-567
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, java client
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-567.patch
>
>
> the javadoc/cdoc for getchildren2 needs to mention that the methods are "new 
> in 3.3.0"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-566) "reqs" four letter word (command port) returns no information

2009-11-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-566:


  Resolution: Fixed
Release Note: Committed revision 835515.
  Status: Resolved  (was: Patch Available)

> "reqs" four letter word (command port) returns no information
> -
>
> Key: ZOOKEEPER-566
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-566
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-566.patch
>
>
> the four letter word "reqs" doesn't do anything - it always returns empty 
> data. Seems that "outstanding" field is always empty and never set.
> we should remove outstanding and also update the reqs code to correctly 
> output the outstanding requests (if not possible then remove the cmd and 
> update docs - although this is very useful command, hate to see us lose it)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-566) "reqs" four letter word (command port) returns no information

2009-11-12 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-566:


Hadoop Flags: [Reviewed]

> "reqs" four letter word (command port) returns no information
> -
>
> Key: ZOOKEEPER-566
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-566
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-566.patch
>
>
> the four letter word "reqs" doesn't do anything - it always returns empty 
> data. Seems that "outstanding" field is always empty and never set.
> we should remove outstanding and also update the reqs code to correctly 
> output the outstanding requests (if not possible then remove the cmd and 
> update docs - although this is very useful command, hate to see us lose it)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776840#action_12776840
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

jeff, i agree that we shouldn't hold a patch to fix a bug somewhere else, but 
we also generally try to keep our trunk correct, so generally we want to see 
doc, test, and correct behavior before committing especially with something 
that touches the core. having said that i think the missing doc, functionality, 
and testing is confined to the observer function, so i think we should commit 
it and fix the rest of the observer code as separate patches to avoid having to 
refresh the patch.

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: obs-refactor.patch, observer-refactor.patch, observers 
> sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-425) Add OSGi metadata to zookeeper.jar

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776614#action_12776614
 ] 

Benjamin Reed commented on ZOOKEEPER-425:
-

oh sorry david. so i still have the same concern with the full manifest.mf, but 
before that i was wondering: are you trying to provide the bundle so that other 
bundles can use zookeeper or so that the bundle can start up a zookeeper server?

most of the packages imported and exported are internal to zookeeper and should 
be kept private. if we want to just provide access to the client API we should 
just list org.apache.zookeeper and org.apache.zookeeper.data (possibly 
org.apache.zookeeper.version). we should also use the script to set the version 
rather than hard code it. if you want to start the server, we should really 
have a separate package with just the classes/interfaces needed to manage a 
server instance and export that.

the only import we need is log4j. is there already a standard log4j bundle?

> Add OSGi metadata to zookeeper.jar
> --
>
> Key: ZOOKEEPER-425
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-425
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.1
>Reporter: David Bosschaert
> Attachments: MANIFEST.MF
>
>
> After adding OSGi metadata to zookeeper.jar it can be used as both an OSGi 
> bundle as well as an ordinary jar file. 
> In the CXF/DOSGi project the buildsystem does this using the 
> maven-bundle-plugin: 
> http://svn.apache.org/repos/asf/cxf/dosgi/trunk/discovery/distributed/zookeeper-wrapper/pom.xml
> The MANIFEST.MF generated by maven-bundle-plugin is attached to this bug, 
> this works for the CXF/DOSGi project.
> If your buildsystem isn't using maven, I would advise to use bnd 
> (http://www.aqute.biz/Code/Bnd). BND defines its own ant task in which you 
> should be able to use more or less the same instructions as were used in 
> maven:
> 
>   ZooKeeper bundle
>   This bundle contains the ZooKeeper 
> library
>   org.apache.hadoop.zookeeper
>   3.1.1
>   *
>   *;version=3.1.1
> 
> Oh and one other thing. Is it really necessary to put the source code in the 
> Jar file too? I would put that in a separate source distribution :)
> See also: 
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200905.mbox/%3c4a2009b1.3030...@yahoo-inc.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-425) Add OSGi metadata to zookeeper.jar

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776610#action_12776610
 ] 

Benjamin Reed commented on ZOOKEEPER-425:
-

right these are osgi specific tags that will get ignored normally.

> Add OSGi metadata to zookeeper.jar
> --
>
> Key: ZOOKEEPER-425
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-425
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.1
>Reporter: David Bosschaert
> Attachments: MANIFEST.MF
>
>
> After adding OSGi metadata to zookeeper.jar it can be used as both an OSGi 
> bundle as well as an ordinary jar file. 
> In the CXF/DOSGi project the buildsystem does this using the 
> maven-bundle-plugin: 
> http://svn.apache.org/repos/asf/cxf/dosgi/trunk/discovery/distributed/zookeeper-wrapper/pom.xml
> The MANIFEST.MF generated by maven-bundle-plugin is attached to this bug, 
> this works for the CXF/DOSGi project.
> If your buildsystem isn't using maven, I would advise to use bnd 
> (http://www.aqute.biz/Code/Bnd). BND defines its own ant task in which you 
> should be able to use more or less the same instructions as were used in 
> maven:
> 
>   ZooKeeper bundle
>   This bundle contains the ZooKeeper 
> library
>   org.apache.hadoop.zookeeper
>   3.1.1
>   *
>   *;version=3.1.1
> 
> Oh and one other thing. Is it really necessary to put the source code in the 
> Jar file too? I would put that in a separate source distribution :)
> See also: 
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200905.mbox/%3c4a2009b1.3030...@yahoo-inc.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-425) Add OSGi metadata to zookeeper.jar

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776538#action_12776538
 ] 

Benjamin Reed commented on ZOOKEEPER-425:
-

sorry i didn't notice this sooner. this is a great idea, and certainly 
reasonable. i think the import and export packages statement is incorrect. we 
should list the exact dependencies.

> Add OSGi metadata to zookeeper.jar
> --
>
> Key: ZOOKEEPER-425
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-425
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.1.1
>Reporter: David Bosschaert
> Attachments: MANIFEST.MF
>
>
> After adding OSGi metadata to zookeeper.jar it can be used as both an OSGi 
> bundle as well as an ordinary jar file. 
> In the CXF/DOSGi project the buildsystem does this using the 
> maven-bundle-plugin: 
> http://svn.apache.org/repos/asf/cxf/dosgi/trunk/discovery/distributed/zookeeper-wrapper/pom.xml
> The MANIFEST.MF generated by maven-bundle-plugin is attached to this bug, 
> this works for the CXF/DOSGi project.
> If your buildsystem isn't using maven, I would advise to use bnd 
> (http://www.aqute.biz/Code/Bnd). BND defines its own ant task in which you 
> should be able to use more or less the same instructions as were used in 
> maven:
> 
>   ZooKeeper bundle
>   This bundle contains the ZooKeeper 
> library
>   org.apache.hadoop.zookeeper
>   3.1.1
>   *
>   *;version=3.1.1
> 
> Oh and one other thing. Is it really necessary to put the source code in the 
> Jar file too? I would put that in a separate source distribution :)
> See also: 
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200905.mbox/%3c4a2009b1.3030...@yahoo-inc.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-11-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776524#action_12776524
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

nice reviewer guide! the patch looks really good. for me it's good to go once 
you have address the 4 things that flavio raised. (the forest doc is a pain, if 
you have troubles with it i'll help with the formatting if you give me the 
content.)

for historical purposes do you have a copy of that summary that was produced of 
the differences in operation and motivation between zero-weight followers and 
observers? it's been a while and i can't remember how it was published. it 
would be good to put a comment about it here in this issue.

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: obs-refactor.patch, observer-refactor.patch, observers 
> sync benchmark.png, observers.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-570) AsyncHammerTest is broken, callbacks need to validate rc parameter

2009-11-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-570:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 833938.


> AsyncHammerTest is broken, callbacks need to validate rc parameter
> --
>
> Key: ZOOKEEPER-570
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-570
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-570.patch, ZOOKEEPER-570.patch
>
>
> the asynchammertest is not validating the rc in the callback, more serious is 
> that it is using path in the create callback
> to delete the node, rather than name (which is important in the case of a 
> sequential node creation as in this case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-570) AsyncHammerTest is broken, callbacks need to validate rc parameter

2009-11-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-570:


Hadoop Flags: [Reviewed]

> AsyncHammerTest is broken, callbacks need to validate rc parameter
> --
>
> Key: ZOOKEEPER-570
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-570
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-570.patch, ZOOKEEPER-570.patch
>
>
> the asynchammertest is not validating the rc in the callback, more serious is 
> that it is using path in the create callback
> to delete the node, rather than name (which is important in the case of a 
> sequential node creation as in this case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-570) AsyncHammerTest is broken, callbacks need to validate rc parameter

2009-11-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774833#action_12774833
 ] 

Benjamin Reed commented on ZOOKEEPER-570:
-

+1 good job. what a messed up patch!

> AsyncHammerTest is broken, callbacks need to validate rc parameter
> --
>
> Key: ZOOKEEPER-570
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-570
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-570.patch, ZOOKEEPER-570.patch
>
>
> the asynchammertest is not validating the rc in the callback, more serious is 
> that it is using path in the create callback
> to delete the node, rather than name (which is important in the case of a 
> sequential node creation as in this case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-568) SyncRequestProcessor snapping too frequently - counts non-log events as log events

2009-11-07 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-568:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 833639.

> SyncRequestProcessor snapping too frequently - counts non-log events as log 
> events
> --
>
> Key: ZOOKEEPER-568
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-568
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-568.patch
>
>
> Noticed the following issues in SyncRequestProcessor
> 1) logCount is incremented even for non-log events (say getData)
> txnlog should return indication if request was logged or not (if hdr ==null 
> it returns)
> also:
> 2) move r.nextInt below logCount++ (ie if an actual log event)
> 3) fix indentation after txnlog.append (for some reason has unnecessary 4 
> char indent)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-568) SyncRequestProcessor snapping too frequently - counts non-log events as log events

2009-11-03 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773280#action_12773280
 ] 

Benjamin Reed commented on ZOOKEEPER-568:
-

i think that r.nextInt is in the right place. we should document why it is 
there. (so that all the replicas aren't snap shotting at the same time.)

apart from not counting the read only requests, we should also process them 
immediately if there are no pending writes.

> SyncRequestProcessor snapping too frequently - counts non-log events as log 
> events
> --
>
> Key: ZOOKEEPER-568
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-568
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
> Fix For: 3.3.0
>
>
> Noticed the following issues in SyncRequestProcessor
> 1) logCount is incremented even for non-log events (say getData)
> txnlog should return indication if request was logged or not (if hdr ==null 
> it returns)
> also:
> 2) move r.nextInt below logCount++ (ie if an actual log event)
> 3) fix indentation after txnlog.append (for some reason has unnecessary 4 
> char indent)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-562) c client can flood server with pings if tcp send queue filled

2009-10-26 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-562:


Status: Patch Available  (was: Open)

> c client can flood server with pings if tcp send queue filled
> -
>
> Key: ZOOKEEPER-562
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-562
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-562.patch
>
>
> The c client can flood the server with pings if the tcp queue is filled.
> Say the cluster is overloaded and shuts down the recv processing
> a c client can send a ping, but since last_send is only updated on successful 
> pushing of data into the 
> socket, if flush_send_queue fails to send any data (send_buffer returns 0) 
> then last_send is not updated
> and zookeeper_interest will again send a ping the next time it is woken - 
> which could be 0 if recv_to is close
> to 0, easily could happen if server is not sending data to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-562) c client can flood server with pings if tcp send queue filled

2009-10-26 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-562:


Attachment: ZOOKEEPER-562.patch

this patch fixes the problem by only sending a ping if there isn't something 
already queued. the test checks for clients sending gratuitous pings.

> c client can flood server with pings if tcp send queue filled
> -
>
> Key: ZOOKEEPER-562
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-562
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.1
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.2, 3.3.0
>
> Attachments: ZOOKEEPER-562.patch
>
>
> The c client can flood the server with pings if the tcp queue is filled.
> Say the cluster is overloaded and shuts down the recv processing
> a c client can send a ping, but since last_send is only updated on successful 
> pushing of data into the 
> socket, if flush_send_queue fails to send any data (send_buffer returns 0) 
> then last_send is not updated
> and zookeeper_interest will again send a ping the next time it is woken - 
> which could be 0 if recv_to is close
> to 0, easily could happen if server is not sending data to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-462) Last hint for open ledger

2009-10-11 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764535#action_12764535
 ] 

Benjamin Reed commented on ZOOKEEPER-462:
-

if the client asks all the bookies he may not be able to get the last committed 
entry if we allow for failures. the safest thing to do would be to get the last 
entry from each bookie and then use the entry id in the last committed field. 
that would mean that you would never be able to see the actual last committed 
record.

i think it would be good to allow the client to specify the last committed 
entry on the open. that way we allow the client to figure out the last 
committed record any way it wants, via communication from other processes for 
example, and it would keep the open code simple: it would just use the id it 
wouldn't need to worry about recovery.

> Last hint for open ledger
> -
>
> Key: ZOOKEEPER-462
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-462
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: contrib-bookkeeper
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-462.patch
>
>
> In some use cases of BookKeeper, it is useful to be able to read from a 
> ledger before closing the ledger. To enable such a feature, the writer has to 
> be able to communicate to a reader how many entries it has been able to write 
> successfully. The main idea of this jira is to continuously update a znode 
> with the number of successful writes, and a reader can, for example, watch 
> the node for changes.
>  I was thinking of having a configuration parameter to state how often a 
> writer should update the hint on ZooKeeper (e.g., every 1000 requests, every 
> 10,000 requests). Clearly updating more often increases the overhead of 
> writing to ZooKeeper, although the impact on the performance of writes to 
> BookKeeper should be minimal given that we make an asynchronous call to 
> update the hint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-542) c-client can spin when server unresponsive

2009-10-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-542:


Attachment: ZOOKEEPER-542.patch

added comments

> c-client can spin when server unresponsive
> --
>
> Key: ZOOKEEPER-542
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-542
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.0
>Reporter: Christian Wiedmann
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-542.patch, ZOOKEEPER-542.patch
>
>
> Due to a mismatch between zookeeper_interest() and zookeeper_process(), when 
> the zookeeper server is unresponsive the client can spin when reconnecting to 
> the server.
> In particular, zookeeper_interest() adds ZOOKEEPER_WRITE whenever there is 
> data to be sent, but flush_send_queue() only writes the data if the state is 
> ZOO_CONNECTED_STATE.  When in ZOO_ASSOCIATING_STATE, this results in spinning.
> This probably doesn't affect production, but I had a runaway process in a 
> development deployment that caused performance issues on the node.  This is 
> easy to reproduce in a single node environment by doing a kill -STOP on the 
> server and waiting for the session timeout.
> Patch to be added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-542) c-client can spin when server unresponsive

2009-10-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-542:


Fix Version/s: 3.3.0
   Status: Patch Available  (was: Open)

> c-client can spin when server unresponsive
> --
>
> Key: ZOOKEEPER-542
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-542
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.0
>Reporter: Christian Wiedmann
> Fix For: 3.3.0
>
> Attachments: ZOOKEEPER-542.patch, ZOOKEEPER-542.patch
>
>
> Due to a mismatch between zookeeper_interest() and zookeeper_process(), when 
> the zookeeper server is unresponsive the client can spin when reconnecting to 
> the server.
> In particular, zookeeper_interest() adds ZOOKEEPER_WRITE whenever there is 
> data to be sent, but flush_send_queue() only writes the data if the state is 
> ZOO_CONNECTED_STATE.  When in ZOO_ASSOCIATING_STATE, this results in spinning.
> This probably doesn't affect production, but I had a runaway process in a 
> development deployment that caused performance issues on the node.  This is 
> easy to reproduce in a single node environment by doing a kill -STOP on the 
> server and waiting for the session timeout.
> Patch to be added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-542) c-client can spin when server unresponsive

2009-10-06 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762683#action_12762683
 ] 

Benjamin Reed commented on ZOOKEEPER-542:
-

+1 good catch and good fix. i'm going to extend the patch slightly by putting 
in a comment to document how we are handling the non-blocking connect. (somehow 
that got deleted long ago.)

> c-client can spin when server unresponsive
> --
>
> Key: ZOOKEEPER-542
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-542
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.0
>Reporter: Christian Wiedmann
> Attachments: ZOOKEEPER-542.patch
>
>
> Due to a mismatch between zookeeper_interest() and zookeeper_process(), when 
> the zookeeper server is unresponsive the client can spin when reconnecting to 
> the server.
> In particular, zookeeper_interest() adds ZOOKEEPER_WRITE whenever there is 
> data to be sent, but flush_send_queue() only writes the data if the state is 
> ZOO_CONNECTED_STATE.  When in ZOO_ASSOCIATING_STATE, this results in spinning.
> This probably doesn't affect production, but I had a runaway process in a 
> development deployment that caused performance issues on the node.  This is 
> easy to reproduce in a single node environment by doing a kill -STOP on the 
> server and waiting for the session timeout.
> Patch to be added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-536) the initial size of the hashsets for the watcher is too large

2009-09-25 Thread Benjamin Reed (JIRA)
the initial size of the hashsets for the watcher is too large
-

 Key: ZOOKEEPER-536
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-536
 Project: Zookeeper
  Issue Type: Improvement
Reporter: Benjamin Reed


setting a watches on a lot of different nodes can be expensive if there is only 
one watch set on each node. by default the hashset we use to track watches has 
16 entries and takes up about 160 bytes. we should probably make the initial 
size much lower.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-520) add static/readonly client resident serverless zookeeper

2009-09-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-520:


Summary: add static/readonly client resident serverless zookeeper  (was: 
add static/readonly client session type)

> add static/readonly client resident serverless zookeeper
> 
>
> Key: ZOOKEEPER-520
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-520
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: c client, java client
>Reporter: Patrick Hunt
> Fix For: 3.3.0
>
>
> Occasionally people (typically ops) has asked for the ability to start a ZK 
> client with a hardcoded, local, non cluster based session. Meaning that you 
> can bring up a particular client with a hardcoded/readonly view of the ZK 
> namespace even if the zk cluster is not available. This seems useful for a 
> few reasons:
> 1) unforseen problems - a client might be brought up and partial application 
> service restored even in the face of catastrophic cluster failure
> 2) testing - client could be brought up with a hardcoded configuration for 
> testing purposes. we might even be able to extend this idea over time to 
> allow "simulated changes" ie - simulate other clients making changes in the 
> namespace, perhaps simulate changes in the state of the cluster (testing 
> state change is often hard for users of the client interface)
> Seems like this shouldn't be too hard for us to add. The session could be 
> established with a URI for a local/remote file rather than a URI of the 
> cluster servers. The client would essentially read this file which would be a 
> simple representation of the znode namespace.
> /foo/bar "abc"
> /foo/bar2 "def"
> etc...
> In the pure client readonly case this is simple. We might also want to allow 
> writes to the namespace (essentially back this with an in memory hash) for 
> things like group membership (so that the client continues to function).
> Obv this wouldn't work in some cases, but it might work in many and would 
> allow further options for users wrt building a relable/recoverable service on 
> top of ZK.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-512) FLE election fails to elect leader

2009-08-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748018#action_12748018
 ] 

Benjamin Reed commented on ZOOKEEPER-512:
-

agreed. i think the problem is that under high load we don't have a period of 
error free operation. i think it is ok to generate errors randomly as we are 
doing, but we should have periods of error free operation so that things can 
settle down.

> FLE election fails to elect leader
> --
>
> Key: ZOOKEEPER-512
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.2.0
>Reporter: Patrick Hunt
>Assignee: Flavio Paiva Junqueira
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
> t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
> ZOOKEEPER-512.patch
>
>
> I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
> applied and noticed that after some time the ensemble failed to re-elect a 
> leader.
> See the attached log files - 5 member ensemble. typically 5 is the leader
> Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
> elapses w/no quorum
> environment:
> I was doing fault injection testing using aspectj. The faults are injected 
> into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
> (rand.nextFloat() <= .005 => throw IOException
> You can see when a fault is injected in the log via:
> 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
> - READPACKET FORCED FAIL
> vs a read/write that didn't force fail:
> 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
> - READPACKET OK
> otw standard code/config (straight fle quorum with 5 members)
> also see the attached jstack trace. this is for one of the servers. Notice in 
> particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-515) Zookeeper quorum didn't provide service when restart after an "Out of memory" crash

2009-08-25 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747532#action_12747532
 ] 

Benjamin Reed commented on ZOOKEEPER-515:
-

first, it is important to note that our limit of 1M for data is a sanity check. 
it is unwise to design your application to run on the edge of sanity. generally 
we talk about data in the kilobyte range 100 bytes - 64k. zookeeper stores 
meta-data not application data.

do you know how big the resulting data is? what is the size of a snapshot file?

1) perhaps you are hitting the memory error again when you try to rebuild your 
in-memory data structure. you may try increasing the memory limit using the 
-Xmx flag.
2) there is a configuration option to specify the number of requests in flight, 
globalOutstandingLimit, which defaults to 1000, but with 1000 1M requests you 
need 1G for just the inflight requests, in addition to the memory needed for 
the tree. if you want to handle such large requests you need to look at the 
amount of memory we have and possibly tune that parameter. also if you have a 
large in memory tree and you need to do a state transfer for followers that are 
behind, you will need some time to push a lot of data over the network, so you 
probably also need to adjust the syncLimit and initLimit.
3) if you want to reinitialize everything you need to remove the version-2 
directory from all servers, otherwise, a server that still has the version-2 
directory will get elected and the other servers will sync with it.

> Zookeeper quorum didn't provide service when restart after an "Out of memory" 
> crash
> ---
>
> Key: ZOOKEEPER-515
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-515
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
> Environment: Linux 2.6.9-52bs-4core #2 SMP Wed Jan 16 14:44:08 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Jdk: 1.6.0_14 
>Reporter: Qian Ye
>
> The Zookeeper quorum, containing 5 servers, didn't provide service when 
> restart after an "Out of memory" crash. 
> It happened as following:
> 1. we built  a Zookeeper quorum which contained  5 servers, say 1, 3, 4, 5, 6 
> (have no 2), and 6 was the leader.
> 2. we created 18 threads on 6 different servers to set and get data from a 
> znode in the Zookeeper at the same time.  The size of the data is 1MB. The 
> test threads did their job as fast as possible, no pause between two 
> operation, and they repeated the setting and getting 4000 times. 
> 3. the Zookeeper leader crashed about 10 mins  after the test threads 
> started. The leader printed out the log:
> 2009-08-25 12:00:12,301 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b5 due to java.io.IOException: Read error
> 2009-08-25 12:00:12,318 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b6 due to java.io.IOException: Read error
> 2009-08-25 12:03:44,086 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b8 due to java.io.IOException: Read error
> 2009-08-25 12:04:53,757 - WARN  [NIOServerCxn.Factory:2181:nioserverc...@497] 
> - Exception causing close of session 0x523
> 4223c2dc00b7 due to java.io.IOException: Read error
> 2009-08-25 12:15:45,151 - FATAL [SyncThread:0:syncrequestproces...@131] - 
> Severe unrecoverable error, exiting
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
> at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
> at 
> org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
> at org.apache.zookeeper.txn.SetDataTxn.serialize(SetDataTxn.java:42)
> at 
> org.apache.zookeeper.server.persistence.Util.marshallTxnEntry(Util.java:262)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:154)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:268)
> at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:100)
> It is clear that the leader ran out of memory. then the server 4 was down 
> almost at the same time, and printed out the log:
> 2009-08-25 12:15:45,995 - ERROR 
> [FollowerRequestProcessor:3:followerrequestproces...@91] - Unexpected 
> exception causing
> exit
> java.net.SocketException: Connection reset
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.write(BufferedOutputS

[jira] Commented: (ZOOKEEPER-508) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy.

2009-08-24 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747155#action_12747155
 ] 

Benjamin Reed commented on ZOOKEEPER-508:
-

+1 looks good. simple fix! :)

> proposals and commits for DIFF and Truncate messages from the leader to 
> followers is buggy.
> ---
>
> Key: ZOOKEEPER-508
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-508
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, 
> ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, ZOOKEEPER-508.patch, 
> ZOOKEEPER-508.patch-3.2
>
>
> The proposals and commits sent by the leader after it asks the followers to 
> truncate there logs or starts sending a diff has missing messages which 
> causes out of order commits messages and causes the followers to shutdown 
> because of these out of order commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-508) proposals and commits for DIFF and Truncate messages from the leader to followers is buggy.

2009-08-19 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-508:


Attachment: ZOOKEEPER-508.patch

added a testcase for the DIFF problem. still not fixed.

> proposals and commits for DIFF and Truncate messages from the leader to 
> followers is buggy.
> ---
>
> Key: ZOOKEEPER-508
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-508
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-508.patch, ZOOKEEPER-508.patch
>
>
> The proposals and commits sent by the leader after it asks the followers to 
> truncate there logs or starts sending a diff has missing messages which 
> causes out of order commits messages and causes the followers to shutdown 
> because of these out of order commits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743547#action_12743547
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

just to be clear. this bug isn't completely fixed and the test case should 
still be failing. i just want to make sure it fails reliably on the hudson 
machine.

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
> ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
> ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 

[jira] Commented: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743541#action_12743541
 ] 

Benjamin Reed commented on ZOOKEEPER-503:
-

i should have also mentioned that this patch was done by flavio and utkarsh. i 
will be reviewing it.

> race condition in asynchronous create
> -
>
> Key: ZOOKEEPER-503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bookkeeper
>Reporter: Benjamin Reed
> Attachments: ZOOKEEPER-503.patch
>
>
> there is a race condition between the zookeeper completion thread and the 
> bookeeper processing queue during create. if the zookeeper completion thread 
> falls behind due to scheduling, the action counter of the create operation 
> may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-503:


Status: Patch Available  (was: Open)

> race condition in asynchronous create
> -
>
> Key: ZOOKEEPER-503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bookkeeper
>Reporter: Benjamin Reed
> Attachments: ZOOKEEPER-503.patch
>
>
> there is a race condition between the zookeeper completion thread and the 
> bookeeper processing queue during create. if the zookeeper completion thread 
> falls behind due to scheduling, the action counter of the create operation 
> may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-503:


Attachment: ZOOKEEPER-503.patch

this patch fixes a range of projects. it is a big simplification. it has a net 
removal of 700 lines of code. the meta data for a ledger was collapsed into a 
single znode. here is a description of the changes:

Index calculation in QuorumEngine must be synchronized on the LedgerHandle to 
avoid changes to the ensemble while trying to submit an operation. Such changes 
happen upon crashes of bookies. 
  

I initialized thought it was not necessary, but now I think this 
synchronization block is necessary. 

If a writer adds just a few entries to a ledger, it may end up with hints that 
say "empty ledger" when trying to recover a ledger. In this case, if we receive 
an empty ledger flag as a hint, we have to switch the hint to zero, which means 
that the client will start recovery from entry zero. If no entry has been 
written, it still works as the client won't be able to read anything.   
   

I have changed LedgerRecoveryTest to test for: many entries written, one entry 
written, no entry written.

I have been able to identify the problem that was causing BookieFailureTest to 
hang on Utkarsh's computer. Basically, when the queue of a BookieHandle is full 
and the corresponding bookie has crashed, we are not able to add a read 
operation to the queue incoming queue of the bookie handle because the 
BookieHandle is not processing new requests anymore and it is waiting to fail 
the handle. In this case, the BookieHandle throws an exception after timing out 
the call to add the read operation to the queue. We were propagating this 
exception to the application.   
  

The main problem is that we have to add the operation to the queue of 
ClientCBWorker so that we guarantee that it knows about the operation once we 
receive responses from bookies. If we throw an exception without removing the 
operation from the ClientCBWorker queue, the worker will wait forever, which I 
believe is the case Utkarsh was observing.  
   

If I reasoned about the code correctly, then my modifications fix this problem 
by retrying a few times and erroring out after a number of retries. Erroring 
out in this case means notifying the CBWorker so that we can release the 
operation. 

Fixing log level in LedgerConfig. -F

I have mainly worked on the ledger recovery machinery. I made it asynchronous 
by transforming LedgerRecovery into a thread and moving some calls. We have to 
revisit this way of making it asynchronous as it might not be acceptable for 
this patch.

I'm still to check why BookieFailureTest is failing for Utkarsh. It passes fine 
every time for me, so we have to find a way to reproduce it reliably in my 
machine so that I can debug it.


Took a pass over asynchronous ledger operations: create, open, close. Some 
parts are still blocking, work on those next.

> race condition in asynchronous create
> -
>
> Key: ZOOKEEPER-503
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bookkeeper
>Reporter: Benjamin Reed
> Attachments: ZOOKEEPER-503.patch
>
>
> there is a race condition between the zookeeper completion thread and the 
> bookeeper processing queue during create. if the zookeeper completion thread 
> falls behind due to scheduling, the action counter of the create operation 
> may go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
> ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
> ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,8

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

The test case exposed another bug: log truncation was not being done properly 
with the buffered inputstream. i modified the test to make it fail reliably and 
then fixed the bug.

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
> ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
> ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-14 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Open  (was: Patch Available)

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: QuorumTest.log, QuorumTest.log.gz, zklogs.tar.gz, 
> ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zooke

[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-498:


Hadoop Flags: [Reviewed]

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, 
> ZOOKEEPER-498.patch, ZOOKEEPER-498.patch
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-10 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741605#action_12741605
 ] 

Benjamin Reed commented on ZOOKEEPER-498:
-

+1 looks good. when setting the stop flags, you should really do an interrupt 
to wake up the wait, but that will cause a message to be printed to stdout. 
i'll open another jira to fix that.

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Flavio Paiva Junqueira
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, 
> zk498-test.tar.gz, zoo.cfg, ZOOKEEPER-498.patch, ZOOKEEPER-498.patch, 
> ZOOKEEPER-498.patch, ZOOKEEPER-498.patch
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Open  (was: Patch Available)

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
> ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closi

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-10 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
> ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closi

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

fixed patch to apply cleanly.

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch, 
> ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.serv

[jira] Updated: (ZOOKEEPER-499) electionAlg should default to FLE (3) - regression

2009-08-08 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-499:


Status: Open  (was: Patch Available)

this looks good pat, but when you first get the logger, why are you using the 
package name? if you are going to use the package name shouldn't you get the 
package from the class file?

in the second test, you get the logger using a package to add an appender, but 
remove using the class. couldn't that cause a problem potentially?

> electionAlg should default to FLE (3) - regression
> --
>
> Key: ZOOKEEPER-499
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-499
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server, tests
>Affects Versions: 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-499.patch, ZOOKEEPER-499_br3.2.patch
>
>
> there's a regression in 3.2 - electionAlg is no longer defaulting to 3 
> (incorrectly defaults to 0)
> also - need to have tests to validate this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-503) race condition in asynchronous create

2009-08-06 Thread Benjamin Reed (JIRA)
race condition in asynchronous create
-

 Key: ZOOKEEPER-503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-503
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Reporter: Benjamin Reed


there is a race condition between the zookeeper completion thread and the 
bookeeper processing queue during create. if the zookeeper completion thread 
falls behind due to scheduling, the action counter of the create operation may 
go backwards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-502) bookkeeper create calls completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-502:


Component/s: contrib-bookkeeper

> bookkeeper create calls completion too many times
> -
>
> Key: ZOOKEEPER-502
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bookkeeper
>Reporter: Benjamin Reed
>Assignee: Flavio Paiva Junqueira
> Attachments: ZOOKEEPER-502.patch
>
>
> when calling the asynchronous version of create, the completion routine is 
> called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-502) bookkeeper create calls completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-502:


Summary: bookkeeper create calls completion too many times  (was: 
bookkeeper create call completion too many times)

> bookkeeper create calls completion too many times
> -
>
> Key: ZOOKEEPER-502
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Assignee: Flavio Paiva Junqueira
> Attachments: ZOOKEEPER-502.patch
>
>
> when calling the asynchronous version of create, the completion routine is 
> called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-502) bookkeeper create call completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-502:


Attachment: ZOOKEEPER-502.patch

this patch adds a test case that reproduces the problem.

> bookkeeper create call completion too many times
> 
>
> Key: ZOOKEEPER-502
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Assignee: Flavio Paiva Junqueira
> Attachments: ZOOKEEPER-502.patch
>
>
> when calling the asynchronous version of create, the completion routine is 
> called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-502) bookkeeper create call completion too many times

2009-08-06 Thread Benjamin Reed (JIRA)
bookkeeper create call completion too many times


 Key: ZOOKEEPER-502
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-502
 Project: Zookeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Flavio Paiva Junqueira


when calling the asynchronous version of create, the completion routine is 
called more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-476) upgrade junit library from 4.4 to 4.6

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-476:


Hadoop Flags: [Reviewed]

+1 looks good

> upgrade junit library from 4.4 to 4.6
> -
>
> Key: ZOOKEEPER-476
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-476
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: tests
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.0
>
> Attachments: junit-4.6.jar, junit-4.6.LICENSE.txt
>
>
> upgrade from junit 4.4 to 4.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-490:


Hadoop Flags: [Reviewed]

+1 looks good pat

> the java docs for session creation are misleading/incomplete
> 
>
> Key: ZOOKEEPER-490
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-490.patch
>
>
> the javadoc for ZooKeeper constructor says:
>  * The client object will pick an arbitrary server and try to connect to 
> it.
>  * If failed, it will try the next one in the list, until a connection is
>  * established, or all the servers have been tried.
> the "or all server tried" phrase is misleading, it should indicate that we 
> retry until success, con closed, or session expired. 
> we also need ot mention that connection is async, that constructor returns 
> immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-484:


Hadoop Flags: [Reviewed]

+1 looks good mahadev

> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch, ZOOKEEPER-484.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-08-06 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-311:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

commit to 3.2 branch: Committed revision 801756.
commit to trunk: Committed revision 801747.

> handle small path lengths in zoo_create()
> -
>
> Key: ZOOKEEPER-311
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch
>
>
> The synchronous completion for zoo_create() contains the following code:\\
> {noformat}
> if (sc->u.str.str_len > strlen(res.path)) {
> len = strlen(res.path);
> } else {
> len = sc->u.str.str_len-1;
> }
> if (len > 0) {
> memcpy(sc->u.str.str, res.path, len);
> sc->u.str.str[len] = '\0';
> }
> {noformat}
> In the case where the max_realpath_len argument to zoo_create() is 0, none of 
> this code executes, which is OK.  In the case where max_realpath_len is 1, a 
> user might expect their buffer to be filled with a null terminator, but 
> again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
> since new node's will have paths longer than "/").
> The name of the argument to zoo_create() is also a little misleading, as is 
> its description ("the maximum length of real path you would want") in 
> zookeeper.h, and the example usage in the Programmer's Guide:
> {noformat}
> int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL, 
> buffer, sizeof(buffer)-1);
> {noformat}
> In fact this value should be the actual length of the buffer, including space 
> for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
> buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
> returned value to 9 bytes and put the null terminator in the second-last 
> byte, leaving the final byte of the buffer unused.
> It would be better, I think, to rename the realpath and max_realpath_len 
> arguments to something like path_buffer and path_buffer_len, akin to 
> zoo_set().  The path_buffer_len would be treated as the full length of the 
> buffer (as the code does now, in fact, but the docs suggest otherwise).
> The code in the synchronous completion could then be changed as per the 
> attached patch.
> Since this would change, slightly, the behaviour or "contract" of the API, I 
> would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Status: Patch Available  (was: Open)

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d516

[jira] Commented: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739898#action_12739898
 ] 

Benjamin Reed commented on ZOOKEEPER-483:
-

I've addressed 1) in the attached patch.

for 2) we are not eating the IOException. we are actually shutting things down. 
the bug is actually that we are passing it up to the upper layer, which does 
not know anything about the follower thread. we need to handle it here.

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zoo

[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-08-05 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
>Assignee: Benjamin Reed
> Fix For: 3.2.1, 3.3.0
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d516011c 

[jira] Updated: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert

2009-07-30 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-466:


Status: Open  (was: Patch Available)

> crash on zookeeper_close() when using auth with empty cert
> --
>
> Key: ZOOKEEPER-466
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-466.patch
>
>
> The free_auth_info() function calls deallocate_Buffer(&auth->auth) on every 
> element in the auth list; that function frees any memory pointed to by 
> auth->auth.buff if that field is non-NULL.
> In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set 
> to 0, but then not assigned to authinfo->auth when auth.buff is NULL.  The 
> result is uninitialized data in auth->auth.buff in free_auth_info(), and 
> potential crashes.
> The attached patch adds a test which attempts to duplicate this error; it 
> works for me but may not always on all systems as it depends on the 
> uninitialized data being non-zero; there's not really a simple way I can see 
> to trigger this in the current test framework.  The patch also fixes the 
> problem, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert

2009-07-30 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-466:


Status: Patch Available  (was: Open)

> crash on zookeeper_close() when using auth with empty cert
> --
>
> Key: ZOOKEEPER-466
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-466.patch
>
>
> The free_auth_info() function calls deallocate_Buffer(&auth->auth) on every 
> element in the auth list; that function frees any memory pointed to by 
> auth->auth.buff if that field is non-NULL.
> In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set 
> to 0, but then not assigned to authinfo->auth when auth.buff is NULL.  The 
> result is uninitialized data in auth->auth.buff in free_auth_info(), and 
> potential crashes.
> The attached patch adds a test which attempts to duplicate this error; it 
> works for me but may not always on all systems as it depends on the 
> uninitialized data being non-zero; there's not really a simple way I can see 
> to trigger this in the current test framework.  The patch also fixes the 
> problem, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-482) ignore sigpipe in testRetry to avoid silent immediate failure

2009-07-30 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-482:



+1 thanks chris!

> ignore sigpipe in testRetry to avoid silent immediate failure
> -
>
> Key: ZOOKEEPER-482
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-482
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, tests
>Affects Versions: 3.2.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-482.patch
>
>
> The testRetry test silently exits for me periodically, especially, it seems, 
> on newer hardware.  It also spits out from log messages clutter the test 
> output.
> The silent exits turn out to be because SIGPIPE is sometimes delivered during 
> the sleep(1) in createClient(), the second time createClient() is called.  
> Since SIGPIPE is not being ignored and there is no signal handler, the 
> process exists immediately.  This leaves the test suite in a broken state, 
> with the test ZooKeeper process still running because "zkServer.sh stop" is 
> not run by tearDown().  You have to manually kill the ZK server and retry the 
> tests; sometimes they succeed and sometimes they don't.
> I described SIGPIPE handling a little in ZOOKEEPER-320.  The appropriate 
> thing, I think, is for the client application to ignore or handle SIGPIPE.  
> In this case, that falls to the test processes.  The attached patch fixes the 
> issue for me with testRetry.
> The patch uses sigaction() to ignore SIGPIPE in TestClientRetry.cc and, for 
> good measure (although I never saw it actually fail for me), TestClient.cc, 
> since that file also uses sleep() extensively.
> I also removed a couple of unused functions and a macro definition from 
> TestClientRetry.cc, just to simply matters, and turned off log output, which 
> makes the testRetry output much, much cleaner (otherwise you get a lot of log 
> output spamming into the nice clean cppunit output :-).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager

2009-07-30 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737238#action_12737238
 ] 

Benjamin Reed commented on ZOOKEEPER-481:
-

looks great flavio. i think i figured out how the test works. do you mind 
putting a comment into the test to state your strategy for posterity?

> Add lastMessageSent to QuorumCnxManager
> ---
>
> Key: ZOOKEEPER-481
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Attachments: ZOOKEEPER-481.patch, ZOOKEEPER-481.patch
>
>
> Currently we rely on TCP for reliable delivery of FLE messages. However, as 
> we concurrently drop and create new connections, it is possible that a 
> message is sent but never received. With this patch, cnx manager keeps a list 
> of last messages sent, and resends the last one sent. Receiving multiples 
> copies is harmless. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower

2009-07-30 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737221#action_12737221
 ] 

Benjamin Reed commented on ZOOKEEPER-480:
-

looks great flavio. i think i figured out how the test works. do you mind 
putting a comment into the test to state your strategy for posterity?


> FLE should perform leader check when node is not leading and add vote of 
> follower
> -
>
> Key: ZOOKEEPER-480
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-480.patch, ZOOKEEPER-480.patch
>
>
> As a server may join leader election while others have already elected a 
> leader, it is necessary that a server handles some special cases of leader 
> election when notifications are from servers that are either LEADING or 
> FOLLOWING. In such special cases, we check if we have received a message from 
> the leader to declare a leader elected. This check does not consider the case 
> that the process performing the check might be a recently elected leader, and 
> consequently the check fails.
> This patch also adds a new case, which corresponds to adding a vote to 
> recvset when the notification is from a process LEADING or FOLLOWING. This 
> fixes the case raised in ZOOKEEPER-475.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-479) QuorumHierarchical does not count groups correctly

2009-07-30 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-479:


Hadoop Flags: [Reviewed]

+1 looks good.

> QuorumHierarchical does not count groups correctly
> --
>
> Key: ZOOKEEPER-479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-479
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-479.patch, ZOOKEEPER-479.patch
>
>
> QuorumHierarchical::containsQuorum should not verify if all groups 
> represented in the input set have more than half of the total weight. 
> Instead, it should check only for an overall majority of groups. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-483) ZK fataled on me, and ugly

2009-07-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-483:


Attachment: ZOOKEEPER-483.patch

i was able to reproduce the problem. and the patch was a missing catch for a 
socket exception.

> ZK fataled on me, and ugly
> --
>
> Key: ZOOKEEPER-483
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-483
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: ryan rawson
> Fix For: 3.2.1
>
> Attachments: zklogs.tar.gz, ZOOKEEPER-483.patch
>
>
> here are the part of the log whereby my zookeeper instance crashed, taking 3 
> out of 5 down, and thus ruining the quorum for all clients:
> 2009-07-23 12:29:06,769 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5161350 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:00,756 WARN org.apache.zookeeper.server.quorum.Follower: 
> Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:65)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at 
> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:243)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:494)
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.168:39489]
> 2009-07-23 12:29:06,770 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0578 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46797]
> 2009-07-23 12:29:06,771 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa013e NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:33998]
> 2009-07-23 12:29:06,771 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x52276d1d5160593 due to 
> java.io.IOException: Read error
> 2009-07-23 12:29:06,808 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e02bb NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.158:53758]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13e4 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.154:58681]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691382 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59967]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb1354 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.163:49957]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x42276d1d3fa13cd NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.150:34212]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x22276d15e691383 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.159:46813]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x12276d15dfb0350 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.162:59956]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e139b NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.156:55138]
> 2009-07-23 12:29:06,809 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x32276d15d2e1398 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.167:41257]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> closing session:0x52276d1d5161355 NIOServerCnxn: 
> java.nio.channels.SocketChannel[connected local=/10.20.20.151:2181 
> remote=/10.20.20.153:34032]
> 2009-07-23 12:29:06,810 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> cl

[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-07-23 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-484:


Attachment: sessionTest.patch

this patch recreates the problem.

> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733990#action_12733990
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

it would be great to meet when you are here in SF! it turns out that flavio is 
also here next week. tragically, i will be leaving on vacation tuesday morning, 
i could meet on monday though. perhaps we could meet somewhere between here and 
SF for dinner?

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-07-21 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-311:


Status: Open  (was: Patch Available)

> handle small path lengths in zoo_create()
> -
>
> Key: ZOOKEEPER-311
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.2.0, 3.1.1, 3.1.0, 3.0.1, 3.0.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
>Priority: Minor
> Fix For: 3.2.1
>
> Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch
>
>
> The synchronous completion for zoo_create() contains the following code:\\
> {noformat}
> if (sc->u.str.str_len > strlen(res.path)) {
> len = strlen(res.path);
> } else {
> len = sc->u.str.str_len-1;
> }
> if (len > 0) {
> memcpy(sc->u.str.str, res.path, len);
> sc->u.str.str[len] = '\0';
> }
> {noformat}
> In the case where the max_realpath_len argument to zoo_create() is 0, none of 
> this code executes, which is OK.  In the case where max_realpath_len is 1, a 
> user might expect their buffer to be filled with a null terminator, but 
> again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
> since new node's will have paths longer than "/").
> The name of the argument to zoo_create() is also a little misleading, as is 
> its description ("the maximum length of real path you would want") in 
> zookeeper.h, and the example usage in the Programmer's Guide:
> {noformat}
> int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL, 
> buffer, sizeof(buffer)-1);
> {noformat}
> In fact this value should be the actual length of the buffer, including space 
> for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
> buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
> returned value to 9 bytes and put the null terminator in the second-last 
> byte, leaving the final byte of the buffer unused.
> It would be better, I think, to rename the realpath and max_realpath_len 
> arguments to something like path_buffer and path_buffer_len, akin to 
> zoo_set().  The path_buffer_len would be treated as the full length of the 
> buffer (as the code does now, in fact, but the docs suggest otherwise).
> The code in the synchronous completion could then be changed as per the 
> attached patch.
> Since this would change, slightly, the behaviour or "contract" of the API, I 
> would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-07-21 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-311:


Status: Patch Available  (was: Open)

> handle small path lengths in zoo_create()
> -
>
> Key: ZOOKEEPER-311
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.2.0, 3.1.1, 3.1.0, 3.0.1, 3.0.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
>Priority: Minor
> Fix For: 3.2.1
>
> Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch
>
>
> The synchronous completion for zoo_create() contains the following code:\\
> {noformat}
> if (sc->u.str.str_len > strlen(res.path)) {
> len = strlen(res.path);
> } else {
> len = sc->u.str.str_len-1;
> }
> if (len > 0) {
> memcpy(sc->u.str.str, res.path, len);
> sc->u.str.str[len] = '\0';
> }
> {noformat}
> In the case where the max_realpath_len argument to zoo_create() is 0, none of 
> this code executes, which is OK.  In the case where max_realpath_len is 1, a 
> user might expect their buffer to be filled with a null terminator, but 
> again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
> since new node's will have paths longer than "/").
> The name of the argument to zoo_create() is also a little misleading, as is 
> its description ("the maximum length of real path you would want") in 
> zookeeper.h, and the example usage in the Programmer's Guide:
> {noformat}
> int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL, 
> buffer, sizeof(buffer)-1);
> {noformat}
> In fact this value should be the actual length of the buffer, including space 
> for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
> buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
> returned value to 9 bytes and put the null terminator in the second-last 
> byte, leaving the final byte of the buffer unused.
> It would be better, I think, to rename the realpath and max_realpath_len 
> arguments to something like path_buffer and path_buffer_len, akin to 
> zoo_set().  The path_buffer_len would be treated as the full length of the 
> buffer (as the code does now, in fact, but the docs suggest otherwise).
> The code in the synchronous completion could then be changed as per the 
> attached patch.
> Since this would change, slightly, the behaviour or "contract" of the API, I 
> would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733790#action_12733790
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

hey i'm looking at the patch, can you comment on the VIEWCHANGE message? does 
that refer to ensemble membership change or the subscribe to a subtree that was 
mentioned.

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-21 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733789#action_12733789
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

i'm very sensitive to the work already done issue! i've totally been there.

the con argument for the increased chatter is actually quite minimal since the 
COMMIT message is just a few bytes that gets merged into an existing TCP 
stream.the restriction only weight-0 followers subscribing to a portion of the 
tree is a bit hacky, but it eliminates the need for a bunch of new code.

to be honest, there are two things that really concern me:

1) the amount of new code we have to add if we don't use weight-0 followers and 
the the new test cases that we have to write. since observers use a different 
code path we have to add a lot more tests.
2) one use of observers is to do graceful change over for ensemble changes. 
changing from a weight-0 follower to a follower that is a voting participant 
just means that the follower will start sending ACKs when it gets the proposal 
that it starts voting. we can do that very fast on the fly with no interruption 
to the follower. if we try to convert an observer, the new follower must switch 
from observer to follower and sync up to the leader before it can commit the 
new ensemble message. this increases the interruption of the change and the 
likelihood of failure.

btw, we could setup a phone conference if it would help. (everyone would be 
invited of course. we have global access numbers.)

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-20 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733410#action_12733410
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

henry, i was thinking the other day that an observer is very similar to a 
follower in a flexible quorum with 0 weight. actually the more i thought about 
it, the more i realized that it should be the same. a follower with 0 weight 
really should not send ACKs back and then it would be an observer. it turns out 
that there is a comment in ZOOKEEPER-29 that makes this observation as well. in 
that issue the differences that flavio points out are no longer relevant. i 
think. what do you think?

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-311) handle small path lengths in zoo_create()

2009-07-17 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732581#action_12732581
 ] 

Benjamin Reed commented on ZOOKEEPER-311:
-

+1 great job chris! i likke your test cases. thanx.  now let me see if i can 
find out why qa isn't running...

> handle small path lengths in zoo_create()
> -
>
> Key: ZOOKEEPER-311
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-311
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
>Priority: Minor
> Fix For: 3.2.1
>
> Attachments: ZOOKEEPER-311.patch, ZOOKEEPER-311.patch
>
>
> The synchronous completion for zoo_create() contains the following code:\\
> {noformat}
> if (sc->u.str.str_len > strlen(res.path)) {
> len = strlen(res.path);
> } else {
> len = sc->u.str.str_len-1;
> }
> if (len > 0) {
> memcpy(sc->u.str.str, res.path, len);
> sc->u.str.str[len] = '\0';
> }
> {noformat}
> In the case where the max_realpath_len argument to zoo_create() is 0, none of 
> this code executes, which is OK.  In the case where max_realpath_len is 1, a 
> user might expect their buffer to be filled with a null terminator, but 
> again, nothing will happen (even if strlen(res.path) is 0, which is unlikely 
> since new node's will have paths longer than "/").
> The name of the argument to zoo_create() is also a little misleading, as is 
> its description ("the maximum length of real path you would want") in 
> zookeeper.h, and the example usage in the Programmer's Guide:
> {noformat}
> int rc = zoo_create(zh,"/xyz","value", 5, &CREATE_ONLY, ZOO_EPHEMERAL, 
> buffer, sizeof(buffer)-1);
> {noformat}
> In fact this value should be the actual length of the buffer, including space 
> for the null terminator.  If the user supplies a max_realpath_len of 10 and a 
> buffer of 11 bytes, and strlen(res.path) is 10, the code will truncate the 
> returned value to 9 bytes and put the null terminator in the second-last 
> byte, leaving the final byte of the buffer unused.
> It would be better, I think, to rename the realpath and max_realpath_len 
> arguments to something like path_buffer and path_buffer_len, akin to 
> zoo_set().  The path_buffer_len would be treated as the full length of the 
> buffer (as the code does now, in fact, but the docs suggest otherwise).
> The code in the synchronous completion could then be changed as per the 
> attached patch.
> Since this would change, slightly, the behaviour or "contract" of the API, I 
> would be inclined to suggest waiting until 4.0.0 to implement this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-472) Making DataNode not instantiate a HashMap when the node is ephmeral

2009-07-15 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731595#action_12731595
 ] 

Benjamin Reed commented on ZOOKEEPER-472:
-

i think we should expand this to not instantiate a hashmap for all zondes if 
there aren't any children. it creates a fixed size overhead for all leaf nodes 
and since there will always be more leaves than inner nodes, it is a none 
trivial space saving. i think it could also speed serialization/deserialization 
since it is faster to process a null then an empty hashmap. plus i think it 
keeps the code simpler to not have a new class.

> Making DataNode not instantiate a HashMap when the node is ephmeral
> ---
>
> Key: ZOOKEEPER-472
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-472
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Erik Holstad
>Assignee: Erik Holstad
>Priority: Minor
> Fix For: 3.3.0
>
>
> Looking at the code, there is an overhead of a HashSet object for that nodes 
> children, even though the node might be an ephmeral node and cannot have 
> children.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-423) Add getFirstChild API

2009-07-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731114#action_12731114
 ] 

Benjamin Reed commented on ZOOKEEPER-423:
-

we should keep in mind that someday we may have a partitioned namespace. when 
that happens some of these options would be hard/very expensive/blocking. NAME 
of course is easy. the client can always do this. when the creation happens, we 
can store the xid with the child's name in the parent data structure since it 
doesn't change, so CREATED is reasonable. MODIFIED and DATA_SIZE is more 
problematic/seemingly impossible in the presence of a namespace partition.

> Add getFirstChild API
> -
>
> Key: ZOOKEEPER-423
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: contrib-bindings, documentation, java client, server
>Reporter: Henry Robinson
>
> When building the distributed queue for my tutorial blog post, it was pointed 
> out to me that there's a serious inefficiency here. 
> Informally, the items in the queue are created as sequential nodes. For a 
> 'dequeue' call, all items are retrieved and sorted by name by the client in 
> order to find the name of the next item to try and take. This costs O( n ) 
> bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this 
> doesn't scale very well. 
> If the servers were able to maintain a data structure that allowed them to 
> efficiently retrieve the children of a node in order of the zxid that created 
> them this would make successful dequeue operations O( 1 ) at the cost of O( n 
> ) memory on the server (to maintain, e.g. a singly-linked list as a queue). 
> This is a win if it is generally true that clients only want the first child 
> in creation order, rather than the whole set. 
> We could expose this to the client via this API: getFirstChild(handle, path, 
> name_buffer, watcher) which would have much the same semantics as 
> getChildren, but only return one znode name. 
> Sequential nodes would still allow the ordering of znodes to be made 
> explicitly available to the client in one RPC should it need it. Although: 
> since this ordering would now be available cheaply for every set of children, 
> it's not completely clear that there would be that many use cases left for 
> sequential nodes if this API was augmented with a getChildrenInCreationOrder 
> call. However, that's for a different discussion. 
> A halfway-house alternative with more flexibility is to add an 'order' 
> parameter to getFirstChild and have the server compute the first child 
> according to the requested order (creation time, update time, lexicographical 
> order). This saves bandwidth at the expense of increased server load, 
> although servers can be implemented to spend memory on pre-computing commonly 
> requested orders. I am only in favour of this approach if servers maintain a 
> data-structure for every possible order, and then the memory implications 
> need careful consideration.
> [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-14 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731073#action_12731073
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

to address the motivation a bit consider poorly connected data centers and 
cross datacenter zookeeper. we need to put zookeeper servers in the poorly 
connected data centers because we will want to service all the reads locally in 
those data centers, but we don't want to affect reliability or latency in other 
data centers. for example, imagine we have 5 poorly connected data centers and 
3 well connected data centers. we may put two servers in each data center. that 
means that we have an ensemble of 16 servers, but because of the poorly 
connected data centers, we are more likely to lose quorum than if we made the 5 
poorly connected data centers observers and just used the 3 well connected data 
centers to commit changes. you can view observers as proxies. 

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728751#action_12728751
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

hey, henry two other questions/comments for you:

* i'm trying to understand the use case for a follower that connects as an 
observer. this would adversely affect the reliability of the system since a 
follower acting as an observer would count as a failed follower even though it 
is up. did you have a case in mind?
* i think it is reasonable to turn off the sync for the observer, but we 
probably still want to log to disk so that we can recover quickly. otherwise we 
will keep doing state transfers from the leader every time we connect. right?

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-368) Observers

2009-07-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728749#action_12728749
 ] 

Benjamin Reed commented on ZOOKEEPER-368:
-

since henry is doing the work, i'm certainly willing to see how it goes with a 
new message. the two disadvantages i see with this approach (of adding INFORM):
* at a high level the only difference between a follower and an observer is 
that its ACKs do not get counted. i think this high level difference can 
translate into a simple modification of the follower. (since we already have to 
make sure that the follower sees proposals before commits, we shouldn't run 
into the problem you alluded to henry.)
* for the dynamic ensemble patch we want to be able to convert observers to 
followers easily and quickly. flipping a switch at the observer to start 
sending ACKs and at the leader to start acknowledging the ACKs would be an easy 
way to do the conversion.

oh and there already is a jira about going into read-only mode on disconnect.

> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: ZOOKEEPER-368.patch, ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2009-07-07 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728300#action_12728300
 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-

sorry to jump in late here. rather than adding the inform, why don't we just 
send the PROPOSE and COMMIT to the Observer as normal, and just make the 
Observer not send ACKs? That way we change as little code as possible with 
minimum overhead. It also makes switching from Observer to Follower as easy as 
turning on the ACKs. I also think Observers should be able to issue proposals. 
One use case for observers are remote data centers that basically proxy clients 
that connect to ZooKeeper. This means an Observer is just a Follower that 
doesn't vote (ACK).

> Allow dynamic changes to server cluster membership
> --
>
> Key: ZOOKEEPER-107
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Patrick Hunt
>Assignee: Henry Robinson
> Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts 
> to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-440) update the performance documentation in forrest

2009-06-29 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725360#action_12725360
 ] 

Benjamin Reed commented on ZOOKEEPER-440:
-

i have created the wiki page: 
http://wiki.apache.org/hadoop/ZooKeeper/Performance

i'd like to just leave it on the wiki for this release and move it to forrest 
when i can dedicate more time to the text and different benchmarks.

> update the performance documentation in forrest
> ---
>
> Key: ZOOKEEPER-440
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-440
> Project: Zookeeper
>  Issue Type: Task
>  Components: documentation
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
> Fix For: 3.2.0
>
>
> Ben, it would be great if you could update the performance documentation in 
> Forrest docs based on the 3.2 performance improvements.
> Specifically the scalling graphs (reads vs write load for various quorum 
> sizes)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Status: Patch Available  (was: Open)

> emphemeral cleanup not happening with session timeout
> -
>
> Key: ZOOKEEPER-450
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-450.patch
>
>
> The session move patch broke ephemeral cleanup during session expiration. 
> tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Attachment: (was: ZOOKEEPER-450.patch)

> emphemeral cleanup not happening with session timeout
> -
>
> Key: ZOOKEEPER-450
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-450.patch
>
>
> The session move patch broke ephemeral cleanup during session expiration. 
> tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Attachment: ZOOKEEPER-450.patch

updated the patch to comment on why the checkSession is not needed for the 
benefit of future maintainers.

> emphemeral cleanup not happening with session timeout
> -
>
> Key: ZOOKEEPER-450
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-450.patch
>
>
> The session move patch broke ephemeral cleanup during session expiration. 
> tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-450:


Attachment: ZOOKEEPER-450.patch

the patch detects the bug and fixes it. i'm not completely sure about the fix. 
it's simple and works, but there is a little non deterministic corner case: a 
client issues a close, but the connection drops after the request is received 
by the server, and the client moves to a new server and continues to use the 
session, the stray close will come in and close the session. this corner case 
is not possible with our current client implementation.

> emphemeral cleanup not happening with session timeout
> -
>
> Key: ZOOKEEPER-450
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-450.patch
>
>
> The session move patch broke ephemeral cleanup during session expiration. 
> tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-450) emphemeral cleanup not happening with session timeout

2009-06-29 Thread Benjamin Reed (JIRA)
emphemeral cleanup not happening with session timeout
-

 Key: ZOOKEEPER-450
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-450
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Benjamin Reed
Priority: Blocker
 Fix For: 3.2.0


The session move patch broke ephemeral cleanup during session expiration. 
tragically, we didn't have test coverage to detect the bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-449) sesssionmoved in java code and ZCLOSING in C have the same value.

2009-06-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724716#action_12724716
 ] 

Benjamin Reed commented on ZOOKEEPER-449:
-

+1

> sesssionmoved in java code and ZCLOSING in C have the same value.
> -
>
> Key: ZOOKEEPER-449
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-449
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-449.patch
>
>
> sesssionmoved in java code and ZCLOSING in C have the same value. We need to 
> assign a new value to ZSESSIONMOVED.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-448) png files do nto work with forrest.

2009-06-26 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724659#action_12724659
 ] 

Benjamin Reed commented on ZOOKEEPER-448:
-

+1

> png files do nto work with forrest.
> ---
>
> Key: ZOOKEEPER-448
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-448
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.2.0
>
> Attachments: 2pc.jpg, ZOOKEEPER-448.patch
>
>
> png images are not compatible with forrest generating pdf. We can them to jpg 
> to get them into pdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Assignee: (was: Benjamin Reed)
  Status: Open  (was: Patch Available)

> stray message problem when changing servers
> ---
>
> Key: ZOOKEEPER-417
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
> ZOOKEEPER-417.patch, ZOOKEEPER-417.patch
>
>
> There is  a possibility for stray messages from a previous connection to 
> violate ordering and generally cause problems. Here is a scenario: we have a 
> client, C, two followers, F1 and F2, and a leader, L. The client is connected 
> to F1, which is a slow follower. C sends setData("/a", "1") to F1 and then 
> loses the connection, so C reconnects to F2 and sends setData("/a", "2").  it 
> is possible, if F1 is slow enough and the setData("/a", "1") got onto the 
> network before the connection break, for F1 to forward the setData("/a", "1") 
> to L after F2 forwards setData("/a", "2").
> to fix this, the leader should keep track of which follower last registered a 
> session for a client and drop any requests from followers for clients for 
> whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Attachment: ZOOKEEPER-417.patch

implemented mahadev's suggestion

> stray message problem when changing servers
> ---
>
> Key: ZOOKEEPER-417
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
> ZOOKEEPER-417.patch, ZOOKEEPER-417.patch
>
>
> There is  a possibility for stray messages from a previous connection to 
> violate ordering and generally cause problems. Here is a scenario: we have a 
> client, C, two followers, F1 and F2, and a leader, L. The client is connected 
> to F1, which is a slow follower. C sends setData("/a", "1") to F1 and then 
> loses the connection, so C reconnects to F2 and sends setData("/a", "2").  it 
> is possible, if F1 is slow enough and the setData("/a", "1") got onto the 
> network before the connection break, for F1 to forward the setData("/a", "1") 
> to L after F2 forwards setData("/a", "2").
> to fix this, the leader should keep track of which follower last registered a 
> session for a client and drop any requests from followers for clients for 
> whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Assignee: Benjamin Reed
  Status: Patch Available  (was: Open)

> stray message problem when changing servers
> ---
>
> Key: ZOOKEEPER-417
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Assignee: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
> ZOOKEEPER-417.patch, ZOOKEEPER-417.patch
>
>
> There is  a possibility for stray messages from a previous connection to 
> violate ordering and generally cause problems. Here is a scenario: we have a 
> client, C, two followers, F1 and F2, and a leader, L. The client is connected 
> to F1, which is a slow follower. C sends setData("/a", "1") to F1 and then 
> loses the connection, so C reconnects to F2 and sends setData("/a", "2").  it 
> is possible, if F1 is slow enough and the setData("/a", "1") got onto the 
> network before the connection break, for F1 to forward the setData("/a", "1") 
> to L after F2 forwards setData("/a", "2").
> to fix this, the leader should keep track of which follower last registered a 
> session for a client and drop any requests from followers for clients for 
> whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Status: Patch Available  (was: Open)

removed the deprecated member

> stray message problem when changing servers
> ---
>
> Key: ZOOKEEPER-417
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Assignee: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
> ZOOKEEPER-417.patch
>
>
> There is  a possibility for stray messages from a previous connection to 
> violate ordering and generally cause problems. Here is a scenario: we have a 
> client, C, two followers, F1 and F2, and a leader, L. The client is connected 
> to F1, which is a slow follower. C sends setData("/a", "1") to F1 and then 
> loses the connection, so C reconnects to F2 and sends setData("/a", "2").  it 
> is possible, if F1 is slow enough and the setData("/a", "1") got onto the 
> network before the connection break, for F1 to forward the setData("/a", "1") 
> to L after F2 forwards setData("/a", "2").
> to fix this, the leader should keep track of which follower last registered a 
> session for a client and drop any requests from followers for clients for 
> whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Status: Open  (was: Patch Available)

> stray message problem when changing servers
> ---
>
> Key: ZOOKEEPER-417
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Assignee: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
> ZOOKEEPER-417.patch
>
>
> There is  a possibility for stray messages from a previous connection to 
> violate ordering and generally cause problems. Here is a scenario: we have a 
> client, C, two followers, F1 and F2, and a leader, L. The client is connected 
> to F1, which is a slow follower. C sends setData("/a", "1") to F1 and then 
> loses the connection, so C reconnects to F2 and sends setData("/a", "2").  it 
> is possible, if F1 is slow enough and the setData("/a", "1") got onto the 
> network before the connection break, for F1 to forward the setData("/a", "1") 
> to L after F2 forwards setData("/a", "2").
> to fix this, the leader should keep track of which follower last registered a 
> session for a client and drop any requests from followers for clients for 
> whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-417) stray message problem when changing servers

2009-06-25 Thread Benjamin Reed (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-417:


Attachment: ZOOKEEPER-417.patch

> stray message problem when changing servers
> ---
>
> Key: ZOOKEEPER-417
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-417
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Benjamin Reed
>Assignee: Benjamin Reed
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: ZOOKEEPER-417.patch, ZOOKEEPER-417.patch, 
> ZOOKEEPER-417.patch
>
>
> There is  a possibility for stray messages from a previous connection to 
> violate ordering and generally cause problems. Here is a scenario: we have a 
> client, C, two followers, F1 and F2, and a leader, L. The client is connected 
> to F1, which is a slow follower. C sends setData("/a", "1") to F1 and then 
> loses the connection, so C reconnects to F2 and sends setData("/a", "2").  it 
> is possible, if F1 is slow enough and the setData("/a", "1") got onto the 
> network before the connection break, for F1 to forward the setData("/a", "1") 
> to L after F2 forwards setData("/a", "2").
> to fix this, the leader should keep track of which follower last registered a 
> session for a client and drop any requests from followers for clients for 
> whom they do not have a registration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    1   2   3   4   5   6   7   8   >