from:"Henry Robinson \(JIRA\)"

[jira] Commented: (ZOOKEEPER-921) zkPython incorrectly checks for existence of required ACL elements

2010-11-09 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930008#action_12930008
 ] 

Henry Robinson commented on ZOOKEEPER-921:
--

Nicholas - 

Good catch, thanks! Do you think you will be able to submit a patch fixing the 
args checking in check_is_acl()?

Thanks,
Henry

> zkPython incorrectly checks for existence of required ACL elements
> --
>
> Key: ZOOKEEPER-921
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1, 3.4.0
> Environment: Mac OS X 10.6.4, included Python 2.6.1
>Reporter: Nicholas Knight
>Assignee: Nicholas Knight
> Fix For: 3.3.3, 3.4.0
>
> Attachments: zktest.py
>
>
> Calling {{zookeeper.create()}} seems, under certain circumstances, to be 
> corrupting a subsequent call to Python's {{logging}} module.
> Specifically, if the node does not exist (but its parent does), I end up with 
> a traceback like this when I try to make the logging call:
> {noformat}
> Traceback (most recent call last):
>   File "zktest.py", line 21, in 
> logger.error("Boom?")
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py",
>  line 1046, in error
> if self.isEnabledFor(ERROR):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py",
>  line 1206, in isEnabledFor
> return level >= self.getEffectiveLevel()
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py",
>  line 1194, in getEffectiveLevel
> while logger:
> TypeError: an integer is required
> {noformat}
> But if the node already exists, or the parent does not exist, I get the 
> appropriate NodeExists or NoNode exceptions.
> I'll be attaching a test script that can be used to reproduce this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-851) ZK lets any node to become an observer

2010-10-28 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926016#action_12926016
 ] 

Henry Robinson commented on ZOOKEEPER-851:
--

I think what happens is that the leader happily lets the new follower connect, 
but that it won't be part of any voting procedure. It shouldn't become leader 
because no other nodes know about it to  propose or support a vote for it. 

To add a new node, you'll need to incrementally restart every node in your 
cluster with the new config.

> ZK lets any node to become an observer
> --
>
> Key: ZOOKEEPER-851
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.3.1
>Reporter: Vishal K
>Priority: Critical
> Fix For: 3.4.0
>
>
> I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
> show below:
> tickTime=2000
> dataDir=/var/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.0=10.150.27.61:2888:3888
> server.1=10.150.27.62:2888:3888
> server.2=10.150.27.63:2888:3888
> I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
> created another entry for that node and started zk server. The zoo.cfg on the 
> first 3 nodes was left unchanged. The fourth node was able to join the 
> cluster even though the 3 nodes had no idea about the fourth node.
> zoo.cfg on fourth node:
> tickTime=2000
> dataDir=/var/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.0=10.150.27.61:2888:3888
> server.1=10.150.27.62:2888:3888
> server.2=10.150.27.63:2888:3888
> server.3=10.17.117.71:2888:3888
> It looks like 10.17.117.71 is becoming an observer in this case. I was 
> expecting that the leader will reject 10.17.117.71.
> # telnet 10.17.117.71 2181
> Trying 10.17.117.71...
> Connected to 10.17.117.71.
> Escape character is '^]'.
> stat
> Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
> Clients:
>  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
> Latency min/avg/max: 0/0/0
> Received: 3
> Sent: 2
> Outstanding: 0
> Zxid: 0x20065
> Mode: follower
> Node count: 288

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-851) ZK lets any node to become an observer

2010-10-28 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925927#action_12925927
 ] 

Henry Robinson commented on ZOOKEEPER-851:
--

Hi Vishal - 

Sorry for the slow turnaround on this one. It doesn't surprise me that this is 
the behaviour, although it's slightly unexpected that the node becomes an 
observer, rather than a follower. What evidence do you have for that? (Given 
that Mode: follower - I haven't checked the code in a while, but I would have 
thought it would print Mode: Observer).

Henry

> ZK lets any node to become an observer
> --
>
> Key: ZOOKEEPER-851
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.3.1
>Reporter: Vishal K
>Priority: Critical
> Fix For: 3.4.0
>
>
> I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
> show below:
> tickTime=2000
> dataDir=/var/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.0=10.150.27.61:2888:3888
> server.1=10.150.27.62:2888:3888
> server.2=10.150.27.63:2888:3888
> I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
> created another entry for that node and started zk server. The zoo.cfg on the 
> first 3 nodes was left unchanged. The fourth node was able to join the 
> cluster even though the 3 nodes had no idea about the fourth node.
> zoo.cfg on fourth node:
> tickTime=2000
> dataDir=/var/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.0=10.150.27.61:2888:3888
> server.1=10.150.27.62:2888:3888
> server.2=10.150.27.63:2888:3888
> server.3=10.17.117.71:2888:3888
> It looks like 10.17.117.71 is becoming an observer in this case. I was 
> expecting that the leader will reject 10.17.117.71.
> # telnet 10.17.117.71 2181
> Trying 10.17.117.71...
> Connected to 10.17.117.71.
> Escape character is '^]'.
> stat
> Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
> Clients:
>  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
> Latency min/avg/max: 0/0/0
> Received: 3
> Sent: 2
> Outstanding: 0
> Zxid: 0x20065
> Mode: follower
> Node count: 288

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-19 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> c-client / zkpython: Double free corruption on node watcher
> ---
>
> Key: ZOOKEEPER-888
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Lukas
>Assignee: Lukas
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, 
> ZOOKEEPER-888.patch
>
>
> the c-client / zkpython wrapper invokes already freed watcher callback
> steps to reproduce:
>   0. start a zookeper server on your machine
>   1. run the attached python script
>   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
>   3. wait until the connection and the node observer fired with a session 
> event
>   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
> -> the client tries to dispatch the node observer function again, but it was 
> already freed -> double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-19 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-

Hadoop Flags: [Reviewed]

I just committed this to origin/branch-3.3 and origin/trunk. 

Thanks both!

> c-client / zkpython: Double free corruption on node watcher
> ---
>
> Key: ZOOKEEPER-888
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Lukas
>Assignee: Lukas
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, 
> ZOOKEEPER-888.patch
>
>
> the c-client / zkpython wrapper invokes already freed watcher callback
> steps to reproduce:
>   0. start a zookeper server on your machine
>   1. run the attached python script
>   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
>   3. wait until the connection and the node observer fired with a session 
> event
>   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
> -> the client tries to dispatch the node observer function again, but it was 
> already freed -> double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-18 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922209#action_12922209
 ] 

Henry Robinson commented on ZOOKEEPER-888:
--

The patch as it stands relies on ZOOKEEPER-853 (which it fixes) which is not in 
3.3 as it is a small API change - it changes is_unrecoverable to return Python 
True or False, rather than ZINVALIDSTATE. 

So I'm not certain about what to do here - we try not to change APIs between 
minor versions. However, this is a very minor change, and this patch fixes a 
significant bug. I'm inclined to commit both 853 and this patch to 3.3 as well 
as trunk, and put a note in the release notes. 

Any objections?

> c-client / zkpython: Double free corruption on node watcher
> ---
>
> Key: ZOOKEEPER-888
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Lukas
>Assignee: Lukas
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: resume-segfault.py, ZOOKEEPER-888.patch
>
>
> the c-client / zkpython wrapper invokes already freed watcher callback
> steps to reproduce:
>   0. start a zookeper server on your machine
>   1. run the attached python script
>   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
>   3. wait until the connection and the node observer fired with a session 
> event
>   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
> -> the client tries to dispatch the node observer function again, but it was 
> already freed -> double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher

2010-10-14 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-888:
-


The patch looks good to me - thanks! 

Could you add a test case that verifies the correct behaviour, if possible? (I 
appreciate it can be hard to fake unrecoverable session errors). We keep 
circling around the correct behaviour for this code block, and I'd like to 
capture it in a test suite.

> c-client / zkpython: Double free corruption on node watcher
> ---
>
> Key: ZOOKEEPER-888
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Lukas
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: resume-segfault.py, ZOOKEEPER-888.patch
>
>
> the c-client / zkpython wrapper invokes already freed watcher callback
> steps to reproduce:
>   0. start a zookeper server on your machine
>   1. run the attached python script
>   2. suspend the zookeeper server process (e.g. using `pkill -STOP -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
>   3. wait until the connection and the node observer fired with a session 
> event
>   4. resume the zookeeper server process  (e.g. using `pkill -CONT -f 
> org.apache.zookeeper.server.quorum.QuorumPeerMain` )
> -> the client tries to dispatch the node observer function again, but it was 
> already freed -> double free corruption

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests

2010-10-14 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921103#action_12921103
 ] 

Henry Robinson commented on ZOOKEEPER-893:
--

Thanks for the patch Thijs! It looks pretty good to me - good catch.

Do you think you might be able to write a test case that verifies correct 
behaviour when you send malformed messages to the control port? 

> ZooKeeper high cpu usage when invalid requests
> --
>
> Key: ZOOKEEPER-893
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Linux 2.6.16
> 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>Reporter: Thijs Terlouw
>Assignee: Thijs Terlouw
>Priority: Critical
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-893.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When ZooKeeper receives certain illegally formed messages on the internal 
> communication port (:4181 by default), it's possible for ZooKeeper to enter 
> an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
> but that patch does not resolve all issues.
> from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
> the two affected parts:
> ===
> int length = msgLength.getInt();  
>   
> if(length <= 0) { 
>   
> throw new IOException("Invalid packet length:" + length); 
>   
> } 
> ===
> ===
> while (message.hasRemaining()) {  
>   
> temp_numbytes = channel.read(message);
>   
> if(temp_numbytes < 0) {   
>   
> throw new IOException("Channel eof before end");  
>   
> } 
>   
> numbytes += temp_numbytes;
>   
> } 
> ===
> how to replicate this bug:
> perform an nmap portscan against your zookeeper server: "nmap -sV -n 
> your.ip.here -p4181"
> wait for a while untill you see some messages in the logfile and then you 
> will see 100% cpu usage. It does not recover from this situation. With my 
> patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

2010-09-14 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-785:
-


+1, this looks good (although I'd remove the 'out of place in this class' 
comment now that you've moved it). 

>  Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
> ---
>
> Key: ZOOKEEPER-785
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Tested in linux with a new jvm
>Reporter: Alex Newman
>Assignee: Patrick Hunt
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, 
> ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch, 
> ZOOKEEPER-785_2_br33.patch
>
>
> The following config causes an infinite loop
> [zoo.cfg]
> tickTime=2000
> dataDir=/var/zookeeper/
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=localhost:2888:3888
> Output:
> 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum 
> peer
> 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to 
> port 0.0.0.0/0.0.0.0:2181
> 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000
> 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set 
> to -1
> 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set 
> to -1
> 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10
> 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot 
> /var/zookeeper/version-2/snapshot.c
> 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My 
> election bind port: 3888
> 2010-06-01 16:20:32,554 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,556 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,558 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 1, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,560 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
> 2010-06-01 16:20:32,560 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,560 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,561 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 2, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,561 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
> 2010-06-01 16:20:32,561 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,562 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,562 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 3, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,562 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> Things like HBase require that the zookeeper servers be listed in the 
> zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in 
> a loop though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

2010-09-14 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-785:
-

Hadoop Flags: [Reviewed]

>  Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
> ---
>
> Key: ZOOKEEPER-785
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Tested in linux with a new jvm
>Reporter: Alex Newman
>Assignee: Patrick Hunt
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, 
> ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch, 
> ZOOKEEPER-785_2_br33.patch
>
>
> The following config causes an infinite loop
> [zoo.cfg]
> tickTime=2000
> dataDir=/var/zookeeper/
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=localhost:2888:3888
> Output:
> 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum 
> peer
> 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to 
> port 0.0.0.0/0.0.0.0:2181
> 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000
> 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set 
> to -1
> 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set 
> to -1
> 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10
> 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot 
> /var/zookeeper/version-2/snapshot.c
> 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My 
> election bind port: 3888
> 2010-06-01 16:20:32,554 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,556 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,558 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 1, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,560 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
> 2010-06-01 16:20:32,560 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,560 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,561 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 2, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,561 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
> 2010-06-01 16:20:32,561 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,562 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,562 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 3, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,562 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> Things like HBase require that the zookeeper servers be listed in the 
> zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in 
> a loop though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-785) Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

2010-09-14 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909376#action_12909376
 ] 

Henry Robinson commented on ZOOKEEPER-785:
--

This patch looks good - a couple of comments:

1. Can you expand the comment " // Not a quorum configuration so return 
immediately" to be clear that this isn't a problem, and that the server will 
default to standalone mode?
2. Can you actually move the 'bit out of place' test to somewhere more 
sensible? :) Let's make a QuorumConfigurationTest class if we have to.



>  Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line
> ---
>
> Key: ZOOKEEPER-785
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-785
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
> Environment: Tested in linux with a new jvm
>Reporter: Alex Newman
>Assignee: Patrick Hunt
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-785.patch, ZOOKEEPER-785.patch, 
> ZOOKEEPER-785_2.patch, ZOOKEEPER-785_2_br33.patch
>
>
> The following config causes an infinite loop
> [zoo.cfg]
> tickTime=2000
> dataDir=/var/zookeeper/
> clientPort=2181
> initLimit=10
> syncLimit=5
> server.0=localhost:2888:3888
> Output:
> 2010-06-01 16:20:32,471 - INFO [main:quorumpeerm...@119] - Starting quorum 
> peer
> 2010-06-01 16:20:32,489 - INFO [main:nioservercnxn$fact...@143] - binding to 
> port 0.0.0.0/0.0.0.0:2181
> 2010-06-01 16:20:32,504 - INFO [main:quorump...@818] - tickTime set to 2000
> 2010-06-01 16:20:32,504 - INFO [main:quorump...@829] - minSessionTimeout set 
> to -1
> 2010-06-01 16:20:32,505 - INFO [main:quorump...@840] - maxSessionTimeout set 
> to -1
> 2010-06-01 16:20:32,505 - INFO [main:quorump...@855] - initLimit set to 10
> 2010-06-01 16:20:32,526 - INFO [main:files...@82] - Reading snapshot 
> /var/zookeeper/version-2/snapshot.c
> 2010-06-01 16:20:32,547 - INFO [Thread-1:quorumcnxmanager$liste...@436] - My 
> election bind port: 3888
> 2010-06-01 16:20:32,554 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,556 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,558 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 1, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,560 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
> 2010-06-01 16:20:32,560 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,560 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,561 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 2, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,561 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
> 2010-06-01 16:20:32,561 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING
> 2010-06-01 16:20:32,562 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election. My 
> id = 0, Proposed zxid = 12
> 2010-06-01 16:20:32,562 - INFO 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification: 0, 
> 12, 3, 0, LOOKING, LOOKING, 0
> 2010-06-01 16:20:32,562 - WARN 
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@623] - Unexpected exception
> java.lang.NullPointerException
> Things like HBase require that the zookeeper servers be listed in the 
> zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in 
> a loop though.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-853) Make zookeeper.is_unrecoverable return True or False and not an integer

2010-08-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-853:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this (to trunk) - thanks Andrei!

> Make zookeeper.is_unrecoverable return True or False and not an integer
> ---
>
> Key: ZOOKEEPER-853
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-853
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Andrei Savu
>Assignee: Andrei Savu
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-853.patch, ZOOKEEPER-853.patch
>
>
> This is a patch that fixes a TODO from the python zookeeper extension, it 
> makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an 
> integer. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-853) Make zookeeper.is_unrecoverable return True or False and not an integer

2010-08-24 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-853:
-

Hadoop Flags: [Reviewed]

+1 This looks good to me - thanks. 

> Make zookeeper.is_unrecoverable return True or False and not an integer
> ---
>
> Key: ZOOKEEPER-853
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-853
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Andrei Savu
>Assignee: Andrei Savu
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-853.patch, ZOOKEEPER-853.patch
>
>
> This is a patch that fixes a TODO from the python zookeeper extension, it 
> makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an 
> integer. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-792) zkpython memory leak

2010-08-22 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-792:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I just committed this! Thanks Lei Zhang!

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch, 
> ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-792) zkpython memory leak

2010-08-19 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-792:
-

Attachment: ZOOKEEPER-792.patch

I forgot --no-prefix. Plus ca change, plus c'est la meme chose. 

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch, 
> ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-792) zkpython memory leak

2010-08-19 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-792:
-

Attachment: ZOOKEEPER-792.patch

Updated patch to remove that bug. 

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch, ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-19 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900380#action_12900380
 ] 

Henry Robinson commented on ZOOKEEPER-792:
--

Aha - I think I have found the problem, and it was related to this patch.


   PyObject *ret = Py_BuildValue( "(s#,N)", buffer,buffer_len, stat_dict );
+  free_pywatcher(pw);
   free(buffer);

We shouldn't free the pywatcher_t object here because it may be called later. 
This was what was causing the segfault I was seeing. I'll upload a new patch 
with this line removed; I hope it will still fix your memory consumption 
issues. 

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-17 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899676#action_12899676
 ] 

Henry Robinson commented on ZOOKEEPER-792:
--

Just to update - I've found that zkpython tests are failing in trunk, and I 
don't want to commit a patch when the tests are broken. I'll be creating a JIRA 
shortly to address the problem once I've looked into it slightly further.

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-792) zkpython memory leak

2010-08-16 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899004#action_12899004
 ] 

Henry Robinson commented on ZOOKEEPER-792:
--

Hi - 

Sorry for the slow response! I just took a look over the patch - good catches.

+1. I'll commit within the day. 

Henry

> zkpython memory leak
> 
>
> Key: ZOOKEEPER-792
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.1
> Environment: vmware workstation - guest OS:Linux python:2.4.3
>Reporter: Lei Zhang
>Assignee: Lei Zhang
> Fix For: 3.3.2, 3.4.0
>
> Attachments: ZOOKEEPER-792.patch
>
>
> We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less 
> client deadlock on session expiration, which is a definite plus!
> Unfortunately we are seeing memory leak that requires our zk clients to be 
> restarted every half-day. Valgrind result:
> ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in 
> loss record 255 of 670
> ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418)
> ==8804==by 0x5047B42: parse_acls (zookeeper.c:369)
> ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
> ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
> ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-08-11 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897338#action_12897338
 ] 

Henry Robinson commented on ZOOKEEPER-784:
--

Spectacular job, Sergey. I've taken a look at the code and I'm pretty satisfied 
- you've done a great job covering little things like JMX support, and good 
code comments and documentation. 

I'm going to wait for one of the other committers to come by and also give this 
a +1 since this is a substantial change. We may also decide to run a long-lived 
test with this patch to satisfy ourselves of the stability. But this looks 
very, very solid indeed. 

> server-side functionality for read-only mode
> 
>
> Key: ZOOKEEPER-784
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
> ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
> ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch
>
>
> As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
> ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-08-01 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894460#action_12894460
 ] 

Henry Robinson commented on ZOOKEEPER-784:
--

Sergey this looks great - thanks! I'll take a look at it asap.

> server-side functionality for read-only mode
> 
>
> Key: ZOOKEEPER-784
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
> ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
> ZOOKEEPER-784.patch, ZOOKEEPER-784.patch
>
>
> As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
> ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython

2010-07-19 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889940#action_12889940
 ] 

Henry Robinson commented on ZOOKEEPER-821:
--

Rich - 

This is a really useful contribution, thanks! The only thing I would change 
from your patch would be to use snprintf with a buffer length of 10 so as to 
avoid any potential string overflows if our version numbers ever get huge :)

Otherwise +1; if you make this change I'll commit asap. 

Thanks!
Henry

> Add ZooKeeper version information to zkpython
> -
>
> Key: ZOOKEEPER-821
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Affects Versions: 3.3.1
>Reporter: Rich Schumacher
>Assignee: Rich Schumacher
>Priority: Trivial
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-821.patch
>
>
> Since installing and using ZooKeeper I've built and installed no less than 
> four versions of the zkpython bindings.  It would be really helpful if the 
> module had a '__version__' attribute to easily tell which version is 
> currently in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode

2010-06-23 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881746#action_12881746
 ] 

Henry Robinson commented on ZOOKEEPER-784:
--

I like the idea of fake sessions fine, although I think that the upgrade 
process might be complex. Another possibility is to do away with sessions in 
read-only mode (because they're mainly used to maintain state about watches, 
which don't make sense on a read-only server).

Sergey - just looked over your patch. Nice job! Couple of questions:

1. In QuorumPeer.java, I can't quite follow the logic in this part of the patch:

{code}
while (running) {
 switch (getPeerState()) {
 case LOOKING:
+LOG.info("LOOKING");
+ReadOnlyZooKeeperServer roZk = null;
 try {
-LOG.info("LOOKING");
+roZk = new ReadOnlyZooKeeperServer(
+logFactory, this,
+new ZooKeeperServer.BasicDataTreeBuilder(),
+this.zkDb);
+roZk.startup();
+
{code}

- is it sensible to start a ROZKServer every time a server enters the 'LOOKING' 
state, or should there be some kind of delay before it decides it is 
partitioned? Otherwise when a leader is lost and the quorum is doing a 
re-election, r/w clients that try and connect would get (I think) 'can't be 
read-only' messages .

2. What are you doing about watches? It seems to me that setting a watch turns 
a read operation into a read / write operation, and the client should be told 
that watch registration failed. If you can do this you don't have to worry so 
much about session migration because there's very little session state 
maintained by a ROZKServer on behalf of the client.

3. This patch has got to the point where it might be good if you started adding 
some tests to validate any further development you do. 


> server-side functionality for read-only mode
> 
>
> Key: ZOOKEEPER-784
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784
> Project: Zookeeper
>  Issue Type: Sub-task
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, 
> ZOOKEEPER-784.patch
>
>
> As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create 
> ReadOnlyZooKeeperServer which comes into play when peer is partitioned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-06-09 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877227#action_12877227
 ] 

Henry Robinson commented on ZOOKEEPER-740:
--

Mike - 

Great catch, thanks for figuring this out. 

I'm correct in saying that this doesn't prevent watchers from eventually being 
correctly freed, right? 

If so, then it would be great if you could submit this patch formally so that 
we can get it into trunk. See 
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute for details.

Thanks,
Henry

> zkpython leading to segfault on zookeeper
> -
>
> Key: ZOOKEEPER-740
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Federico
>Assignee: Henry Robinson
>Priority: Critical
> Fix For: 3.4.0
>
>
> The program that we are implementing uses the python binding for zookeeper 
> but sometimes it crash with segfault; here is the bt from gdb:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xad244b70 (LWP 28216)]
> 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> 2488../Objects/abstract.c: No such file or directory.
> in ../Objects/abstract.c
> (gdb) bt
> #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
> arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
> #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
> at ../Objects/abstract.c:2480
> #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
> #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
> #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
> list=0xa5354140) at src/zk_hashtable.c:317
> #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
> #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
> #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (ZOOKEEPER-704) GSoC 2010: Read-Only Mode

2010-06-02 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-704:


Assignee: Sergey Doroshenko

> GSoC 2010: Read-Only Mode
> -
>
> Key: ZOOKEEPER-704
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-704
> Project: Zookeeper
>  Issue Type: Wish
>Reporter: Henry Robinson
>Assignee: Sergey Doroshenko
>
> Read-only mode
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java and TCP/IP networking
> Description
> When a ZooKeeper server loses contact with over half of the other servers in 
> an ensemble ('loses a quorum'), it stops responding to client requests 
> because it cannot guarantee that writes will get processed correctly. For 
> some applications, it would be beneficial if a server still responded to read 
> requests when the quorum is lost, but caused an error condition when a write 
> request was attempted.
> This project would implement a 'read-only' mode for ZooKeeper servers (maybe 
> only for Observers) that allowed read requests to be served as long as the 
> client can contact a server.
> This is a great project for getting really hands-on with the internals of 
> ZooKeeper - you must be comfortable with Java and networking otherwise you'll 
> have a hard time coming up to speed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized

2010-06-01 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-783:
-

Attachment: ZOOKEEPER-783.patch

Defensive copying added to getCommittedLog() and synchronization during 
clear(). 

No tests added; really not sure how best to test for this. It does fix my test 
case but it's very difficult to distill that into a test (plus it only fails 
once in about 100 runs). 

> committedLog in ZKDatabase is not properly synchronized
> ---
>
> Key: ZOOKEEPER-783
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.1
>Reporter: Henry Robinson
>Priority: Critical
> Attachments: ZOOKEEPER-783.patch
>
>
> ZKDatabase.getCommittedLog() returns a reference to the LinkedList 
> committedLog in ZKDatabase. This is then iterated over by at least one 
> caller. 
> I have seen a bug that causes a NPE in LinkedList.clear on committedLog, 
> which I am pretty sure is due to the lack of synchronization. This bug has 
> not been apparent in normal ZK operation, but in code that I have that starts 
> and stops a ZK server in process repeatedly (clear() is called from 
> ZooKeeperServerMain.shutdown()). 
> It's better style to defensively copy the list in getCommittedLog, and to 
> synchronize on the list in ZKDatabase.clear.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized

2010-06-01 Thread Henry Robinson (JIRA)

committedLog in ZKDatabase is not properly synchronized
---

 Key: ZOOKEEPER-783
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Henry Robinson
Priority: Critical


ZKDatabase.getCommittedLog() returns a reference to the LinkedList 
committedLog in ZKDatabase. This is then iterated over by at least one caller. 

I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which 
I am pretty sure is due to the lack of synchronization. This bug has not been 
apparent in normal ZK operation, but in code that I have that starts and stops 
a ZK server in process repeatedly (clear() is called from 
ZooKeeperServerMain.shutdown()). 

It's better style to defensively copy the list in getCommittedLog, and to 
synchronize on the list in ZKDatabase.clear.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument

2010-05-21 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870179#action_12870179
 ] 

Henry Robinson commented on ZOOKEEPER-776:
--

Greg - 

Don't worry - you should have seen the hash I made of my first patch!

Hudson is misbehaving at the moment, so I'm not convinced that the test 
failures are as a result of your patch. You don't need to do anything right now 
- I'll take a look and update this ticket once I know what's going on.

cheers,
Henry

> API should sanity check sessionTimeout argument
> ---
>
> Key: ZOOKEEPER-776
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client, java client
>Affects Versions: 3.2.2, 3.3.0, 3.3.1
> Environment: OSX 10.6.3, JVM 1.6.0-20
>Reporter: Gregory Haskins
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: zookeeper-776-fix.patch
>
>
> passing in a "0" sessionTimeout to ZooKeeper() constructor leads to errors in 
> subsequent operations.  It would be ideal to capture this configuration error 
> at the source by throwing something like an IllegalArgument exception when 
> the bogus sessionTimeout is specified, instead of later when it is utilized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument

2010-05-21 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870164#action_12870164
 ] 

Henry Robinson commented on ZOOKEEPER-776:
--

Cancelling the patch is fine but there's no need to delete it - Hudson will 
always figure out what the latest patch is and it's good to see how a ticket 
evolved.

Tests will also help :)

> API should sanity check sessionTimeout argument
> ---
>
> Key: ZOOKEEPER-776
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client, java client
>Affects Versions: 3.2.2, 3.3.0, 3.3.1
> Environment: OSX 10.6.3, JVM 1.6.0-20
>Reporter: Gregory Haskins
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: zookeeper-776-fix.patch
>
>
> passing in a "0" sessionTimeout to ZooKeeper() constructor leads to errors in 
> subsequent operations.  It would be ideal to capture this configuration error 
> at the source by throwing something like an IllegalArgument exception when 
> the bogus sessionTimeout is specified, instead of later when it is utilized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-776) API should sanity check sessionTimeout argument

2010-05-21 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870152#action_12870152
 ] 

Henry Robinson commented on ZOOKEEPER-776:
--

Thanks Greg - can you generate your patch from git with --no-prefix, to make it 
svn compatible?

> API should sanity check sessionTimeout argument
> ---
>
> Key: ZOOKEEPER-776
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-776
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: c client, java client
>Affects Versions: 3.2.2, 3.3.0, 3.3.1
> Environment: OSX 10.6.3, JVM 1.6.0-20
>Reporter: Gregory Haskins
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: zookeeper-776-fix.patch
>
>
> passing in a "0" sessionTimeout to ZooKeeper() constructor leads to errors in 
> subsequent operations.  It would be ideal to capture this configuration error 
> at the source by throwing something like an IllegalArgument exception when 
> the bogus sessionTimeout is specified, instead of later when it is utilized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-21 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this - thanks Sergey!

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-20 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869822#action_12869822
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Failures do not look related to this patch (although I could be mistaken). 
ZkDatabaseCorruptionTest is the most recent broken test - passes fine for me 
locally?

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-20 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

  Status: Patch Available  (was: Open)
Hadoop Flags: [Reviewed]

hudson? hello?

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-20 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Open  (was: Patch Available)

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Patch Available  (was: Open)

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Attachment: ZOOKEEPER-769.patch

I made a few small changes to your patch to make the logic a little easier to 
follow. Take a look and let me know if you think this is ok, otherwise I'll 
commit the patch tomorrow. Thanks!

Henry

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-769:
-

Status: Open  (was: Patch Available)

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-18 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868780#action_12868780
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - sorry for the delay. It's on me to review this patch, and then I'll 
commit it.

Thanks for your patience!

Henry

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg, ZOOKEEPER-769.patch
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Attachment: ZOOKEEPER-772.patch

--no-prefix, predictably.

> zkpython segfaults when watcher from async get children is invoked.
> ---
>
> Key: ZOOKEEPER-772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
> Environment: ubuntu lucid (10.04) / zk trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
> ZOOKEEPER-772.patch, ZOOKEEPER-772.patch
>
>
> When utilizing the zkpython async get children api with a watch, i 
> consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Status: Open  (was: Patch Available)

> zkpython segfaults when watcher from async get children is invoked.
> ---
>
> Key: ZOOKEEPER-772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
> Environment: ubuntu lucid (10.04) / zk trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
> ZOOKEEPER-772.patch, ZOOKEEPER-772.patch
>
>
> When utilizing the zkpython async get children api with a watch, i 
> consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Status: Patch Available  (was: Open)

> zkpython segfaults when watcher from async get children is invoked.
> ---
>
> Key: ZOOKEEPER-772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
> Environment: ubuntu lucid (10.04) / zk trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
> ZOOKEEPER-772.patch, ZOOKEEPER-772.patch
>
>
> When utilizing the zkpython async get children api with a watch, i 
> consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Status: Patch Available  (was: Open)

> zkpython segfaults when watcher from async get children is invoked.
> ---
>
> Key: ZOOKEEPER-772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
> Environment: ubuntu lucid (10.04) / zk trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
> ZOOKEEPER-772.patch
>
>
> When utilizing the zkpython async get children api with a watch, i 
> consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-772:
-

Attachment: ZOOKEEPER-772.patch

Bug was simple when I got round to looking - was incorrectly reusing a watcher 
that was getting deallocated before getting called.

> zkpython segfaults when watcher from async get children is invoked.
> ---
>
> Key: ZOOKEEPER-772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
> Environment: ubuntu lucid (10.04) / zk trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, 
> ZOOKEEPER-772.patch
>
>
> When utilizing the zkpython async get children api with a watch, i 
> consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.

2010-05-17 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-772:


Assignee: Henry Robinson

> zkpython segfaults when watcher from async get children is invoked.
> ---
>
> Key: ZOOKEEPER-772
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
> Environment: ubuntu lucid (10.04) / zk trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff
>
>
> When utilizing the zkpython async get children api with a watch, i 
> consistently get segfaults when the watcher is invoked to process events. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-679) Offers a node design for interacting with the Java Zookeeper client.

2010-05-09 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865639#action_12865639
 ] 

Henry Robinson commented on ZOOKEEPER-679:
--

Hi Aaron - 

The great thing about open source, and the relatively permissive Apache license 
in particular, is that Chris is free to copy any and all of ZK into github and 
continue with a development process that he finds more agreeable. It is 
completely kosher to do this. As Chris says, you are welcome to contribute, 
fork or ignore it. 

As far as I am concerned, contrib is an excellent place to put projects that 
directly add more functionality to their parent project (the language bindings 
and this patch are good examples), but not a great place to store standalone 
projects that simply leverage the parent (an example might be a DNS server, 
written in ZooKeeper). This is a needfully vague distinction, and others will 
have different opinions.

I do not know specifically to what Chris is referring when he talks about an 
'onerous' patch process, but I speculate he might mean that the role of 
'committer' - someone who is gating the submission of patches - makes it harder 
to get your patches available for others to use quickly. Of course there are 
also benefits of this approach, such as a ready collection of experienced users 
on hand to offer advice and the relatively high standard for patches to be 
accepted to trunk arguably improves code quality. What's great is the two 
development styles are not mutually exclusive, and can, ideally, benefit from 
each other. If you are having difficulties with, or are frustrated by, the 
patch submission process here, ask for help. The community here is very happy 
to help, and we'll do what we can to address pain points. 

As for this patch, I'm happy it's going into contrib - users sometimes find 
ZooKeeper difficult to program to, and examples and new abstractions are always 
welcome. Keeping this patch in the main repository means that newcomers to 
ZooKeeper will find it more easily. Thanks for the contribution!

Henry

> Offers a node design for interacting with the Java Zookeeper client.
> 
>
> Key: ZOOKEEPER-679
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-679
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: contrib, java client, tests
>Reporter: Aaron Crow
>Assignee: Aaron Crow
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-679.patch, ZOOKEEPER-679.patch, 
> ZOOKEEPER-679.patch, ZOOKEEPER-679.patch
>
>
> Following up on my conversations with Patrick and Mahadev 
> (http://n2.nabble.com/Might-I-contribute-a-Node-design-for-the-Java-API-td4567695.html#a4567695).
> This patch includes the implementation as well as unit tests. The first unit 
> test gives a simple high level demo of using the node API.
> The current implementation is simple and is only what I need withe current 
> project I am working on. However, I am very open to any and all suggestions 
> for improvement.
> This is a proposal to support a simplified node (or File) like API into a 
> Zookeeper tree, by wrapping the Zookeeper Java client. It is similar to 
> Java's File API design.
> Although, I'm trying to make it easier in a few spots. For example, deleting 
> a Node recursively is done by default. I also lean toward resolving 
> Exceptions "under the hood" when it seems appropriate. For example, if you 
> ask a Node if it exists, and its parent doesn't even exist, you just get a 
> false back (rather than a nasty Exception).
> As for watches and ephemeral nodes, my current work does not need these 
> things so I currently have no handling of them. But if potential users of  
> the "Node a.k.a. File" design want these things, I'd be open to supporting 
> them as reasonable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-07 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865240#action_12865240
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - 

Great, thanks for making this patch! ISTR there was some reason why we didn't 
infer peerType from the servers list, but I can't remember what it was...

As for your patch, a few small comments:

1. Use --no-prefix and just attach the output of git-diff (no mail headers etc) 
- Hudson is rather picky about the patch formats it can apply
2. It would be great to include a test that reads a configuration and checks 
that the behaviour is correct
3. If the peerTypes don't match up, should we default to the server list (on 
the assumption that that will be consistent across all servers)?
4. Once you've added the patch, click 'submit patch' to start Hudson moving.

cheers,
Henry

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
>Assignee: Sergey Doroshenko
> Fix For: 3.4.0
>
> Attachments: follower.log, leader.log, observer.log, warning.patch, 
> zoo1.cfg
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864953#action_12864953
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Sergey - 

In the cfg files for nodes 3 and 5, did you include the following line? 

peerType=observer

See http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html for 
details. The observer log contains this line:

2010-05-06 22:46:00,876 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:quorump...@642] - FOLLOWING

which is a big red flag because observers should never adopt the FOLLOWING 
state. 

If I don't have that line I can reproduce your issue. If I add it, the 
observers work as expected. Can you check your cfg files?

cheers,
Henry

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
> Fix For: 3.3.0
>
> Attachments: follower.log, leader.log, observer.log, zoo1.cfg
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)

2010-05-06 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864939#action_12864939
 ] 

Henry Robinson commented on ZOOKEEPER-768:
--

Note that this is the same assertion as in ZOOKEEPER-707, which is not related 
to the python client. 

I need to understand why it's an error for the completion callback to be null 
when processing a ping, since send_ping explicitly queues a null callback... 

> zkpython segfault on close (assertion error in io thread)
> -
>
> Key: ZOOKEEPER-768
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.4.0
> Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
>Reporter: Kapil Thangavelu
> Attachments: zkpython-segfault-client-log.txt, 
> zkpython-segfault-on-close-core.bz2, zkpython-segfault-stack-traces.txt, 
> zkpython-segfault.py
>
>
> While trying to create a test case showing slow average add_auth, i stumbled 
> upon a test case that reliably segfaults for me, albeit with variable amount 
> of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 
> processes in my test environment (ubuntu lucid 10.04). The test case opens a 
> connection, adds authentication to it, and closes the connection, in a loop. 
> I'm including the sample program and the gdb stack traces from the core file. 
> I can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-769) Leader can treat observers as quorum members

2010-05-06 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864878#action_12864878
 ] 

Henry Robinson commented on ZOOKEEPER-769:
--

Hi Sergey - 

Can you attach the logs from (at least) the leader node to this ticket? I'd 
like to figure this one out asap.

cheers,
Henry

> Leader can treat observers as quorum members
> 
>
> Key: ZOOKEEPER-769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-769
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment: Ubuntu Karmic x64
>Reporter: Sergey Doroshenko
> Fix For: 3.3.0
>
> Attachments: zoo1.cfg
>
>
> In short: it seems leader can treat observers as quorum members.
> Steps to repro:
> 1. Server configuration: 3 voters, 2 observers (attached).
> 2. Bring up 2 voters and one observer. It's enough for quorum.
> 3. Shut down the one from the quorum who is the follower.
> As I understand, expected result is that leader will start a new election 
> round so that to regain quorum.
> But the real situation is that it just says goodbye to that follower, and is 
> still operable. (When I'm shutting down 3rd one -- observer -- leader starts 
> trying to regain a quorum).
> (Expectedly, if on step 3 we shut down the leader, not the follower, 
> remaining follower starta new leader election, as it should be).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-768) zkpython segfault on close (assertion error in io thread)

2010-05-06 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864849#action_12864849
 ] 

Henry Robinson commented on ZOOKEEPER-768:
--

Thanks Kapil - I'll take a look. From the stack trace it looks as though a 
pending completion callback is null and therefore something weird is going on 
with a completion dispatcher being freed before it is finished being used. As 
per usual I can't reproduce on my machine, but this is enough information to 
dig into it. 

> zkpython segfault on close (assertion error in io thread)
> -
>
> Key: ZOOKEEPER-768
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-768
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.4.0
> Environment: ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython)
>Reporter: Kapil Thangavelu
> Attachments: zkpython-segfault-client-log.txt, 
> zkpython-segfault-stack-traces.txt, zkpython-segfault.py
>
>
> While trying to create a test case showing slow average add_auth, i stumbled 
> upon a test case that reliably segfaults for me, albeit with variable amount 
> of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 
> processes in my test environment (ubuntu lucid 10.04). The test case opens a 
> connection, adds authentication to it, and closes the connection, in a loop. 
> I'm including the sample program and the gdb stack traces from the core file. 
> I can upload the core file if thats helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-740:
-

Fix Version/s: (was: 3.3.1)

Can't reproduce, or diagnose without code, moving to 3.4.0.

> zkpython leading to segfault on zookeeper
> -
>
> Key: ZOOKEEPER-740
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Federico
>Assignee: Henry Robinson
>Priority: Critical
> Fix For: 3.4.0
>
>
> The program that we are implementing uses the python binding for zookeeper 
> but sometimes it crash with segfault; here is the bt from gdb:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xad244b70 (LWP 28216)]
> 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> 2488../Objects/abstract.c: No such file or directory.
> in ../Objects/abstract.c
> (gdb) bt
> #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
> arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
> #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
> at ../Objects/abstract.c:2480
> #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
> #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
> #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
> list=0xa5354140) at src/zk_hashtable.c:317
> #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
> #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
> #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-764) Observer elected leader due to inconsistent voting view

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-764:
-

Attachment: ZOOKEEPER-764_3_3_1.patch

Patch to apply against 3_3_1

> Observer elected leader due to inconsistent voting view
> ---
>
> Key: ZOOKEEPER-764
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-764
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: ZOOKEEPER-690.patch, ZOOKEEPER-764_3_3_1.patch
>
>
> In ZOOKEEPER-690, we noticed that an observer was being elected, and Henry 
> proposed a patch to fix the issue. However, it seems that the patch does not 
> solve the issue one user (Alan Cabrera) has observed. Given that we would 
> like to fix this issue, and to work separately with Alan to determine the 
> problem with his setup, I'm creating this jira and re-posting Henry's patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Status: Patch Available  (was: Open)

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
> ZOOKEEPER-763.patch, ZOOKEEPER-763.patch
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Status: Open  (was: Patch Available)

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
> ZOOKEEPER-763.patch, ZOOKEEPER-763.patch
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Attachment: ZOOKEEPER-763.patch

Forgot --no-prefix again :/

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
> ZOOKEEPER-763.patch, ZOOKEEPER-763.patch
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Status: Patch Available  (was: Open)

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
> ZOOKEEPER-763.patch
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

Attachment: ZOOKEEPER-763.patch

Patch attached, with new test (note that test will hang if it fails, no obvious 
way to fix that - will open a JIRA to track).

Also included is fix for type in acl_test.py; not sure how I allowed that to 
creep in :/

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt, 
> ZOOKEEPER-763.patch
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-763:
-

 Assignee: Henry Robinson  (was: Mahadev konar)
Fix Version/s: 3.3.1
  Component/s: (was: c client)

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Henry Robinson
> Fix For: 3.3.1, 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864488#action_12864488
 ] 

Henry Robinson commented on ZOOKEEPER-763:
--

Kapil - 

Thanks! Adding that sleep helped me understand what was going on. 

pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the 
completion thread to finish. However, if a completion is still inside Python, 
but has been pre-empted by the main thread which calls pyzoo_close, the 
completion can't get the GIL back to finish up executing, blocking the 
completions_thread for ever more. The fix is simple - relinquish the GIL during 
the zookeeper_close call, and then reacquire it straight after. There are even 
handy macros to do this:

Py_BEGIN_ALLOW_THREADS
ret = zookeeper_close(zhandles[zkhid]);
Py_END_ALLOW_THREADS

This same issue will affect any part of zkpython where a call to the C client 
is blocked on some work being completed in another Python thread - in practice, 
I think this means from callbacks. I'll audit the code to see if any other API 
calls are affected. Patch to fix this issue is following shortly - Kapil, I'd 
be very grateful if you could help us by testing it. 

cheers,
Henry

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Mahadev konar
> Fix For: 3.4.0
>
> Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client

2010-05-05 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864429#action_12864429
 ] 

Henry Robinson commented on ZOOKEEPER-763:
--

Hi Kapil - 

As seems to be the norm for me this week, I'm struggling to reproduce :) It 
does seem like your python script explicitly waits for a completion to be 
called before closing a handle. Is this enough to leave an outstanding 
completion on the queue?

Can you capture the stacktrace for the completion thread? I think it must be 
getting stuck in process_completions but it would be very valuable to know 
where - if it's stuck on the callback into zkpython then that means the 
deadlock is in the python bindings and not solely in C-land.

cheers,
Henry

> Deadlock on close w/ zkpython / c client
> 
>
> Key: ZOOKEEPER-763
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib-bindings
>Affects Versions: 3.3.0
> Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>Reporter: Kapil Thangavelu
>Assignee: Mahadev konar
> Fix For: 3.4.0
>
> Attachments: deadlock.py, stack-trace-deadlock.txt
>
>
> deadlocks occur if we attempt to close a handle while there are any 
> outstanding async requests (aget, acreate, etc). Normally on close both the 
> io thread terminates and the completion thread are terminated and joined, 
> however w\ith outstanding async requests, the completion thread won't be in a 
> joinable state, and we effectively hang when the main thread does the join.
> afaics ideal behavior would be on close of a handle, to effectively clear out 
> any remaining callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against 
> closing while there is an outstanding async completion request, but its an 
> imperfect solution since even after the python callback is executed there is 
> still a window for deadlock before the completion thread finishes the 
> callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-05-04 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863915#action_12863915
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Weird - it looks like the test is shutting down correctly:


[junit] 2010-04-30 11:41:52,896 - INFO  [main:clientb...@222] - connecting to 
127.0.0.1 11233
[junit] 2010-04-30 11:41:52,896 - INFO  [main:quorumb...@277] - 
127.0.0.1:11233 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,896 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11234
[junit] 2010-04-30 11:41:52,897 - INFO  [main:quorumb...@277] - 
127.0.0.1:11234 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,897 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11235
[junit] 2010-04-30 11:41:52,897 - INFO  [main:quorumb...@277] - 
127.0.0.1:11235 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,897 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11236
[junit] 2010-04-30 11:41:52,898 - INFO  [main:quorumb...@277] - 
127.0.0.1:11236 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,898 - INFO  [main:clientb...@222] - connecting 
to 127.0.0.1 11237
[junit] 2010-04-30 11:41:52,898 - INFO  [main:quorumb...@277] - 
127.0.0.1:11237 is no longer accepting client connections
[junit] 2010-04-30 11:41:52,901 - INFO  
[main:junit4zktestrunner$loggedinvokemet...@56] - FINISHED TEST METHOD 
testObserversHammer
[junit] 2010-04-30 11:41:52,901 - INFO  [main:zktestcas...@59] - SUCCEEDED 
testObserversHammer
[junit] 2010-04-30 11:41:52,901 - INFO  [main:zktestcas...@54] - FINISHED 
testObserversHammer

and then it goes into trying the C tests which fail for an unrelated reason - 
does it lock up at this point or does it actually fail out to the CLI? If it 
locks up, is the jstack output you attached from that run?



> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> jstack-201004291527.txt, jstack-AsyncHammerTest-201004301209.txt, 
> nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, 
> nohup-AsyncHammerTest-201004301209.txt, 
> nohup-QuorumPeerMainTest-201004301209.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-05-04 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863902#action_12863902
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Hi Alan - 

Looking at this attachment: nohup-AsyncHammerTest-201004301209.txt - the tests 
appear to be run twice. The first testObserversHammer completes successfully, 
the second fails. Were you running the tests until you experienced the failure? 

Henry

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> jstack-201004291527.txt, jstack-AsyncHammerTest-201004301209.txt, 
> nohup-201004201053.txt, nohup-201004291409.txt, nohup-201004291527.txt, 
> nohup-AsyncHammerTest-201004301209.txt, 
> nohup-QuorumPeerMainTest-201004301209.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-05-01 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863082#action_12863082
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Hi Alan - Do you think you might have a chance to test whether this patch 
improves things?


> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, 
> nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
> zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

   Status: Resolved  (was: Patch Available)
Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

I just committed this. Thanks Kapil!

> zkpython segfaults on invalid acl with missing key
> --
>
> Key: ZOOKEEPER-758
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0, 3.4.0
> Environment: ubuntu lucid (10.04)
>Reporter: Kapil Thangavelu
> Fix For: 3.3.1, 3.4.0
>
> Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
> ZOOKEEPER-758.patch
>
>
> Currently when setting an acl, there is a minimal parse to ensure that its a 
> list of dicts, however if one of the dicts is missing a required key, the 
> subsequent usage doesn't check for it, and will segfault.. for example using 
> an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if 
> used, because the scheme key is missing (its been purposefully typo'd to 
> schema in example). 
> I've expanded the check_acl macro to include verifying that all keys are 
> present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Status: Open  (was: Patch Available)

> zkpython segfaults on invalid acl with missing key
> --
>
> Key: ZOOKEEPER-758
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0, 3.4.0
> Environment: ubuntu lucid (10.04)
>Reporter: Kapil Thangavelu
> Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
> ZOOKEEPER-758.patch
>
>
> Currently when setting an acl, there is a minimal parse to ensure that its a 
> list of dicts, however if one of the dicts is missing a required key, the 
> subsequent usage doesn't check for it, and will segfault.. for example using 
> an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if 
> used, because the scheme key is missing (its been purposefully typo'd to 
> schema in example). 
> I've expanded the check_acl macro to include verifying that all keys are 
> present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Status: Patch Available  (was: Open)

> zkpython segfaults on invalid acl with missing key
> --
>
> Key: ZOOKEEPER-758
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0, 3.4.0
> Environment: ubuntu lucid (10.04)
>Reporter: Kapil Thangavelu
> Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
> ZOOKEEPER-758.patch
>
>
> Currently when setting an acl, there is a minimal parse to ensure that its a 
> list of dicts, however if one of the dicts is missing a required key, the 
> subsequent usage doesn't check for it, and will segfault.. for example using 
> an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if 
> used, because the scheme key is missing (its been purposefully typo'd to 
> schema in example). 
> I've expanded the check_acl macro to include verifying that all keys are 
> present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Attachment: ZOOKEEPER-758.patch

forgot --no-prefix.

> zkpython segfaults on invalid acl with missing key
> --
>
> Key: ZOOKEEPER-758
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0, 3.4.0
> Environment: ubuntu lucid (10.04)
>Reporter: Kapil Thangavelu
> Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch, 
> ZOOKEEPER-758.patch
>
>
> Currently when setting an acl, there is a minimal parse to ensure that its a 
> list of dicts, however if one of the dicts is missing a required key, the 
> subsequent usage doesn't check for it, and will segfault.. for example using 
> an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if 
> used, because the scheme key is missing (its been purposefully typo'd to 
> schema in example). 
> I've expanded the check_acl macro to include verifying that all keys are 
> present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

  Status: Patch Available  (was: Open)
Hadoop Flags: [Reviewed]

I have reviewed this, and it looks good. Thanks Kapil!

> zkpython segfaults on invalid acl with missing key
> --
>
> Key: ZOOKEEPER-758
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0, 3.4.0
> Environment: ubuntu lucid (10.04)
>Reporter: Kapil Thangavelu
> Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch
>
>
> Currently when setting an acl, there is a minimal parse to ensure that its a 
> list of dicts, however if one of the dicts is missing a required key, the 
> subsequent usage doesn't check for it, and will segfault.. for example using 
> an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if 
> used, because the scheme key is missing (its been purposefully typo'd to 
> schema in example). 
> I've expanded the check_acl macro to include verifying that all keys are 
> present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-758) zkpython segfaults on invalid acl with missing key

2010-04-30 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-758:
-

Attachment: ZOOKEEPER-758.patch

Kapil - 

Thanks for the patch! Unfortunately it didn't apply cleanly against trunk 
because I think you had added 'test_acl_validity' to acl_test.py which was not 
included in the diff.

I'm attaching a patch that applies cleanly to trunk - no code changes from your 
patch.

Thanks,

Henry

> zkpython segfaults on invalid acl with missing key
> --
>
> Key: ZOOKEEPER-758
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-758
> Project: Zookeeper
>  Issue Type: Bug
>  Components: contrib-bindings
>Affects Versions: 3.3.0, 3.4.0
> Environment: ubuntu lucid (10.04)
>Reporter: Kapil Thangavelu
> Attachments: invalid-acl-fix-and-test.diff, ZOOKEEPER-758.patch
>
>
> Currently when setting an acl, there is a minimal parse to ensure that its a 
> list of dicts, however if one of the dicts is missing a required key, the 
> subsequent usage doesn't check for it, and will segfault.. for example using 
> an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if 
> used, because the scheme key is missing (its been purposefully typo'd to 
> schema in example). 
> I've expanded the check_acl macro to include verifying that all keys are 
> present and added some unit tests against trunk in the attachments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862482#action_12862482
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Ben - 

Agreed. I see this as the same as setMyid(...) - it sets an immutable value and 
should only be called once. I'd prefer if these parameters were 'final' in 
QuorumPeer and set in the constructor, but that's not the way that 
runFromConfig (the only place outside of tests that these methods are called) 
is written. Then we could get rid of setLearnerType, for sure. 

The real error here, I think, is duplicating the learnertype between QuorumPeer 
and QuorumServer. If we are going to have the list of QuorumServers, then 
getLearnerType should lookup the learner type in the peer map. Same for the 
serverid, perhaps, and we should just save a reference to the QuorumServer that 
represents our Quorumpeer. 


> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, 
> nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
> zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

Fixing that bug unmasked another ugliness with the way that tests were started 
and stopped. This patch makes the process cleaner. Let's see if it helps, Alan?

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> jstack-201004291527.txt, nohup-201004201053.txt, nohup-201004291409.txt, 
> nohup-201004291527.txt, TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
> zoo.log, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch, ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

Alan - would you mind trying this new patch? Thanks for your patience. I 
suspect that something might still be a bit flaky with these tests (not the 
code, but the tests), but I hope this will fix this particular problem. 

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> nohup-201004201053.txt, nohup-201004291409.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch, ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862424#action_12862424
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

This map is, I think, shared between the quorumpeers for the purposes of the 
test (and in general there aren't two quorumpeers sharing this datastructure 
when running normally). 

But! The error here is that I'm dumb (and that Java's type-checking leaves a 
little to be desired). I've written quorumPeers.containsValue up there, but 
actually it should be quorumPeers.containsKey. New patch on the way, let's see 
if that fixes it.

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, jstack-201004291409.txt, 
> nohup-201004201053.txt, nohup-201004291409.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862400#action_12862400
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Note that both testHammer and testObserversHammer fail in the most recent log.

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Status: Patch Available  (was: Open)

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862351#action_12862351
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Alan - can you try this patch to see if it fixes things? 

Thanks, 

Henry


> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-29 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-690:
-

Attachment: ZOOKEEPER-690.patch

I have found what I hope is the problem.

Because QuorumPeers duplicate their 'LearnerType' in two places there's the 
possibility that they may get out of sync. This is what was happening here - it 
was a test bug. Although the Observers knew that they were Observers, the other 
nodes did not. This affected the leader election protocol as other node did not 
know to reject an Observer.

I feel like we should refactor the QuorumPeer.QuorumServer code so as not to 
duplicate information, but for the time being I think this patch will work. 

I have also taken the opportunity to standardise the naming of 'learnertype' 
throughout the code (in some places it was called 'peertype' adding to the 
confusion).

Tests pass on my machine, but I can't guarantee that the problem is fixed as I 
could never recreate the error.

Thanks to Flavio for catching the broken invariant!

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log, 
> ZOOKEEPER-690.patch
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (ZOOKEEPER-750) move maven artifacts into "dist-maven" subdir of the release (package target)

2010-04-28 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson resolved ZOOKEEPER-750.
--

Resolution: Fixed

I just committed ZOOKEEPER-749 (which addresses this as well). Thanks Patrick!

> move maven artifacts into "dist-maven" subdir of the release (package target)
> -
>
> Key: ZOOKEEPER-750
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-750
> Project: Zookeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.3.1, 3.4.0
>
>
> The maven artifacts are currently (3.3.0) put into the toplevel of the 
> release. This causes confusion
> amonst new users (ie "which jar do I use?"). Also the naming of the bin jar 
> is wrong for maven (to put
> onto the maven repo it must be named without the -bin) which adds extra 
> burden for the release
> manager. Putting into a subdir fixes this and makes it explicit what's being 
> deployed to maven repo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-749) OSGi metadata not included in binary only jar

2010-04-28 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-749:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this. Thanks Patrick!

> OSGi metadata not included in binary only jar
> -
>
> Key: ZOOKEEPER-749
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-749
> Project: Zookeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.3.1, 3.4.0
>
> Attachments: ZOOKEEPER-749.patch
>
>
> See this JIRA/comment for background:
> https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697
> basically the issue is that OSGi metadata is included in the legacy jar 
> (zookeeper-.jar) but not in the binary only
> jar (zookeeper--bin.jar) which is eventually deployed to the maven 
> repo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-749) OSGi metadata not included in binary only jar

2010-04-28 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-749:
-

Hadoop Flags: [Reviewed]

+1, patch looks good to me. Tests failing was a quirk of Hudson, as this patch 
doesn't test code. ant bin-jar works correctly. 

> OSGi metadata not included in binary only jar
> -
>
> Key: ZOOKEEPER-749
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-749
> Project: Zookeeper
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.3.1, 3.4.0
>
> Attachments: ZOOKEEPER-749.patch
>
>
> See this JIRA/comment for background:
> https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697
> basically the issue is that OSGi metadata is included in the legacy jar 
> (zookeeper-.jar) but not in the binary only
> jar (zookeeper--bin.jar) which is eventually deployed to the maven 
> repo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-28 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861865#action_12861865
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Progress update - possibly to do with a bug in FLE allowing an Observer to be 
elected. We're looking into this now.

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Henry Robinson
>Priority: Blocker
> Fix For: 3.3.1, 3.4.0
>
> Attachments: jstack-201004201053.txt, nohup-201004201053.txt, 
> TEST-org.apache.zookeeper.test.AsyncHammerTest.txt, zoo.log
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-746) learner outputs session id to log in dec (should be hex)

2010-04-25 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-746:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this. Thanks Patrick!

> learner outputs session id to log in dec (should be hex)
> 
>
> Key: ZOOKEEPER-746
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-746
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum, server
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: ZOOKEEPER-746.patch
>
>
> usability issue, should be in hex:
> 2010-04-21 11:31:13,827 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11354:lear...@95] - Revalidating 
> client: 83353578391797760

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-04-23 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reopened ZOOKEEPER-740:
--


Ok, thanks for the update. Can you share the code that you are running to give 
the segfault? That will make it much easier for me to diagnose.

> zkpython leading to segfault on zookeeper
> -
>
> Key: ZOOKEEPER-740
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Federico
>Assignee: Henry Robinson
>Priority: Critical
> Fix For: 3.3.1, 3.4.0
>
>
> The program that we are implementing uses the python binding for zookeeper 
> but sometimes it crash with segfault; here is the bt from gdb:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xad244b70 (LWP 28216)]
> 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> 2488../Objects/abstract.c: No such file or directory.
> in ../Objects/abstract.c
> (gdb) bt
> #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
> arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
> #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
> at ../Objects/abstract.c:2480
> #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
> #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
> #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
> list=0xa5354140) at src/zk_hashtable.c:317
> #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
> #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
> #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-746) learner outputs session id to log in dec (should be hex)

2010-04-21 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-746:
-

Hadoop Flags: [Reviewed]

+1, patch looks good to me. No tests required, pre-empting Hudsonbot. 

> learner outputs session id to log in dec (should be hex)
> 
>
> Key: ZOOKEEPER-746
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-746
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum, server
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: ZOOKEEPER-746.patch
>
>
> usability issue, should be in hex:
> 2010-04-21 11:31:13,827 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11354:lear...@95] - Revalidating 
> client: 83353578391797760

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-04-21 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-740:
-

Fix Version/s: 3.4.0

> zkpython leading to segfault on zookeeper
> -
>
> Key: ZOOKEEPER-740
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Federico
>Assignee: Henry Robinson
>Priority: Critical
> Fix For: 3.3.1, 3.4.0
>
>
> The program that we are implementing uses the python binding for zookeeper 
> but sometimes it crash with segfault; here is the bt from gdb:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xad244b70 (LWP 28216)]
> 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> 2488../Objects/abstract.c: No such file or directory.
> in ../Objects/abstract.c
> (gdb) bt
> #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
> arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
> #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
> at ../Objects/abstract.c:2480
> #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
> #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
> #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
> list=0xa5354140) at src/zk_hashtable.c:317
> #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
> #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
> #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-21 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-631:
-

Fix Version/s: 3.4.0

> zkpython's C code could do with a style clean-up
> 
>
> Key: ZOOKEEPER-631
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: ZOOKEEPER-631.patch
>
>
> Inconsistent formatting / use of parenthesis / some error checking - all need 
> fixing. 
> Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper

2010-04-21 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-740:


Assignee: Henry Robinson

> zkpython leading to segfault on zookeeper
> -
>
> Key: ZOOKEEPER-740
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Federico
>Assignee: Henry Robinson
>Priority: Critical
> Fix For: 3.3.1
>
>
> The program that we are implementing uses the python binding for zookeeper 
> but sometimes it crash with segfault; here is the bt from gdb:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xad244b70 (LWP 28216)]
> 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> 2488../Objects/abstract.c: No such file or directory.
> in ../Objects/abstract.c
> (gdb) bt
> #0  0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
> at ../Objects/abstract.c:2488
> #1  0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
> arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
> #2  0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
> at ../Objects/abstract.c:2480
> #3  0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
> #4  0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
> path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
> #5  deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
> list=0xa5354140) at src/zk_hashtable.c:317
> #6  0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
> #7  0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
> #8  0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #9  0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-21 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-631:
-

Fix Version/s: 3.3.1

> zkpython's C code could do with a style clean-up
> 
>
> Key: ZOOKEEPER-631
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Fix For: 3.3.1
>
> Attachments: ZOOKEEPER-631.patch
>
>
> Inconsistent formatting / use of parenthesis / some error checking - all need 
> fixing. 
> Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-745) zkpython documentation

2010-04-21 Thread Henry Robinson (JIRA)

zkpython documentation
--

 Key: ZOOKEEPER-745
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-745
 Project: Zookeeper
  Issue Type: Task
Reporter: Henry Robinson


zkpython deserves better documentation than the README I have given it. This 
jira is for tracking a document that includes at a minimum:

1. Installation instructions
2. Basic usage instructions, including common idiomatic use
3. API reference



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-690) AsyncTestHammer test fails on hudson.

2010-04-19 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858665#action_12858665
 ] 

Henry Robinson commented on ZOOKEEPER-690:
--

Alan - that would be great. If you can take a jstack dump of the process when 
it hangs we can do some forensics.

> AsyncTestHammer test fails on hudson.
> -
>
> Key: ZOOKEEPER-690
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-690
> Project: Zookeeper
>  Issue Type: Bug
>Reporter: Mahadev konar
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.3.1
>
>
> the hudson test failed on 
> http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/2/testReport/.
>  There are huge set of cancelledkeyexceptions in the logs. Still going 
> through the logs to find out the reason for failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-18 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858377#action_12858377
 ] 

Henry Robinson commented on ZOOKEEPER-631:
--

The existing tests are the ones that validate this patch. To test the Py_None 
and memory allocation issues is hard because in the first case the GC behaviour 
is hard to force and in the second we would have to stub out calloc(..) somehow!

> zkpython's C code could do with a style clean-up
> 
>
> Key: ZOOKEEPER-631
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Attachments: ZOOKEEPER-631.patch
>
>
> Inconsistent formatting / use of parenthesis / some error checking - all need 
> fixing. 
> Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-18 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-631:
-

Status: Patch Available  (was: Open)

> zkpython's C code could do with a style clean-up
> 
>
> Key: ZOOKEEPER-631
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Attachments: ZOOKEEPER-631.patch
>
>
> Inconsistent formatting / use of parenthesis / some error checking - all need 
> fixing. 
> Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-16 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858064#action_12858064
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

Patch to ZOOKEEPER-631 should fix this issue - when that is committed, we can 
close out this ticket. 

> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
>Assignee: Henry Robinson
> Attachments: commands.py, foo.p, ZOOKEEPER-742.patch, 
> ZOOKEEPER-742.patch
>
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (ZOOKEEPER-631) zkpython's C code could do with a style clean-up

2010-04-16 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-631:
-

Attachment: ZOOKEEPER-631.patch

Attached patch addresses the following:

1. Formatting redone to be (nearly) consistent
2. Comments added to every function
3. zookeeper.c reorganised logically
4. Py_None now reference counted correctly (see ZOOKEEPER-742)
5. Memory allocations now checked, and general error handling greatly improved. 
6. A variety of small bugs and typos fixed

The result is hopefully a much more stable zkpython. This patch will look like 
a rewrite - there are lots of changes. Apologies to the reviewer in advance! 

(I am happy for this patch to be used by the ASF, but the button is not 
available to be checked). 

> zkpython's C code could do with a style clean-up
> 
>
> Key: ZOOKEEPER-631
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-631
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: contrib-bindings
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Attachments: ZOOKEEPER-631.patch
>
>
> Inconsistent formatting / use of parenthesis / some error checking - all need 
> fixing. 
> Also, the documentation in the header file could do with a reformat. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-16 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-742:
-

Attachment: ZOOKEEPER-742.patch

This patch does a better job of handling references to Py_None - let me know if 
this helps.

> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
>Assignee: Henry Robinson
> Attachments: commands.py, foo.p, ZOOKEEPER-742.patch, 
> ZOOKEEPER-742.patch
>
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned ZOOKEEPER-742:


Assignee: Henry Robinson

> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
>Assignee: Henry Robinson
> Attachments: commands.py, foo.p, ZOOKEEPER-742.patch
>
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-742:
-

Attachment: ZOOKEEPER-742.patch

Josh - are you able to apply this patch and try it out? It's not ready to be 
committed, but it adds a bit of defensive programming that should hopefully fix 
the error you're seeing. 

> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
> Attachments: commands.py, foo.p, ZOOKEEPER-742.patch
>
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857667#action_12857667
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

The bad news is I can't recreate the error, but the good news is that there are 
only four call sites where this could be happening (the error is to decrease a 
reference to Py_None, which is a bad idea). 

I can have a patch for you to try up here shortly.

> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
> Attachments: commands.py, foo.p
>
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857638#action_12857638
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

Thanks very much for this - any chance you can share Commands as well, so that 
I can see the actual zookeeper API calls that are being made? Let me know if 
you're not comfortable posting it publicly. 

> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
> Attachments: foo.p
>
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (ZOOKEEPER-742) Deallocatng None on writes

2010-04-15 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857628#action_12857628
 ] 

Henry Robinson commented on ZOOKEEPER-742:
--

Thanks Josh - can you share the portion of your script that is causing the 
problem?



> Deallocatng None on writes
> --
>
> Key: ZOOKEEPER-742
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-742
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client, contrib, contrib-bindings
>Affects Versions: 3.2.2, 3.3.0
> Environment: Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 
> (python 2.5.1)
>Reporter: Josh Fraser
>
> On write operations, getting:
> Fatal Python error: deallocating None
> Aborted
> This error happens on write operations only.  Here's the backtrace:
> Fatal Python error: deallocating None
> Program received signal SIGABRT, Aborted.
> 0x00383fc30215 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00383fc30215 in raise () from /lib64/libc.so.6
> #1  0x00383fc31cc0 in abort () from /lib64/libc.so.6
> #2  0x2adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
> #3  0x2adbd0bc7493 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #4  0x2adbd0bcab66 in PyEval_EvalFrame () from 
> /usr/lib64/libpython2.4.so.1.0
> #5  0x2adbd0bcbfe5 in PyEval_EvalCodeEx () from 
> /usr/lib64/libpython2.4.so.1.0
> #6  0x2adbd0bcc032 in PyEval_EvalCode () from 
> /usr/lib64/libpython2.4.so.1.0
> #7  0x2adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
> #8  0x2adbd0be9bd8 in PyRun_SimpleFileExFlags () from 
> /usr/lib64/libpython2.4.so.1.0
> #9  0x2adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
> #10 0x00383fc1d974 in __libc_start_main () from /lib64/libc.so.6
> #11 0x00400629 in _start ()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 >

1 - 100 of 408 matches

Mail list logo