[jira] Updated: (ZOOKEEPER-936) zkpython is leaking ACL_vector

2010-11-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-936:
---

 Priority: Critical  (was: Major)
Fix Version/s: 3.4.0
   3.3.3
 Assignee: Gustavo Niemeyer

 zkpython is leaking ACL_vector
 --

 Key: ZOOKEEPER-936
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-936
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bindings
Reporter: Gustavo Niemeyer
Assignee: Gustavo Niemeyer
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 It looks like there are no calls to deallocate_ACL_vector() within 
 zookeeper.c in the zkpython binding, which means that (at least) the result 
 of zoo_get_acl() must be leaking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-935) Concurrent primitives library - shared lock

2010-11-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-935:
---

Fix Version/s: 3.4.0
 Assignee: ChiaHung Lin

Thanks for the patch! Slating for 3.4.0.

 Concurrent primitives library - shared lock
 ---

 Key: ZOOKEEPER-935
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-935
 Project: Zookeeper
  Issue Type: Improvement
  Components: recipes
 Environment: Debian squeeze 
 JDK 1.6.x
 zookeeper trunk
Reporter: ChiaHung Lin
Assignee: ChiaHung Lin
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-935.patch


 I create this jira to add sharedock function. The function follows recipes at 
 http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#Shared+Locks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: bug in ZooKeeperSever.java

2010-11-18 Thread Patrick Hunt
Sorry for the slow response but this went into my spam folder and I
only just noticed it.

Please do enter a JIRA for this!
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks,

Patrick

On Thu, Nov 4, 2010 at 3:29 PM, Ling Liu l...@linkedin.com wrote:
 I found a bug in  this file ( version 3.1.1).

 public ZooKeeperServer(File snapDir, File logDir, int tickTime)
            throws IOException {
        this( new FileTxnSnapLog(snapDir, logDir),
                tickTime, new BasicDataTreeBuilder());
    }


 the FileTxnSnapLog constructor need logDir as the first parameter and the 
 snapDir as the second parameter.  Here the ZooKeeperServer  misplace the two 
 parameters.

 Ling




Fwd: Problem with Zookeeper cluster configuration

2010-11-18 Thread Patrick Hunt
I'm afraid this went into my spam folder and I only just noticed it.
Is this still and issue or did you work past it?

-- Forwarded message --
From: siddhartha banik siddhartha.ba...@gmail.com
To: zookeeper-user-subscr...@hadoop.apache.org, zookeeper-dev@hadoop.apache.org
Date: Wed, 27 Oct 2010 18:40:27 +0530
Subject: Problem with Zookeeper cluster configuration

Hi,

I am trying to configure zookeeper cluster ... with 2 server
instances. zookeeper version : 3.2.2

Config files are :

Server 1. zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/xuser/zookeeper1/zookeeper-3.2.2/data/
clientPort=5181
server.1=3.7.192.142:5181:5888
server.2=3.7.192.145:5181:5888

Server 2. zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/xuser/zookeeper2/zookeeper-3.2.2/data/
clientPort=5181
server.1=3.7.192.142:5181:5888
server.2=3.7.192.145:5181:5888

I have also created myid files in respective data folders. Below are
the exception I am seeing :

Server 1

2010-10-27 07:43:43,411 - INFO
[QuorumPeer:/0.0.0.0:5181:quorump...@514] - LOOKING
2010-10-27 07:43:43,418 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@579] - New election: -1
2010-10-27 07:43:43,419 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@618] - Notification: 1,
-1, 382, 1, LOOKING, LOOKING, 1
2010-10-27 07:43:43,420 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@642] - Adding vote
2010-10-27 07:43:43,436 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@618] - Notification: 2,
0, 383, 1, LOOKING, LOOKING, 2
2010-10-27 07:43:43,442 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@642] - Adding vote
2010-10-27 07:43:43,443 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@618] - Notification: 2,
0, 383, 1, LOOKING, LOOKING, 1
2010-10-27 07:43:43,443 - INFO
[QuorumPeer:/0.0.0.0:5181:fastleaderelect...@642] - Adding vote
2010-10-27 07:43:43,444 - INFO
[QuorumPeer:/0.0.0.0:5181:quorump...@523] - FOLLOWING
2010-10-27 07:43:43,445 - INFO
[QuorumPeer:/0.0.0.0:5181:zookeeperser...@160] - Created server
2010-10-27 07:43:43,447 - INFO
[QuorumPeer:/0.0.0.0:5181:follo...@147] - Following /3.7.192.145:5181
2010-10-27 07:43:43,461 - INFO  [WorkerReceiver
Thread:fastleaderelection$messenger$workerrecei...@254] - Sending new
notification.
2010-10-27 07:43:43,462 - WARN
[QuorumPeer:/0.0.0.0:5181:follo...@318] - Exception when following the
leader
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:66)
    at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
    at 
org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
    at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:193)
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:525)
2010-10-27 07:43:43,464 - INFO
[QuorumPeer:/0.0.0.0:5181:follo...@436] - shutdown called
java.lang.Exception: shutdown Follower
    at 
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:436)
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:529)

Server 2

010-10-27 07:59:22,387 - INFO
[QuorumPeer:/0.0.0.0:5181:quorump...@535] - LEADING
2010-10-27 07:59:22,388 - INFO
[QuorumPeer:/0.0.0.0:5181:zookeeperser...@160] - Created server
2010-10-27 07:59:22,390 - ERROR [QuorumPeer:/0.0.0.0:5181:lea...@127]
- Couldn't bind to port 5181
java.net.BindException: Address already in use
    at java.net.PlainSocketImpl.socketBind(Native Method)
    at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
    at java.net.ServerSocket.bind(ServerSocket.java:319)
    at java.net.ServerSocket.init(ServerSocket.java:185)
    at java.net.ServerSocket.init(ServerSocket.java:97)
    at org.apache.zookeeper.server.quorum.Leader.init(Leader.java:125)
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:417)
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:537)
2010-10-27 07:59:22,392 - WARN
[QuorumPeer:/0.0.0.0:5181:quorump...@541] - Unexpected exception
java.net.BindException: Address already in use
    at java.net.PlainSocketImpl.socketBind(Native Method)
    at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
    at java.net.ServerSocket.bind(ServerSocket.java:319)
    at java.net.ServerSocket.init(ServerSocket.java:185)
    at java.net.ServerSocket.init(ServerSocket.java:97)
    at org.apache.zookeeper.server.quorum.Leader.init(Leader.java:125)
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:417)
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:537)
2010-10-27 07:59:22,393 - INFO  [WorkerReceiver
Thread:fastleaderelection$messenger$workerrecei...@254] - Sending new

[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-18 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933557#action_12933557
 ] 

Benjamin Reed commented on ZOOKEEPER-922:
-

camille, i also think disabling moving sessions is not a good idea or very 
useful, but it seems to be the only way to have sensible semantics. 

may i suggest that we take this discussion a bit higher? i think there are 
fundamental assumptions that you are making that i'm questioning. can you write 
up a high-level design and state your assumptions? i can't quite see how the 
math works out between the client-server timeouts, connect timeouts, and lower 
session timeout. i'm also not clear on how much you are relying on a connection 
reset for the failure detection.

 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-18 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933560#action_12933560
 ] 

Camille Fournier commented on ZOOKEEPER-922:


My kingdom for a virtual whiteboard! 

I will take some time and write this up.

 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-11-18 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933602#action_12933602
 ] 

Vishal K commented on ZOOKEEPER-880:


Hi Benoit,

May I suggest to see if you can reproduce this problem with 3.3.3
(with patch for ZOOKEEPER-822)? I was going through
QuorumCnxManager.java for 3.2.2. It clearly leaks a SendWorker thread
for every other connection.

After receiving a connection from a peer, it creates a new thread and
inserts its reference in senderWorkerMap.

SendWorker sw = new SendWorker(s, sid);
RecvWorker rw = new RecvWorker(s, sid);
sw.setRecv(rw);

SendWorker vsw = senderWorkerMap.get(sid);
senderWorkerMap.put(sid, sw);

Then it kills the old thread for the peer (created from earlier
connection)

if(vsw != null)
vsw.finish();

However, the SendWorker.finish method removes an entry from
senderWorkerMap. This results in removing a reference for
recently created SendWorker thread.
senderWorkerMap.remove(sid);


Thus, it will end up removing both the entries. As a result, one thread
will be leaked for every other connection.

If you count the number of error messages in
hbase-hadoop-zookeeper-sv4borg9.log, you will see that messages from
RecvWorker is approximately twice of SendWorker. I think this proves
the point.

$:/tmp/hadoop # grep RecvWorker  hbase-hadoop-zookeeper-sv4borg9.log | wc -l
60
$:/tmp/hadoop # grep SendWorker  hbase-hadoop-zookeeper-sv4borg9.log | wc -l
32

-Vishal

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
Priority: Critical
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
 TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID

2010-11-18 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933610#action_12933610
 ] 

Vishal K commented on ZOOKEEPER-934:



How about we reject connection if (sid != OBSERVER_ID  
!self.viewContains(sid))?

 Add sanity check for server ID
 --

 Key: ZOOKEEPER-934
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Vishal K
 Fix For: 3.4.0


 2. Should I add a check to reject connections from peers that are not
 listed in the configuration file? Currently, we are not doing any
 sanity check for server IDs. I think this might fix ZOOKEEPER-851.
 The fix is simple. However, I am not sure if anyone in community
 is relying on this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID

2010-11-18 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933612#action_12933612
 ] 

Vishal K commented on ZOOKEEPER-933:


I looks like we need a way to uniquely identify the observer after we receive a 
connection. One way to do this is to get IP address from the socket. But this 
is not a good idea.

Instead, we can ask the observer to generate a unique id (uuid or crypto hash) 
and send sid, role, uuid after connecting to a peer (instead of just sid in 
the current implementation).
From role, QCM can figure out that the node is observer. It can then ignore 
the sid and use the uuid passed by the observer.

For followers and leader we will use sid as identifier and for observers we 
will use uuid.

How does that sound?


 Remove wildcard  QuorumPeer.OBSERVER_ID
 ---

 Key: ZOOKEEPER-933
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Vishal K
 Fix For: 3.4.0


 1. I have a question about the following piece of code in QCM:
 if (remoteSid == QuorumPeer.OBSERVER_ID) {
  /* * Choose identifier at random. We need a value to identify * the 
 connection. */ 
 remoteSid = observerCounter--;
 LOG.info(Setting arbitrary identifier to observer:  + remoteSid); 
 }
 Should we allow this? The problem with this code is that if a peer
 connects twice with QuorumPeer.OBSERVER_ID, we will end up creating
 threads for this peer twice. This could result in redundant
 SendWorker/RecvWorker threads.
 I haven't used observers yet. The documentation
 http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html
 says that just like followers, observers should have server IDs. In
 which case, why do we want to provide a wild-card?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID

2010-11-18 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933709#action_12933709
 ] 

Flavio Junqueira commented on ZOOKEEPER-933:


+1 for the idea, sounds right to me.

 Remove wildcard  QuorumPeer.OBSERVER_ID
 ---

 Key: ZOOKEEPER-933
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Vishal K
 Fix For: 3.4.0


 1. I have a question about the following piece of code in QCM:
 if (remoteSid == QuorumPeer.OBSERVER_ID) {
  /* * Choose identifier at random. We need a value to identify * the 
 connection. */ 
 remoteSid = observerCounter--;
 LOG.info(Setting arbitrary identifier to observer:  + remoteSid); 
 }
 Should we allow this? The problem with this code is that if a peer
 connects twice with QuorumPeer.OBSERVER_ID, we will end up creating
 threads for this peer twice. This could result in redundant
 SendWorker/RecvWorker threads.
 I haven't used observers yet. The documentation
 http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html
 says that just like followers, observers should have server IDs. In
 which case, why do we want to provide a wild-card?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID

2010-11-18 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933713#action_12933713
 ] 

Flavio Junqueira commented on ZOOKEEPER-934:


I was not thinking about OBSERVER_ID, good point, I think it should do it.  

 Add sanity check for server ID
 --

 Key: ZOOKEEPER-934
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Vishal K
 Fix For: 3.4.0


 2. Should I add a check to reject connections from peers that are not
 listed in the configuration file? Currently, we are not doing any
 sanity check for server IDs. I think this might fix ZOOKEEPER-851.
 The fix is simple. However, I am not sure if anyone in community
 is relying on this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.