[jira] Updated: (ZOOKEEPER-936) zkpython is leaking ACL_vector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-936: --- Priority: Critical (was: Major) Fix Version/s: 3.4.0 3.3.3 Assignee: Gustavo Niemeyer zkpython is leaking ACL_vector -- Key: ZOOKEEPER-936 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-936 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Reporter: Gustavo Niemeyer Assignee: Gustavo Niemeyer Priority: Critical Fix For: 3.3.3, 3.4.0 It looks like there are no calls to deallocate_ACL_vector() within zookeeper.c in the zkpython binding, which means that (at least) the result of zoo_get_acl() must be leaking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-935) Concurrent primitives library - shared lock
[ https://issues.apache.org/jira/browse/ZOOKEEPER-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-935: --- Fix Version/s: 3.4.0 Assignee: ChiaHung Lin Thanks for the patch! Slating for 3.4.0. Concurrent primitives library - shared lock --- Key: ZOOKEEPER-935 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-935 Project: Zookeeper Issue Type: Improvement Components: recipes Environment: Debian squeeze JDK 1.6.x zookeeper trunk Reporter: ChiaHung Lin Assignee: ChiaHung Lin Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-935.patch I create this jira to add sharedock function. The function follows recipes at http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#Shared+Locks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: bug in ZooKeeperSever.java
Sorry for the slow response but this went into my spam folder and I only just noticed it. Please do enter a JIRA for this! https://issues.apache.org/jira/browse/ZOOKEEPER Thanks, Patrick On Thu, Nov 4, 2010 at 3:29 PM, Ling Liu l...@linkedin.com wrote: I found a bug in this file ( version 3.1.1). public ZooKeeperServer(File snapDir, File logDir, int tickTime) throws IOException { this( new FileTxnSnapLog(snapDir, logDir), tickTime, new BasicDataTreeBuilder()); } the FileTxnSnapLog constructor need logDir as the first parameter and the snapDir as the second parameter. Here the ZooKeeperServer misplace the two parameters. Ling
Fwd: Problem with Zookeeper cluster configuration
I'm afraid this went into my spam folder and I only just noticed it. Is this still and issue or did you work past it? -- Forwarded message -- From: siddhartha banik siddhartha.ba...@gmail.com To: zookeeper-user-subscr...@hadoop.apache.org, zookeeper-dev@hadoop.apache.org Date: Wed, 27 Oct 2010 18:40:27 +0530 Subject: Problem with Zookeeper cluster configuration Hi, I am trying to configure zookeeper cluster ... with 2 server instances. zookeeper version : 3.2.2 Config files are : Server 1. zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/xuser/zookeeper1/zookeeper-3.2.2/data/ clientPort=5181 server.1=3.7.192.142:5181:5888 server.2=3.7.192.145:5181:5888 Server 2. zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/xuser/zookeeper2/zookeeper-3.2.2/data/ clientPort=5181 server.1=3.7.192.142:5181:5888 server.2=3.7.192.145:5181:5888 I have also created myid files in respective data folders. Below are the exception I am seeing : Server 1 2010-10-27 07:43:43,411 - INFO [QuorumPeer:/0.0.0.0:5181:quorump...@514] - LOOKING 2010-10-27 07:43:43,418 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@579] - New election: -1 2010-10-27 07:43:43,419 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@618] - Notification: 1, -1, 382, 1, LOOKING, LOOKING, 1 2010-10-27 07:43:43,420 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@642] - Adding vote 2010-10-27 07:43:43,436 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@618] - Notification: 2, 0, 383, 1, LOOKING, LOOKING, 2 2010-10-27 07:43:43,442 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@642] - Adding vote 2010-10-27 07:43:43,443 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@618] - Notification: 2, 0, 383, 1, LOOKING, LOOKING, 1 2010-10-27 07:43:43,443 - INFO [QuorumPeer:/0.0.0.0:5181:fastleaderelect...@642] - Adding vote 2010-10-27 07:43:43,444 - INFO [QuorumPeer:/0.0.0.0:5181:quorump...@523] - FOLLOWING 2010-10-27 07:43:43,445 - INFO [QuorumPeer:/0.0.0.0:5181:zookeeperser...@160] - Created server 2010-10-27 07:43:43,447 - INFO [QuorumPeer:/0.0.0.0:5181:follo...@147] - Following /3.7.192.145:5181 2010-10-27 07:43:43,461 - INFO [WorkerReceiver Thread:fastleaderelection$messenger$workerrecei...@254] - Sending new notification. 2010-10-27 07:43:43,462 - WARN [QuorumPeer:/0.0.0.0:5181:follo...@318] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:66) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:193) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:525) 2010-10-27 07:43:43,464 - INFO [QuorumPeer:/0.0.0.0:5181:follo...@436] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:436) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:529) Server 2 010-10-27 07:59:22,387 - INFO [QuorumPeer:/0.0.0.0:5181:quorump...@535] - LEADING 2010-10-27 07:59:22,388 - INFO [QuorumPeer:/0.0.0.0:5181:zookeeperser...@160] - Created server 2010-10-27 07:59:22,390 - ERROR [QuorumPeer:/0.0.0.0:5181:lea...@127] - Couldn't bind to port 5181 java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359) at java.net.ServerSocket.bind(ServerSocket.java:319) at java.net.ServerSocket.init(ServerSocket.java:185) at java.net.ServerSocket.init(ServerSocket.java:97) at org.apache.zookeeper.server.quorum.Leader.init(Leader.java:125) at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:537) 2010-10-27 07:59:22,392 - WARN [QuorumPeer:/0.0.0.0:5181:quorump...@541] - Unexpected exception java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359) at java.net.ServerSocket.bind(ServerSocket.java:319) at java.net.ServerSocket.init(ServerSocket.java:185) at java.net.ServerSocket.init(ServerSocket.java:97) at org.apache.zookeeper.server.quorum.Leader.init(Leader.java:125) at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:537) 2010-10-27 07:59:22,393 - INFO [WorkerReceiver Thread:fastleaderelection$messenger$workerrecei...@254] - Sending new
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933557#action_12933557 ] Benjamin Reed commented on ZOOKEEPER-922: - camille, i also think disabling moving sessions is not a good idea or very useful, but it seems to be the only way to have sensible semantics. may i suggest that we take this discussion a bit higher? i think there are fundamental assumptions that you are making that i'm questioning. can you write up a high-level design and state your assumptions? i can't quite see how the math works out between the client-server timeouts, connect timeouts, and lower session timeout. i'm also not clear on how much you are relying on a connection reset for the failure detection. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933560#action_12933560 ] Camille Fournier commented on ZOOKEEPER-922: My kingdom for a virtual whiteboard! I will take some time and write this up. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933602#action_12933602 ] Vishal K commented on ZOOKEEPER-880: Hi Benoit, May I suggest to see if you can reproduce this problem with 3.3.3 (with patch for ZOOKEEPER-822)? I was going through QuorumCnxManager.java for 3.2.2. It clearly leaks a SendWorker thread for every other connection. After receiving a connection from a peer, it creates a new thread and inserts its reference in senderWorkerMap. SendWorker sw = new SendWorker(s, sid); RecvWorker rw = new RecvWorker(s, sid); sw.setRecv(rw); SendWorker vsw = senderWorkerMap.get(sid); senderWorkerMap.put(sid, sw); Then it kills the old thread for the peer (created from earlier connection) if(vsw != null) vsw.finish(); However, the SendWorker.finish method removes an entry from senderWorkerMap. This results in removing a reference for recently created SendWorker thread. senderWorkerMap.remove(sid); Thus, it will end up removing both the entries. As a result, one thread will be leaked for every other connection. If you count the number of error messages in hbase-hadoop-zookeeper-sv4borg9.log, you will see that messages from RecvWorker is approximately twice of SendWorker. I think this proves the point. $:/tmp/hadoop # grep RecvWorker hbase-hadoop-zookeeper-sv4borg9.log | wc -l 60 $:/tmp/hadoop # grep SendWorker hbase-hadoop-zookeeper-sv4borg9.log | wc -l 32 -Vishal QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933610#action_12933610 ] Vishal K commented on ZOOKEEPER-934: How about we reject connection if (sid != OBSERVER_ID !self.viewContains(sid))? Add sanity check for server ID -- Key: ZOOKEEPER-934 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 2. Should I add a check to reject connections from peers that are not listed in the configuration file? Currently, we are not doing any sanity check for server IDs. I think this might fix ZOOKEEPER-851. The fix is simple. However, I am not sure if anyone in community is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933612#action_12933612 ] Vishal K commented on ZOOKEEPER-933: I looks like we need a way to uniquely identify the observer after we receive a connection. One way to do this is to get IP address from the socket. But this is not a good idea. Instead, we can ask the observer to generate a unique id (uuid or crypto hash) and send sid, role, uuid after connecting to a peer (instead of just sid in the current implementation). From role, QCM can figure out that the node is observer. It can then ignore the sid and use the uuid passed by the observer. For followers and leader we will use sid as identifier and for observers we will use uuid. How does that sound? Remove wildcard QuorumPeer.OBSERVER_ID --- Key: ZOOKEEPER-933 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 1. I have a question about the following piece of code in QCM: if (remoteSid == QuorumPeer.OBSERVER_ID) { /* * Choose identifier at random. We need a value to identify * the connection. */ remoteSid = observerCounter--; LOG.info(Setting arbitrary identifier to observer: + remoteSid); } Should we allow this? The problem with this code is that if a peer connects twice with QuorumPeer.OBSERVER_ID, we will end up creating threads for this peer twice. This could result in redundant SendWorker/RecvWorker threads. I haven't used observers yet. The documentation http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html says that just like followers, observers should have server IDs. In which case, why do we want to provide a wild-card? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933709#action_12933709 ] Flavio Junqueira commented on ZOOKEEPER-933: +1 for the idea, sounds right to me. Remove wildcard QuorumPeer.OBSERVER_ID --- Key: ZOOKEEPER-933 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 1. I have a question about the following piece of code in QCM: if (remoteSid == QuorumPeer.OBSERVER_ID) { /* * Choose identifier at random. We need a value to identify * the connection. */ remoteSid = observerCounter--; LOG.info(Setting arbitrary identifier to observer: + remoteSid); } Should we allow this? The problem with this code is that if a peer connects twice with QuorumPeer.OBSERVER_ID, we will end up creating threads for this peer twice. This could result in redundant SendWorker/RecvWorker threads. I haven't used observers yet. The documentation http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html says that just like followers, observers should have server IDs. In which case, why do we want to provide a wild-card? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933713#action_12933713 ] Flavio Junqueira commented on ZOOKEEPER-934: I was not thinking about OBSERVER_ID, good point, I think it should do it. Add sanity check for server ID -- Key: ZOOKEEPER-934 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 2. Should I add a check to reject connections from peers that are not listed in the configuration file? Currently, we are not doing any sanity check for server IDs. I think this might fix ZOOKEEPER-851. The fix is simple. However, I am not sure if anyone in community is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.