[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930953#action_12930953 ] Flavio Junqueira commented on ZOOKEEPER-928: Good point, Pat. I should have remembered this, since our hack to introduce the connection timeout in QCM previously was through the socket directly, so it makes sense that we would have to do the same for other blocking operations. In fact, I have quickly tried replacing the read call in receiveConnection with the following: {noformat} s.socket().getInputStream().read(msgBytes); {noformat} and I get a SocketTimeoutException after the especified timeout. Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930983#action_12930983 ] Andrei Savu commented on ZOOKEEPER-780: --- I will create a new test class for testing {{ZooKeeperMain}}. There are no previous tests for this class. zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch, ZOOKEEPER-780.patch, ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-921) zkPython incorrectly checks for existence of required ACL elements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931000#action_12931000 ] Nicholas Knight commented on ZOOKEEPER-921: --- Sorry, been swamped. I'll have time to get a proper patch posted over the weekend. zkPython incorrectly checks for existence of required ACL elements -- Key: ZOOKEEPER-921 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1, 3.4.0 Environment: Mac OS X 10.6.4, included Python 2.6.1 Reporter: Nicholas Knight Assignee: Nicholas Knight Fix For: 3.3.3, 3.4.0 Attachments: zktest.py Calling {{zookeeper.create()}} seems, under certain circumstances, to be corrupting a subsequent call to Python's {{logging}} module. Specifically, if the node does not exist (but its parent does), I end up with a traceback like this when I try to make the logging call: {noformat} Traceback (most recent call last): File zktest.py, line 21, in module logger.error(Boom?) File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1046, in error if self.isEnabledFor(ERROR): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1206, in isEnabledFor return level = self.getEffectiveLevel() File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1194, in getEffectiveLevel while logger: TypeError: an integer is required {noformat} But if the node already exists, or the parent does not exist, I get the appropriate NodeExists or NoNode exceptions. I'll be attaching a test script that can be used to reproduce this behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-908: -- Attachment: (was: ZOOKEEPER-908.patch) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931056#action_12931056 ] Vishal K commented on ZOOKEEPER-900: Hi Flavio, I have a question regarding the logic that determines which connection to retain if peer 1 and peer 2 decide to communicate with each other. Suppose peer 1 connects to peer 2. It first sends its sid as a challenge. Peer 2 reads the sid and determines whether to keep the connection or initiate a connection back to peer 1. Both determine that peer 2 should be the one initiating the connection to peer 1 since sid of peer2 sid of peer1. I am concerned that they both may not be able to maintain any connection since the handshake is one-way. In the current implementation, peer1 disconnects immediately after writing the challenge to peer 2. It can happen that peer 2 may get a ClosedChannelException before it reads the challenge from peer 1. As a result, peer 2 will not initiate a connection to peer 1. Is this a legitimate problem? If it is, how about we ask peer 2 to send back a ACK after it reads the challenge. Peer 1 will do a timed read() after writing a challenge to peer 2. This will hopefully give peer 2 enough time to read the challenge and take appropriate action. If peer 2 is really slow, peer 1 will timeout on the read operation. -Vishal FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-908: -- Attachment: ZOOKEEPER-908.patch new patch against the current Client after ZOOKEEPER-909 There are not new tests in this patch. Please commit it anyway. I'm planning to move the Packet class out to a separate file as part of ZOOKEEPER-894. Then I'd like to do ZOOKEEPER-878. After this, it would be the right time to see if the Packet class needs any tests. I'd like to collect these low hanging fruits and make the Client java stuff more comprehensible before continuing with the Netty stuff in ZOOKEEPER-823 Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-908.patch rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Koch updated ZOOKEEPER-908: -- Status: Patch Available (was: Open) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-908.patch rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931077#action_12931077 ] Hadoop QA commented on ZOOKEEPER-908: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12459361/ZOOKEEPER-908.patch against trunk revision 1033770. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//console This message is automatically generated. Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-908.patch rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-929) hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested
hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested --- Key: ZOOKEEPER-929 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-929 Project: Zookeeper Issue Type: Bug Components: build Reporter: Patrick Hunt Assignee: Nigel Daley Hi Nigel can you take a look at this? Following you'll see the email I got, notice that the patch is patch 908, however if you look at the hudson page it's linked to the change is documented as 909 patch file applied https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25/changes I looked at both jiras ZOOKEEPER-908 and ZOOKEEPER-909 both of these look good (the right names on patches) and qabot actually updated 908 with the comment (failure). However the change is listed as 909 which is wrong. [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12459361/ZOOKEEPER-908.patch [exec] against trunk revision 1033770. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//testReport/ [exec] Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//console [exec] [exec] This message is automatically generated. [exec] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931085#action_12931085 ] Vishal K commented on ZOOKEEPER-928: Hi Flavio, Thats correct. I was planning to do this change (in addition to other changes) as a part of ZOOKEEPER-900. But now I think it is better if we make this change first and not wait for other changes. So that we don't have to wait till 3.4.0 for this fix. At least, that will get us around the block forever problem. -Vishal Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-908: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 change looks good to me. Thanks Thomas! Committed to trunk. Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-908.patch rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931092#action_12931092 ] Patrick Hunt commented on ZOOKEEPER-780: Agreed (no prev tests) but really this highlights that there should be. Thanks! zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch, ZOOKEEPER-780.patch, ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931300#action_12931300 ] Hudson commented on ZOOKEEPER-908: -- Integrated in ZooKeeper-trunk #999 (See [https://hudson.apache.org/hudson/job/ZooKeeper-trunk/999/]) ZOOKEEPER-908. Remove code duplication and inconsistent naming in ClientCnxn.Packet creation Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-908.patch rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.