[jira] [Commented] (ZOOKEEPER-1460) IPv6 literal address not supported for quorum members
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745633#comment-14745633 ] Hongchao Deng commented on ZOOKEEPER-1460: -- +1 Thanks Raul for raising it up. I did remember there is some IPV6 issues. > IPv6 literal address not supported for quorum members > - > > Key: ZOOKEEPER-1460 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1460 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.3 >Reporter: Chris Dolan >Assignee: Thawan Kooburat > Attachments: > ZOOKEEPER-1460-accept-square-bracket-delimited-IPv6-literals.diff > > > Via code inspection, I see that the "server.nnn" configuration key does not > support literal IPv6 addresses because the property value is split on ":". In > v3.4.3, the problem is in QuorumPeerConfig: > {noformat} > String parts[] = value.split(":"); > InetSocketAddress addr = new InetSocketAddress(parts[0], > Integer.parseInt(parts[1])); > {noformat} > In the current trunk > (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java?view=markup) > this code has been refactored into QuorumPeer.QuorumServer, but the bug > remains: > {noformat} > String serverClientParts[] = addressStr.split(";"); > String serverParts[] = serverClientParts[0].split(":"); > addr = new InetSocketAddress(serverParts[0], > Integer.parseInt(serverParts[1])); > {noformat} > This bug probably affects very few users because most will naturally use a > hostname rather than a literal IP address. But given that IPv6 addresses are > supported for clients via ZOOKEEPER-667 it seems that server support should > be fixed too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717839#comment-14717839 ] Hongchao Deng commented on ZOOKEEPER-2101: -- I have one and only one comment on the swallowed exception as mentioned above. It would be great if other committers can review and give more feedback. [~liushaohui], are you still available for the JIRA? Otherwise I can take care of it. I want to get this done by the weekend. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715503#comment-14715503 ] Hongchao Deng commented on ZOOKEEPER-2101: -- I will review it by this weekend and hopefully get it committed soon. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and RequestProcessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700205#comment-14700205 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Committed to branch-3.4: https://github.com/apache/zookeeper/commit/91f579e40755de870ed9123c8fd55925517d9aa6 Thanks [~rakeshr]! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692343#comment-14692343 ] Hongchao Deng commented on ZOOKEEPER-1907: -- +1 I have reviewed the PR and run the unit test locally. It's nice work! Would any other committer have time to review it too? Otw, I will get this in probably by next week. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658474#comment-14658474 ] Hongchao Deng commented on ZOOKEEPER-1907: -- I used rbtools to upload patches. The web interface has been broken to me for a long time.. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658788#comment-14658788 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Yes! That would be great. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654868#comment-14654868 ] Hongchao Deng commented on ZOOKEEPER-1907: -- GJ Rakesh. Do you mind uploading it to ReviewBoard? I would like to give some comments and definitely get this this ASAP. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2233) Invalid description in the comment of LearnerHandler.syncFollower()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626788#comment-14626788 ] Hongchao Deng commented on ZOOKEEPER-2233: -- LGTM. +1 Invalid description in the comment of LearnerHandler.syncFollower() --- Key: ZOOKEEPER-2233 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2233 Project: ZooKeeper Issue Type: Improvement Reporter: Hitoshi Mitake Assignee: Hitoshi Mitake Priority: Trivial Attachments: ZOOKEEPER-2233.patch LearnerHandler.syncFollower() has a comment like below: When leader election is completed, the leader will set its lastProcessedZxid to be (epoch 32). There will be no txn associated with this zxid. However, IIUC, the expression epoch 32 (comparison) should be epoch 32 (bitshift). Of course the error is very trivial but it was a little bit confusing for me, so I'd like to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603285#comment-14603285 ] Hongchao Deng commented on ZOOKEEPER-2164: -- It's on my plan to have a patch for this. I'm currently involved in internal stuff. I should be able to get onto this after that. At the mean time, it sounds like you have a good testing plan. Would be nice if you can share it. :) fast leader election keeps failing -- Key: ZOOKEEPER-2164 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Reporter: Michi Mutsuzaki Assignee: Hongchao Deng Fix For: 3.5.2, 3.6.0 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. When I shut down 2, 1 and 3 keep going back to leader election. Here is what seems to be happening. - Both 1 and 3 elect 3 as the leader. - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a follower. - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't timeout for 5 seconds: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 - By the time 3 receives votes, 1 has given up trying to connect to 3: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599725#comment-14599725 ] Hongchao Deng commented on ZOOKEEPER-1000: -- Can you open a new JIRA? Provide SSL in zookeeper to be able to run cross colos. --- Key: ZOOKEEPER-1000 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000 Project: ZooKeeper Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.5.2, 3.6.0 This jira is to track SSL for zookeeper. The inter zookeeper server communication and the client to server communication should be over ssl so that zookeeper can be deployed over WAN's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2220) Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599923#comment-14599923 ] Hongchao Deng commented on ZOOKEEPER-2220: -- Can you give more details? Even a log file would explain more. Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty --- Key: ZOOKEEPER-2220 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2220 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.5.0 Environment: Alpha Reporter: rupa mogali I am trying to test SSL connectivity between client and server following the instructions in the following page: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide But, I get the following when trying to connect to server from client.. 2015-06-24 12:14:36,589 [myid:] - INFO [main:ZooKeeper@709] - Initiating client connection, connectString=localhost:2282 sessionTimeout=3 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@f2a0b8e Exception in thread main java.io.IOException: Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty Can you tell me what I am doing wrong here? Very new to Zookeeper. Thanks! Reply -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2220) Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599946#comment-14599946 ] Hongchao Deng commented on ZOOKEEPER-2220: -- {code} Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.ClientCnxnSocketNetty {code} The version you use is not uptodate. Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty --- Key: ZOOKEEPER-2220 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2220 Project: ZooKeeper Issue Type: Bug Components: c client Affects Versions: 3.5.0 Environment: Alpha Reporter: rupa mogali I am trying to test SSL connectivity between client and server following the instructions in the following page: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide But, I get the following when trying to connect to server from client.. 2015-06-24 12:14:36,589 [myid:] - INFO [main:ZooKeeper@709] - Initiating client connection, connectString=localhost:2282 sessionTimeout=3 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@f2a0b8e Exception in thread main java.io.IOException: Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty Can you tell me what I am doing wrong here? Very new to Zookeeper. Thanks! Reply -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-602) log all exceptions not caught by ZK threads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592272#comment-14592272 ] Hongchao Deng commented on ZOOKEEPER-602: - +1 Thanks Rakesh and Raul! log all exceptions not caught by ZK threads --- Key: ZOOKEEPER-602 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-602 Project: ZooKeeper Issue Type: Bug Components: java client, server Affects Versions: 3.2.1 Reporter: Patrick Hunt Assignee: Rakesh R Priority: Blocker Fix For: 3.4.7, 3.5.0 Attachments: ZOOKEEPER-602-br3-4.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch the java code should add a ThreadGroup exception handler that logs at ERROR level any uncaught exceptions thrown by Thread run methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2214) Findbugs warning: LearnerHandler.packetToString Dead store to local variable
Hongchao Deng created ZOOKEEPER-2214: Summary: Findbugs warning: LearnerHandler.packetToString Dead store to local variable Key: ZOOKEEPER-2214 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2214 Project: ZooKeeper Issue Type: Improvement Reporter: Hongchao Deng Assignee: Hongchao Deng Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2214) Findbugs warning: LearnerHandler.packetToString Dead store to local variable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2214: - Attachment: ZOOKEEPER-2214.patch Findbugs warning: LearnerHandler.packetToString Dead store to local variable Key: ZOOKEEPER-2214 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2214 Project: ZooKeeper Issue Type: Improvement Reporter: Hongchao Deng Assignee: Hongchao Deng Priority: Minor Attachments: ZOOKEEPER-2214.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2213: - Attachment: ZOOKEEPER-2213.patch Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582132#comment-14582132 ] Hongchao Deng commented on ZOOKEEPER-2213: -- Hi [~rgs], Thanks for the suggestion. I have created ZOOKEEPER-2214 to fix the findbugs warning. The latest patch cleans up that part out. Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582167#comment-14582167 ] Hongchao Deng commented on ZOOKEEPER-2213: -- Thanks for the review. I will submit a patch for 3.4 branch shortly. Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582199#comment-14582199 ] Hongchao Deng commented on ZOOKEEPER-2213: -- I wonder if we should add validation to OpCode.check too. I thought we might have missed that. I will add the check too. Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2213: - Attachment: ZOOKEEPER-2213-branch34.patch Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213-branch34.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2213: - Attachment: ZOOKEEPER-2213.patch Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213-branch34.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582210#comment-14582210 ] Hongchao Deng commented on ZOOKEEPER-2213: -- Latest patch added validation to OpCode.check too. Also submitted patch for branch-3.4 Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213-branch34.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580671#comment-14580671 ] Hongchao Deng commented on ZOOKEEPER-1000: -- Yes. I'm currently working on server-server as well as client-server which can be backported onto 3.4 branch. It took some time though. Provide SSL in zookeeper to be able to run cross colos. --- Key: ZOOKEEPER-1000 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000 Project: ZooKeeper Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.5.2, 3.6.0 This jira is to track SSL for zookeeper. The inter zookeeper server communication and the client to server communication should be over ssl so that zookeeper can be deployed over WAN's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580691#comment-14580691 ] Hongchao Deng commented on ZOOKEEPER-2213: -- It seems that ZK java client does a lot of checking locally before sending the packets to server: https://github.com/apache/zookeeper/blob/26e8dd6e90726997a37965ef469e37a96ef7085f/src/java/main/org/apache/zookeeper/common/PathUtils.java#L43 As a result, if the server receives any kind of wrong path, it breaks the assumption: https://github.com/apache/zookeeper/blob/26e8dd6e90726997a37965ef469e37a96ef7085f/src/java/main/org/apache/zookeeper/common/PathTrie.java#L258-L260 Such a user error shouldn't break server down. We can either return an error to client or just close the connection. Let me think about it more. Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Priority: Blocker See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580937#comment-14580937 ] Hongchao Deng commented on ZOOKEEPER-2213: -- There is one more thing I'm not sure. I thought SetData should return a NoNodeException but it didn't. That's because datatree treats empty string also as the root /. https://github.com/apache/zookeeper/blob/71401b4842b0486716f96d9ea3060d4fba65be96/src/java/main/org/apache/zookeeper/server/DataTree.java#L292 There is inconsistent assumption because path checking thinks that empty string is invalid.. Anyway, I agree with Raul that to fix this we only need to add validatePath() for SetData and SetACL. It's more stable to add the checking. Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Priority: Blocker See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2213: - Attachment: ZOOKEEPER-2213.patch Addressed comments and fix findbugs warning. Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581337#comment-14581337 ] Hongchao Deng commented on ZOOKEEPER-2213: -- I will come up with a patch for 3.4 branch if there is no other comment for current patch. Thanks! Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2213: - Attachment: ZOOKEEPER-2213.patch Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng reassigned ZOOKEEPER-2213: Assignee: Hongchao Deng Empty path in Set crashes server and prevents restart - Key: ZOOKEEPER-2213 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5 Reporter: Brian Brazil Assignee: Hongchao Deng Priority: Blocker Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2213.patch See https://github.com/samuel/go-zookeeper/issues/62 I've reproduced this on 3.4.5 with the code: c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second) c.Set(, []byte{}, 0) This crashes a local zookeeper 3.4.5 server: 2015-06-10 16:21:10,862 [myid:] - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329) at org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) On restart the zookeeper server crashes out: 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - Invalid arguments, exiting abnormally java.lang.IllegalArgumentException: Invalid path at org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259) at org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634) at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616) at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2201) Network issues can cause cluster to hang due to near-deadlock
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575364#comment-14575364 ] Hongchao Deng commented on ZOOKEEPER-2201: -- +1 The patch looks good! Network issues can cause cluster to hang due to near-deadlock - Key: ZOOKEEPER-2201 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2201 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.6 Reporter: Donny Nadolny Assignee: Donny Nadolny Priority: Critical Fix For: 3.4.7, 3.5.2 Attachments: ZOOKEEPER-2201-branch-34.patch, ZOOKEEPER-2201.patch, ZOOKEEPER-2201.patch, ZOOKEEPER-2201.patch, ZOOKEEPER-2201.patch, ZOOKEEPER-2201.patch {{DataTree.serializeNode}} synchronizes on the {{DataNode}} it is about to serialize then writes it out via {{OutputArchive.writeRecord}}, potentially to a network connection. Under default linux TCP settings, a network connection where the other side completely disappears will hang (blocking on the {{java.net.SocketOutputStream.socketWrite0}} call) for over 15 minutes. During this time, any attempt to create/delete/modify the {{DataNode}} will cause the leader to hang at the beginning of the request processor chain: {noformat} ProcessThread(sid:5 cport:-1): prio=10 tid=0x026f1800 nid=0x379c waiting for monitor entry [0x7fe6c2a8c000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.zookeeper.server.PrepRequestProcessor.getRecordForPath(PrepRequestProcessor.java:163) - waiting to lock 0xd4cd9e28 (a org.apache.zookeeper.server.DataNode) - locked 0xd2ef81d0 (a java.util.ArrayList) at org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:345) at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:534) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:131) {noformat} Additionally, any attempt to send a snapshot to a follower or to disk will hang. Because the ping packets are sent by another thread which is unaffected, followers never time out and become leader, even though the cluster will make no progress until either the leader is killed or the TCP connection times out. This isn't exactly a deadlock since it will resolve itself eventually, but as mentioned above this will take 15 minutes with the default TCP retry settings in linux. A simple solution to this is: in {{DataTree.serializeNode}} we can take a copy of the contents of the {{DataNode}} (as is done with its children) in the synchronized block, then call {{writeRecord}} with the copy of the {{DataNode}} outside of the original {{DataNode}} synchronized block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573365#comment-14573365 ] Hongchao Deng commented on ZOOKEEPER-2163: -- I think it is a good feature to go into 3.5 too :) Introduce new ZNode type: container --- Key: ZOOKEEPER-2163 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163 Project: ZooKeeper Issue Type: New Feature Components: c client, java client, server Affects Versions: 3.5.0 Reporter: Jordan Zimmerman Assignee: Jordan Zimmerman Fix For: 3.6.0 Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, zookeeper-2163.12.patch, zookeeper-2163.13.patch, zookeeper-2163.14.patch, zookeeper-2163.3.patch, zookeeper-2163.5.patch, zookeeper-2163.6.patch, zookeeper-2163.7.patch, zookeeper-2163.8.patch, zookeeper-2163.9.patch BACKGROUND A recurring problem for ZooKeeper users is garbage collection of parent nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a parent node under which participants create sequential nodes. When the participant is done, it deletes its node. In practice, the ZooKeeper tree begins to fill up with orphaned parent nodes that are no longer needed. The ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can become unstable due to the number of these nodes. CURRENT SOLUTIONS === Apache Curator has a workaround solution for this by providing the Reaper class which runs in the background looking for orphaned parent nodes and deleting them. This isn’t ideal and it would be better if ZooKeeper supported this directly. PROPOSAL = ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes to contain child nodes. This is not optimum as EPHEMERALs are tied to a session and the general use case of parent nodes is for PERSISTENT nodes. This proposal adds a new node type, CONTAINER. A CONTAINER node is the same as a PERSISTENT node with the additional property that when its last child is deleted, it is deleted (and CONTAINER nodes recursively up the tree are deleted if empty). CANONICAL USAGE {code} while ( true) { // or some reasonable limit try { zk.create(path, ...); break; } catch ( KeeperException.NoNodeException e ) { try { zk.createContainer(containerPath, ...); } catch ( KeeperException.NodeExistsException ignore) { } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2204) LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573109#comment-14573109 ] Hongchao Deng commented on ZOOKEEPER-2204: -- +1 The patch looks good. LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally - Key: ZOOKEEPER-2204 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2204 Project: ZooKeeper Issue Type: Test Affects Versions: 3.5.0 Reporter: Donny Nadolny Assignee: Donny Nadolny Priority: Minor Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2204.patch, ZOOKEEPER-2204.patch The {{LearnerSnapshotThrottler}} will only allow 2 concurrent snapshots to be taken, and if there are already 2 snapshots in progress it will wait up to 200ms for one to complete. This isn't enough time for {{testHighContentionWithTimeout}} to consistently pass - on a cold JVM running just the one test I was able to get it to fail 3 times in around 50 runs. This 200ms timeout will be hit if there is a delay between a thread calling {{LearnerSnapshot snap = throttler.beginSnapshot(false);}} and {{throttler.endSnapshot();}}. This also erroneously fails on the build server, see https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2747/testReport/org.apache.zookeeper.server.quorum/LearnerSnapshotThrottlerTest/testHighContentionWithTimeout/ for an example. I have bumped the timeout up to 5 seconds (which should be more than enough for warmup / gc pauses), as well as added logging to the {{catch (Exception e)}} block to assist in debugging any future issues. An alternate approach would be to separate out results gathered from the threads, because although we only record true/false there are really three outcomes: 1. The {{snapshotNumber}} was = 2, meaning the individual call operated correctly 2. The {{snapshotNumber}} was 2, meaning the test should definitely fail 3. We were unable to snapshot in the time given, so we can't determine if we should fail or pass (although if we have enough successes from #1 with no failures from #2 maybe we would pass the test anyway). Bumping up the timeout is easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1546) Unable to load database on disk when restarting after node freeze
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573389#comment-14573389 ] Hongchao Deng commented on ZOOKEEPER-1546: -- Is this JIRA related to ZOOKEEPER-1573? I think they are duplicate. Unable to load database on disk when restarting after node freeze --- Key: ZOOKEEPER-1546 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1546 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.5 Reporter: Erik Forsberg One of my zookeeper servers in a quorum of 3 froze (probably due to underlying hardware problems). When restarting, zookeeper fails to start with the following in zookeeper.log: {noformat} 2012-09-04 09:02:35,300 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /etc/zookeeper/zoo.cfg 2012-09-04 09:02:35,316 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums 2012-09-04 09:02:35,333 - INFO [main:QuorumPeerMain@119] - Starting quorum peer 2012-09-04 09:02:35,358 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2012-09-04 09:02:35,379 - INFO [main:QuorumPeer@819] - tickTime set to 2000 2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1 2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1 2012-09-04 09:02:35,386 - INFO [main:QuorumPeer@856] - initLimit set to 10 2012-09-04 09:02:35,523 - INFO [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.500017240 2012-09-04 09:02:38,944 - ERROR [main:FileTxnSnapLog@226] - Failed to increment parent cversion for: /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms at org.apache.zookeeper.server.DataTree.incrementCversion(DataTree.java:1218) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:224) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) 2012-09-04 09:02:38,945 - FATAL [main:QuorumPeer@400] - Unable to load database on disk java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) 2012-09-04 09:02:38,946 - FATAL [main:QuorumPeerMain@87] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) Caused by: java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) ... 3 more {noformat} Removing data from /var/zookeeper/version-2 then restart seems to fix the
[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2189: - Description: This was original JIRA of ZOOKEEPER-2203. For project management reason, all the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is linked to ZOOKEEPER-2098. == This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? was: This was original JIRA of ZOOKEEPER-2203. For project management reason, all the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is linked to ZOOKEEPER-2098. This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This was original JIRA of ZOOKEEPER-2203. For project management reason, all the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is linked to ZOOKEEPER-2098. == This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2189: - Description: This was This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? was: This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This was This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2189: - Summary: QuorumCnxManager: use BufferedOutputStream for initial msg (was: multiple leaders can be elected when configs conflict) QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2189: - Description: This was original JIRA of ZOOKEEPER-2203. For project management reason, all the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is linked to ZOOKEEPER-2098. This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? was: This was This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This was original JIRA of ZOOKEEPER-2203. For project management reason, all the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is linked to ZOOKEEPER-2098. This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570194#comment-14570194 ] Hongchao Deng commented on ZOOKEEPER-2189: -- Hi, Thanks for your understanding. I really appreciate your help. All you have to do is to click More - Clone and clone this JIRA to a new one. Then I can take it myself and add a comment to tell people that discussion here belongs to that JIRA :) Thanks for reporting the issue and contributing to the project too! multiple leaders can be elected when configs conflict - Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2203) multiple leaders can be elected when configs conflict
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2203: - Summary: multiple leaders can be elected when configs conflict (was: CLONE - multiple leaders can be elected when configs conflict) multiple leaders can be elected when configs conflict - Key: ZOOKEEPER-2203 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2203 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567512#comment-14567512 ] Hongchao Deng commented on ZOOKEEPER-2189: -- I didn't offer anything... You might misunderstand.. What I meant is I committed 2098, but I wrote the message to be 2198 (not committed 2198). I wonder if you can give this JIRA to me, I will replicate 2098 to this JIRA and mark it duplicated. You can create a new JIRA and move discussion there. I would really appreciate it if you can help me. :) multiple leaders can be elected when configs conflict - Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566120#comment-14566120 ] Hongchao Deng commented on ZOOKEEPER-2189: -- Hi [~suda]. I committed ZOOKEEPER-2098 but mistakenly wrote the commit message to be ZOOKEEPER-2189. Would you mind to open another JIRA and grant this JIRA number to me. Thanks! multiple leaders can be elected when configs conflict - Key: ZOOKEEPER-2189 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.5.0 Reporter: Akihiro Suda This sequence leads the ensemble to a split-brain state: * Start server 1 (config=1:participant, 2:participant, 3:participant) * Start server 2 (config=1:participant, 2:participant, 3:participant) * 1 and 2 believe 2 is the leader * Start server 3 (config=1:observer, 2:observer, 3:participant) * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader Such a split-brain ensemble is very unstable. Znodes can be lost easily: * Create some znodes on 2 * Restart 1 and 2 * 1, 2 and 3 can think 3 is the leader * znodes created on 2 are lost, as 1 and 2 sync with 3 I consider this behavior as a bug and that ZK should fail gracefully if a participant is listed as an observer in the config. In current implementation, ZK cannot detect such an invalid config, as FastLeaderElection.sendNotification() sends notifications to only voting members and hence there is no message from observers(1 and 2) to the new voter (3). I think FastLeaderElection.sendNotification() should send notifications to all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify acks. Any thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565115#comment-14565115 ] Hongchao Deng commented on ZOOKEEPER-2187: -- +1, Thanks [~rgs]. I will commit this shortly. remove duplicated code between CreateRequest{,2} Key: ZOOKEEPER-2187 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187 Project: ZooKeeper Issue Type: Bug Components: c client, java client, server Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Priority: Minor Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2187.patch To avoid cargo culting and reducing duplicated code we can merge most of CreateRequest CreateRequest2 given that only the Response object is actually different. This will improve readability of the code plus make it less confusing for people adding new opcodes in the future (i.e.: copying a request definition vs reusing what's already there, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2187: - Fix Version/s: (was: 3.5.2) 3.5.1 remove duplicated code between CreateRequest{,2} Key: ZOOKEEPER-2187 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187 Project: ZooKeeper Issue Type: Bug Components: c client, java client, server Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Priority: Minor Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2187.patch To avoid cargo culting and reducing duplicated code we can merge most of CreateRequest CreateRequest2 given that only the Response object is actually different. This will improve readability of the code plus make it less confusing for people adding new opcodes in the future (i.e.: copying a request definition vs reusing what's already there, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2187: - Release Note: (was: trunk: https://github.com/apache/zookeeper/commit/652a53618cf93165b67dce4816e4831d10393a03 branch-3.5: https://github.com/apache/zookeeper/commit/a9556fa88a441882624a0c6e57c442662514e94a) remove duplicated code between CreateRequest{,2} Key: ZOOKEEPER-2187 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187 Project: ZooKeeper Issue Type: Bug Components: c client, java client, server Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Priority: Minor Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2187.patch To avoid cargo culting and reducing duplicated code we can merge most of CreateRequest CreateRequest2 given that only the Response object is actually different. This will improve readability of the code plus make it less confusing for people adding new opcodes in the future (i.e.: copying a request definition vs reusing what's already there, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565131#comment-14565131 ] Hongchao Deng commented on ZOOKEEPER-2187: -- Commited. trunk: https://github.com/apache/zookeeper/commit/652a53618cf93165b67dce4816e4831d10393a03 branch-3.5: https://github.com/apache/zookeeper/commit/a9556fa88a441882624a0c6e57c442662514e94a remove duplicated code between CreateRequest{,2} Key: ZOOKEEPER-2187 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187 Project: ZooKeeper Issue Type: Bug Components: c client, java client, server Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Priority: Minor Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2187.patch To avoid cargo culting and reducing duplicated code we can merge most of CreateRequest CreateRequest2 given that only the Response object is actually different. This will improve readability of the code plus make it less confusing for people adding new opcodes in the future (i.e.: copying a request definition vs reusing what's already there, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565337#comment-14565337 ] Hongchao Deng commented on ZOOKEEPER-2098: -- +1, LGTM. Thanks [~rgs], I will merge this shortly. QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2098 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098 Project: ZooKeeper Issue Type: Improvement Components: quorum, server Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch Whilst writing fle-dump (a tool like [zk-dump|https://github.com/twitter/zktraffic/], but to dump FastLeaderElection messages), I noticed that QCM is using DataOutputStream (which doesn't buffer) directly. So all calls to write() are written immediately to the network, which means simple messaages like two participants exchanging Votes can take a couple RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs). The solution is to use BufferedOutputStream for the initial negotiation between members of the cluster. Note that there are other places were suboptimal (but not entirely unbuffered) writes to the network still exist. I'll get those in separate tickets. After using BufferedOutputStream we get only 1 RTT for the initial message, so elections time for for participants to join a cluster is reduced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2098: - Fix Version/s: (was: 3.5.2) 3.5.1 QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2098 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098 Project: ZooKeeper Issue Type: Improvement Components: quorum, server Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch Whilst writing fle-dump (a tool like [zk-dump|https://github.com/twitter/zktraffic/], but to dump FastLeaderElection messages), I noticed that QCM is using DataOutputStream (which doesn't buffer) directly. So all calls to write() are written immediately to the network, which means simple messaages like two participants exchanging Votes can take a couple RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs). The solution is to use BufferedOutputStream for the initial negotiation between members of the cluster. Note that there are other places were suboptimal (but not entirely unbuffered) writes to the network still exist. I'll get those in separate tickets. After using BufferedOutputStream we get only 1 RTT for the initial message, so elections time for for participants to join a cluster is reduced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565484#comment-14565484 ] Hongchao Deng commented on ZOOKEEPER-2098: -- Committed: trunk: https://github.com/apache/zookeeper/commit/0cbc8eee21bda31184d4e7f11100bc0bb300f376 branch-3.5: https://github.com/apache/zookeeper/commit/f1f7b3714c5c36f1408f5fe0f0d0b3da305b1023 QuorumCnxManager: use BufferedOutputStream for initial msg -- Key: ZOOKEEPER-2098 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098 Project: ZooKeeper Issue Type: Improvement Components: quorum, server Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch Whilst writing fle-dump (a tool like [zk-dump|https://github.com/twitter/zktraffic/], but to dump FastLeaderElection messages), I noticed that QCM is using DataOutputStream (which doesn't buffer) directly. So all calls to write() are written immediately to the network, which means simple messaages like two participants exchanging Votes can take a couple RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs). The solution is to use BufferedOutputStream for the initial negotiation between members of the cluster. Note that there are other places were suboptimal (but not entirely unbuffered) writes to the network still exist. I'll get those in separate tickets. After using BufferedOutputStream we get only 1 RTT for the initial message, so elections time for for participants to join a cluster is reduced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2179) Typo in Watcher.java
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2179: - Issue Type: Improvement (was: Bug) Typo in Watcher.java Key: ZOOKEEPER-2179 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2179 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5, 3.5.0 Reporter: Eunchan Kim Priority: Trivial Fix For: 3.4.5, 3.5.0 Attachments: ZOOKEEPER-2179.patch at zookeeper/src/java/main/org/apache/zookeeper/Watcher.java, * implement. A ZooKeeper client will get various events from the ZooKeepr should be fixed to * implement. A ZooKeeper client will get various events from the ZooKeeper. (Zookeepr - Zookeeper) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564191#comment-14564191 ] Hongchao Deng commented on ZOOKEEPER-832: - I mean expire the session on client side. It's client who's not consistent with the view. We should fix it on client (by crashing it) not server. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5, 3.5.0 Environment: All Reporter: Ryan Holmes Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.7, 3.5.2, 3.6.0 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564189#comment-14564189 ] Hongchao Deng commented on ZOOKEEPER-832: - I mean expire the session on client side. It's client who's not consistent with the view. We should fix it on client (by crashing it) not server. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5, 3.5.0 Environment: All Reporter: Ryan Holmes Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.7, 3.5.2, 3.6.0 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564190#comment-14564190 ] Hongchao Deng commented on ZOOKEEPER-832: - I mean expire the session on client side. It's client who's not consistent with the view. We should fix it on client (by crashing it) not server. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5, 3.5.0 Environment: All Reporter: Ryan Holmes Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.7, 3.5.2, 3.6.0 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561478#comment-14561478 ] Hongchao Deng commented on ZOOKEEPER-832: - Hi German, The test is a known flaky test. Regarding this issue, I thought this is a client issue because it has a history that's not in the server. The right thing to do is to expire/close the client. Invalid session id causes infinite loop during automatic reconnect -- Key: ZOOKEEPER-832 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.5, 3.5.0 Environment: All Reporter: Ryan Holmes Assignee: Germán Blanco Priority: Blocker Fix For: 3.4.7, 3.5.2, 3.6.0 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is session invalid, similar to how the session expired state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557932#comment-14557932 ] Hongchao Deng commented on ZOOKEEPER-2101: -- It's just my personal opinion. Swallowing a system failure exception doesn't look like a good choice. I usually prefer to let the system crash if not recoverable. I'm not sure why in Leader and ZKDatabase it did that. So I will leave it to others to comment. Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive
[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554808#comment-14554808 ] Hongchao Deng commented on ZOOKEEPER-2101: -- Another question: in SerializeUtils: {code} serializeRequest(): catch (IOException e) { LOG.error(This really should be impossible, e); {code} If such an unexpected exception happens, should the exception goes up and let server fail? Transaction larger than max buffer of jute makes zookeeper unavailable -- Key: ZOOKEEPER-2101 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101 Project: ZooKeeper Issue Type: Bug Components: jute Affects Versions: 3.4.4 Reporter: Liu Shaohui Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff *Problem* For multi operation, PrepRequestProcessor may produce a large transaction whose size may be larger than the max buffer size of jute. There is check of buffer size in readBuffer method of BinaryInputArchive, but no check in writeBuffer method of BinaryOutputArchive, which will cause that 1, Leader can sync transaction to txn log and send the large transaction to the followers, but the followers failed to read the transaction and can't sync with leader. {code} 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: [myid:2] Exception when following the leader java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: [myid:2] shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) {code} 2, The leader lose all followers, which trigger the leader election. The old leader will become leader again for it has up-to-date data. {code} 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutting down 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: [myid:3] Shutdown called java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) {code} 3, The leader can not load the transaction from the txn log for the length of data is larger than the max buffer of jute. {code} 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: [myid:3] Unable to load database on disk java.io.IOException: Unreasonable length = 2054758 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) at org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546) at org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716) {code} The zookeeper service will be unavailable until we enlarge the jute.maxbuffer and restart zookeeper hbase cluster. *Solution* Add buffer size check in BinaryOutputArchive to avoid large transaction be written to log and sent to followers. But I am not sure if there are side-effects of throwing an IOException in BinaryOutputArchive and
[jira] [Commented] (ZOOKEEPER-2191) Continue supporting prior Ant versions that don't implement the threads attribute for the JUnit task.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555042#comment-14555042 ] Hongchao Deng commented on ZOOKEEPER-2191: -- +1 Thanks for your work, [~cnauroth]. Continue supporting prior Ant versions that don't implement the threads attribute for the JUnit task. - Key: ZOOKEEPER-2191 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2191 Project: ZooKeeper Issue Type: Improvement Components: build Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2191.001.patch, ZOOKEEPER-2191.002.patch ZOOKEEPER-2183 introduced usage of the threads attribute on the junit task call in build.xml to speed up test execution. This attribute is only available since Ant 1.9.4. However, we can continue to support older Ant versions by calling the antversion task and dispatching to a clone of our junit task call that doesn't use the threads attribute. Users of older Ant versions will get the slower single-process test execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2190: - Affects Version/s: 3.5.0 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers --- Key: ZOOKEEPER-2190 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190 Project: ZooKeeper Issue Type: Bug Reporter: Hongchao Deng Assignee: Hongchao Deng Attachments: ZOOKEEPER-2190.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2190: - Affects Version/s: (was: 3.5.0) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers --- Key: ZOOKEEPER-2190 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190 Project: ZooKeeper Issue Type: Bug Reporter: Hongchao Deng Assignee: Hongchao Deng Attachments: ZOOKEEPER-2190.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Concurrent Testing Processes and Port Assignments
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543995#comment-14543995 ] Hongchao Deng commented on ZOOKEEPER-2183: -- Made mistake on CHANGES.txt when switching branches... Committed another fix to branch-3.5: https://github.com/apache/zookeeper/commit/419756a3ff3be986d3bbcef12ebdfba5c1b68412 Feeling guilty for it. Will be careful on future commits. Concurrent Testing Processes and Port Assignments - Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2183) Concurrent Testing Processes and Port Assignments
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2183: - Summary: Concurrent Testing Processes and Port Assignments (was: Concurrent Testing Processes and Port Assignments.) Concurrent Testing Processes and Port Assignments - Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2183) Concurrent Testing Processes and Port Assignments.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2183: - Summary: Concurrent Testing Processes and Port Assignments. (was: Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.) Concurrent Testing Processes and Port Assignments. -- Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544240#comment-14544240 ] Hongchao Deng commented on ZOOKEEPER-2190: -- It's good to go :) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers --- Key: ZOOKEEPER-2190 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190 Project: ZooKeeper Issue Type: Bug Components: tests Reporter: Hongchao Deng Assignee: Hongchao Deng Attachments: ZOOKEEPER-2190.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-901) Redesign of QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544480#comment-14544480 ] Hongchao Deng commented on ZOOKEEPER-901: - Sure. Thanks for it. I'm still catching up. It's great that we can discuss design and problems here. Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: ZooKeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Hongchao Deng Fix For: 3.6.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542947#comment-14542947 ] Hongchao Deng commented on ZOOKEEPER-2183: -- I prefer 8. The threads is per class. So the test report is still clear. What's more, if it's constant failure developers can reproduce it; if flaky, I doubt the logs help... Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2190: - Attachment: ZOOKEEPER-2190.patch In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers --- Key: ZOOKEEPER-2190 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190 Project: ZooKeeper Issue Type: Bug Reporter: Hongchao Deng Assignee: Hongchao Deng Attachments: ZOOKEEPER-2190.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543139#comment-14543139 ] Hongchao Deng commented on ZOOKEEPER-2183: -- I separate it into another JIRA: ZOOKEEPER-2190 Let's go fix it there. Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542991#comment-14542991 ] Hongchao Deng commented on ZOOKEEPER-2186: -- +1 The latest patch looks really good. Thanks for the clean patch, Raul! QuorumCnxManager#receiveConnection may crash with random input -- Key: ZOOKEEPER-2186 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch This will allocate an arbitrarily large byte buffer (and try to read it!): {code} public boolean receiveConnection(Socket sock) { Long sid = null; ... sid = din.readLong(); // next comes the #bytes in the remainder of the message int num_remaining_bytes = din.readInt(); byte[] b = new byte[num_remaining_bytes]; // remove the remainder of the message from din int num_read = din.read(b); {code} This will crash the QuorumCnxManager thread, so the cluster will keep going but future elections might fail to converge (ditto for leaving/joining members). Patch coming up in a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2183: - Fix Version/s: 3.6.0 3.5.1 Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2183: - Affects Version/s: 3.5.0 Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers
Hongchao Deng created ZOOKEEPER-2190: Summary: In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers Key: ZOOKEEPER-2190 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190 Project: ZooKeeper Issue Type: Bug Reporter: Hongchao Deng Assignee: Hongchao Deng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543120#comment-14543120 ] Hongchao Deng commented on ZOOKEEPER-2190: -- [~michim] [~shralex] I kinda found the logic isn't right as the bug shows up in ZK-2183. Can you take a look? In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers --- Key: ZOOKEEPER-2190 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190 Project: ZooKeeper Issue Type: Bug Reporter: Hongchao Deng Assignee: Hongchao Deng Attachments: ZOOKEEPER-2190.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543079#comment-14543079 ] Hongchao Deng commented on ZOOKEEPER-2183: -- [~rakeshr] [~rgs] [~michim] [~shralex] Any other committer would like to review this? I will commit it once it gets another +1. Thanks! Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543118#comment-14543118 ] Hongchao Deng commented on ZOOKEEPER-2183: -- BTW, StandaloneDisabledTest.startSingleServerTest() fails intermittently on the following lines: {code} //reconfigure out leader and follower 1. Remaining follower //2 should elect itself as leader and run by itself reconfigServers.clear(); reconfigServers.add(Integer.toString(leaderId)); reconfigServers.add(Integer.toString(follower1)); testReconfig(follower2, false, reconfigServers); {code} I think the logic isn't correct because {code} ReconfigTest.testServerHasConfig(zkHandles[id], servers, null); {code} is testing the leaving servers as the joining servers, right? [~shralex] Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ZOOKEEPER-2094) SSL feature on Netty
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng resolved ZOOKEEPER-2094. -- Resolution: Duplicate SSL feature on Netty Key: ZOOKEEPER-2094 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2094 Project: ZooKeeper Issue Type: Sub-task Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Ian Dimayuga Assignee: Ian Dimayuga Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-2094-git-apply.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, test.cert, testKeyStore.jks, testTrustStore.jks, testUntrustedKeyStore.jks Add SSL handler to Netty pipeline, and a default X509AuthenticationProvider to perform authentication. Review board: https://reviews.apache.org/r/30753/diff/# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541427#comment-14541427 ] Hongchao Deng commented on ZOOKEEPER-2183: -- PortAssignment seems to make assumptions on ant running time. Do you think it's better to write the code like: {code} public synchronized static int unique() { if (!initialized) { setupPortRange() } } {code} A test is highly recommended for setupPortRange() too :) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2183: - Attachment: threads-change.patch Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540216#comment-14540216 ] Hongchao Deng commented on ZOOKEEPER-2183: -- The multi-threading change is very speedy. But I wonder if the test failures are caused by the change of multi-threading or port assignment? Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540271#comment-14540271 ] Hongchao Deng commented on ZOOKEEPER-2183: -- I see. That's why we need to change the port assignment at the same time. Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541324#comment-14541324 ] Hongchao Deng commented on ZOOKEEPER-2183: -- Thanks Chris! I'm reviewing the patch now. No worry about the Jenkins flood. That's what it's used for... Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541325#comment-14541325 ] Hongchao Deng commented on ZOOKEEPER-2183: -- Thanks Chris! I'm reviewing the patch now. No worry about the Jenkins flood. That's what it's used for... Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, ZOOKEEPER-2183.003.patch, threads-change.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539253#comment-14539253 ] Hongchao Deng commented on ZOOKEEPER-2183: -- StandaloneDisabledTest.startSingleServerTest was flaky in this case -- I run that single test successfully in local. Let me run again the entire test suite... Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host. Key: ZOOKEEPER-2183 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2183.001.patch Tests use {{PortAssignment#unique}} for assignment of the ports to bind during tests. Currently, this method works by using a monotonically increasing counter from a static starting point. Generally, this is sufficient to achieve uniqueness within a single JVM process, but it does not achieve uniqueness across multiple processes on the same host. This can cause tests to get bind errors if there are multiple pre-commit jobs running concurrently on the same Jenkins host. This also prevents running tests in parallel to improve the speed of pre-commit runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539286#comment-14539286 ] Hongchao Deng commented on ZOOKEEPER-2182: -- Thank you! My mistake to forget close it. Several test suites are not running during pre-commit, because their names do not end with Test. -- Key: ZOOKEEPER-2182 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2182.001.patch In build.xml, the {{junit}} task definition uses an include pattern of {{\*\*/\*$\{test.category\}Test.java}}. This is important so that we don't accidentally try to run utility classes like {{PortAssignment}} or {{TestableZooKeeper}} as if they were JUnit suites. However, several test suites are misnamed so that they don't satisfy this pattern, and therefore pre-commit hasn't been running them. {{ClientRetry}} {{ReconfigFailureCases}} {{WatchEventWhenAutoReset}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538116#comment-14538116 ] Hongchao Deng commented on ZOOKEEPER-2186: -- [~rgs] Can you open a RB for this? I have some questions and comments to make. Thanks! QuorumCnxManager#receiveConnection may crash with random input -- Key: ZOOKEEPER-2186 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2186.patch This will allocate an arbitrarily large byte buffer (and try to read it!): {code} public boolean receiveConnection(Socket sock) { Long sid = null; ... sid = din.readLong(); // next comes the #bytes in the remainder of the message int num_remaining_bytes = din.readInt(); byte[] b = new byte[num_remaining_bytes]; // remove the remainder of the message from din int num_read = din.read(b); {code} This will crash the QuorumCnxManager thread, so the cluster will keep going but future elections might fail to converge (ditto for leaving/joining members). Patch coming up in a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537381#comment-14537381 ] Hongchao Deng commented on ZOOKEEPER-2171: -- Great work folks! Just mention one discrepancy I found in 'CHANGE.txt': 1. in branch-3.5, it shows in bug fixes 2. in trunk, it shows in improvement. avoid reverse lookups in QuorumCnxManager - Key: ZOOKEEPER-2171 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171 Project: ZooKeeper Issue Type: Bug Components: quorum Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2171.patch, ZOOKEEPER-2171.patch Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of getHostName() calls in QCM. Besides the overhead, these can cause problems when mixed with failing/mis-configured DNS servers. It would be nice to reduce them, if that doesn't affect operational correctness. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2182: - Affects Version/s: 3.5.0 Several test suites are not running during pre-commit, because their names do not end with Test. -- Key: ZOOKEEPER-2182 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2182.001.patch In build.xml, the {{junit}} task definition uses an include pattern of {{\*\*/\*$\{test.category\}Test.java}}. This is important so that we don't accidentally try to run utility classes like {{PortAssignment}} or {{TestableZooKeeper}} as if they were JUnit suites. However, several test suites are misnamed so that they don't satisfy this pattern, and therefore pre-commit hasn't been running them. {{ClientRetry}} {{ReconfigFailureCases}} {{WatchEventWhenAutoReset}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2182: - Fix Version/s: 3.6.0 3.5.1 Several test suites are not running during pre-commit, because their names do not end with Test. -- Key: ZOOKEEPER-2182 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2182.001.patch In build.xml, the {{junit}} task definition uses an include pattern of {{\*\*/\*$\{test.category\}Test.java}}. This is important so that we don't accidentally try to run utility classes like {{PortAssignment}} or {{TestableZooKeeper}} as if they were JUnit suites. However, several test suites are misnamed so that they don't satisfy this pattern, and therefore pre-commit hasn't been running them. {{ClientRetry}} {{ReconfigFailureCases}} {{WatchEventWhenAutoReset}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537384#comment-14537384 ] Hongchao Deng commented on ZOOKEEPER-2182: -- Thanks for Alex's review and [~cnauroth]'s patch Several test suites are not running during pre-commit, because their names do not end with Test. -- Key: ZOOKEEPER-2182 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2182.001.patch In build.xml, the {{junit}} task definition uses an include pattern of {{\*\*/\*$\{test.category\}Test.java}}. This is important so that we don't accidentally try to run utility classes like {{PortAssignment}} or {{TestableZooKeeper}} as if they were JUnit suites. However, several test suites are misnamed so that they don't satisfy this pattern, and therefore pre-commit hasn't been running them. {{ClientRetry}} {{ReconfigFailureCases}} {{WatchEventWhenAutoReset}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537383#comment-14537383 ] Hongchao Deng commented on ZOOKEEPER-2182: -- Committed: trunk: https://github.com/apache/zookeeper/commit/029d6299e006ca697c3d6f9953b3194a7c33bf19 branch-3.5: https://github.com/apache/zookeeper/commit/5b06e01de19135b6fe38a947dd1238877a647e49 Several test suites are not running during pre-commit, because their names do not end with Test. -- Key: ZOOKEEPER-2182 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182 Project: ZooKeeper Issue Type: Bug Components: tests Affects Versions: 3.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2182.001.patch In build.xml, the {{junit}} task definition uses an include pattern of {{\*\*/\*$\{test.category\}Test.java}}. This is important so that we don't accidentally try to run utility classes like {{PortAssignment}} or {{TestableZooKeeper}} as if they were JUnit suites. However, several test suites are misnamed so that they don't satisfy this pattern, and therefore pre-commit hasn't been running them. {{ClientRetry}} {{ReconfigFailureCases}} {{WatchEventWhenAutoReset}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535419#comment-14535419 ] Hongchao Deng commented on ZOOKEEPER-2186: -- Good catch! I will be glad to review and commit it. QuorumCnxManager#receiveConnection may crash with random input -- Key: ZOOKEEPER-2186 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.4.7, 3.5.1, 3.6.0 This will allocate an arbitrarily large byte buffer (and try to read it!): {code} public boolean receiveConnection(Socket sock) { Long sid = null; ... sid = din.readLong(); // next comes the #bytes in the remainder of the message int num_remaining_bytes = din.readInt(); byte[] b = new byte[num_remaining_bytes]; // remove the remainder of the message from din int num_read = din.read(b); {code} This will crash the QuorumCnxManager thread, so the cluster will keep going but future elections might fail to converge (ditto for leaving/joining members). Patch coming up in a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536207#comment-14536207 ] Hongchao Deng commented on ZOOKEEPER-2186: -- Can you open a RB for it? QuorumCnxManager#receiveConnection may crash with random input -- Key: ZOOKEEPER-2186 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.6, 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2186.patch This will allocate an arbitrarily large byte buffer (and try to read it!): {code} public boolean receiveConnection(Socket sock) { Long sid = null; ... sid = din.readLong(); // next comes the #bytes in the remainder of the message int num_remaining_bytes = din.readInt(); byte[] b = new byte[num_remaining_bytes]; // remove the remainder of the message from din int num_read = din.read(b); {code} This will crash the QuorumCnxManager thread, so the cluster will keep going but future elections might fail to converge (ditto for leaving/joining members). Patch coming up in a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1451#comment-1451 ] Hongchao Deng commented on ZOOKEEPER-2182: -- +1 for the patch. Several test suites are not running during pre-commit, because their names do not end with Test. -- Key: ZOOKEEPER-2182 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182 Project: ZooKeeper Issue Type: Bug Components: tests Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: ZOOKEEPER-2182.001.patch In build.xml, the {{junit}} task definition uses an include pattern of {{\*\*/\*$\{test.category\}Test.java}}. This is important so that we don't accidentally try to run utility classes like {{PortAssignment}} or {{TestableZooKeeper}} as if they were JUnit suites. However, several test suites are misnamed so that they don't satisfy this pattern, and therefore pre-commit hasn't been running them. {{ClientRetry}} {{ReconfigFailureCases}} {{WatchEventWhenAutoReset}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2153) X509 Authentication Documentation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2153: - Attachment: ZOOKEEPER-2153.patch X509 Authentication Documentation - Key: ZOOKEEPER-2153 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153 Project: ZooKeeper Issue Type: Sub-task Affects Versions: 3.5.0 Reporter: Hongchao Deng Assignee: Ian Dimayuga Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2153.patch, ZOOKEEPER-2153.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2153) X509 Authentication Documentation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530765#comment-14530765 ] Hongchao Deng commented on ZOOKEEPER-2153: -- The parenthesis is fixed: trunk: https://github.com/apache/zookeeper/commit/f45e48569b2e684378fdc56ef6bab96d3fcc0f88 branch-3.5: https://github.com/apache/zookeeper/commit/665c5aba9bba297daa8e491ff593945ab5e69a2f X509 Authentication Documentation - Key: ZOOKEEPER-2153 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153 Project: ZooKeeper Issue Type: Sub-task Affects Versions: 3.5.0 Reporter: Hongchao Deng Assignee: Ian Dimayuga Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2153.patch, ZOOKEEPER-2153.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2153) X509 Authentication Documentation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528838#comment-14528838 ] Hongchao Deng commented on ZOOKEEPER-2153: -- I was busy for a talk before.. Let me do the delayed commit. Thanks [~rakeshr] for the review, [~iandi] for the work. X509 Authentication Documentation - Key: ZOOKEEPER-2153 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153 Project: ZooKeeper Issue Type: Sub-task Affects Versions: 3.5.0 Reporter: Hongchao Deng Assignee: Ian Dimayuga Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2153.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ZOOKEEPER-2153) X509 Authentication Documentation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng resolved ZOOKEEPER-2153. -- Resolution: Fixed Committed to: trunk: https://github.com/apache/zookeeper/commit/ea5abdb82d2e2bc4ed0559420b109da35b30bfca branch-3.5: https://github.com/apache/zookeeper/commit/da4d934b89fece39401230a6c26ce61715427960 X509 Authentication Documentation - Key: ZOOKEEPER-2153 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153 Project: ZooKeeper Issue Type: Sub-task Affects Versions: 3.5.0 Reporter: Hongchao Deng Assignee: Ian Dimayuga Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2153.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2176) unclear error message should be info or warn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528739#comment-14528739 ] Hongchao Deng commented on ZOOKEEPER-2176: -- The patch is trivial and LGTM. +1 I will commit this shortly. unclear error message should be info or warn Key: ZOOKEEPER-2176 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.5.0, 3.5.1, 3.5.2 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Attachments: ZOOKEEPER-2176.patch Hi [~shralex], Looking at the CI output of ZOOKEEPER-2163 I see this: {noformat} [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394] - writeToDisk == true but configFilename == null {noformat} Though looking at QuorumPeer#setQuorumVerifier I see: {noformat} if (configFilename != null) { try { String dynamicConfigFilename = makeDynamicConfigFilename( qv.getVersion()); QuorumPeerConfig.writeDynamicConfig( dynamicConfigFilename, qv, false); QuorumPeerConfig.editStaticConfig(configFilename, dynamicConfigFilename, needEraseClientInfoFromStaticConfig()); } catch (IOException e) { LOG.error(Error closing file: , e.getMessage()); } } else { LOG.error(writeToDisk == true but configFilename == null); } {noformat} there's no proper error handling so I guess maybe we should just make it a warning? Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2176) Unclear error message should be info not error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2176: - Affects Version/s: (was: 3.5.2) (was: 3.5.1) Unclear error message should be info not error -- Key: ZOOKEEPER-2176 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2176.patch Hi [~shralex], Looking at the CI output of ZOOKEEPER-2163 I see this: {noformat} [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394] - writeToDisk == true but configFilename == null {noformat} Though looking at QuorumPeer#setQuorumVerifier I see: {noformat} if (configFilename != null) { try { String dynamicConfigFilename = makeDynamicConfigFilename( qv.getVersion()); QuorumPeerConfig.writeDynamicConfig( dynamicConfigFilename, qv, false); QuorumPeerConfig.editStaticConfig(configFilename, dynamicConfigFilename, needEraseClientInfoFromStaticConfig()); } catch (IOException e) { LOG.error(Error closing file: , e.getMessage()); } } else { LOG.error(writeToDisk == true but configFilename == null); } {noformat} there's no proper error handling so I guess maybe we should just make it a warning? Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2176) Unclear error message should be info not error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2176: - Summary: Unclear error message should be info not error (was: unclear error message should be info or warn) Unclear error message should be info not error -- Key: ZOOKEEPER-2176 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2176.patch Hi [~shralex], Looking at the CI output of ZOOKEEPER-2163 I see this: {noformat} [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394] - writeToDisk == true but configFilename == null {noformat} Though looking at QuorumPeer#setQuorumVerifier I see: {noformat} if (configFilename != null) { try { String dynamicConfigFilename = makeDynamicConfigFilename( qv.getVersion()); QuorumPeerConfig.writeDynamicConfig( dynamicConfigFilename, qv, false); QuorumPeerConfig.editStaticConfig(configFilename, dynamicConfigFilename, needEraseClientInfoFromStaticConfig()); } catch (IOException e) { LOG.error(Error closing file: , e.getMessage()); } } else { LOG.error(writeToDisk == true but configFilename == null); } {noformat} there's no proper error handling so I guess maybe we should just make it a warning? Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2176) Unclear error message should be info not error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongchao Deng updated ZOOKEEPER-2176: - Fix Version/s: 3.6.0 3.5.1 Unclear error message should be info not error -- Key: ZOOKEEPER-2176 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.5.0 Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-2176.patch Hi [~shralex], Looking at the CI output of ZOOKEEPER-2163 I see this: {noformat} [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394] - writeToDisk == true but configFilename == null {noformat} Though looking at QuorumPeer#setQuorumVerifier I see: {noformat} if (configFilename != null) { try { String dynamicConfigFilename = makeDynamicConfigFilename( qv.getVersion()); QuorumPeerConfig.writeDynamicConfig( dynamicConfigFilename, qv, false); QuorumPeerConfig.editStaticConfig(configFilename, dynamicConfigFilename, needEraseClientInfoFromStaticConfig()); } catch (IOException e) { LOG.error(Error closing file: , e.getMessage()); } } else { LOG.error(writeToDisk == true but configFilename == null); } {noformat} there's no proper error handling so I guess maybe we should just make it a warning? Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)