from:"Hongchao Deng \(JIRA\)"

[jira] [Commented] (ZOOKEEPER-1460) IPv6 literal address not supported for quorum members

2015-09-15 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745633#comment-14745633
 ] 

Hongchao Deng commented on ZOOKEEPER-1460:
--

+1
Thanks Raul for raising it up. I did remember there is some IPV6 issues.

> IPv6 literal address not supported for quorum members
> -
>
> Key: ZOOKEEPER-1460
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1460
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.3
>Reporter: Chris Dolan
>Assignee: Thawan Kooburat
> Attachments: 
> ZOOKEEPER-1460-accept-square-bracket-delimited-IPv6-literals.diff
>
>
> Via code inspection, I see that the "server.nnn" configuration key does not 
> support literal IPv6 addresses because the property value is split on ":". In 
> v3.4.3, the problem is in QuorumPeerConfig:
> {noformat}
> String parts[] = value.split(":");
> InetSocketAddress addr = new InetSocketAddress(parts[0],
> Integer.parseInt(parts[1]));
> {noformat}
> In the current trunk 
> (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java?view=markup)
>  this code has been refactored into QuorumPeer.QuorumServer, but the bug 
> remains:
> {noformat}
> String serverClientParts[] = addressStr.split(";");
> String serverParts[] = serverClientParts[0].split(":");
> addr = new InetSocketAddress(serverParts[0],
> Integer.parseInt(serverParts[1]));
> {noformat}
> This bug probably affects very few users because most will naturally use a 
> hostname rather than a literal IP address. But given that IPv6 addresses are 
> supported for clients via ZOOKEEPER-667 it seems that server support should 
> be fixed too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-08-27 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717839#comment-14717839
 ] 

Hongchao Deng commented on ZOOKEEPER-2101:
--

I have one and only one comment on the swallowed exception as mentioned above. 
It would be great if other committers can review and give more feedback.

[~liushaohui], are you still available for the JIRA? Otherwise I can take care 
of it. I want to get this done by the weekend.

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-08-26 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715503#comment-14715503
 ] 

Hongchao Deng commented on ZOOKEEPER-2101:
--

I will review it by this weekend and hopefully get it committed soon.

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and RequestProcessors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-08-17 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700205#comment-14700205
 ] 

Hongchao Deng commented on ZOOKEEPER-1907:
--

Committed to branch-3.4:
https://github.com/apache/zookeeper/commit/91f579e40755de870ed9123c8fd55925517d9aa6

Thanks [~rakeshr]!



 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-08-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692343#comment-14692343
 ] 

Hongchao Deng commented on ZOOKEEPER-1907:
--

+1
I have reviewed the PR and run the unit test locally. It's nice work!
Would any other committer have time to review it too? Otw, I will get this in 
probably by next week.

 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-08-05 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658474#comment-14658474
 ] 

Hongchao Deng commented on ZOOKEEPER-1907:
--

I used rbtools to upload patches. The web interface has been broken to me for a 
long time..

 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-08-05 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658788#comment-14658788
 ] 

Hongchao Deng commented on ZOOKEEPER-1907:
--

Yes! That would be great.

 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling

2015-08-04 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654868#comment-14654868
 ] 

Hongchao Deng commented on ZOOKEEPER-1907:
--

GJ Rakesh.

Do you mind uploading it to ReviewBoard? I would like to give some comments and 
definitely get this this ASAP.

 Improve Thread handling
 ---

 Key: ZOOKEEPER-1907
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.5.0
Reporter: Rakesh R
Assignee: Rakesh R
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
 ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch


 Server has many critical threads running and co-ordinating each other like  
 RequestProcessor chains et. When going through each threads, most of them 
 having the similar structure like:
 {code}
 public void run() {
 try {
   while(running)
// processing logic
   }
 } catch (InterruptedException e) {
 LOG.error(Unexpected interruption, e);
 } catch (Exception e) {
 LOG.error(Unexpected exception, e);
 }
 LOG.info(...exited loop!);
 }
 {code}
 From the design I could see, there could be a chance of silently leaving the 
 thread by swallowing the exception. If this happens in the production, the 
 server would get hanged forever and would not be able to deliver its role. 
 Now its hard for the management tool to detect this.
 The idea of this JIRA is to discuss and imprv.
 Reference: [Community discussion 
 thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2233) Invalid description in the comment of LearnerHandler.syncFollower()

2015-07-14 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626788#comment-14626788
 ] 

Hongchao Deng commented on ZOOKEEPER-2233:
--

LGTM. +1

 Invalid description in the comment of LearnerHandler.syncFollower()
 ---

 Key: ZOOKEEPER-2233
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2233
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Hitoshi Mitake
Assignee: Hitoshi Mitake
Priority: Trivial
 Attachments: ZOOKEEPER-2233.patch


 LearnerHandler.syncFollower() has a comment like below:
 When leader election is completed, the leader will set its
 lastProcessedZxid to be (epoch  32). There will be no txn associated
 with this zxid.
 However, IIUC, the expression epoch  32 (comparison) should be epoch  
 32 (bitshift).
 Of course the error is very trivial but it was a little bit confusing for me, 
 so I'd like to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing

2015-06-26 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603285#comment-14603285
 ] 

Hongchao Deng commented on ZOOKEEPER-2164:
--

It's on my plan to have a patch for this. I'm currently involved in internal 
stuff. I should be able to get onto this after that.

At the mean time, it sounds like you have a good testing plan. Would be nice if 
you can share it. :)

 fast leader election keeps failing
 --

 Key: ZOOKEEPER-2164
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.5
Reporter: Michi Mutsuzaki
Assignee: Hongchao Deng
 Fix For: 3.5.2, 3.6.0


 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. 
 When I shut down 2, 1 and 3 keep going back to leader election. Here is what 
 seems to be happening.
 - Both 1 and 3 elect 3 as the leader.
 - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a 
 follower.
 - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't 
 timeout for 5 seconds: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
 - By the time 3 receives votes, 1 has given up trying to connect to 3: 
 https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a 
 while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.

2015-06-24 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599725#comment-14599725
 ] 

Hongchao Deng commented on ZOOKEEPER-1000:
--

Can you open a new JIRA?

 Provide SSL in zookeeper to be able to run cross colos.
 ---

 Key: ZOOKEEPER-1000
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.5.2, 3.6.0


 This jira is to track SSL for zookeeper. The inter zookeeper server 
 communication and the client to server communication should be over ssl so 
 that zookeeper can be deployed over WAN's. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2220) Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty

2015-06-24 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599923#comment-14599923
 ] 

Hongchao Deng commented on ZOOKEEPER-2220:
--

Can you give more details?
Even a log file would explain more.

 Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty
 ---

 Key: ZOOKEEPER-2220
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2220
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
 Environment: Alpha
Reporter: rupa mogali

 I am trying to test SSL connectivity between client and server following the 
 instructions in the following page:
 https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
 But, I get the following when trying to connect to server from client..
 2015-06-24 12:14:36,589 [myid:] - INFO [main:ZooKeeper@709] - Initiating 
 client connection, connectString=localhost:2282 sessionTimeout=3 
 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@f2a0b8e
 Exception in thread main java.io.IOException: Couldn't instantiate 
 org.apache.zookeeper.ClientCnxnSocketNetty
 Can you tell me what I am doing wrong here?
 Very new to Zookeeper. 
 Thanks!
 Reply



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2220) Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty

2015-06-24 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599946#comment-14599946
 ] 

Hongchao Deng commented on ZOOKEEPER-2220:
--

{code}
Caused by: java.lang.ClassNotFoundException: 
org.apache.zookeeper.ClientCnxnSocketNetty
{code}

The version you use is not uptodate.

 Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNetty
 ---

 Key: ZOOKEEPER-2220
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2220
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.5.0
 Environment: Alpha
Reporter: rupa mogali

 I am trying to test SSL connectivity between client and server following the 
 instructions in the following page:
 https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
 But, I get the following when trying to connect to server from client..
 2015-06-24 12:14:36,589 [myid:] - INFO [main:ZooKeeper@709] - Initiating 
 client connection, connectString=localhost:2282 sessionTimeout=3 
 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@f2a0b8e
 Exception in thread main java.io.IOException: Couldn't instantiate 
 org.apache.zookeeper.ClientCnxnSocketNetty
 Can you tell me what I am doing wrong here?
 Very new to Zookeeper. 
 Thanks!
 Reply



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-602) log all exceptions not caught by ZK threads

2015-06-18 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592272#comment-14592272
 ] 

Hongchao Deng commented on ZOOKEEPER-602:
-

+1
Thanks Rakesh and Raul!

 log all exceptions not caught by ZK threads
 ---

 Key: ZOOKEEPER-602
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-602
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.2.1
Reporter: Patrick Hunt
Assignee: Rakesh R
Priority: Blocker
 Fix For: 3.4.7, 3.5.0

 Attachments: ZOOKEEPER-602-br3-4.patch, ZOOKEEPER-602.patch, 
 ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, 
 ZOOKEEPER-602.patch, ZOOKEEPER-602.patch, ZOOKEEPER-602.patch


 the java code should add a ThreadGroup exception handler that logs at ERROR 
 level any uncaught exceptions thrown by Thread run methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2214) Findbugs warning: LearnerHandler.packetToString Dead store to local variable

2015-06-11 Thread Hongchao Deng (JIRA)

Hongchao Deng created ZOOKEEPER-2214:


 Summary: Findbugs warning: LearnerHandler.packetToString Dead 
store to local variable
 Key: ZOOKEEPER-2214
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2214
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Hongchao Deng
Assignee: Hongchao Deng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2214) Findbugs warning: LearnerHandler.packetToString Dead store to local variable

2015-06-11 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2214:
-
Attachment: ZOOKEEPER-2214.patch

 Findbugs warning: LearnerHandler.packetToString Dead store to local variable
 

 Key: ZOOKEEPER-2214
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2214
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Hongchao Deng
Assignee: Hongchao Deng
Priority: Minor
 Attachments: ZOOKEEPER-2214.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2213:
-
Attachment: ZOOKEEPER-2213.patch

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582132#comment-14582132
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

Hi [~rgs],

Thanks for the suggestion. I have created ZOOKEEPER-2214 to fix the findbugs 
warning.
The latest patch cleans up that part out.

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582167#comment-14582167
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

Thanks for the review.

I will submit a patch for 3.4 branch shortly.

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582199#comment-14582199
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

I wonder if we should add validation to OpCode.check too. I thought we might 
have missed that. I will add the check too.

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2213:
-
Attachment: ZOOKEEPER-2213-branch34.patch

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213-branch34.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2213:
-
Attachment: ZOOKEEPER-2213.patch

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213-branch34.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582210#comment-14582210
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

Latest patch added validation to OpCode.check too.

Also submitted patch for branch-3.4

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213-branch34.patch, ZOOKEEPER-2213.patch, 
 ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.

2015-06-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580671#comment-14580671
 ] 

Hongchao Deng commented on ZOOKEEPER-1000:
--

Yes.

I'm currently working on server-server as well as client-server which can be 
backported onto 3.4 branch. It took some time though.

 Provide SSL in zookeeper to be able to run cross colos.
 ---

 Key: ZOOKEEPER-1000
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 3.5.2, 3.6.0


 This jira is to track SSL for zookeeper. The inter zookeeper server 
 communication and the client to server communication should be over ssl so 
 that zookeeper can be deployed over WAN's. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580691#comment-14580691
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

It seems that ZK java client does a lot of checking locally before sending the 
packets to server:
https://github.com/apache/zookeeper/blob/26e8dd6e90726997a37965ef469e37a96ef7085f/src/java/main/org/apache/zookeeper/common/PathUtils.java#L43

As a result, if the server receives any kind of wrong path, it breaks the 
assumption:
https://github.com/apache/zookeeper/blob/26e8dd6e90726997a37965ef469e37a96ef7085f/src/java/main/org/apache/zookeeper/common/PathTrie.java#L258-L260

Such a user error shouldn't break server down. We can either return an error to 
client or just close the connection. Let me think about it more.

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Priority: Blocker

 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580937#comment-14580937
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

There is one more thing I'm not sure.

I thought SetData should return a NoNodeException but it didn't. That's because 
datatree treats empty string also as the root /.

https://github.com/apache/zookeeper/blob/71401b4842b0486716f96d9ea3060d4fba65be96/src/java/main/org/apache/zookeeper/server/DataTree.java#L292

There is inconsistent assumption because path checking thinks that empty string 
is invalid..

Anyway, I agree with Raul that to fix this we only need to add validatePath() 
for SetData and SetACL. It's more stable to add the checking.



 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Priority: Blocker

 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2213:
-
Attachment: ZOOKEEPER-2213.patch

Addressed comments and fix findbugs warning.

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581337#comment-14581337
 ] 

Hongchao Deng commented on ZOOKEEPER-2213:
--

I will come up with a patch for 3.4 branch if there is no other comment for 
current patch. Thanks! 

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch, ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2213:
-
Attachment: ZOOKEEPER-2213.patch

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (ZOOKEEPER-2213) Empty path in Set crashes server and prevents restart

2015-06-10 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng reassigned ZOOKEEPER-2213:


Assignee: Hongchao Deng

 Empty path in Set crashes server and prevents restart
 -

 Key: ZOOKEEPER-2213
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2213
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Brian Brazil
Assignee: Hongchao Deng
Priority: Blocker
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2213.patch


 See https://github.com/samuel/go-zookeeper/issues/62
 I've reproduced this on 3.4.5 with the code:
 c, _, _ := zk.Connect([]string{127.0.0.1}, time.Second)
 c.Set(, []byte{}, 0)
 This crashes a local zookeeper 3.4.5 server:
 2015-06-10 16:21:10,862 [myid:] - ERROR 
 [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting 
  
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:329)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:965)
 at 
 org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:116)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
 at 
 org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
 On restart the zookeeper server crashes out:
 2015-06-10 16:22:21,352 [myid:] - ERROR [main:ZooKeeperServerMain@54] - 
 Invalid arguments, exiting abnormally
 java.lang.IllegalArgumentException: Invalid path
 at 
 org.apache.zookeeper.common.PathTrie.findMaxPrefix(PathTrie.java:259)
 at 
 org.apache.zookeeper.server.DataTree.getMaxPrefixWithQuota(DataTree.java:634)
 at org.apache.zookeeper.server.DataTree.setData(DataTree.java:616)
 at org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:807)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:198)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
 at 
 org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
 at 
 org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2201) Network issues can cause cluster to hang due to near-deadlock

2015-06-05 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575364#comment-14575364
 ] 

Hongchao Deng commented on ZOOKEEPER-2201:
--

+1
The patch looks good!

 Network issues can cause cluster to hang due to near-deadlock
 -

 Key: ZOOKEEPER-2201
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2201
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.6
Reporter: Donny Nadolny
Assignee: Donny Nadolny
Priority: Critical
 Fix For: 3.4.7, 3.5.2

 Attachments: ZOOKEEPER-2201-branch-34.patch, ZOOKEEPER-2201.patch, 
 ZOOKEEPER-2201.patch, ZOOKEEPER-2201.patch, ZOOKEEPER-2201.patch, 
 ZOOKEEPER-2201.patch


 {{DataTree.serializeNode}} synchronizes on the {{DataNode}} it is about to 
 serialize then writes it out via {{OutputArchive.writeRecord}}, potentially 
 to a network connection. Under default linux TCP settings, a network 
 connection where the other side completely disappears will hang (blocking on 
 the {{java.net.SocketOutputStream.socketWrite0}} call) for over 15 minutes. 
 During this time, any attempt to create/delete/modify the {{DataNode}} will 
 cause the leader to hang at the beginning of the request processor chain:
 {noformat}
 ProcessThread(sid:5 cport:-1): prio=10 tid=0x026f1800 nid=0x379c 
 waiting for monitor entry [0x7fe6c2a8c000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.zookeeper.server.PrepRequestProcessor.getRecordForPath(PrepRequestProcessor.java:163)
 - waiting to lock 0xd4cd9e28 (a 
 org.apache.zookeeper.server.DataNode)
 - locked 0xd2ef81d0 (a java.util.ArrayList)
 at 
 org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:345)
 at 
 org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:534)
 at 
 org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:131)
 {noformat}
 Additionally, any attempt to send a snapshot to a follower or to disk will 
 hang.
 Because the ping packets are sent by another thread which is unaffected, 
 followers never time out and become leader, even though the cluster will make 
 no progress until either the leader is killed or the TCP connection times 
 out. This isn't exactly a deadlock since it will resolve itself eventually, 
 but as mentioned above this will take  15 minutes with the default TCP retry 
 settings in linux.
 A simple solution to this is: in {{DataTree.serializeNode}} we can take a 
 copy of the contents of the {{DataNode}} (as is done with its children) in 
 the synchronized block, then call {{writeRecord}} with the copy of the 
 {{DataNode}} outside of the original {{DataNode}} synchronized block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-06-04 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573365#comment-14573365
 ] 

Hongchao Deng commented on ZOOKEEPER-2163:
--

I think it is a good feature to go into 3.5 too :)

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman
 Fix For: 3.6.0

 Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, 
 zookeeper-2163.12.patch, zookeeper-2163.13.patch, zookeeper-2163.14.patch, 
 zookeeper-2163.3.patch, zookeeper-2163.5.patch, zookeeper-2163.6.patch, 
 zookeeper-2163.7.patch, zookeeper-2163.8.patch, zookeeper-2163.9.patch


 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).
 CANONICAL USAGE
 
 {code}
 while ( true) { // or some reasonable limit
 try {
 zk.create(path, ...);
 break;
 } catch ( KeeperException.NoNodeException e ) {
 try {
 zk.createContainer(containerPath, ...);
 } catch ( KeeperException.NodeExistsException ignore) {
}
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2204) LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally

2015-06-04 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573109#comment-14573109
 ] 

Hongchao Deng commented on ZOOKEEPER-2204:
--

+1
The patch looks good.

 LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally
 -

 Key: ZOOKEEPER-2204
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2204
 Project: ZooKeeper
  Issue Type: Test
Affects Versions: 3.5.0
Reporter: Donny Nadolny
Assignee: Donny Nadolny
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2204.patch, ZOOKEEPER-2204.patch


 The {{LearnerSnapshotThrottler}} will only allow 2 concurrent snapshots to be 
 taken, and if there are already 2 snapshots in progress it will wait up to 
 200ms for one to complete. This isn't enough time for 
 {{testHighContentionWithTimeout}} to consistently pass - on a cold JVM 
 running just the one test I was able to get it to fail 3 times in around 50 
 runs. This 200ms timeout will be hit if there is a delay between a thread 
 calling {{LearnerSnapshot snap = throttler.beginSnapshot(false);}} and 
 {{throttler.endSnapshot();}}.
 This also erroneously fails on the build server, see 
 https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2747/testReport/org.apache.zookeeper.server.quorum/LearnerSnapshotThrottlerTest/testHighContentionWithTimeout/
  for an example.
 I have bumped the timeout up to 5 seconds (which should be more than enough 
 for warmup / gc pauses), as well as added logging to the {{catch (Exception 
 e)}} block to assist in debugging any future issues.
 An alternate approach would be to separate out results gathered from the 
 threads, because although we only record true/false there are really three 
 outcomes:
 1. The {{snapshotNumber}} was = 2, meaning the individual call operated 
 correctly
 2. The {{snapshotNumber}} was  2, meaning the test should definitely fail
 3. We were unable to snapshot in the time given, so we can't determine if we 
 should fail or pass (although if we have enough successes from #1 with no 
 failures from #2 maybe we would pass the test anyway).
 Bumping up the timeout is easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-1546) Unable to load database on disk when restarting after node freeze

2015-06-04 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573389#comment-14573389
 ] 

Hongchao Deng commented on ZOOKEEPER-1546:
--

Is this JIRA related to ZOOKEEPER-1573? I think they are duplicate.

 Unable to load database on disk when restarting after node freeze
 ---

 Key: ZOOKEEPER-1546
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1546
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.5
Reporter: Erik Forsberg

 One of my zookeeper servers in a quorum of 3 froze (probably due to 
 underlying hardware problems). When restarting, zookeeper fails to start with 
 the following in zookeeper.log:
 {noformat}
 2012-09-04 09:02:35,300 - INFO  [main:QuorumPeerConfig@90] - Reading 
 configuration from: /etc/zookeeper/zoo.cfg
 2012-09-04 09:02:35,316 - INFO  [main:QuorumPeerConfig@310] - Defaulting to 
 majority quorums
 2012-09-04 09:02:35,333 - INFO  [main:QuorumPeerMain@119] - Starting quorum 
 peer
 2012-09-04 09:02:35,358 - INFO  [main:NIOServerCnxn$Factory@143] - binding to 
 port 0.0.0.0/0.0.0.0:2181
 2012-09-04 09:02:35,379 - INFO  [main:QuorumPeer@819] - tickTime set to 2000
 2012-09-04 09:02:35,380 - INFO  [main:QuorumPeer@830] - minSessionTimeout set 
 to -1
 2012-09-04 09:02:35,380 - INFO  [main:QuorumPeer@841] - maxSessionTimeout set 
 to -1
 2012-09-04 09:02:35,386 - INFO  [main:QuorumPeer@856] - initLimit set to 10
 2012-09-04 09:02:35,523 - INFO  [main:FileSnap@82] - Reading snapshot 
 /var/zookeeper/version-2/snapshot.500017240
 2012-09-04 09:02:38,944 - ERROR [main:FileTxnSnapLog@226] - Failed to 
 increment parent cversion for: 
 /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
 at 
 org.apache.zookeeper.server.DataTree.incrementCversion(DataTree.java:1218)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:224)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
 2012-09-04 09:02:38,945 - FATAL [main:QuorumPeer@400] - Unable to load 
 database on disk
 java.io.IOException: Failed to process transaction type: 2 error: 
 KeeperErrorCode = NoNode for 
 /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
 2012-09-04 09:02:38,946 - FATAL [main:QuorumPeerMain@87] - Unexpected 
 exception, exiting abnormally
 java.lang.RuntimeException: Unable to run quorum server 
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
 Caused by: java.io.IOException: Failed to process transaction type: 2 error: 
 KeeperErrorCode = NoNode for 
 /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
 ... 3 more
 {noformat}
 Removing data from /var/zookeeper/version-2 then restart seems to fix the

[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-06-03 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2189:
-
Description: 
This was original JIRA of ZOOKEEPER-2203. For project management reason, all 
the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is 
linked to ZOOKEEPER-2098.

==

This sequence leads the ensemble to a split-brain state:
 * Start server 1 (config=1:participant, 2:participant, 3:participant)
 * Start server 2 (config=1:participant, 2:participant, 3:participant)
 * 1 and 2 believe 2 is the leader
 * Start server 3 (config=1:observer, 2:observer, 3:participant)
 * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader

Such a split-brain ensemble is very unstable.
Znodes can be lost easily:
 * Create some znodes on 2
 * Restart 1 and 2
 * 1, 2 and 3 can think 3 is the leader
 * znodes created on 2 are lost, as 1 and 2 sync with 3


I consider this behavior as a bug and that ZK should fail gracefully if a 
participant is listed as an observer in the config.

In current implementation, ZK cannot detect such an invalid config, as 
FastLeaderElection.sendNotification() sends notifications to only voting 
members and hence there is no message from observers(1 and 2) to the new voter 
(3).
I think FastLeaderElection.sendNotification() should send notifications to all 
the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify 
acks.

Any thoughts?

  was:
This was original JIRA of ZOOKEEPER-2203. For project management reason, all 
the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is 
linked to ZOOKEEPER-2098.

This sequence leads the ensemble to a split-brain state:
 * Start server 1 (config=1:participant, 2:participant, 3:participant)
 * Start server 2 (config=1:participant, 2:participant, 3:participant)
 * 1 and 2 believe 2 is the leader
 * Start server 3 (config=1:observer, 2:observer, 3:participant)
 * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader

Such a split-brain ensemble is very unstable.
Znodes can be lost easily:
 * Create some znodes on 2
 * Restart 1 and 2
 * 1, 2 and 3 can think 3 is the leader
 * znodes created on 2 are lost, as 1 and 2 sync with 3


I consider this behavior as a bug and that ZK should fail gracefully if a 
participant is listed as an observer in the config.

In current implementation, ZK cannot detect such an invalid config, as 
FastLeaderElection.sendNotification() sends notifications to only voting 
members and hence there is no message from observers(1 and 2) to the new voter 
(3).
I think FastLeaderElection.sendNotification() should send notifications to all 
the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify 
acks.

Any thoughts?


 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This was original JIRA of ZOOKEEPER-2203. For project management reason, all 
 the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is 
 linked to ZOOKEEPER-2098.
 ==
 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-06-03 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2189:
-
Description: 
This was

This sequence leads the ensemble to a split-brain state:
 * Start server 1 (config=1:participant, 2:participant, 3:participant)
 * Start server 2 (config=1:participant, 2:participant, 3:participant)
 * 1 and 2 believe 2 is the leader
 * Start server 3 (config=1:observer, 2:observer, 3:participant)
 * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader

Such a split-brain ensemble is very unstable.
Znodes can be lost easily:
 * Create some znodes on 2
 * Restart 1 and 2
 * 1, 2 and 3 can think 3 is the leader
 * znodes created on 2 are lost, as 1 and 2 sync with 3


I consider this behavior as a bug and that ZK should fail gracefully if a 
participant is listed as an observer in the config.

In current implementation, ZK cannot detect such an invalid config, as 
FastLeaderElection.sendNotification() sends notifications to only voting 
members and hence there is no message from observers(1 and 2) to the new voter 
(3).
I think FastLeaderElection.sendNotification() should send notifications to all 
the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify 
acks.

Any thoughts?

  was:
This sequence leads the ensemble to a split-brain state:
 * Start server 1 (config=1:participant, 2:participant, 3:participant)
 * Start server 2 (config=1:participant, 2:participant, 3:participant)
 * 1 and 2 believe 2 is the leader
 * Start server 3 (config=1:observer, 2:observer, 3:participant)
 * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader

Such a split-brain ensemble is very unstable.
Znodes can be lost easily:
 * Create some znodes on 2
 * Restart 1 and 2
 * 1, 2 and 3 can think 3 is the leader
 * znodes created on 2 are lost, as 1 and 2 sync with 3


I consider this behavior as a bug and that ZK should fail gracefully if a 
participant is listed as an observer in the config.

In current implementation, ZK cannot detect such an invalid config, as 
FastLeaderElection.sendNotification() sends notifications to only voting 
members and hence there is no message from observers(1 and 2) to the new voter 
(3).
I think FastLeaderElection.sendNotification() should send notifications to all 
the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify 
acks.

Any thoughts?


 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This was
 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-06-03 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2189:
-
Summary: QuorumCnxManager: use BufferedOutputStream for initial msg  (was: 
multiple leaders can be elected when configs conflict)

 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2189) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-06-03 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2189:
-
Description: 
This was original JIRA of ZOOKEEPER-2203. For project management reason, all 
the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is 
linked to ZOOKEEPER-2098.

This sequence leads the ensemble to a split-brain state:
 * Start server 1 (config=1:participant, 2:participant, 3:participant)
 * Start server 2 (config=1:participant, 2:participant, 3:participant)
 * 1 and 2 believe 2 is the leader
 * Start server 3 (config=1:observer, 2:observer, 3:participant)
 * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader

Such a split-brain ensemble is very unstable.
Znodes can be lost easily:
 * Create some znodes on 2
 * Restart 1 and 2
 * 1, 2 and 3 can think 3 is the leader
 * znodes created on 2 are lost, as 1 and 2 sync with 3


I consider this behavior as a bug and that ZK should fail gracefully if a 
participant is listed as an observer in the config.

In current implementation, ZK cannot detect such an invalid config, as 
FastLeaderElection.sendNotification() sends notifications to only voting 
members and hence there is no message from observers(1 and 2) to the new voter 
(3).
I think FastLeaderElection.sendNotification() should send notifications to all 
the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify 
acks.

Any thoughts?

  was:
This was

This sequence leads the ensemble to a split-brain state:
 * Start server 1 (config=1:participant, 2:participant, 3:participant)
 * Start server 2 (config=1:participant, 2:participant, 3:participant)
 * 1 and 2 believe 2 is the leader
 * Start server 3 (config=1:observer, 2:observer, 3:participant)
 * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader

Such a split-brain ensemble is very unstable.
Znodes can be lost easily:
 * Create some znodes on 2
 * Restart 1 and 2
 * 1, 2 and 3 can think 3 is the leader
 * znodes created on 2 are lost, as 1 and 2 sync with 3


I consider this behavior as a bug and that ZK should fail gracefully if a 
participant is listed as an observer in the config.

In current implementation, ZK cannot detect such an invalid config, as 
FastLeaderElection.sendNotification() sends notifications to only voting 
members and hence there is no message from observers(1 and 2) to the new voter 
(3).
I think FastLeaderElection.sendNotification() should send notifications to all 
the members and FastLeaderElection.Messenger.WorkerReceiver.run() should verify 
acks.

Any thoughts?


 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This was original JIRA of ZOOKEEPER-2203. For project management reason, all 
 the issues and related discussion are moved to ZOOKEEPER-2203. This JIRA is 
 linked to ZOOKEEPER-2098.
 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict

2015-06-02 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570194#comment-14570194
 ] 

Hongchao Deng commented on ZOOKEEPER-2189:
--

Hi,

Thanks for your understanding. I really appreciate your help.

All you have to do is to click More - Clone and clone this JIRA to a new 
one.

Then I can take it myself and add a comment to tell people that discussion here 
belongs to that JIRA :)
Thanks for reporting the issue and contributing to the project too!

 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2203) multiple leaders can be elected when configs conflict

2015-06-02 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2203:
-
Summary: multiple leaders can be elected when configs conflict  (was: CLONE 
- multiple leaders can be elected when configs conflict)

 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2203
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2203
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict

2015-06-01 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567512#comment-14567512
 ] 

Hongchao Deng commented on ZOOKEEPER-2189:
--

I didn't offer anything... You might misunderstand..

What I meant is I committed 2098, but I wrote the message to be 2198 (not 
committed 2198).
I wonder if you can give this JIRA to me, I will replicate 2098 to this JIRA 
and mark it duplicated. You can create a new JIRA and move discussion there.

I would really appreciate it if you can help me. :)

 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict

2015-05-30 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566120#comment-14566120
 ] 

Hongchao Deng commented on ZOOKEEPER-2189:
--

Hi [~suda].

I committed ZOOKEEPER-2098 but mistakenly wrote the commit message to be 
ZOOKEEPER-2189. Would you mind to open another JIRA and grant this JIRA number 
to me. Thanks!


 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}

2015-05-29 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565115#comment-14565115
 ] 

Hongchao Deng commented on ZOOKEEPER-2187:
--

+1, Thanks [~rgs]. I will commit this shortly.

 remove duplicated code between CreateRequest{,2}
 

 Key: ZOOKEEPER-2187
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client, server
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
Priority: Minor
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2187.patch


 To avoid cargo culting and reducing duplicated code we can merge most of 
 CreateRequest  CreateRequest2 given that only the Response object is 
 actually different.
 This will improve readability of the code plus make it less confusing for 
 people adding new opcodes in the future (i.e.: copying a request definition 
 vs reusing what's already there, etc.). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}

2015-05-29 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2187:
-
Fix Version/s: (was: 3.5.2)
   3.5.1

 remove duplicated code between CreateRequest{,2}
 

 Key: ZOOKEEPER-2187
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client, server
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2187.patch


 To avoid cargo culting and reducing duplicated code we can merge most of 
 CreateRequest  CreateRequest2 given that only the Response object is 
 actually different.
 This will improve readability of the code plus make it less confusing for 
 people adding new opcodes in the future (i.e.: copying a request definition 
 vs reusing what's already there, etc.). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}

2015-05-29 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2187:
-
Release Note:   (was: trunk:
https://github.com/apache/zookeeper/commit/652a53618cf93165b67dce4816e4831d10393a03

branch-3.5:
https://github.com/apache/zookeeper/commit/a9556fa88a441882624a0c6e57c442662514e94a)

 remove duplicated code between CreateRequest{,2}
 

 Key: ZOOKEEPER-2187
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client, server
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2187.patch


 To avoid cargo culting and reducing duplicated code we can merge most of 
 CreateRequest  CreateRequest2 given that only the Response object is 
 actually different.
 This will improve readability of the code plus make it less confusing for 
 people adding new opcodes in the future (i.e.: copying a request definition 
 vs reusing what's already there, etc.). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}

2015-05-29 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565131#comment-14565131
 ] 

Hongchao Deng commented on ZOOKEEPER-2187:
--

Commited.

trunk:
https://github.com/apache/zookeeper/commit/652a53618cf93165b67dce4816e4831d10393a03

branch-3.5:
https://github.com/apache/zookeeper/commit/a9556fa88a441882624a0c6e57c442662514e94a

 remove duplicated code between CreateRequest{,2}
 

 Key: ZOOKEEPER-2187
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client, server
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2187.patch


 To avoid cargo culting and reducing duplicated code we can merge most of 
 CreateRequest  CreateRequest2 given that only the Response object is 
 actually different.
 This will improve readability of the code plus make it less confusing for 
 people adding new opcodes in the future (i.e.: copying a request definition 
 vs reusing what's already there, etc.). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-05-29 Thread Hongchao Deng (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565337#comment-14565337
]

Hongchao Deng commented on ZOOKEEPER-2098:
--

+1, LGTM.

Thanks [~rgs], I will merge this shortly.

QuorumCnxManager: use BufferedOutputStream for initial msg
--

Key: ZOOKEEPER-2098
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098
Project: ZooKeeper
Issue Type: Improvement
Components: quorum, server
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
Fix For: 3.5.2, 3.6.0

Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch

Whilst writing fle-dump (a tool like
[zk-dump|https://github.com/twitter/zktraffic/], but to dump
FastLeaderElection messages), I noticed that QCM is using DataOutputStream
(which doesn't buffer) directly.
So all calls to write() are written immediately to the network, which means
simple messaages like two participants exchanging Votes can take a couple
RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs).
The solution is to use BufferedOutputStream for the initial negotiation
between members of the cluster. Note that there are other places were
suboptimal (but not entirely unbuffered) writes to the network still exist.
I'll get those in separate tickets.
After using BufferedOutputStream we get only 1 RTT for the initial message,
so elections time for for participants to join a cluster is reduced.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-05-29 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2098:
-
Fix Version/s: (was: 3.5.2)
   3.5.1

 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2098
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch


 Whilst writing fle-dump (a tool like 
 [zk-dump|https://github.com/twitter/zktraffic/], but to dump 
 FastLeaderElection messages), I noticed that QCM is using DataOutputStream 
 (which doesn't buffer) directly.
 So all calls to write() are written immediately to the network, which means 
 simple messaages like two participants exchanging Votes can take a couple 
 RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs).
 The solution is to use BufferedOutputStream for the initial negotiation 
 between members of the cluster. Note that there are other places were 
 suboptimal (but not entirely unbuffered) writes to the network still exist. 
 I'll get those in separate tickets.
 After using BufferedOutputStream we get only 1 RTT for the initial message, 
 so elections  time for for participants to join a cluster is reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg

2015-05-29 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565484#comment-14565484
 ] 

Hongchao Deng commented on ZOOKEEPER-2098:
--

Committed:

trunk:
https://github.com/apache/zookeeper/commit/0cbc8eee21bda31184d4e7f11100bc0bb300f376

branch-3.5:
https://github.com/apache/zookeeper/commit/f1f7b3714c5c36f1408f5fe0f0d0b3da305b1023

 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2098
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch


 Whilst writing fle-dump (a tool like 
 [zk-dump|https://github.com/twitter/zktraffic/], but to dump 
 FastLeaderElection messages), I noticed that QCM is using DataOutputStream 
 (which doesn't buffer) directly.
 So all calls to write() are written immediately to the network, which means 
 simple messaages like two participants exchanging Votes can take a couple 
 RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs).
 The solution is to use BufferedOutputStream for the initial negotiation 
 between members of the cluster. Note that there are other places were 
 suboptimal (but not entirely unbuffered) writes to the network still exist. 
 I'll get those in separate tickets.
 After using BufferedOutputStream we get only 1 RTT for the initial message, 
 so elections  time for for participants to join a cluster is reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2179) Typo in Watcher.java

2015-05-28 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2179:
-
Issue Type: Improvement  (was: Bug)

 Typo in Watcher.java
 

 Key: ZOOKEEPER-2179
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2179
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5, 3.5.0
Reporter: Eunchan Kim
Priority: Trivial
 Fix For: 3.4.5, 3.5.0

 Attachments: ZOOKEEPER-2179.patch


 at zookeeper/src/java/main/org/apache/zookeeper/Watcher.java,
  * implement. A ZooKeeper client will get various events from the ZooKeepr
 should be fixed to 
  * implement. A ZooKeeper client will get various events from the ZooKeeper.
 (Zookeepr - Zookeeper)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2015-05-28 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564191#comment-14564191
 ] 

Hongchao Deng commented on ZOOKEEPER-832:
-

I mean expire the session on client side. It's client who's not consistent with 
the view. We should fix it on client (by crashing it) not server.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2015-05-28 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564189#comment-14564189
 ] 

Hongchao Deng commented on ZOOKEEPER-832:
-

I mean expire the session on client side. It's client who's not consistent with 
the view. We should fix it on client (by crashing it) not server.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2015-05-28 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564190#comment-14564190
 ] 

Hongchao Deng commented on ZOOKEEPER-832:
-

I mean expire the session on client side. It's client who's not consistent with 
the view. We should fix it on client (by crashing it) not server.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-832) Invalid session id causes infinite loop during automatic reconnect

2015-05-27 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561478#comment-14561478
 ] 

Hongchao Deng commented on ZOOKEEPER-832:
-

Hi German,

The test is a known flaky test. 

Regarding this issue, I thought this is a client issue because it has a history 
that's not in the server. The right thing to do is to expire/close the client.

 Invalid session id causes infinite loop during automatic reconnect
 --

 Key: ZOOKEEPER-832
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-832
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5, 3.5.0
 Environment: All
Reporter: Ryan Holmes
Assignee: Germán Blanco
Priority: Blocker
 Fix For: 3.4.7, 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, 
 ZOOKEEPER-832.patch, ZOOKEEPER-832.patch, ZOOKEEPER-832.patch


 Steps to reproduce:
 1.) Connect to a standalone server using the Java client.
 2.) Stop the server.
 3.) Delete the contents of the data directory (i.e. the persisted session 
 data).
 4.) Start the server.
 The client now automatically tries to reconnect but the server refuses the 
 connection because the session id is invalid. The client and server are now 
 in an infinite loop of attempted and rejected connections. While this 
 situation represents a catastrophic failure and the current behavior is not 
 incorrect, it appears that there is no way to detect this situation on the 
 client and therefore no way to recover.
 The suggested improvement is to send an event to the default watcher 
 indicating that the current state is session invalid, similar to how the 
 session expired state is handled.
 Server log output (repeats indefinitely):
 2010-08-05 11:48:08,283 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - 
 Accepted socket connection from /127.0.0.1:63292
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing 
 session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last 
 zxid is 0x0 client must try another server
 2010-08-05 11:48:08,284 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed 
 socket connection for client /127.0.0.1:63292 (no session established for 
 client)
 Client log output (repeats indefinitely):
 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - 
 Opening socket connection to server localhost/127.0.0.1:2181
 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 
 0x12a3ae4e893000a for server null, unexpected error, closing socket 
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring 
 exception during shutdown input
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
   at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring 
 exception during shutdown output
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
   at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
   at 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-24 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557932#comment-14557932
 ] 

Hongchao Deng commented on ZOOKEEPER-2101:
--

It's just my personal opinion. Swallowing a system failure exception doesn't 
look like a good choice. I usually prefer to let the system crash if not 
recoverable.

I'm not sure why in Leader and ZKDatabase it did that. So I will leave it to 
others to comment.

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive

[jira] [Commented] (ZOOKEEPER-2101) Transaction larger than max buffer of jute makes zookeeper unavailable

2015-05-21 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554808#comment-14554808
 ] 

Hongchao Deng commented on ZOOKEEPER-2101:
--

Another question:

in SerializeUtils:
{code}
serializeRequest():
   catch (IOException e) {
LOG.error(This really should be impossible, e);
{code}

If such an unexpected exception happens, should the exception goes up and let 
server fail?

 Transaction larger than max buffer of jute makes zookeeper unavailable
 --

 Key: ZOOKEEPER-2101
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2101
 Project: ZooKeeper
  Issue Type: Bug
  Components: jute
Affects Versions: 3.4.4
Reporter: Liu Shaohui
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2101-v1.diff, ZOOKEEPER-2101-v2.diff, 
 ZOOKEEPER-2101-v3.diff, ZOOKEEPER-2101-v4.diff, ZOOKEEPER-2101-v5.diff, 
 ZOOKEEPER-2101-v6.diff, ZOOKEEPER-2101-v7.diff, test.diff


 *Problem*
 For multi operation, PrepRequestProcessor may produce a large transaction 
 whose size may be larger than the max buffer size of jute. There is check of 
 buffer size in readBuffer method  of BinaryInputArchive, but no check in 
 writeBuffer method  of BinaryOutputArchive, which will cause that 
 1, Leader can sync transaction to txn log and send the large transaction to 
 the followers, but the followers failed to read the transaction and can't 
 sync with leader.
 {code}
 2015-01-04,12:42:26,474 WARN org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] Exception when following the leader
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
 at 
 org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
 at 
 org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
 at 
 org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
 2015-01-04,12:42:26,475 INFO org.apache.zookeeper.server.quorum.Learner: 
 [myid:2] shutdown called
 java.lang.Exception: shutdown Follower
 at 
 org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
 {code}
 2, The leader lose all followers, which trigger the leader election. The old 
 leader will become leader again for it has up-to-date data.
 {code}
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutting down
 2015-01-04,12:42:28,502 INFO org.apache.zookeeper.server.quorum.Leader: 
 [myid:3] Shutdown called
 java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 2
 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
 at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)
 {code}
 3, The leader can not load the transaction from the txn log for the length of 
 data is larger than the max buffer of jute.
 {code}
 2015-01-04,12:42:31,282 ERROR org.apache.zookeeper.server.quorum.QuorumPeer: 
 [myid:3] Unable to load database on disk
 java.io.IOException: Unreasonable length = 2054758
 at 
 org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
 at 
 org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
 at 
 org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:546)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:690)
 at 
 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:737)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
 {code}
 The zookeeper service will be unavailable until we enlarge the jute.maxbuffer 
 and restart zookeeper hbase cluster.
 *Solution*
 Add buffer size check in BinaryOutputArchive to avoid large transaction be 
 written to log and sent to followers.
 But I am not sure if there are side-effects of throwing an IOException in 
 BinaryOutputArchive  and

[jira] [Commented] (ZOOKEEPER-2191) Continue supporting prior Ant versions that don't implement the threads attribute for the JUnit task.

2015-05-21 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555042#comment-14555042
 ] 

Hongchao Deng commented on ZOOKEEPER-2191:
--

+1
Thanks for your work, [~cnauroth].

 Continue supporting prior Ant versions that don't implement the threads 
 attribute for the JUnit task.
 -

 Key: ZOOKEEPER-2191
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2191
 Project: ZooKeeper
  Issue Type: Improvement
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2191.001.patch, ZOOKEEPER-2191.002.patch


 ZOOKEEPER-2183 introduced usage of the threads attribute on the junit task 
 call in build.xml to speed up test execution.  This attribute is only 
 available since Ant 1.9.4.  However, we can continue to support older Ant 
 versions by calling the antversion task and dispatching to a clone of our 
 junit task call that doesn't use the threads attribute.  Users of older Ant 
 versions will get the slower single-process test execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-14 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2190:
-
Affects Version/s: 3.5.0

 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as 
 joining servers
 ---

 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Hongchao Deng
Assignee: Hongchao Deng
 Attachments: ZOOKEEPER-2190.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-14 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2190:
-
Affects Version/s: (was: 3.5.0)

 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as 
 joining servers
 ---

 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Hongchao Deng
Assignee: Hongchao Deng
 Attachments: ZOOKEEPER-2190.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Concurrent Testing Processes and Port Assignments

2015-05-14 Thread Hongchao Deng (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543995#comment-14543995
]

Hongchao Deng commented on ZOOKEEPER-2183:
--

Made mistake on CHANGES.txt when switching branches...

Committed another fix to branch-3.5:
https://github.com/apache/zookeeper/commit/419756a3ff3be986d3bbcef12ebdfba5c1b68412

Feeling guilty for it. Will be careful on future commits.

Concurrent Testing Processes and Port Assignments
-

Key: ZOOKEEPER-2183
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
Project: ZooKeeper
Issue Type: Improvement
Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Fix For: 3.5.1, 3.6.0

Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch,
ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch,
threads-change.patch

Tests use {{PortAssignment#unique}} for assignment of the ports to bind
during tests. Currently, this method works by using a monotonically
increasing counter from a static starting point. Generally, this is
sufficient to achieve uniqueness within a single JVM process, but it does not
achieve uniqueness across multiple processes on the same host. This can
cause tests to get bind errors if there are multiple pre-commit jobs running
concurrently on the same Jenkins host. This also prevents running tests in
parallel to improve the speed of pre-commit runs.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2183) Concurrent Testing Processes and Port Assignments

2015-05-14 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2183:
-
Summary: Concurrent Testing Processes and Port Assignments  (was: 
Concurrent Testing Processes and Port Assignments.)

 Concurrent Testing Processes and Port Assignments
 -

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2183) Concurrent Testing Processes and Port Assignments.

2015-05-14 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2183:
-
Summary: Concurrent Testing Processes and Port Assignments.  (was: Change 
test port assignments to improve uniqueness of ports for multiple concurrent 
test processes on the same host.)

 Concurrent Testing Processes and Port Assignments.
 --

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-14 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544240#comment-14544240
 ] 

Hongchao Deng commented on ZOOKEEPER-2190:
--

It's good to go :)

 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as 
 joining servers
 ---

 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Hongchao Deng
Assignee: Hongchao Deng
 Attachments: ZOOKEEPER-2190.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-901) Redesign of QuorumCnxManager

2015-05-14 Thread Hongchao Deng (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544480#comment-14544480
]

Hongchao Deng commented on ZOOKEEPER-901:
-

Sure. Thanks for it.
I'm still catching up. It's great that we can discuss design and problems here.

Redesign of QuorumCnxManager

Key: ZOOKEEPER-901
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901
Project: ZooKeeper
Issue Type: Improvement
Components: leaderElection
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Hongchao Deng
Fix For: 3.6.0

QuorumCnxManager manages TCP connections between ZooKeeper servers for leader
election in replicated mode. We have identified over time a couple of
deficiencies that we would like to fix. Unfortunately, fixing these issues
requires a little more than just generating a couple of small patches. More
specifically, I propose, based on previous discussions with the community,
that we reimplement QuorumCnxManager so that we achieve the following:
# Establishing connections should not be a blocking operation, and perhaps
even more important, it shouldn't prevent the establishment of connections
with other servers;
# Using a pair of threads per connection is a little messy, and we have seen
issues over time due to the creation and destruction of such threads. A more
reasonable approach is to have a single thread and a selector.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542947#comment-14542947
]

Hongchao Deng commented on ZOOKEEPER-2183:
--

I prefer 8.
The threads is per class. So the test report is still clear. What's more, if
it's constant failure developers can reproduce it; if flaky, I doubt the logs
help...

Change test port assignments to improve uniqueness of ports for multiple
concurrent test processes on the same host.

Key: ZOOKEEPER-2183
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
Project: ZooKeeper
Issue Type: Improvement
Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch,
ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch,
threads-change.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-13 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2190:
-
Attachment: ZOOKEEPER-2190.patch

 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as 
 joining servers
 ---

 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Hongchao Deng
Assignee: Hongchao Deng
 Attachments: ZOOKEEPER-2190.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543139#comment-14543139
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

I separate it into another JIRA: ZOOKEEPER-2190
Let's go fix it there.

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-13 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542991#comment-14542991
 ] 

Hongchao Deng commented on ZOOKEEPER-2186:
--

+1
The latest patch looks really good.
Thanks for the clean patch, Raul!

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch, ZOOKEEPER-2186.patch, 
 ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2183:
-
Fix Version/s: 3.6.0
   3.5.1

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2183:
-
Affects Version/s: 3.5.0

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-13 Thread Hongchao Deng (JIRA)

Hongchao Deng created ZOOKEEPER-2190:


 Summary: In StandaloneDisabledTest, testReconfig() shouldn't take 
leaving servers as joining servers
 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Hongchao Deng
Assignee: Hongchao Deng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2190) In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as joining servers

2015-05-13 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543120#comment-14543120
 ] 

Hongchao Deng commented on ZOOKEEPER-2190:
--

[~michim] [~shralex]

I kinda found the logic isn't right as the bug shows up in ZK-2183. Can you 
take a look?

 In StandaloneDisabledTest, testReconfig() shouldn't take leaving servers as 
 joining servers
 ---

 Key: ZOOKEEPER-2190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2190
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Hongchao Deng
Assignee: Hongchao Deng
 Attachments: ZOOKEEPER-2190.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543079#comment-14543079
]

Hongchao Deng commented on ZOOKEEPER-2183:
--

[~rakeshr] [~rgs] [~michim] [~shralex]
Any other committer would like to review this? I will commit it once it gets
another +1. Thanks!

Change test port assignments to improve uniqueness of ports for multiple
concurrent test processes on the same host.

Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch,
ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch,
threads-change.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543118#comment-14543118
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

BTW, StandaloneDisabledTest.startSingleServerTest() fails intermittently on the 
following lines:

{code}
//reconfigure out leader and follower 1. Remaining follower
//2 should elect itself as leader and run by itself
reconfigServers.clear();
reconfigServers.add(Integer.toString(leaderId));
reconfigServers.add(Integer.toString(follower1));
testReconfig(follower2, false, reconfigServers);
{code}

I think the logic isn't correct because
{code}
ReconfigTest.testServerHasConfig(zkHandles[id], servers, null);
{code}
is testing the leaving servers as the joining servers, right? [~shralex]

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, ZOOKEEPER-2183.004.patch, ZOOKEEPER-2183.005.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2094) SSL feature on Netty

2015-05-13 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng resolved ZOOKEEPER-2094.
--
Resolution: Duplicate

 SSL feature on Netty
 

 Key: ZOOKEEPER-2094
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2094
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Ian Dimayuga
Assignee: Ian Dimayuga
 Fix For: 3.5.2, 3.6.0

 Attachments: ZOOKEEPER-2094-git-apply.patch, ZOOKEEPER-2094.patch, 
 ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, 
 ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, 
 ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, 
 ZOOKEEPER-2094.patch, ZOOKEEPER-2094.patch, test.cert, testKeyStore.jks, 
 testTrustStore.jks, testUntrustedKeyStore.jks


 Add SSL handler to Netty pipeline, and a default X509AuthenticationProvider 
 to perform authentication.
 Review board: 
 https://reviews.apache.org/r/30753/diff/#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-13 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541427#comment-14541427
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

PortAssignment seems to make assumptions on ant running time.

Do you think it's better to write the code like:
{code}
public synchronized static int unique() {
  if (!initialized) {
setupPortRange()
  }
}
{code}
A test is highly recommended for setupPortRange() too :)

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-12 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2183:
-
Attachment: threads-change.patch

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-12 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540216#comment-14540216
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

The multi-threading change is very speedy. But I wonder if the test failures 
are caused by the change of multi-threading or port assignment?

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-12 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540271#comment-14540271
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

I see. That's why we need to change the port assignment at the same time.

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-12 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541324#comment-14541324
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

Thanks Chris!

I'm reviewing the patch now. No worry about the Jenkins flood. That's what it's 
used for...

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-12 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541325#comment-14541325
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

Thanks Chris!

I'm reviewing the patch now. No worry about the Jenkins flood. That's what it's 
used for...

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch, ZOOKEEPER-2183.002.patch, 
 ZOOKEEPER-2183.003.patch, threads-change.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2183) Change test port assignments to improve uniqueness of ports for multiple concurrent test processes on the same host.

2015-05-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539253#comment-14539253
 ] 

Hongchao Deng commented on ZOOKEEPER-2183:
--

StandaloneDisabledTest.startSingleServerTest was flaky in this case -- I run 
that single test successfully in local.

Let me run again the entire test suite...

 Change test port assignments to improve uniqueness of ports for multiple 
 concurrent test processes on the same host.
 

 Key: ZOOKEEPER-2183
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2183
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2183.001.patch


 Tests use {{PortAssignment#unique}} for assignment of the ports to bind 
 during tests.  Currently, this method works by using a monotonically 
 increasing counter from a static starting point.  Generally, this is 
 sufficient to achieve uniqueness within a single JVM process, but it does not 
 achieve uniqueness across multiple processes on the same host.  This can 
 cause tests to get bind errors if there are multiple pre-commit jobs running 
 concurrently on the same Jenkins host.  This also prevents running tests in 
 parallel to improve the speed of pre-commit runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.

2015-05-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539286#comment-14539286
 ] 

Hongchao Deng commented on ZOOKEEPER-2182:
--

Thank you! My mistake to forget close it.

 Several test suites are not running during pre-commit, because their names do 
 not end with Test.
 --

 Key: ZOOKEEPER-2182
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2182.001.patch


 In build.xml, the {{junit}} task definition uses an include pattern of 
 {{\*\*/\*$\{test.category\}Test.java}}.  This is important so that we don't 
 accidentally try to run utility classes like {{PortAssignment}} or 
 {{TestableZooKeeper}} as if they were JUnit suites.  However, several test 
 suites are misnamed so that they don't satisfy this pattern, and therefore 
 pre-commit hasn't been running them.
 {{ClientRetry}}
 {{ReconfigFailureCases}}
 {{WatchEventWhenAutoReset}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-11 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538116#comment-14538116
 ] 

Hongchao Deng commented on ZOOKEEPER-2186:
--

[~rgs]
Can you open a RB for this?

I have some questions and comments to make. Thanks!

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2171) avoid reverse lookups in QuorumCnxManager

2015-05-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537381#comment-14537381
 ] 

Hongchao Deng commented on ZOOKEEPER-2171:
--

Great work folks!

Just mention one discrepancy I found in 'CHANGE.txt':
1. in branch-3.5, it shows in bug fixes
2. in trunk, it shows in improvement.

 avoid reverse lookups in QuorumCnxManager
 -

 Key: ZOOKEEPER-2171
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2171
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2171.patch, ZOOKEEPER-2171.patch


 Apparently, ZOOKEEPER-107 (via a quick git-blame look) introduced a bunch of 
 getHostName() calls in QCM. Besides the overhead, these can cause problems 
 when mixed with failing/mis-configured DNS servers.
 It would be nice to reduce them, if that doesn't affect operational 
 correctness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.

2015-05-10 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2182:
-
Affects Version/s: 3.5.0

 Several test suites are not running during pre-commit, because their names do 
 not end with Test.
 --

 Key: ZOOKEEPER-2182
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2182.001.patch


 In build.xml, the {{junit}} task definition uses an include pattern of 
 {{\*\*/\*$\{test.category\}Test.java}}.  This is important so that we don't 
 accidentally try to run utility classes like {{PortAssignment}} or 
 {{TestableZooKeeper}} as if they were JUnit suites.  However, several test 
 suites are misnamed so that they don't satisfy this pattern, and therefore 
 pre-commit hasn't been running them.
 {{ClientRetry}}
 {{ReconfigFailureCases}}
 {{WatchEventWhenAutoReset}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.

2015-05-10 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2182:
-
Fix Version/s: 3.6.0
   3.5.1

 Several test suites are not running during pre-commit, because their names do 
 not end with Test.
 --

 Key: ZOOKEEPER-2182
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2182.001.patch


 In build.xml, the {{junit}} task definition uses an include pattern of 
 {{\*\*/\*$\{test.category\}Test.java}}.  This is important so that we don't 
 accidentally try to run utility classes like {{PortAssignment}} or 
 {{TestableZooKeeper}} as if they were JUnit suites.  However, several test 
 suites are misnamed so that they don't satisfy this pattern, and therefore 
 pre-commit hasn't been running them.
 {{ClientRetry}}
 {{ReconfigFailureCases}}
 {{WatchEventWhenAutoReset}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.

2015-05-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537384#comment-14537384
 ] 

Hongchao Deng commented on ZOOKEEPER-2182:
--

Thanks for Alex's review and [~cnauroth]'s patch

 Several test suites are not running during pre-commit, because their names do 
 not end with Test.
 --

 Key: ZOOKEEPER-2182
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2182.001.patch


 In build.xml, the {{junit}} task definition uses an include pattern of 
 {{\*\*/\*$\{test.category\}Test.java}}.  This is important so that we don't 
 accidentally try to run utility classes like {{PortAssignment}} or 
 {{TestableZooKeeper}} as if they were JUnit suites.  However, several test 
 suites are misnamed so that they don't satisfy this pattern, and therefore 
 pre-commit hasn't been running them.
 {{ClientRetry}}
 {{ReconfigFailureCases}}
 {{WatchEventWhenAutoReset}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.

2015-05-10 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537383#comment-14537383
 ] 

Hongchao Deng commented on ZOOKEEPER-2182:
--

Committed:
trunk:
https://github.com/apache/zookeeper/commit/029d6299e006ca697c3d6f9953b3194a7c33bf19
branch-3.5:
https://github.com/apache/zookeeper/commit/5b06e01de19135b6fe38a947dd1238877a647e49

 Several test suites are not running during pre-commit, because their names do 
 not end with Test.
 --

 Key: ZOOKEEPER-2182
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2182.001.patch


 In build.xml, the {{junit}} task definition uses an include pattern of 
 {{\*\*/\*$\{test.category\}Test.java}}.  This is important so that we don't 
 accidentally try to run utility classes like {{PortAssignment}} or 
 {{TestableZooKeeper}} as if they were JUnit suites.  However, several test 
 suites are misnamed so that they don't satisfy this pattern, and therefore 
 pre-commit hasn't been running them.
 {{ClientRetry}}
 {{ReconfigFailureCases}}
 {{WatchEventWhenAutoReset}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-08 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535419#comment-14535419
 ] 

Hongchao Deng commented on ZOOKEEPER-2186:
--

Good catch!
I will be glad to review and commit it.

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2186) QuorumCnxManager#receiveConnection may crash with random input

2015-05-08 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536207#comment-14536207
 ] 

Hongchao Deng commented on ZOOKEEPER-2186:
--

Can you open a RB for it?

 QuorumCnxManager#receiveConnection may crash with random input
 --

 Key: ZOOKEEPER-2186
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2186
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.6, 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2186.patch


 This will allocate an arbitrarily large byte buffer (and try to read it!):
 {code}
 public boolean receiveConnection(Socket sock) {
 Long sid = null;
 ...
 sid = din.readLong();
 // next comes the #bytes in the remainder of the message  

 int num_remaining_bytes = din.readInt();
 byte[] b = new byte[num_remaining_bytes];
 // remove the remainder of the message from din   

 int num_read = din.read(b);
 {code}
 This will crash the QuorumCnxManager thread, so the cluster will keep going 
 but future elections might fail to converge (ditto for leaving/joining 
 members). 
 Patch coming up in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2182) Several test suites are not running during pre-commit, because their names do not end with Test.

2015-05-07 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1451#comment-1451
 ] 

Hongchao Deng commented on ZOOKEEPER-2182:
--

+1 for the patch.

 Several test suites are not running during pre-commit, because their names do 
 not end with Test.
 --

 Key: ZOOKEEPER-2182
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2182
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: ZOOKEEPER-2182.001.patch


 In build.xml, the {{junit}} task definition uses an include pattern of 
 {{\*\*/\*$\{test.category\}Test.java}}.  This is important so that we don't 
 accidentally try to run utility classes like {{PortAssignment}} or 
 {{TestableZooKeeper}} as if they were JUnit suites.  However, several test 
 suites are misnamed so that they don't satisfy this pattern, and therefore 
 pre-commit hasn't been running them.
 {{ClientRetry}}
 {{ReconfigFailureCases}}
 {{WatchEventWhenAutoReset}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2153) X509 Authentication Documentation

2015-05-06 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2153:
-
Attachment: ZOOKEEPER-2153.patch

 X509 Authentication Documentation
 -

 Key: ZOOKEEPER-2153
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Hongchao Deng
Assignee: Ian Dimayuga
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2153.patch, ZOOKEEPER-2153.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2153) X509 Authentication Documentation

2015-05-06 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530765#comment-14530765
 ] 

Hongchao Deng commented on ZOOKEEPER-2153:
--

The parenthesis is fixed:

trunk:
https://github.com/apache/zookeeper/commit/f45e48569b2e684378fdc56ef6bab96d3fcc0f88

branch-3.5:
https://github.com/apache/zookeeper/commit/665c5aba9bba297daa8e491ff593945ab5e69a2f

 X509 Authentication Documentation
 -

 Key: ZOOKEEPER-2153
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Hongchao Deng
Assignee: Ian Dimayuga
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2153.patch, ZOOKEEPER-2153.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2153) X509 Authentication Documentation

2015-05-05 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528838#comment-14528838
 ] 

Hongchao Deng commented on ZOOKEEPER-2153:
--

I was busy for a talk before.. Let me do the delayed commit.

Thanks [~rakeshr] for the review, [~iandi] for the work.

 X509 Authentication Documentation
 -

 Key: ZOOKEEPER-2153
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Hongchao Deng
Assignee: Ian Dimayuga
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2153.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (ZOOKEEPER-2153) X509 Authentication Documentation

2015-05-05 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng resolved ZOOKEEPER-2153.
--
Resolution: Fixed

Committed to:

trunk:
https://github.com/apache/zookeeper/commit/ea5abdb82d2e2bc4ed0559420b109da35b30bfca

branch-3.5:
https://github.com/apache/zookeeper/commit/da4d934b89fece39401230a6c26ce61715427960

 X509 Authentication Documentation
 -

 Key: ZOOKEEPER-2153
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2153
 Project: ZooKeeper
  Issue Type: Sub-task
Affects Versions: 3.5.0
Reporter: Hongchao Deng
Assignee: Ian Dimayuga
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2153.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2176) unclear error message should be info or warn

2015-05-05 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528739#comment-14528739
 ] 

Hongchao Deng commented on ZOOKEEPER-2176:
--

The patch is trivial and LGTM. +1

I will commit this shortly.

 unclear error message should be info or warn
 

 Key: ZOOKEEPER-2176
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.5.0, 3.5.1, 3.5.2
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Attachments: ZOOKEEPER-2176.patch


 Hi [~shralex],
 Looking at the CI output of ZOOKEEPER-2163 I see this:
 {noformat}
  [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR 
 [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394]
  - writeToDisk == true but configFilename == null
 {noformat}
 Though looking at QuorumPeer#setQuorumVerifier I see:
 {noformat}
 if (configFilename != null) {
 try {
 String dynamicConfigFilename = makeDynamicConfigFilename(
 qv.getVersion());
 QuorumPeerConfig.writeDynamicConfig(
 dynamicConfigFilename, qv, false);
 QuorumPeerConfig.editStaticConfig(configFilename,
 dynamicConfigFilename,
 needEraseClientInfoFromStaticConfig());
 } catch (IOException e) {
 LOG.error(Error closing file: , e.getMessage());
 }
 } else {
 LOG.error(writeToDisk == true but configFilename == null);
 }
 {noformat}
 there's no proper error handling so I guess maybe we should just make it a 
 warning? Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2176) Unclear error message should be info not error

2015-05-05 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2176:
-
Affects Version/s: (was: 3.5.2)
   (was: 3.5.1)

 Unclear error message should be info not error
 --

 Key: ZOOKEEPER-2176
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2176.patch


 Hi [~shralex],
 Looking at the CI output of ZOOKEEPER-2163 I see this:
 {noformat}
  [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR 
 [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394]
  - writeToDisk == true but configFilename == null
 {noformat}
 Though looking at QuorumPeer#setQuorumVerifier I see:
 {noformat}
 if (configFilename != null) {
 try {
 String dynamicConfigFilename = makeDynamicConfigFilename(
 qv.getVersion());
 QuorumPeerConfig.writeDynamicConfig(
 dynamicConfigFilename, qv, false);
 QuorumPeerConfig.editStaticConfig(configFilename,
 dynamicConfigFilename,
 needEraseClientInfoFromStaticConfig());
 } catch (IOException e) {
 LOG.error(Error closing file: , e.getMessage());
 }
 } else {
 LOG.error(writeToDisk == true but configFilename == null);
 }
 {noformat}
 there's no proper error handling so I guess maybe we should just make it a 
 warning? Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2176) Unclear error message should be info not error

2015-05-05 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2176:
-
Summary: Unclear error message should be info not error  (was: unclear 
error message should be info or warn)

 Unclear error message should be info not error
 --

 Key: ZOOKEEPER-2176
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2176.patch


 Hi [~shralex],
 Looking at the CI output of ZOOKEEPER-2163 I see this:
 {noformat}
  [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR 
 [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394]
  - writeToDisk == true but configFilename == null
 {noformat}
 Though looking at QuorumPeer#setQuorumVerifier I see:
 {noformat}
 if (configFilename != null) {
 try {
 String dynamicConfigFilename = makeDynamicConfigFilename(
 qv.getVersion());
 QuorumPeerConfig.writeDynamicConfig(
 dynamicConfigFilename, qv, false);
 QuorumPeerConfig.editStaticConfig(configFilename,
 dynamicConfigFilename,
 needEraseClientInfoFromStaticConfig());
 } catch (IOException e) {
 LOG.error(Error closing file: , e.getMessage());
 }
 } else {
 LOG.error(writeToDisk == true but configFilename == null);
 }
 {noformat}
 there's no proper error handling so I guess maybe we should just make it a 
 warning? Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2176) Unclear error message should be info not error

2015-05-05 Thread Hongchao Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongchao Deng updated ZOOKEEPER-2176:
-
Fix Version/s: 3.6.0
   3.5.1

 Unclear error message should be info not error
 --

 Key: ZOOKEEPER-2176
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2176
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2176.patch


 Hi [~shralex],
 Looking at the CI output of ZOOKEEPER-2163 I see this:
 {noformat}
  [exec] [junit] 2015-04-17 17:36:23,750 [myid:] - ERROR 
 [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:11235)(secure=disabled):QuorumPeer@1394]
  - writeToDisk == true but configFilename == null
 {noformat}
 Though looking at QuorumPeer#setQuorumVerifier I see:
 {noformat}
 if (configFilename != null) {
 try {
 String dynamicConfigFilename = makeDynamicConfigFilename(
 qv.getVersion());
 QuorumPeerConfig.writeDynamicConfig(
 dynamicConfigFilename, qv, false);
 QuorumPeerConfig.editStaticConfig(configFilename,
 dynamicConfigFilename,
 needEraseClientInfoFromStaticConfig());
 } catch (IOException e) {
 LOG.error(Error closing file: , e.getMessage());
 }
 } else {
 LOG.error(writeToDisk == true but configFilename == null);
 }
 {noformat}
 there's no proper error handling so I guess maybe we should just make it a 
 warning? Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 >

1 - 100 of 886 matches

Mail list logo