ZooKeeper_branch34_jdk7 - Build # 1644 - Still Failing

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1644/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
[EnvInject] - Loading node environment variables.
ERROR: SEVERE ERROR occurs
org.jenkinsci.lib.envinject.EnvInjectException: java.io.IOException: Remote 
call on H5 failed
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:86)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:43)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:528)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:448)
at hudson.model.Run.execute(Run.java:1735)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:405)
Caused by: java.io.IOException: Remote call on H5 failed
at hudson.remoting.Channel.call(Channel.java:838)
at hudson.FilePath.act(FilePath.java:1081)
at 
org.jenkinsci.plugins.envinject.service.EnvInjectActionSetter.addEnvVarsToRun(EnvInjectActionSetter.java:59)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:83)
... 7 more
Caused by: java.lang.OutOfMemoryError: Java heap space
ERROR: Step ?Publish JUnit test result report? failed: no workspace for 
ZooKeeper_branch34_jdk7 #1644
[EnvInject] - [ERROR] - SEVERE ERROR occurs: Remote call on H5 failed
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Comment Edited] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2017-09-06 Thread Cesar Stuardo (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156297#comment-16156297
 ] 

Cesar Stuardo edited comment on ZOOKEEPER-2778 at 9/7/17 1:42 AM:
--

Hey,

Happy to help [~hanm]! Are we correct about the issue (regarding the path)?


was (Author: castuardo):
Hey,

Happy to help! Are we correct about the issue (regarding the path)?

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Critical
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2017-09-06 Thread Cesar Stuardo (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156297#comment-16156297
 ] 

Cesar Stuardo commented on ZOOKEEPER-2778:
--

Hey,

Happy to help! Are we correct about the issue (regarding the path)?

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Critical
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper_branch34 - Build # 2072 - Still Failing

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34/2072/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
[EnvInject] - Loading node environment variables.
ERROR: SEVERE ERROR occurs
org.jenkinsci.lib.envinject.EnvInjectException: java.io.IOException: Remote 
call on H5 failed
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:86)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:43)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:528)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:448)
at hudson.model.Run.execute(Run.java:1735)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:405)
Caused by: java.io.IOException: Remote call on H5 failed
at hudson.remoting.Channel.call(Channel.java:838)
at hudson.FilePath.act(FilePath.java:1081)
at 
org.jenkinsci.plugins.envinject.service.EnvInjectActionSetter.addEnvVarsToRun(EnvInjectActionSetter.java:59)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:83)
... 7 more
Caused by: java.lang.OutOfMemoryError: Java heap space
ERROR: Step ?Publish JUnit test result report? failed: no workspace for 
ZooKeeper_branch34 #2072
[EnvInject] - [ERROR] - SEVERE ERROR occurs: Remote call on H5 failed
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

ZooKeeper-trunk - Build # 3525 - Still Failing

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/3525/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
[EnvInject] - Loading node environment variables.
ERROR: SEVERE ERROR occurs
org.jenkinsci.lib.envinject.EnvInjectException: java.io.IOException: Remote 
call on H5 failed
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:86)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:43)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:528)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:448)
at hudson.model.Run.execute(Run.java:1735)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:405)
Caused by: java.io.IOException: Remote call on H5 failed
at hudson.remoting.Channel.call(Channel.java:838)
at hudson.FilePath.act(FilePath.java:1081)
at 
org.jenkinsci.plugins.envinject.service.EnvInjectActionSetter.addEnvVarsToRun(EnvInjectActionSetter.java:59)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:83)
... 7 more
Caused by: java.lang.OutOfMemoryError: Java heap space
ERROR: Step ?Publish FindBugs analysis results? failed: no workspace for 
ZooKeeper-trunk #3525
ERROR: Step ?Scan for compiler warnings? failed: no workspace for 
ZooKeeper-trunk #3525
ERROR: Step ?Archive the artifacts? failed: no workspace for ZooKeeper-trunk 
#3525
ERROR: Step ?Record fingerprints of files to track usage? failed: no workspace 
for ZooKeeper-trunk #3525
ERROR: Step ?JIRA: Update relevant issues? failed: no workspace for 
ZooKeeper-trunk #3525
ERROR: Step ?Publish JUnit test result report? failed: no workspace for 
ZooKeeper-trunk #3525
ERROR: Step ?Publish Javadoc? failed: no workspace for ZooKeeper-trunk #3525
[EnvInject] - [ERROR] - SEVERE ERROR occurs: Remote call on H5 failed
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

ZooKeeper-trunk-openjdk7 - Build # 1608 - Still Failing

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1608/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 66.76 MB...]
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-09-06 20:02:04,940 [myid:] - INFO  [New I/O boss 
#7497:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-09-06 20:02:04,940 [myid:127.0.0.1:11271] - INFO  
[main-SendThread(127.0.0.1:11271):ClientCnxn$SendThread@1231] - channel for 
sessionid 0x10683f4f4bc0001 is lost, closing socket connection and attempting 
reconnect
[junit] 2017-09-06 20:02:05,050 [myid:127.0.0.1:11351] - INFO  
[main-SendThread(127.0.0.1:11351):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11351. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-09-06 20:02:05,051 [myid:] - INFO  [New I/O boss 
#15337:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {}
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:11351
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-09-06 20:02:05,051 [myid:] - WARN  [New I/O boss 
#15337:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0x043b7f31] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:11351
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:11351
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-09-06 20:02:05,051 [myid:] - INFO  [New I/O boss 
#15337:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-09-06 20:02:05,051 [myid:127.0.0.1:11351] - INFO  

[jira] [Commented] (ZOOKEEPER-2471) Java Zookeeper Client incorrectly considers time spent sleeping as time spent connecting, potentially resulting in infinite reconnect loop

2017-09-06 Thread Dan Benediktson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155862#comment-16155862
 ] 

Dan Benediktson commented on ZOOKEEPER-2471:


Hey, sorry, I've been meaning to get back to this, but I wasn't expecting to 
sign up for porting the exponential backoff retry until this was in. Polishing 
that patch up so that it can be accepted in mainline will take me a fair bit 
more time, since our version of the code straight-up replaced the existing 
logic with jittered exponential backoff (we haven't run a JVM ZK client without 
jittered exponential backoff in > 1.5 years), and I doubt Apache ZK would be 
willing to accept that. I simply don't have time right now to do that work, and 
won't for at least a month. It also makes me a bit nervous to offer a pull 
request for the exponential backoff feature without this fix already checked 
in, since this was an extremely expensive bug for us, but I'm sympathetic to 
the desire to unit test it; we simply didn't have time to concoct a unit test 
back when we needed the fix urgently, since the nature of Zookeeper code makes 
it generally pretty difficult to add unit tests for most areas, including this 
one.

> Java Zookeeper Client incorrectly considers time spent sleeping as time spent 
> connecting, potentially resulting in infinite reconnect loop
> --
>
> Key: ZOOKEEPER-2471
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2471
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.3
> Environment: all
>Reporter: Dan Benediktson
>Assignee: Dan Benediktson
> Attachments: ZOOKEEPER-2471.patch
>
>
> ClientCnxnSocket uses a member variable "now" to track the current time, and 
> lastSend / lastHeard variables to track socket liveness. Implementations, and 
> even ClientCnxn itself, are expected to call both updateNow() to reset "now" 
> to System.currentTimeMillis, and then call updateLastSend()/updateLastHeard() 
> on IO completions.
> This is a fragile contract, so it's not surprising that there's a bug 
> resulting from it: ClientCnxn.SendThread.run() calls updateLastSendAndHeard() 
> as soon as startConnect() returns, but it does not call updateNow() first. I 
> expect when this was written, either the expectation was that startConnect() 
> was an asynchronous operation and that updateNow() would have been called 
> very recently, or simply the requirement to call updateNow() was forgotten at 
> this point. As far as I can see, this bug has been present since the 
> "updateNow" method was first introduced in the distant past. As it turns out, 
> since startConnect() calls HostProvider.next(), which can sleep, quite a lot 
> of time can pass, leaving a big gap between "now" and now.
> If you are using very short session timeouts (one of our ZK ensembles has 
> many clients using a 1-second timeout), this is potentially disastrous, 
> because the sleep time may exceed the connection timeout itself, which can 
> potentially result in the Java client being stuck in a perpetual reconnect 
> loop. The exact code path it goes through in this case is complicated, 
> because there has to be a previously-closed socket still waiting in the 
> selector (otherwise, the first timeout evaluation will not fail because "now" 
> still hasn't been updated, and then the actual connect timeout will be 
> applied in ClientCnxnSocket.doTransport()) so that select() will harvest the 
> IO from the previous socket and updateNow(), resulting in the next loop 
> through ClientCnxnSocket.SendThread.run() observing the spurious timeout and 
> failing. In practice it does happen to us fairly frequently; we only got to 
> the bottom of the bug yesterday. Worse, when it does happen, the Zookeeper 
> client object is rendered unusable: it's stuck in a perpetual reconnect loop 
> where it keeps sleeping, opening a socket, and immediately closing it.
> I have a patch. Rather than calling updateNow() right after startConnect(), 
> my fix is to remove the "now" member variable and the updateNow() method 
> entirely, and to instead just call System.currentTimeMillis() whenever time 
> needs to be evaluated. I realize there is a benefit (aside from a trivial 
> micro-optimization not worth worrying about) to having the time be "fixed", 
> particularly for truth in the logging: if time is fixed by an updateNow() 
> call, then the log for a timeout will still show exactly the same value the 
> code reasoned about. However, this benefit is in my opinion not enough to 
> merit the fragility of the contract which led to this (for us) highly 
> impactful and difficult-to-find bug in the first place.
> I'm currently 

[jira] [Commented] (ZOOKEEPER-2471) Java Zookeeper Client incorrectly considers time spent sleeping as time spent connecting, potentially resulting in infinite reconnect loop

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155840#comment-16155840
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2471:
---

Github user nicktrav commented on the issue:

https://github.com/apache/zookeeper/pull/330
  
@DanBenediktson - bumping this. Any thoughts?


> Java Zookeeper Client incorrectly considers time spent sleeping as time spent 
> connecting, potentially resulting in infinite reconnect loop
> --
>
> Key: ZOOKEEPER-2471
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2471
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.5.3
> Environment: all
>Reporter: Dan Benediktson
>Assignee: Dan Benediktson
> Attachments: ZOOKEEPER-2471.patch
>
>
> ClientCnxnSocket uses a member variable "now" to track the current time, and 
> lastSend / lastHeard variables to track socket liveness. Implementations, and 
> even ClientCnxn itself, are expected to call both updateNow() to reset "now" 
> to System.currentTimeMillis, and then call updateLastSend()/updateLastHeard() 
> on IO completions.
> This is a fragile contract, so it's not surprising that there's a bug 
> resulting from it: ClientCnxn.SendThread.run() calls updateLastSendAndHeard() 
> as soon as startConnect() returns, but it does not call updateNow() first. I 
> expect when this was written, either the expectation was that startConnect() 
> was an asynchronous operation and that updateNow() would have been called 
> very recently, or simply the requirement to call updateNow() was forgotten at 
> this point. As far as I can see, this bug has been present since the 
> "updateNow" method was first introduced in the distant past. As it turns out, 
> since startConnect() calls HostProvider.next(), which can sleep, quite a lot 
> of time can pass, leaving a big gap between "now" and now.
> If you are using very short session timeouts (one of our ZK ensembles has 
> many clients using a 1-second timeout), this is potentially disastrous, 
> because the sleep time may exceed the connection timeout itself, which can 
> potentially result in the Java client being stuck in a perpetual reconnect 
> loop. The exact code path it goes through in this case is complicated, 
> because there has to be a previously-closed socket still waiting in the 
> selector (otherwise, the first timeout evaluation will not fail because "now" 
> still hasn't been updated, and then the actual connect timeout will be 
> applied in ClientCnxnSocket.doTransport()) so that select() will harvest the 
> IO from the previous socket and updateNow(), resulting in the next loop 
> through ClientCnxnSocket.SendThread.run() observing the spurious timeout and 
> failing. In practice it does happen to us fairly frequently; we only got to 
> the bottom of the bug yesterday. Worse, when it does happen, the Zookeeper 
> client object is rendered unusable: it's stuck in a perpetual reconnect loop 
> where it keeps sleeping, opening a socket, and immediately closing it.
> I have a patch. Rather than calling updateNow() right after startConnect(), 
> my fix is to remove the "now" member variable and the updateNow() method 
> entirely, and to instead just call System.currentTimeMillis() whenever time 
> needs to be evaluated. I realize there is a benefit (aside from a trivial 
> micro-optimization not worth worrying about) to having the time be "fixed", 
> particularly for truth in the logging: if time is fixed by an updateNow() 
> call, then the log for a timeout will still show exactly the same value the 
> code reasoned about. However, this benefit is in my opinion not enough to 
> merit the fragility of the contract which led to this (for us) highly 
> impactful and difficult-to-find bug in the first place.
> I'm currently running ant tests locally against my patch on trunk, and then 
> I'll upload it here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #330: ZOOKEEPER-2471: ZK Java client should not count sleep ...

2017-09-06 Thread nicktrav
Github user nicktrav commented on the issue:

https://github.com/apache/zookeeper/pull/330
  
@DanBenediktson - bumping this. Any thoughts?


---


Success: ZOOKEEPER- PreCommit Build #1001

2017-09-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 34.39 MB...]
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] fb9092396a2a23108444c96d1403194bb0bf3eb1 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD SUCCESSFUL
Total time: 35 minutes 19 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2809
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2809) Unnecessary stack-trace in server when the client disconnect unexpectedly

2017-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155507#comment-16155507
 ] 

Hadoop QA commented on ZOOKEEPER-2809:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1001//console

This message is automatically generated.

> Unnecessary stack-trace in server when the client disconnect unexpectedly
> -
>
> Key: ZOOKEEPER-2809
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2809
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.8
>Reporter: Paul Millar
>Assignee: Mark Fenes
>Priority: Minor
> Fix For: 3.5.0
>
>
> In ZK 3.4.x, if the client disconnects unexpectedly then the server logs this 
> with a stack-trace (see 
> src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java:356).
> This is unfortunate as we are using an embedded ZK server in our project (in 
> a test environment) and we consider all stack-traces as bugs.
> I noticed that ZK 3.5 and later no longer log a stack-trace.  This change is 
> due to commit 6206b495 (in branch-3.5), which adds ZOOKEEPER-1504 and seems 
> to fix this issue almost as a side-effect; a similar change in master has the 
> same effect.
> I was wondering if the change in how EndOfStreamException is logged (i.e., 
> logging the message without a stack-trace) could be back-ported to 3.4 
> branch, so could be included in the next 3.4 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper_branch34_openjdk7 - Build # 1636 - Failure

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1636/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on H27 (ubuntu xenial) in workspace 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from git://git.apache.org/zookeeper.git
 > git --version # timeout=10
 > git fetch --tags --progress git://git.apache.org/zookeeper.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/branch-3.4^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/branch-3.4^{commit} # timeout=10
Checking out Revision 70797397f12c8a9cc04895d7ca3459f7c7134f7d 
(refs/remotes/origin/branch-3.4)
Commit message: "ZOOKEEPER-2880: Rename README.txt to README.md"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 70797397f12c8a9cc04895d7ca3459f7c7134f7d
 > git rev-list 70797397f12c8a9cc04895d7ca3459f7c7134f7d # timeout=10
No emails were triggered.
[ZooKeeper_branch34_openjdk7] $ 
/home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -Dtest.output=yes 
-Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.7 clean 
test-core-java
Error: JAVA_HOME is not defined correctly.
  We cannot execute /usr/lib/jvm/java-7-openjdk-amd64//bin/java
Build step 'Invoke Ant' marked build as failure
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2809) Unnecessary stack-trace in server when the client disconnect unexpectedly

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155453#comment-16155453
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2809:
---

GitHub user mfenes reopened a pull request:

https://github.com/apache/zookeeper/pull/355

ZOOKEEPER-2809: Unnecessary stack-trace in server when the client dis…

Unnecessary stack-trace in server when the client disconnects unexpectedly.

Backport from master, branch-3.5 to branch-3.4. Removes unnecessary stack 
traces from the catch blocks of method doIO in NIOServerCnxn. For 
EndOfStreamException stack trace is replaced with logging only the message and 
also contains the removal of stack traces for exceptions CancelledKeyException 
and IOException as per commit 6206b495 referenced in the ticket.
This change is necessary as there are projects which consider all stack 
traces as bugs. 
For CancelledKeyException and IOException developers are still able to see 
stack traces at log level Debug.
This change is in sync with master and branch-3.5.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mfenes/zookeeper ZOOKEEPER-2809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #355






> Unnecessary stack-trace in server when the client disconnect unexpectedly
> -
>
> Key: ZOOKEEPER-2809
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2809
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.8
>Reporter: Paul Millar
>Assignee: Mark Fenes
>Priority: Minor
> Fix For: 3.5.0
>
>
> In ZK 3.4.x, if the client disconnects unexpectedly then the server logs this 
> with a stack-trace (see 
> src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java:356).
> This is unfortunate as we are using an embedded ZK server in our project (in 
> a test environment) and we consider all stack-traces as bugs.
> I noticed that ZK 3.5 and later no longer log a stack-trace.  This change is 
> due to commit 6206b495 (in branch-3.5), which adds ZOOKEEPER-1504 and seems 
> to fix this issue almost as a side-effect; a similar change in master has the 
> same effect.
> I was wondering if the change in how EndOfStreamException is logged (i.e., 
> logging the message without a stack-trace) could be back-ported to 3.4 
> branch, so could be included in the next 3.4 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #355: ZOOKEEPER-2809: Unnecessary stack-trace in serv...

2017-09-06 Thread mfenes
GitHub user mfenes reopened a pull request:

https://github.com/apache/zookeeper/pull/355

ZOOKEEPER-2809: Unnecessary stack-trace in server when the client dis…

Unnecessary stack-trace in server when the client disconnects unexpectedly.

Backport from master, branch-3.5 to branch-3.4. Removes unnecessary stack 
traces from the catch blocks of method doIO in NIOServerCnxn. For 
EndOfStreamException stack trace is replaced with logging only the message and 
also contains the removal of stack traces for exceptions CancelledKeyException 
and IOException as per commit 6206b495 referenced in the ticket.
This change is necessary as there are projects which consider all stack 
traces as bugs. 
For CancelledKeyException and IOException developers are still able to see 
stack traces at log level Debug.
This change is in sync with master and branch-3.5.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mfenes/zookeeper ZOOKEEPER-2809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #355






---


[jira] [Commented] (ZOOKEEPER-2809) Unnecessary stack-trace in server when the client disconnect unexpectedly

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155452#comment-16155452
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2809:
---

Github user mfenes closed the pull request at:

https://github.com/apache/zookeeper/pull/355


> Unnecessary stack-trace in server when the client disconnect unexpectedly
> -
>
> Key: ZOOKEEPER-2809
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2809
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.8
>Reporter: Paul Millar
>Assignee: Mark Fenes
>Priority: Minor
> Fix For: 3.5.0
>
>
> In ZK 3.4.x, if the client disconnects unexpectedly then the server logs this 
> with a stack-trace (see 
> src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java:356).
> This is unfortunate as we are using an embedded ZK server in our project (in 
> a test environment) and we consider all stack-traces as bugs.
> I noticed that ZK 3.5 and later no longer log a stack-trace.  This change is 
> due to commit 6206b495 (in branch-3.5), which adds ZOOKEEPER-1504 and seems 
> to fix this issue almost as a side-effect; a similar change in master has the 
> same effect.
> I was wondering if the change in how EndOfStreamException is logged (i.e., 
> logging the message without a stack-trace) could be back-ported to 3.4 
> branch, so could be included in the next 3.4 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #355: ZOOKEEPER-2809: Unnecessary stack-trace in serv...

2017-09-06 Thread mfenes
Github user mfenes closed the pull request at:

https://github.com/apache/zookeeper/pull/355


---


[jira] [Commented] (ZOOKEEPER-2892) Improve lazy initialize and close stream for `PrepRequestProcessor`

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155451#comment-16155451
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2892:
---

Github user asdf2014 commented on the issue:

https://github.com/apache/zookeeper/pull/361
  
@hanm It works, thx a lot.  


> Improve lazy initialize and close stream for `PrepRequestProcessor`
> ---
>
> Key: ZOOKEEPER-2892
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2892
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benedict Jin
>Assignee: Benedict Jin
>
> Improve lazy initialize and close stream for `PrepRequestProcessor`
> * Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
> * Close the `ByteArrayOutputStream` I/O stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #361: ZOOKEEPER-2892: Improve lazy initialize and close stre...

2017-09-06 Thread asdf2014
Github user asdf2014 commented on the issue:

https://github.com/apache/zookeeper/pull/361
  
@hanm It works, thx a lot. 👍 


---


Failed: ZOOKEEPER- PreCommit Build #1000

2017-09-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1000/

###
## LAST 60 LINES OF THE CONSOLE 
###
GitHub pull request #355 to apache/zookeeper
[EnvInject] - Loading node environment variables.
ERROR: SEVERE ERROR occurs
org.jenkinsci.lib.envinject.EnvInjectException: java.io.IOException: Remote 
call on H5 failed
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:86)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:43)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:528)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:448)
at hudson.model.Run.execute(Run.java:1735)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:405)
Caused by: java.io.IOException: Remote call on H5 failed
at hudson.remoting.Channel.call(Channel.java:838)
at hudson.FilePath.act(FilePath.java:1081)
at 
org.jenkinsci.plugins.envinject.service.EnvInjectActionSetter.addEnvVarsToRun(EnvInjectActionSetter.java:59)
at 
org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:83)
... 7 more
Caused by: java.lang.OutOfMemoryError: Java heap space
ERROR: Step ?Archive the artifacts? failed: no workspace for 
PreCommit-ZOOKEEPER-github-pr-build #1000
ERROR: Step ?Publish JUnit test result report? failed: no workspace for 
PreCommit-ZOOKEEPER-github-pr-build #1000
[description-setter] Could not determine description.
Putting comment on the pull request
[EnvInject] - [ERROR] - SEVERE ERROR occurs: Remote call on H5 failed
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2809) Unnecessary stack-trace in server when the client disconnect unexpectedly

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155440#comment-16155440
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2809:
---

GitHub user mfenes reopened a pull request:

https://github.com/apache/zookeeper/pull/355

ZOOKEEPER-2809: Unnecessary stack-trace in server when the client dis…

Unnecessary stack-trace in server when the client disconnects unexpectedly.

Backport from master, branch-3.5 to branch-3.4. Removes unnecessary stack 
traces from the catch blocks of method doIO in NIOServerCnxn. For 
EndOfStreamException stack trace is replaced with logging only the message and 
also contains the removal of stack traces for exceptions CancelledKeyException 
and IOException as per commit 6206b495 referenced in the ticket.
This change is necessary as there are projects which consider all stack 
traces as bugs. 
For CancelledKeyException and IOException developers are still able to see 
stack traces at log level Debug.
This change is in sync with master and branch-3.5.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mfenes/zookeeper ZOOKEEPER-2809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #355






> Unnecessary stack-trace in server when the client disconnect unexpectedly
> -
>
> Key: ZOOKEEPER-2809
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2809
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.8
>Reporter: Paul Millar
>Assignee: Mark Fenes
>Priority: Minor
> Fix For: 3.5.0
>
>
> In ZK 3.4.x, if the client disconnects unexpectedly then the server logs this 
> with a stack-trace (see 
> src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java:356).
> This is unfortunate as we are using an embedded ZK server in our project (in 
> a test environment) and we consider all stack-traces as bugs.
> I noticed that ZK 3.5 and later no longer log a stack-trace.  This change is 
> due to commit 6206b495 (in branch-3.5), which adds ZOOKEEPER-1504 and seems 
> to fix this issue almost as a side-effect; a similar change in master has the 
> same effect.
> I was wondering if the change in how EndOfStreamException is logged (i.e., 
> logging the message without a stack-trace) could be back-ported to 3.4 
> branch, so could be included in the next 3.4 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2809) Unnecessary stack-trace in server when the client disconnect unexpectedly

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155439#comment-16155439
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2809:
---

Github user mfenes closed the pull request at:

https://github.com/apache/zookeeper/pull/355


> Unnecessary stack-trace in server when the client disconnect unexpectedly
> -
>
> Key: ZOOKEEPER-2809
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2809
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.8
>Reporter: Paul Millar
>Assignee: Mark Fenes
>Priority: Minor
> Fix For: 3.5.0
>
>
> In ZK 3.4.x, if the client disconnects unexpectedly then the server logs this 
> with a stack-trace (see 
> src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java:356).
> This is unfortunate as we are using an embedded ZK server in our project (in 
> a test environment) and we consider all stack-traces as bugs.
> I noticed that ZK 3.5 and later no longer log a stack-trace.  This change is 
> due to commit 6206b495 (in branch-3.5), which adds ZOOKEEPER-1504 and seems 
> to fix this issue almost as a side-effect; a similar change in master has the 
> same effect.
> I was wondering if the change in how EndOfStreamException is logged (i.e., 
> logging the message without a stack-trace) could be back-ported to 3.4 
> branch, so could be included in the next 3.4 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #355: ZOOKEEPER-2809: Unnecessary stack-trace in serv...

2017-09-06 Thread mfenes
GitHub user mfenes reopened a pull request:

https://github.com/apache/zookeeper/pull/355

ZOOKEEPER-2809: Unnecessary stack-trace in server when the client dis…

Unnecessary stack-trace in server when the client disconnects unexpectedly.

Backport from master, branch-3.5 to branch-3.4. Removes unnecessary stack 
traces from the catch blocks of method doIO in NIOServerCnxn. For 
EndOfStreamException stack trace is replaced with logging only the message and 
also contains the removal of stack traces for exceptions CancelledKeyException 
and IOException as per commit 6206b495 referenced in the ticket.
This change is necessary as there are projects which consider all stack 
traces as bugs. 
For CancelledKeyException and IOException developers are still able to see 
stack traces at log level Debug.
This change is in sync with master and branch-3.5.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mfenes/zookeeper ZOOKEEPER-2809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #355






---


[GitHub] zookeeper pull request #355: ZOOKEEPER-2809: Unnecessary stack-trace in serv...

2017-09-06 Thread mfenes
Github user mfenes closed the pull request at:

https://github.com/apache/zookeeper/pull/355


---


Success: ZOOKEEPER- PreCommit Build #999

2017-09-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 71.51 MB...]
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] e3ef8c7f3adb70542fce3837f46781c0e0e6cc9f logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 37 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2892
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2892) Improve lazy initialize and close stream for `PrepRequestProcessor`

2017-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155433#comment-16155433
 ] 

Hadoop QA commented on ZOOKEEPER-2892:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/999//console

This message is automatically generated.

> Improve lazy initialize and close stream for `PrepRequestProcessor`
> ---
>
> Key: ZOOKEEPER-2892
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2892
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benedict Jin
>Assignee: Benedict Jin
>
> Improve lazy initialize and close stream for `PrepRequestProcessor`
> * Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
> * Close the `ByteArrayOutputStream` I/O stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2882) memory leak in zoo_amulti() function

2017-09-06 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155407#comment-16155407
 ] 

Michael Han commented on ZOOKEEPER-2882:


{{Total time: 551 minutes 16 seconds}}

It's the first time I've ever seen unit tests took this long :)

The error message here indicates that some, if not all of the tests were 
failing. That is not cool, but usually OK as we have some flaky tests. Were you 
able to identify which tests were failing? 

Another suggestion is - since you are making change to C client, you can just 
run C tests and skip java tests. Something like:
{{ant test-core-cppunit}}

> memory leak in zoo_amulti() function
> 
>
> Key: ZOOKEEPER-2882
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2882
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: guoxiang niu
>Assignee: guoxiang niu
>Priority: Minor
>
> when default branch is executed in switch(op->type) , alloced memory for oa 
> variable will leak, so, close_buffer_oarchive(, 1); should be called 
> before returning in default branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2892) Improve lazy initialize and close stream for `PrepRequestProcessor`

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155393#comment-16155393
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2892:
---

Github user asdf2014 closed the pull request at:

https://github.com/apache/zookeeper/pull/361


> Improve lazy initialize and close stream for `PrepRequestProcessor`
> ---
>
> Key: ZOOKEEPER-2892
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2892
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benedict Jin
>Assignee: Benedict Jin
>
> Improve lazy initialize and close stream for `PrepRequestProcessor`
> * Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
> * Close the `ByteArrayOutputStream` I/O stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2892) Improve lazy initialize and close stream for `PrepRequestProcessor`

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155389#comment-16155389
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2892:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/361
  
Looks like a transient error, you can try retrigger Jenkins job by closing 
and reopening the pull request, or just generate a new commit hash (git commit 
--amend) and force push to remote.


> Improve lazy initialize and close stream for `PrepRequestProcessor`
> ---
>
> Key: ZOOKEEPER-2892
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2892
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benedict Jin
>Assignee: Benedict Jin
>
> Improve lazy initialize and close stream for `PrepRequestProcessor`
> * Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
> * Close the `ByteArrayOutputStream` I/O stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2892) Improve lazy initialize and close stream for `PrepRequestProcessor`

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155392#comment-16155392
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2892:
---

Github user asdf2014 commented on the issue:

https://github.com/apache/zookeeper/pull/361
  
Okay, I'll try it.


> Improve lazy initialize and close stream for `PrepRequestProcessor`
> ---
>
> Key: ZOOKEEPER-2892
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2892
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benedict Jin
>Assignee: Benedict Jin
>
> Improve lazy initialize and close stream for `PrepRequestProcessor`
> * Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
> * Close the `ByteArrayOutputStream` I/O stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2892) Improve lazy initialize and close stream for `PrepRequestProcessor`

2017-09-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155394#comment-16155394
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2892:
---

GitHub user asdf2014 reopened a pull request:

https://github.com/apache/zookeeper/pull/361

ZOOKEEPER-2892: Improve lazy initialize and close stream for 
`PrepRequestProcessor`

Improve lazy initialize and close stream for `PrepRequestProcessor`

* Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
* Close the `ByteArrayOutputStream` I/O stream

@hanm PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/asdf2014/zookeeper ZOOKEEPER-2892

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #361


commit 14ea808d169c0f3df9529fe4353a31c6f595337b
Author: asdf2014 
Date:   2017-09-06T02:02:28Z

ZOOKEEPER-2892: Improve lazy initialize & close stream for 
`PrepRequestProcessor`




> Improve lazy initialize and close stream for `PrepRequestProcessor`
> ---
>
> Key: ZOOKEEPER-2892
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2892
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Benedict Jin
>Assignee: Benedict Jin
>
> Improve lazy initialize and close stream for `PrepRequestProcessor`
> * Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
> * Close the `ByteArrayOutputStream` I/O stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #361: ZOOKEEPER-2892: Improve lazy initialize and clo...

2017-09-06 Thread asdf2014
GitHub user asdf2014 reopened a pull request:

https://github.com/apache/zookeeper/pull/361

ZOOKEEPER-2892: Improve lazy initialize and close stream for 
`PrepRequestProcessor`

Improve lazy initialize and close stream for `PrepRequestProcessor`

* Delay the initialization of `ChangeRecord` and `ReconfigRequest` variables
* Close the `ByteArrayOutputStream` I/O stream

@hanm PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/asdf2014/zookeeper ZOOKEEPER-2892

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #361


commit 14ea808d169c0f3df9529fe4353a31c6f595337b
Author: asdf2014 
Date:   2017-09-06T02:02:28Z

ZOOKEEPER-2892: Improve lazy initialize & close stream for 
`PrepRequestProcessor`




---


[GitHub] zookeeper issue #361: ZOOKEEPER-2892: Improve lazy initialize and close stre...

2017-09-06 Thread asdf2014
Github user asdf2014 commented on the issue:

https://github.com/apache/zookeeper/pull/361
  
Okay, I'll try it.


---


[GitHub] zookeeper pull request #361: ZOOKEEPER-2892: Improve lazy initialize and clo...

2017-09-06 Thread asdf2014
Github user asdf2014 closed the pull request at:

https://github.com/apache/zookeeper/pull/361


---


[GitHub] zookeeper issue #361: ZOOKEEPER-2892: Improve lazy initialize and close stre...

2017-09-06 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/361
  
Looks like a transient error, you can try retrigger Jenkins job by closing 
and reopening the pull request, or just generate a new commit hash (git commit 
--amend) and force push to remote.


---


[jira] [Commented] (ZOOKEEPER-1654) bad documentation link on site

2017-09-06 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155351#comment-16155351
 ] 

Michael Han commented on ZOOKEEPER-1654:


I remember someone reported the same issue to mailing list a while ago. Not 
sure what caused this, but a republish of the site could probably fix the issue 
(and the current doc on site is outdated anyway). I'll do it since publishing 
requires committer access.

> bad documentation link on site
> --
>
> Key: ZOOKEEPER-1654
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1654
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Camille Fournier
>Priority: Minor
>
> If you go to this page:
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html
> Then try to click on Developer -> API Docs you'll get to 
> http://zookeeper.apache.org/doc/trunk/api/index.html
> Which does not exist. Should point to:
> http://zookeeper.apache.org/doc/current/api/index.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ZOOKEEPER-1654) bad documentation link on site

2017-09-06 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-1654:
--

Assignee: Michael Han

> bad documentation link on site
> --
>
> Key: ZOOKEEPER-1654
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1654
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Camille Fournier
>Assignee: Michael Han
>Priority: Minor
>
> If you go to this page:
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html
> Then try to click on Developer -> API Docs you'll get to 
> http://zookeeper.apache.org/doc/trunk/api/index.html
> Which does not exist. Should point to:
> http://zookeeper.apache.org/doc/current/api/index.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2885) zookeeper-3.5.3-beta.tar.gz file in mirror site is corrupted

2017-09-06 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155338#comment-16155338
 ] 

Michael Han commented on ZOOKEEPER-2885:


Thanks for reporting the issue. Hopefully this is not a blocker to anyone but 
we'll be more careful and rigorous when verifying the release artifacts for 
next releases. 

> zookeeper-3.5.3-beta.tar.gz file in mirror site is corrupted
> 
>
> Key: ZOOKEEPER-2885
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2885
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.3
>Reporter: Gabriel
>Priority: Critical
> Fix For: 3.5.3
>
>
> I downloaded the zookeeper-3.5.3-beta.tar.gz file from several mirror sites 
> and all of them are corrupted.
> {quote}$ wget 
> http://www-us.apache.org/dist/zookeeper/zookeeper-3.5.3-beta/zookeeper-3.5.3-beta.tar.gz
> $:~/dockerfiles$ tar -xzvf zookeeper-3.5.3-beta.tar.gz
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now{quote}
> If this is my mistake, please could you explain me what I did wrong?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2017-09-06 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155336#comment-16155336
 ] 

Michael Han commented on ZOOKEEPER-2778:


Thanks for your efforts on reproducing this bug (and many others), 
[~castuardo]! I'll resume working on this soon.

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Critical
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Regarding stable release plan from 3.5 branch

2017-09-06 Thread Michael Han
I am not aware of any concrete plans being made - though we are definitely
heading to the direction of making first stable 3.5 release.

Speaking of that, should we do another beta release before stable release?
Quality wise the current branch-3.5 is not at the same quality comparing to
branch-3.4:
https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html.
branch-3.4 is very clean in terms of flaky and branch-3.5 is not. Also if
we look at recent tests [1] [2] there are quite a few batch failures (which
was filtered out by the flaky test checking script that's why they did not
appear in dashboard), which worth investigation.

Feature wise, I think we are in good shape after ZOOKEEPER-1045 being
ported to branch-3.5. Though this SASL based quorum peer authentication
feature does not work with reconfig yet. Since reconfig is disabled by
default, I think it's fine we ship without both working together, but
others might have a different opinion. There are a few other patches that
we should probably get in such as Jordan's usability improvements of
reconfig [3].

There are also a few concurrency related bugs that's worth to fix before
getting to stable, such as [4].

It'll also be cool to run Jepsen on branch-3.5 (ZOOKEEPER-2704) before we
declaring the branch is stable.

TL;DR - IMHO I think we are in good shape, but we need do something to
improve the quality before we release, as stabilization and quality is a
big deal for ZooKeeper.

[1] https://builds.apache.org/job/ZooKeeper_branch35_jdk8/
[2] https://builds.apache.org/job/ZooKeeper_branch35_jdk7/
[3] https://github.com/apache/zookeeper/pull/249
[4] https://issues.apache.org/jira/browse/ZOOKEEPER-2778


> From: bhupendra jain 
> Date: Mon, Sep 4, 2017 at 10:30 PM
> Subject: Regarding stable release plan from 3.5 branch
> To: "dev@zookeeper.apache.org" 
>
>
> Hi Guys
>
> As I read the below link, Its great to know that community already working
> towards the stable version from 3.5 branch.  Last beta release was  on 17
> April, 2017: release 3.5.3-beta version.
> https://whimsy.apache.org/board/minutes/ZooKeeper.html
> ## Activity:
> We have completed two releases since March and the community is now working
> towards the first stable release of the 3.5 branch.
>
> So If community already have some plan / milestone for stable release from
> 3.5 branch, Please share.
>
> Thanks
> Bhupendra
>
>
> 
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
>
>
>
> --
> Cheers
> Michael.
>


[jira] [Comment Edited] (ZOOKEEPER-2882) memory leak in zoo_amulti() function

2017-09-06 Thread guoxiang niu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155292#comment-16155292
 ] 

guoxiang niu edited comment on ZOOKEEPER-2882 at 9/6/17 12:56 PM:
--

i just added close_buffer_oarchive(, 1); statement after default: statement 
in zoo_amulti(), when i executed ant test on windows 10 OS, it showed following 
failed message:

BUILD FAILED
\zookeeper\zookeeper\build.xml:1339: The following error occurred while 
executing this line:
\zookeeper\zookeeper\build.xml:1220: The following error occurred while 
executing this line:
\zookeeper\zookeeper\build.xml:1224: Tests failed!

Total time: 551 minutes 16 seconds

how to solve it?


was (Author: guoxiang):
i just added close_buffer_oarchive(, 1); statement after default: statement 
in zoo_amulti(), when i executed ant test on windows 10 OS, it showed following 
failed message:

BUILD FAILED
\zookeeper\zookeeper\build.xml:1339: The following error occurred while 
executing this line:
\zookeeper\zookeeper\build.xml:1220: The following error occurred while 
executing this line:
\zookeeper\zookeeper\build.xml:1224: Tests failed!

how to solve it?

> memory leak in zoo_amulti() function
> 
>
> Key: ZOOKEEPER-2882
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2882
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: guoxiang niu
>Assignee: guoxiang niu
>Priority: Minor
>
> when default branch is executed in switch(op->type) , alloced memory for oa 
> variable will leak, so, close_buffer_oarchive(, 1); should be called 
> before returning in default branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2882) memory leak in zoo_amulti() function

2017-09-06 Thread guoxiang niu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155292#comment-16155292
 ] 

guoxiang niu commented on ZOOKEEPER-2882:
-

i just added close_buffer_oarchive(, 1); statement after default: statement 
in zoo_amulti(), when i executed ant test on windows 10 OS, it showed following 
failed message:

BUILD FAILED
\zookeeper\zookeeper\build.xml:1339: The following error occurred while 
executing this line:
\zookeeper\zookeeper\build.xml:1220: The following error occurred while 
executing this line:
\zookeeper\zookeeper\build.xml:1224: Tests failed!

how to solve it?

> memory leak in zoo_amulti() function
> 
>
> Key: ZOOKEEPER-2882
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2882
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: guoxiang niu
>Assignee: guoxiang niu
>Priority: Minor
>
> when default branch is executed in switch(op->type) , alloced memory for oa 
> variable will leak, so, close_buffer_oarchive(, 1); should be called 
> before returning in default branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2889) Zookeeper standalone instance startup references logging classes incompatible with log4j-1.2-api

2017-09-06 Thread Nikhil Bhide (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155272#comment-16155272
 ] 

Nikhil Bhide commented on ZOOKEEPER-2889:
-

I would like to work on it. Please assign this issue to me.

> Zookeeper standalone instance startup references logging classes incompatible 
> with log4j-1.2-api
> 
>
> Key: ZOOKEEPER-2889
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2889
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.8
>Reporter: Karl Wright
>
> Starting Zookeeper in the following way causes "ClassNotFoundException" 
> errors, and aborts, in a log4j 2.x environment:
> {code}
> "%JAVA_HOME%\bin\java" %JAVAOPTIONS% 
> org.apache.zookeeper.server.quorum.QuorumPeerMain zookeeper.cfg
> {code}
> The log4j 2.x jars in the classpath are:
> {code}
> log4j-1.2-api
> log4j-core
> log4j-api
> {code}
> It appears that the Zookeeper QuorumPeerMain class is incompatible with the 
> limited log4j 1.2 API that log4j 2.x includes.  Zookeeper 3.4.8 works fine 
> with log4j 2.x except when you start it as a service in this way.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper-trunk-jdk8 - Build # 1192 - Still Failing

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1192/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 60.31 MB...]
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit] at java.lang.Thread.run(Thread.java:748)
[junit] 2017-09-06 11:56:05,531 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 101457
[junit] 2017-09-06 11:56:05,531 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 863
[junit] 2017-09-06 11:56:05,532 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-09-06 11:56:05,532 [myid:] - INFO  [main:ClientBase@601] - 
tearDown starting
[junit] 2017-09-06 11:56:05,532 [myid:] - INFO  [main:ClientBase@571] - 
STOPPING server
[junit] 2017-09-06 11:56:05,532 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:22240
[junit] 2017-09-06 11:56:05,534 [myid:] - INFO  [main:ZooKeeperServer@545] 
- shutting down
[junit] 2017-09-06 11:56:05,534 [myid:] - ERROR [main:ZooKeeperServer@509] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-09-06 11:56:05,534 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-09-06 11:56:05,535 [myid:] - INFO  
[main:PrepRequestProcessor@1010] - Shutting down
[junit] 2017-09-06 11:56:05,535 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-09-06 11:56:05,535 [myid:] - INFO  [ProcessThread(sid:0 
cport:22240)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop!
[junit] 2017-09-06 11:56:05,535 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-09-06 11:56:05,535 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-09-06 11:56:05,535 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port22240,name1=InMemoryDataTree]
[junit] 2017-09-06 11:56:05,536 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port22240]
[junit] 2017-09-06 11:56:05,536 [myid:] - INFO  
[main:FourLetterWordMain@87] - connecting to 127.0.0.1 22240
[junit] 2017-09-06 11:56:05,536 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2017-09-06 11:56:05,541 [myid:] - INFO  [main:ClientBase@626] - 
fdcount after test is: 2545 at start it was 2545
[junit] 2017-09-06 11:56:05,541 [myid:] - INFO  [main:ZKTestCase$1@68] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2017-09-06 11:56:05,547 [myid:] - INFO  [main:ZKTestCase$1@63] - 
FINISHED testWatcherAutoResetWithLocal
[junit] Tests run: 105, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
432.574 sec, Thread: 5, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] 2017-09-06 11:56:05,793 [myid:127.0.0.1:22123] - INFO  
[main-SendThread(127.0.0.1:22123):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:22123. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-09-06 11:56:05,794 [myid:127.0.0.1:22123] - WARN  
[main-SendThread(127.0.0.1:22123):ClientCnxn$SendThread@1235] - Session 
0x305d07a37de for server 127.0.0.1/127.0.0.1:22123, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-09-06 11:56:05,947 [myid:127.0.0.1:21994] - INFO  
[main-SendThread(127.0.0.1:21994):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:21994. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-09-06 11:56:05,947 [myid:127.0.0.1:21994] - WARN  
[main-SendThread(127.0.0.1:21994):ClientCnxn$SendThread@1235] - Session 
0x105d0771348 for server 127.0.0.1/127.0.0.1:21994, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit]  

ZooKeeper_branch35_openjdk7 - Build # 660 - Failure

2017-09-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/660/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 69.37 MB...]
[junit] 2017-09-06 10:10:54,501 [myid:] - WARN  [New I/O boss 
#18:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0xdc9f7247] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30073
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30073
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-09-06 10:10:54,501 [myid:] - INFO  [New I/O boss 
#18:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-09-06 10:10:54,501 [myid:127.0.0.1:30073] - INFO  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1231] - channel for 
sessionid 0x10345a1cd8a is lost, closing socket connection and attempting 
reconnect
[junit] 2017-09-06 10:10:54,715 [myid:127.0.0.1:30199] - INFO  
[main-SendThread(127.0.0.1:30199):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30199. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-09-06 10:10:54,716 [myid:] - INFO  [New I/O boss 
#2772:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {}
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30199
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-09-06 10:10:54,717 [myid:] - WARN  [New I/O boss 
#2772:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0x5257607b] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30199
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30199
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at