[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold

2017-06-08 Thread JiangJiafu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043968#comment-16043968
 ] 

JiangJiafu commented on ZOOKEEPER-2800:
---

I think this must be a bug, because the PR happens again in my environment.

> zookeeper ephemeral node not deleted after server restart and consistency is 
> not hold
> -
>
> Key: ZOOKEEPER-2800
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.11
> Environment: Centos6.5 java8
>Reporter: JiangJiafu
>Priority: Critical
> Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:30.0.0.72
> ofs_zk2:30.0.0.73
> ofs_zk3:30.0.0.99
> On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,:
> /adm_election/rolemgr/rolemgr08,
> /adm_election/rolemgr/rolemgr11,
> /adm_election/rolemgr/rolemgr12,
> with sesstion timeout 2 ms.
> Then  I restart ofs_zk1 and ofs_zk2.
> On 2017-06-05, I found that, these ephemeral  nodes still exist on ofs_zk1.
> I can check the nodes by zkCli.sh get command on ofs_zk1.
> But these nodes doesn't not exist on ofs_zk2 and ofs_zk3.
> Is it odd?
> I have upload the whole deploy directory of three nodes to:
> https://pan.baidu.com/s/1miohiCo ,
> The log is printed in log/zookeeper.out
> log of ofs_zk3 is too large, so I only show the head 1000 lines.
> Since I find this PR a little late, some snapshot and log may be deleted.
> I hope anyone can help find the reason.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reassigned ZOOKEEPER-1748:
---

Assignee: Ben Sherman  (was: Daniel Peon)

> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Ben Sherman
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


ZooKeeper_branch34_jdk8 - Build # 1020 - Failure

2017-06-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1020/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 18.36 MB...]
[junit] at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
[junit] 2017-06-08 22:53:20,034 [myid:] - INFO  [main:ZKTestCase$1@64] - 
FINISHED testSessionMoved
[junit] 2017-06-08 22:53:20,035 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11341
[junit] 2017-06-08 22:53:20,035 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11342
[junit] 2017-06-08 22:53:20,035 [myid:] - INFO  [main:ZKTestCase$1@59] - 
STARTING testDeleteWithChildren
[junit] 2017-06-08 22:53:20,035 [myid:] - INFO  [main:QuorumBase@69] - 
QuorumBase.setup null
[junit] 2017-06-08 22:53:20,040 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11343
[junit] 2017-06-08 22:53:20,040 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11344
[junit] 2017-06-08 22:53:20,040 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11345
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11346
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11347
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11348
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11349
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11350
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11351
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:PortAssignment@32] - 
assigning port 11352
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:QuorumBase@93] - 
Ports are: 
127.0.0.1:11343,127.0.0.1:11344,127.0.0.1:11345,127.0.0.1:11346,127.0.0.1:11347
[junit] 2017-06-08 22:53:20,041 [myid:] - INFO  [main:QuorumBase@277] - 
TearDown started
[junit] 2017-06-08 22:53:20,042 [myid:] - INFO  [main:QuorumBase@281] - 
fdcount after test is: 102
[junit] 2017-06-08 22:53:20,042 [myid:] - INFO  [main:ZKTestCase$1@74] - 
FAILED testDeleteWithChildren
[junit] org.junit.internal.runners.model.MultipleFailureException
[junit] at 
org.junit.internal.runners.model.MultipleFailureException.assertEmpty(MultipleFailureException.java:23)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:42)
[junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
[junit] at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
[junit] at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
[junit] at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033)
[junit] 2017-06-08 22:53:20,042 [myid:] - INFO  [main:ZKTestCase$1@64] - 
FINISHED testDeleteWithChildren
[junit] Tests run: 12, Failures: 8, Errors: 9, Skipped: 1, Time elapsed: 
11.358 sec
[junit] 2017-06-08 22:53:20,122 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@163] - SessionTrackerImpl exited loop!
[junit] 2017-06-08 22:53:20,895 [myid:] - INFO  
[/127.0.0.1:12244:QuorumCnxManager$Listener@773] - Leaving listener
[junit] 2017-06-08 22:53:20,940 [myid:] - INFO  
[/127.0.0.1:12252:QuorumCnxManager$Listener@773] - Leaving listener

[jira] [Commented] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043588#comment-16043588
 ] 

Hadoop QA commented on ZOOKEEPER-2803:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 48 new Findbugs (version 
3.0.1) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//console

This message is automatically generated.

> Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
> --
>
> Key: ZOOKEEPER-2803
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>
> We have noticed on internal executions of the integration tests rare failures 
> of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.
> {code}
> java.lang.RuntimeException: Unable to run quorum server 
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
>   at 
> org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Caused by: java.io.IOException: The current epoch, 0, is older than the last 
> zxid, 4294967296
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
> {code}
> along with this strange stack trace in the logs:
> {code}
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
>   at 
> org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
>   at 
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
>   at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
> {code}
> It appears that this failure is related to the usage of {{((FileOutputStream) 
> out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
> {{FileChannel#force}} appears to be interruptible, which is not desirable 
> behavior when writing the epoch file. The interrupt may be triggered by the 
> repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. 
> Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does 
> not appear to have the same problem.
> I was able to find another JIRA ticket describing a similar issue here: 
> https://issues.apache.org/jira/browse/DERBY-4963
> There is also interesting discussion in ZOOKEEPER-1835 (where the change was 
> made for 3.5) although these discussions appear to be Windows centric (we 
> noticed the issue on Linux) 
> https://issues.apache.org/jira/browse/ZOOKEEPER-1835 
> The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build 
> #3241" but jenkins cleared out the logs (I only still have the test report 
> from the mailing list).
> In addition, {{testWorkerThreads}} appears to be failing every few months on 
> Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430  
> and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote 
> this Jenkins had cleaned out the logs from the latest failed run so I have no 
> way of determining if the cause is the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Failed: ZOOKEEPER- PreCommit Build #781

2017-06-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 32.58 MB...]
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 48 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 7a0a096774669ce2581f9534a3109dfaa0ef14aa logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1703:
 exec returned: 2

Total time: 36 minutes 20 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2803
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043580#comment-16043580
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2803:
---

Github user afine closed the pull request at:

https://github.com/apache/zookeeper/pull/277


> Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
> --
>
> Key: ZOOKEEPER-2803
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>
> We have noticed on internal executions of the integration tests rare failures 
> of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.
> {code}
> java.lang.RuntimeException: Unable to run quorum server 
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
>   at 
> org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Caused by: java.io.IOException: The current epoch, 0, is older than the last 
> zxid, 4294967296
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
> {code}
> along with this strange stack trace in the logs:
> {code}
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
>   at 
> org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
>   at 
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
>   at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
> {code}
> It appears that this failure is related to the usage of {{((FileOutputStream) 
> out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
> {{FileChannel#force}} appears to be interruptible, which is not desirable 
> behavior when writing the epoch file. The interrupt may be triggered by the 
> repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. 
> Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does 
> not appear to have the same problem.
> I was able to find another JIRA ticket describing a similar issue here: 
> https://issues.apache.org/jira/browse/DERBY-4963
> There is also interesting discussion in ZOOKEEPER-1835 (where the change was 
> made for 3.5) although these discussions appear to be Windows centric (we 
> noticed the issue on Linux) 
> https://issues.apache.org/jira/browse/ZOOKEEPER-1835 
> The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build 
> #3241" but jenkins cleared out the logs (I only still have the test report 
> from the mailing list).
> In addition, {{testWorkerThreads}} appears to be failing every few months on 
> Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430  
> and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote 
> this Jenkins had cleaned out the logs from the latest failed run so I have no 
> way of determining if the cause is the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #277: ZOOKEEPER-2803 Flaky test: org.apache.zookeeper...

2017-06-08 Thread afine
Github user afine closed the pull request at:

https://github.com/apache/zookeeper/pull/277


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

2017-06-08 Thread Abraham Fine (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abraham Fine updated ZOOKEEPER-2803:

Description: 
We have noticed on internal executions of the integration tests rare failures 
of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.

{code}
java.lang.RuntimeException: Unable to run quorum server 
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
at 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Caused by: java.io.IOException: The current epoch, 0, is older than the last 
zxid, 4294967296
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
{code}

along with this strange stack trace in the logs:
{code}
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
{code}

It appears that this failure is related to the usage of {{((FileOutputStream) 
out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
{{FileChannel#force}} appears to be interruptible, which is not desirable 
behavior when writing the epoch file. The interrupt may be triggered by the 
repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. 
Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not 
appear to have the same problem.

I was able to find another JIRA ticket describing a similar issue here: 
https://issues.apache.org/jira/browse/DERBY-4963

There is also interesting discussion in ZOOKEEPER-1835 (where the change was 
made for 3.5) although these discussions appear to be Windows centric (we 
noticed the issue on Linux) 
https://issues.apache.org/jira/browse/ZOOKEEPER-1835 

The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build #3241" 
but jenkins cleared out the logs (I only still have the test report from the 
mailing list).

In addition, {{testWorkerThreads}} appears to be failing every few months on 
Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430  
and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote this 
Jenkins had cleaned out the logs from the latest failed run so I have no way of 
determining if the cause is the same.

  was:
We have noticed on internal executions of the integration tests rare failures 
of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.

{code}
java.lang.RuntimeException: Unable to run quorum server 
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
at 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Caused by: java.io.IOException: The current epoch, 0, is older than the last 
zxid, 4294967296
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
{code}

along with this strange stack trace in the logs:
{code}
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
{code}

It appears that this failure is related to the usage of {{((FileOutputStream) 
out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 

[jira] [Commented] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043538#comment-16043538
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2803:
---

GitHub user afine opened a pull request:

https://github.com/apache/zookeeper/pull/277

ZOOKEEPER-2803 Flaky test: 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afine/zookeeper ZOOKEEPER-2803

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #277


commit 0a6ada6ba25ab3d4b2094a4c5f1842a9a0b67dfc
Author: Abraham Fine 
Date:   2017-06-08T22:10:41Z

ZOOKEEPER-2803: Flaky test: 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads




> Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
> --
>
> Key: ZOOKEEPER-2803
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>
> We have noticed on internal executions of the integration tests rare failures 
> of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.
> {code}
> java.lang.RuntimeException: Unable to run quorum server 
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
>   at 
> org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Caused by: java.io.IOException: The current epoch, 0, is older than the last 
> zxid, 4294967296
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
> {code}
> along with this strange stack trace in the logs:
> {code}
> java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
>   at 
> org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
>   at 
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
>   at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
> {code}
> It appears that this failure is related to the usage of {{((FileOutputStream) 
> out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
> {{FileChannel#force}} appears to be interruptible, which is not desirable 
> behavior when writing the epoch file. The interrupt may be triggered by the 
> repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. 
> Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does 
> not appear to have the same problem.
> I was able to find another JIRA ticket describing a similar issue here: 
> https://issues.apache.org/jira/browse/DERBY-4963
> There is also interesting discussion in ZOOKEEPER-1835 (where the change was 
> made for 3.5) although these discussions appear to be Windows centric (we 
> noticed the issue on Linux) 
> https://issues.apache.org/jira/browse/ZOOKEEPER-1835 
> {{testWorkerThreads}} appears to be failing every few months on Solaris on 
> Apache Jenkins (for 3.4 and 3.5), but at the time I wrote this Jenkins had 
> cleaned out the logs from the latest failed run so I have no way of 
> determining if the cause is the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

2017-06-08 Thread Abraham Fine (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abraham Fine updated ZOOKEEPER-2803:

Description: 
We have noticed on internal executions of the integration tests rare failures 
of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads.

{code}
java.lang.RuntimeException: Unable to run quorum server 
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
at 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Caused by: java.io.IOException: The current epoch, 0, is older than the last 
zxid, 4294967296
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
{code}

along with this strange stack trace in the logs:
{code}
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
{code}

It appears that this failure is related to the usage of {{((FileOutputStream) 
out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
{{FileChannel#force}} appears to be interruptible, which is not desirable 
behavior when writing the epoch file. The interrupt may be triggered by the 
repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. 
Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not 
appear to have the same problem.

I was able to find another JIRA ticket describing a similar issue here: 
https://issues.apache.org/jira/browse/DERBY-4963

There is also interesting discussion in ZOOKEEPER-1835 (where the change was 
made for 3.5) although these discussions appear to be Windows centric (we 
noticed the issue on Linux) 
https://issues.apache.org/jira/browse/ZOOKEEPER-1835 

{{testWorkerThreads}} appears to be failing every few months on Solaris on 
Apache Jenkins (for 3.4 and 3.5), but at the time I wrote this Jenkins had 
cleaned out the logs from the latest failed run so I have no way of determining 
if the cause is the same.

  was:
We have noticed on internal executions of the integration tests rare failures 
of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

{code}
java.lang.RuntimeException: Unable to run quorum server 
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
at 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Caused by: java.io.IOException: The current epoch, 0, is older than the last 
zxid, 4294967296
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
{code}

along with this strange stack trace in the logs:
{code}
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
{code}

It appears that this failure is related to the usage of {{((FileOutputStream) 
out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
{{FileChannel#force}} appears to be interruptible, which is not desirable 
behavior when writing the epoch file. Branch 3.5 uses {{FileDescriptor#sync}} 
which is not interruptible and does not appear to have the same problem.

I was able to find another JIRA ticket describing 

[GitHub] zookeeper issue #276: ZOOKEEPER-1748: add tcp keepalive option for branch 3....

2017-06-08 Thread bensherman
Github user bensherman commented on the issue:

https://github.com/apache/zookeeper/pull/276
  
Version number fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043532#comment-16043532
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
---

Github user bensherman commented on the issue:

https://github.com/apache/zookeeper/pull/276
  
Version number fixed!


> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

2017-06-08 Thread Abraham Fine (JIRA)
Abraham Fine created ZOOKEEPER-2803:
---

 Summary: Flaky test: 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
 Key: ZOOKEEPER-2803
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.10
Reporter: Abraham Fine
Assignee: Abraham Fine


We have noticed on internal executions of the integration tests rare failures 
of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

{code}
java.lang.RuntimeException: Unable to run quorum server 
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520)
at 
org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Caused by: java.io.IOException: The current epoch, 0, is older than the last 
zxid, 4294967296
at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546)
{code}

along with this strange stack trace in the logs:
{code}
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380)
at 
org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253)
at 
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412)
at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851)
{code}

It appears that this failure is related to the usage of {{((FileOutputStream) 
out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. 
{{FileChannel#force}} appears to be interruptible, which is not desirable 
behavior when writing the epoch file. Branch 3.5 uses {{FileDescriptor#sync}} 
which is not interruptible and does not appear to have the same problem.

I was able to find another JIRA ticket describing a similar issue here: 
https://issues.apache.org/jira/browse/DERBY-4963

There is also interesting discussion in ZOOKEEPER-1835 (where the change was 
made for 3.5) although these discussions appear to be Windows centric (we 
noticed the issue on Linux) 
https://issues.apache.org/jira/browse/ZOOKEEPER-1835 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2684) Fix a crashing bug in the mixed workloads commit processor

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043497#comment-16043497
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2684:
---

Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/167#discussion_r121006265
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java ---
@@ -254,24 +254,23 @@ public void run() {
 // If session queue != null, then it is also not 
empty.
 Request topPending = sessionQueue.poll();
 if (request.cxid != topPending.cxid) {
-LOG.error(
-"Got cxid 0x"
-+ 
Long.toHexString(request.cxid)
-+ " expected 0x" + 
Long.toHexString(
-topPending.cxid)
-+ " for client session id "
-+ Long.toHexString(request.sessionId));
-throw new IOException("Error: unexpected cxid 
for"
-+ "client session");
+// we can get commit requests that is not at 
the queue head when 
+// session moves (see ZOOKEEPER-2684). We will 
just pass the 
+// commit to the next processor and put the 
pending back with
+// a warning, we should not see this often 
under normal load
+LOG.warn("Got request " + request + 
+" but we are expecting request " + 
topPending);
+sessionQueue.addFirst(topPending);
+} else {
--- End diff --

Is it the case that for a given session, once we execute the else block 
once, executing the if block would be incorrect? If so, would it make sense to 
have a flag per session indicating that the else block has not been executed 
for the session? It might not even be a flag per session, but perhaps a set of 
session ids instead that we remove from once we execute the else block.   


> Fix a crashing bug in the mixed workloads commit processor
> --
>
> Key: ZOOKEEPER-2684
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0
> Environment: with pretty heavy load on a real cluster
>Reporter: Ryan Zhang
>Assignee: Ryan Zhang
>Priority: Blocker
> Attachments: ZOOKEEPER-2684.patch
>
>
> We deployed our build with ZOOKEEPER-2024 and it quickly started to crash 
> with the following error
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR 
> [CommitProcessor:2] 
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
>  – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc
> clearly something is not right in the new commit processor per session queue 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #167: ZOOKEEPER-2684 commitProcessor does not crash w...

2017-06-08 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/167#discussion_r121006265
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java ---
@@ -254,24 +254,23 @@ public void run() {
 // If session queue != null, then it is also not 
empty.
 Request topPending = sessionQueue.poll();
 if (request.cxid != topPending.cxid) {
-LOG.error(
-"Got cxid 0x"
-+ 
Long.toHexString(request.cxid)
-+ " expected 0x" + 
Long.toHexString(
-topPending.cxid)
-+ " for client session id "
-+ Long.toHexString(request.sessionId));
-throw new IOException("Error: unexpected cxid 
for"
-+ "client session");
+// we can get commit requests that is not at 
the queue head when 
+// session moves (see ZOOKEEPER-2684). We will 
just pass the 
+// commit to the next processor and put the 
pending back with
+// a warning, we should not see this often 
under normal load
+LOG.warn("Got request " + request + 
+" but we are expecting request " + 
topPending);
+sessionQueue.addFirst(topPending);
+} else {
--- End diff --

Is it the case that for a given session, once we execute the else block 
once, executing the if block would be incorrect? If so, would it make sense to 
have a flag per session indicating that the else block has not been executed 
for the session? It might not even be a flag per session, but perhaps a set of 
session ids instead that we remove from once we execute the else block.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043479#comment-16043479
 ] 

Hadoop QA commented on ZOOKEEPER-1748:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//console

This message is automatically generated.

> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Success: ZOOKEEPER- PreCommit Build #780

2017-06-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 67.26 MB...]
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 53536046103656a3ef44e3fb2cd9fce903bf3cd8 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 18 minutes 47 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-1748
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive

2017-06-08 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/274
  
The Jekins failures are "expected" - the findbugs / doc warnings are known 
and the failed test is a known deprecated test. I'll merge this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #276: ZOOKEEPER-1748: add tcp keepalive option for branch 3....

2017-06-08 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/276
  
It will go in semi automatically as part of commit process so no need to 
send a separate PR. And the failed jenkins test is a known flaky / buggy one. 
Patch lgtm pending update version number in doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043431#comment-16043431
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/276
  
It will go in semi automatically as part of commit process so no need to 
send a separate PR. And the failed jenkins test is a known flaky / buggy one. 
Patch lgtm pending update version number in doc.


> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043427#comment-16043427
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/276#discussion_r120997880
  
--- Diff: src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml ---
@@ -1179,6 +1179,29 @@ server.3=zoo3:2888:3888
 
   
 
+  
+tcpKeepAlive
+
+
+  (Java system property: zookeeper.tcpKeepAlive)
+
+  New in 3.4.11:
--- End diff --

Please replace 3.4.11 with 3.5.4. 3.4.11 is only applicable for branch-3.4 
and 3.5.4 is the version number we use for branch-3.5.


> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #276: ZOOKEEPER-1748: add tcp keepalive option for br...

2017-06-08 Thread hanm
Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/276#discussion_r120997880
  
--- Diff: src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml ---
@@ -1179,6 +1179,29 @@ server.3=zoo3:2888:3888
 
   
 
+  
+tcpKeepAlive
+
+
+  (Java system property: zookeeper.tcpKeepAlive)
+
+  New in 3.4.11:
--- End diff --

Please replace 3.4.11 with 3.5.4. 3.4.11 is only applicable for branch-3.4 
and 3.5.4 is the version number we use for branch-3.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043376#comment-16043376
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
---

Github user bensherman commented on the issue:

https://github.com/apache/zookeeper/pull/276
  
This applies cleanly to master too, should that be a separate PR or wit it 
go auto-magically?


> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper issue #276: ZOOKEEPER-1748: add tcp keepalive option for branch 3....

2017-06-08 Thread bensherman
Github user bensherman commented on the issue:

https://github.com/apache/zookeeper/pull/276
  
This applies cleanly to master too, should that be a separate PR or wit it 
go auto-magically?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive

2017-06-08 Thread bensherman
Github user bensherman commented on the issue:

https://github.com/apache/zookeeper/pull/274
  
@hanm docs are fixed, 3.5 patch is in above referenced PR.  Thanks again 
for your help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Failed: ZOOKEEPER- PreCommit Build #779

2017-06-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 69.65 MB...]
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] 2a9c6f689150e19adfcbbe0e46ddbc050b82b353 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1642:
 exec returned: 1

Total time: 13 minutes 44 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Could not determine description.
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig

Error Message:
waiting for server 2 being up

Stack Trace:
junit.framework.AssertionFailedError: waiting for server 2 being up
at 
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043323#comment-16043323
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
---

GitHub user bensherman opened a pull request:

https://github.com/apache/zookeeper/pull/276

add tcp keepalive option for branch 3.5

Adding TCP keepalive for branch 3.5, as described in 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and 
https://github.com/apache/zookeeper/pull/274

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bensherman/zookeeper ZOOKEEPER-1748-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #276






> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #276: add tcp keepalive option for branch 3.5

2017-06-08 Thread bensherman
GitHub user bensherman opened a pull request:

https://github.com/apache/zookeeper/pull/276

add tcp keepalive option for branch 3.5

Adding TCP keepalive for branch 3.5, as described in 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and 
https://github.com/apache/zookeeper/pull/274

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bensherman/zookeeper ZOOKEEPER-1748-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #276






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper_branch35_jdk7 - Build # 994 - Failure

2017-06-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/994/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 65.02 MB...]
[junit] 2017-06-08 19:05:24,279 [myid:127.0.0.1:13915] - INFO  
[main-SendThread(127.0.0.1:13915):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:13915. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 19:05:24,279 [myid:127.0.0.1:13915] - WARN  
[main-SendThread(127.0.0.1:13915):ClientCnxn$SendThread@1235] - Session 
0x104b3fed854 for server 127.0.0.1/127.0.0.1:13915, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-06-08 19:05:24,767 [myid:127.0.0.1:14038] - INFO  
[main-SendThread(127.0.0.1:14038):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:14038. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 19:05:24,768 [myid:127.0.0.1:14038] - WARN  
[main-SendThread(127.0.0.1:14038):ClientCnxn$SendThread@1235] - Session 
0x104b402a442 for server 127.0.0.1/127.0.0.1:14038, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-06-08 19:05:24,902 [myid:127.0.0.1:14068] - INFO  
[main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:14068. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 19:05:24,902 [myid:127.0.0.1:14068] - WARN  
[main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1235] - Session 0x0 for 
server 127.0.0.1/127.0.0.1:14068, unexpected error, closing socket connection 
and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-06-08 19:05:25,205 [myid:127.0.0.1:14044] - INFO  
[main-SendThread(127.0.0.1:14044):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:14044. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 19:05:25,206 [myid:127.0.0.1:14044] - WARN  
[main-SendThread(127.0.0.1:14044):ClientCnxn$SendThread@1235] - Session 
0x304b402a444 for server 127.0.0.1/127.0.0.1:14044, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-06-08 19:05:25,329 [myid:127.0.0.1:14068] - INFO  
[main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:14068. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 19:05:25,330 [myid:127.0.0.1:14068] - WARN  
[main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1235] - Session 
0x504b402cb96 for server 127.0.0.1/127.0.0.1:14068, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] Running 

Failed: ZOOKEEPER- PreCommit Build #778

2017-06-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 35.10 MB...]
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 48 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] 23b941014abd9b6e3e7dbab4f0d868489a29591a logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1703:
 exec returned: 3

Total time: 32 minutes 3 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Could not determine description.
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.zookeeper.test.LETest.testLE

Error Message:
Threads didn't join

Stack Trace:
junit.framework.AssertionFailedError: Threads didn't join
at org.apache.zookeeper.test.LETest.testLE(LETest.java:120)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)

[jira] [Commented] (ZOOKEEPER-2755) Allow to subclass ClientCnxnSocketNetty and NettyServerCnxn in order to use Netty Local transport

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043100#comment-16043100
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2755:
---

Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/227#discussion_r120951358
  
--- Diff: src/java/test/org/apache/zookeeper/test/NettyLocalSuiteTest.java 
---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.zookeeper.test;
+
+import org.junit.runners.Suite;
+
+/**
+ * Run tests with: Netty Client against Netty server
+ */
+@Suite.SuiteClasses({
--- End diff --

Ping


> Allow to subclass ClientCnxnSocketNetty and NettyServerCnxn in order to use 
> Netty Local transport
> -
>
> Key: ZOOKEEPER-2755
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2755
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: java client, server
>Affects Versions: 3.5.2
>Reporter: Enrico Olivelli
>
> ClientCnxnSocketNetty and NettyServerCnxn use explicitly InetSocketAddress 
> class to work with network addresses.
> We can do a little refactoring to use only SocketAddress and make it possible 
> to create subclasses of ClientCnxnSocketNetty and NettyServerCnxn which 
> leverage built-in Netty 'local' channels. 
> Such Netty local channels do not create real sockets and so allow a simple 
> ZooKeeper server + ZooKeeper client to be run on the same JVM without binding 
> to real TCP endpoints.
> Usecases:
> Ability to run concurrently on the same machine tests of projects which use 
> ZooKeeper (usually in unit tests the server and the client run inside the 
> same JVM) without dealing with random ports and in general using less network 
> resources
> Run simplified (standalone, all processes in the same JVM) versions of 
> applications which need a working ZooKeeper ensemble to run.
> Note:
> Embedding ZooKeeper server + client on the same JVM has many risks and in 
> general I think we should encourage users to do so, so I in this patch I will 
> not provide official implementations of ClientCnxnSocketNetty and 
> NettyServerCnxn. There will be implementations only inside the test packages, 
> in order to test that most of the features are working with custom socket 
> factories and in particular with the 'LocalAddress' specific subclass of 
> SocketAddress.
> Note:
> the 'Local' sockets feature will be available on Netty 4 too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #227: ZOOKEEPER-2755 Allow to subclass ClientCnxnSock...

2017-06-08 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/227#discussion_r120951358
  
--- Diff: src/java/test/org/apache/zookeeper/test/NettyLocalSuiteTest.java 
---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.zookeeper.test;
+
+import org.junit.runners.Suite;
+
+/**
+ * Run tests with: Netty Client against Netty server
+ */
+@Suite.SuiteClasses({
--- End diff --

Ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043079#comment-16043079
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2798:
---

Github user afine closed the pull request at:

https://github.com/apache/zookeeper/pull/270


> Fix flaky test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
> ---
>
> Key: ZOOKEEPER-2798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Abraham Fine
>Assignee: Abraham Fine
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> This test appears to be failing intermitently on both 3.4 and 3.5. Here are a 
> couple of example failing jobs.
> 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/
> 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #270: ZOOKEEPER-2798 Fix flaky test: org.apache.zooke...

2017-06-08 Thread afine
Github user afine closed the pull request at:

https://github.com/apache/zookeeper/pull/270


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive

2017-06-08 Thread bensherman
Github user bensherman commented on the issue:

https://github.com/apache/zookeeper/pull/274
  
I'll get the docs thing fixed right now, and I'll get the 3.5 PR done soon, 
it may take me some time as I don't have an environment setup to test 3.5 right 
now - bear with me!  Should this also be applied to master or is there some 
magic there that keeps 3.5 and master in line?

I am also concerned that jenkins isn't passing its tests, should there be 
anything in my change that's causing it, please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper_branch35_openjdk7 - Build # 556 - Failure

2017-06-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/556/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.35 MB...]
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-06-08 16:59:36,783 [myid:127.0.0.1:19547] - INFO  
[main-SendThread(127.0.0.1:19547):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19547. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 16:59:36,783 [myid:127.0.0.1:19547] - INFO  
[main-SendThread(127.0.0.1:19547):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:51824, server: 
127.0.0.1/127.0.0.1:19547
[junit] 2017-06-08 16:59:36,784 [myid:] - INFO  [New I/O worker 
#9949:ZooKeeperServer@1025] - Client attempting to renew session 
0x104b3994f32 at /127.0.0.1:51824
[junit] 2017-06-08 16:59:36,784 [myid:] - INFO  [New I/O worker 
#9949:ZooKeeperServer@727] - Established session 0x104b3994f32 with 
negotiated timeout 6000 for client /127.0.0.1:51824
[junit] 2017-06-08 16:59:36,784 [myid:127.0.0.1:19547] - INFO  
[main-SendThread(127.0.0.1:19547):ClientCnxn$SendThread@1381] - Session 
establishment complete on server 127.0.0.1/127.0.0.1:19547, sessionid = 
0x104b3994f32, negotiated timeout = 6000
[junit] 2017-06-08 16:59:36,787 [myid:] - INFO  
[SyncThread:0:FileTxnLog@206] - Creating new log file: log.7
[junit] 2017-06-08 16:59:37,567 [myid:127.0.0.1:19427] - INFO  
[main-SendThread(127.0.0.1:19427):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19427. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-06-08 16:59:37,567 [myid:127.0.0.1:19427] - WARN  
[main-SendThread(127.0.0.1:19427):ClientCnxn$SendThread@1235] - Session 
0x204b395f312 for server 127.0.0.1/127.0.0.1:19427, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-06-08 16:59:37,800 [myid:] - INFO  [ProcessThread(sid:0 
cport:19547)::PrepRequestProcessor@613] - Processed session termination for 
sessionid: 0x104b3994f32
[junit] 2017-06-08 16:59:37,801 [myid:] - INFO  
[SyncThread:0:MBeanRegistry@128] - Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port19547,name1=Connections,name2=127.0.0.1,name3=0x104b3994f32]
[junit] 2017-06-08 16:59:37,801 [myid:] - INFO  [main:ZooKeeper@1331] - 
Session: 0x104b3994f32 closed
[junit] 2017-06-08 16:59:37,802 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x104b3994f32
[junit] 2017-06-08 16:59:37,803 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 229900
[junit] 2017-06-08 16:59:37,804 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 2427
[junit] 2017-06-08 16:59:37,804 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-06-08 16:59:37,804 [myid:] - INFO  [main:ClientBase@586] - 
tearDown starting
[junit] 2017-06-08 16:59:37,804 [myid:] - INFO  [main:ClientBase@556] - 
STOPPING server
[junit] 2017-06-08 16:59:37,804 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:19547
[junit] 2017-06-08 16:59:37,811 [myid:] - INFO  [main:ZooKeeperServer@541] 
- shutting down
[junit] 2017-06-08 16:59:37,811 [myid:] - ERROR [main:ZooKeeperServer@505] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-06-08 16:59:37,811 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-06-08 16:59:37,812 [myid:] - INFO  
[main:PrepRequestProcessor@1010] - Shutting down
[junit] 2017-06-08 16:59:37,812 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-06-08 16:59:37,812 [myid:] - INFO  [ProcessThread(sid:0 
cport:19547)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop!
[junit] 2017-06-08 16:59:37,813 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-06-08 16:59:37,813 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-06-08 16:59:37,813 [myid:] - INFO  

[jira] [Commented] (ZOOKEEPER-2801) address spelling errors/typos

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043023#comment-16043023
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2801:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/275
  
@tmancill Also I am curious what tools you use to catch the spelling 
errors. I think have such tool be part of commit workflow, or daily build would 
be helpful.


> address spelling errors/typos
> -
>
> Key: ZOOKEEPER-2801
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2801
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.5.3
>Reporter: tony mancill
>Assignee: tony mancill
>Priority: Trivial
>
> This is a follow-on for ZOOKEEPER-2617 (for which I only supplied a patch for 
> branch-3.4), that addresses minor typos in master.  With a slight 
> modification, the patch also applies against the branch-3.5 branch.
> If folks are curious, the typos are spotted with the "spellintian" shipped 
> with Debian's lintian package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper issue #275: ZOOKEEPER-2801: address spelling errors/typos

2017-06-08 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/275
  
@tmancill Also I am curious what tools you use to catch the spelling 
errors. I think have such tool be part of commit workflow, or daily build would 
be helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2801) address spelling errors/typos

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043012#comment-16043012
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2801:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/275
  
Nice pull request.

Please don't directly modify the document artifacts (html files, etc). The 
way the document is updated is by modifying the source of the docs, located at 
src/docs/src/documentation/content/xdocs. After modifying the source, please 
verify the generated document is correct locally by using apache forrest 
https://forrest.apache.org/. After verification please submit the document 
source change only - the compiled document artifacts (html files, etc) don't 
need to be submitted, because we will regenerate document in every release. 
Please check the commit history of 
https://github.com/apache/zookeeper/tree/master/src/docs/src/documentation/content/xdocs
 to get a concrete idea of how to make doc changes, it should be pretty 
straightforward.

It would be also good to fix the similar typos in branch-3.5 and 
branch-3.4, which this PR does not directly apply with many merge conflicts. 


> address spelling errors/typos
> -
>
> Key: ZOOKEEPER-2801
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2801
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.5.3
>Reporter: tony mancill
>Assignee: tony mancill
>Priority: Trivial
>
> This is a follow-on for ZOOKEEPER-2617 (for which I only supplied a patch for 
> branch-3.4), that addresses minor typos in master.  With a slight 
> modification, the patch also applies against the branch-3.5 branch.
> If folks are curious, the typos are spotted with the "spellintian" shipped 
> with Debian's lintian package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper issue #275: ZOOKEEPER-2801: address spelling errors/typos

2017-06-08 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/275
  
Nice pull request.

Please don't directly modify the document artifacts (html files, etc). The 
way the document is updated is by modifying the source of the docs, located at 
src/docs/src/documentation/content/xdocs. After modifying the source, please 
verify the generated document is correct locally by using apache forrest 
https://forrest.apache.org/. After verification please submit the document 
source change only - the compiled document artifacts (html files, etc) don't 
need to be submitted, because we will regenerate document in every release. 
Please check the commit history of 
https://github.com/apache/zookeeper/tree/master/src/docs/src/documentation/content/xdocs
 to get a concrete idea of how to make doc changes, it should be pretty 
straightforward.

It would be also good to fix the similar typos in branch-3.5 and 
branch-3.4, which this PR does not directly apply with many merge conflicts. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042987#comment-16042987
 ] 

Hudson commented on ZOOKEEPER-2798:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3419 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/3419/])
ZOOKEEPER-2798: Fix flaky test: (hanm: rev 
1038966e8289c09a6f3b863dd2713b9f1c83b4cf)
* (edit) src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java
* (edit) src/java/test/org/apache/zookeeper/test/ClientBase.java


> Fix flaky test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
> ---
>
> Key: ZOOKEEPER-2798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Abraham Fine
>Assignee: Abraham Fine
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> This test appears to be failing intermitently on both 3.4 and 3.5. Here are a 
> couple of example failing jobs.
> 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/
> 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error

2017-06-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042988#comment-16042988
 ] 

Hudson commented on ZOOKEEPER-2775:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3419 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/3419/])
ZOOKEEPER-2775: ZK Client not able to connect with Xid out of order (hanm: rev 
fa1dc109d4c1bb7913fee43170ed6131e3dc1b1f)
* (edit) src/java/main/org/apache/zookeeper/ClientCnxn.java
* (delete) src/java/test/org/apache/zookeeper/test/SaslAuthTest.java
* (add) src/java/test/org/apache/zookeeper/SaslAuthTest.java


> ZK Client not able to connect with Xid out of order error 
> --
>
> Key: ZOOKEEPER-2775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid 
> out of order and Nothing in the queue error continously. And ZK client it 
> finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 
> for a packet with details: clientPath:null serverPath:null finished:false 
> header:: 53,101  replyHeader:: 0,0,-4  request:: 
> 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}
>   response:: null
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
>   
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable 
> problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | 
> SASL configuration failed: javax.security.auth.login.LoginException: Network 
> is unreachable (sendto failed) Will continue connection to Zookeeper server 
> without SASL authentication, if Zookeeper server allows it. | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
>   Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully 
> able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during 
> this time no user request will be sent. ( As the socket channel will be 
> closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client 
> and client assumes that tunnelAuthInProgress() is finished ( method checks 
> for saslLoginFailed boolean Since the boolean is true it assumes its done.) 
> and tries to process the packet as a other packet and will result in above 
> errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error

2017-06-08 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-2775:
---
Fix Version/s: 3.6.0
   3.5.4

> ZK Client not able to connect with Xid out of order error 
> --
>
> Key: ZOOKEEPER-2775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid 
> out of order and Nothing in the queue error continously. And ZK client it 
> finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 
> for a packet with details: clientPath:null serverPath:null finished:false 
> header:: 53,101  replyHeader:: 0,0,-4  request:: 
> 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}
>   response:: null
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
>   
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable 
> problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | 
> SASL configuration failed: javax.security.auth.login.LoginException: Network 
> is unreachable (sendto failed) Will continue connection to Zookeeper server 
> without SASL authentication, if Zookeeper server allows it. | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
>   Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully 
> able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during 
> this time no user request will be sent. ( As the socket channel will be 
> closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client 
> and client assumes that tunnelAuthInProgress() is finished ( method checks 
> for saslLoginFailed boolean Since the boolean is true it assumes its done.) 
> and tries to process the packet as a other packet and will result in above 
> errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error

2017-06-08 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042923#comment-16042923
 ] 

Michael Han commented on ZOOKEEPER-2775:


Committed to master
https://github.com/apache/zookeeper/commit/fa1dc109d4c1bb7913fee43170ed6131e3dc1b1f
branch-3.5
https://github.com/apache/zookeeper/commit/0026e27e81f4889816bec162964e2a721cc53db9

JIRA will be resolved pending the pull request for branch-3.4.

> ZK Client not able to connect with Xid out of order error 
> --
>
> Key: ZOOKEEPER-2775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid 
> out of order and Nothing in the queue error continously. And ZK client it 
> finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 
> for a packet with details: clientPath:null serverPath:null finished:false 
> header:: 53,101  replyHeader:: 0,0,-4  request:: 
> 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}
>   response:: null
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
>   
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable 
> problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | 
> SASL configuration failed: javax.security.auth.login.LoginException: Network 
> is unreachable (sendto failed) Will continue connection to Zookeeper server 
> without SASL authentication, if Zookeeper server allows it. | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
>   Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully 
> able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during 
> this time no user request will be sent. ( As the socket channel will be 
> closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client 
> and client assumes that tunnelAuthInProgress() is finished ( method checks 
> for saslLoginFailed boolean Since the boolean is true it assumes its done.) 
> and tries to process the packet as a other packet and will result in above 
> errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042918#comment-16042918
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2775:
---

Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/254


> ZK Client not able to connect with Xid out of order error 
> --
>
> Key: ZOOKEEPER-2775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Mohammad Arshad
>Priority: Critical
> Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid 
> out of order and Nothing in the queue error continously. And ZK client it 
> finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 
> for a packet with details: clientPath:null serverPath:null finished:false 
> header:: 53,101  replyHeader:: 0,0,-4  request:: 
> 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}
>   response:: null
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
>   
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable 
> problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | 
> SASL configuration failed: javax.security.auth.login.LoginException: Network 
> is unreachable (sendto failed) Will continue connection to Zookeeper server 
> without SASL authentication, if Zookeeper server allows it. | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
>   Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully 
> able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during 
> this time no user request will be sent. ( As the socket channel will be 
> closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client 
> and client assumes that tunnelAuthInProgress() is finished ( method checks 
> for saslLoginFailed boolean Since the boolean is true it assumes its done.) 
> and tries to process the packet as a other packet and will result in above 
> errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #254: ZOOKEEPER-2775: ZK Client not able to connect w...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/254


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

2017-06-08 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042910#comment-16042910
 ] 

Michael Han commented on ZOOKEEPER-2798:


Committed to master 
https://github.com/apache/zookeeper/commit/1038966e8289c09a6f3b863dd2713b9f1c83b4cf,
branch-3.5
https://github.com/apache/zookeeper/commit/643e551eacc1fb76c40e04b5d857aaac77089343
branch-3.4
https://github.com/apache/zookeeper/commit/06889c82fdf2093aba800b31f89628fbfd0c08a5

> Fix flaky test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
> ---
>
> Key: ZOOKEEPER-2798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Abraham Fine
>Assignee: Abraham Fine
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> This test appears to be failing intermitently on both 3.4 and 3.5. Here are a 
> couple of example failing jobs.
> 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/
> 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

2017-06-08 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-2798.

   Resolution: Fixed
Fix Version/s: 3.4.11
   3.6.0
   3.5.4

> Fix flaky test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
> ---
>
> Key: ZOOKEEPER-2798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Abraham Fine
>Assignee: Abraham Fine
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> This test appears to be failing intermitently on both 3.4 and 3.5. Here are a 
> couple of example failing jobs.
> 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/
> 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042906#comment-16042906
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2798:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/270
  
Merged, please close this @afine 


> Fix flaky test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
> ---
>
> Key: ZOOKEEPER-2798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>
> This test appears to be failing intermitently on both 3.4 and 3.5. Here are a 
> couple of example failing jobs.
> 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/
> 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper issue #270: ZOOKEEPER-2798 Fix flaky test: org.apache.zookeeper.te...

2017-06-08 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/270
  
Merged, please close this @afine 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042902#comment-16042902
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2798:
---

Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/271


> Fix flaky test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
> ---
>
> Key: ZOOKEEPER-2798
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>
> This test appears to be failing intermitently on both 3.4 and 3.5. Here are a 
> couple of example failing jobs.
> 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/
> 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper pull request #271: ZOOKEEPER-2798 Fix flaky test: org.apache.zooke...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive

2017-06-08 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/274
  
@bensherman : Merged, please close this pull request. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Build failure for recent commit

2017-06-08 Thread Flavio Junqueira
One of the project committers needs to cut a release candidate and put it up 
for a vote. If it is a bug fix release, then it should be relatively 
straightforward.

-Flavio

> On 06 Jun 2017, at 20:39, Ben Sherman  wrote:
> 
> Looking at https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and
> https://github.com/apache/zookeeper/pull/83
> 
> It looks like jenkins is trying to post that the build worked and can't,
> resulting in what looks like a failure.  Can I get a hand on fixing this?
> 
> Also, what is the process for proposing a new release getting cut?  I'd
> like to see this change go into 3.4.11 asap.



[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042892#comment-16042892
 ] 

Michael Han commented on ZOOKEEPER-1748:


Merged to branch-3.4: 
https://github.com/apache/zookeeper/commit/51cdeb407cfb7887e647ba7d34718232e6108409

[~rakeshr] Can you please add [~bensherman] to contributor list and assign this 
issue to him.

[~bensherman] If you have time to send a pull request targeting branch-3.5, 
that would be great. The current patch does not apply to branch-3.5.

> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2017-06-08 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-1748.

   Resolution: Fixed
Fix Version/s: (was: 3.5.4)
   (was: 3.6.0)
   3.4.11

Issue resolved by pull request 274
[https://github.com/apache/zookeeper/pull/274]

> TCP keepalive for leader election connections
> -
>
> Key: ZOOKEEPER-1748
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Affects Versions: 3.4.5, 3.5.0
> Environment: Linux, Java 1.7
>Reporter: Antal Sasvári
>Assignee: Daniel Peon
>Priority: Minor
> Fix For: 3.4.11
>
> Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election 
> port connections are open for very long time without any packets being sent 
> on them.
> Some network elements silently drop the established TCP connection after a 
> timeout if there are no packets being sent on it. In this case the ZK servers 
> will not notice the connection loss. This causes additional delay later when 
> the next leader election is started, as the TCP connections are not alive any 
> more.
> We would like to be able to enable TCP keepalive on the leader election 
> sockets in order to prevent the connection timeout in some network elements 
> due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive 
> in the ZooKeeper configuration file. It would be only applicable in case of 
> algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
> the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
> sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to 
> submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar 
> purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


ZooKeeper_branch34_openjdk7 - Build # 1527 - Still Failing

2017-06-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1527/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on qnode3 (ubuntu) in workspace 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from git://git.apache.org/zookeeper.git
 > git --version # timeout=10
 > git fetch --tags --progress git://git.apache.org/zookeeper.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/branch-3.4^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/branch-3.4^{commit} # timeout=10
Checking out Revision 3289ebbaa48d85ceb9dc5154f5547f37cf7d300c 
(refs/remotes/origin/branch-3.4)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3289ebbaa48d85ceb9dc5154f5547f37cf7d300c
 > git rev-list 3289ebbaa48d85ceb9dc5154f5547f37cf7d300c # timeout=10
No emails were triggered.
[ZooKeeper_branch34_openjdk7] $ 
/home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -Dtest.output=yes 
-Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.7 clean 
test-core-java
Error: JAVA_HOME is not defined correctly.
  We cannot execute /usr/lib/jvm/java-7-openjdk-amd64//bin/java
Build step 'Invoke Ant' marked build as failure
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

ZooKeeper-trunk-jdk8 - Build # 1078 - Failure

2017-06-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1078/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.80 MB...]
[junit] 2017-06-08 11:58:05,744 [myid:] - WARN  [New I/O worker 
#8383:NettyServerCnxnFactory$CnxnChannelHandler@142] - Exception caught [id: 
0x48f46c30, /127.0.0.1:43882 :> /127.0.0.1:11468] EXCEPTION: 
java.nio.channels.ClosedChannelException
[junit] java.nio.channels.ClosedChannelException
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
[junit] at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81)
[junit] at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
[junit] at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
[junit] at 
org.jboss.netty.channel.SimpleChannelHandler.closeRequested(SimpleChannelHandler.java:334)
[junit] at 
org.jboss.netty.channel.SimpleChannelHandler.handleDownstream(SimpleChannelHandler.java:260)
[junit] at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
[junit] at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
[junit] at org.jboss.netty.channel.Channels.close(Channels.java:812)
[junit] at 
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:206)
[junit] at 
org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:118)
[junit] at 
org.apache.zookeeper.server.NettyServerCnxn.sendBuffer(NettyServerCnxn.java:221)
[junit] at 
org.apache.zookeeper.server.NettyServerCnxn.sendCloseSession(NettyServerCnxn.java:460)
[junit] at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:461)
[junit] at 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:182)
[junit] at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
[junit] 2017-06-08 11:58:05,744 [myid:] - INFO  [New I/O worker 
#8334:ClientCnxnSocketNetty$ZKClientHandler@384] - channel is disconnected: 
[id: 0x113b8508, /127.0.0.1:43882 :> 127.0.0.1/127.0.0.1:11468]
[junit] 2017-06-08 11:58:05,748 [myid:] - INFO  [New I/O worker 
#8334:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-06-08 11:58:05,748 [myid:] - INFO  
[main:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-06-08 11:58:05,748 [myid:] - INFO  [main:ZooKeeper@1329] - 
Session: 0x10400ff22e9 closed
[junit] 2017-06-08 11:58:05,748 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x10400ff22e9
[junit] 2017-06-08 11:58:05,749 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 129832
[junit] 2017-06-08 11:58:05,750 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 948
[junit] 2017-06-08 11:58:05,750 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-06-08 11:58:05,750 [myid:] - INFO  [main:ClientBase@582] - 
tearDown starting
[junit] 2017-06-08 11:58:05,750 [myid:] - INFO  [main:ClientBase@552] - 
STOPPING server
[junit] 2017-06-08 11:58:05,751 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468
[junit] 2017-06-08 11:58:05,752 [myid:] - INFO  [main:ZooKeeperServer@542] 
- shutting down
[junit] 2017-06-08 11:58:05,752 [myid:] - ERROR [main:ZooKeeperServer@506] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-06-08 11:58:05,753 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-06-08 11:58:05,753 [myid:] - INFO  
[main:PrepRequestProcessor@1014] - Shutting down
[junit] 2017-06-08 11:58:05,753 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-06-08 11:58:05,753 [myid:] - INFO  [ProcessThread(sid:0 
cport:11468)::PrepRequestProcessor@157] - PrepRequestProcessor exited loop!
[junit] 2017-06-08 11:58:05,753 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-06-08 11:58:05,754 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-06-08 11:58:05,754 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 

[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error

2017-06-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042542#comment-16042542
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2775:
---

Github user arshadmohammad commented on the issue:

https://github.com/apache/zookeeper/pull/254
  
This PR can be merged to master and branch-3.5 only. I  will raise another 
pull request for branch-3.4


> ZK Client not able to connect with Xid out of order error 
> --
>
> Key: ZOOKEEPER-2775
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Mohammad Arshad
>Priority: Critical
> Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid 
> out of order and Nothing in the queue error continously. And ZK client it 
> finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 
> for a packet with details: clientPath:null serverPath:null finished:false 
> header:: 53,101  replyHeader:: 0,0,-4  request:: 
> 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}
>   response:: null
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
>   
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable 
> problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | 
> SASL configuration failed: javax.security.auth.login.LoginException: Network 
> is unreachable (sendto failed) Will continue connection to Zookeeper server 
> without SASL authentication, if Zookeeper server allows it. | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
>   Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully 
> able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during 
> this time no user request will be sent. ( As the socket channel will be 
> closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client 
> and client assumes that tunnelAuthInProgress() is finished ( method checks 
> for saslLoginFailed boolean Since the boolean is true it assumes its done.) 
> and tries to process the packet as a other packet and will result in above 
> errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] zookeeper issue #254: ZOOKEEPER-2775: ZK Client not able to connect with Xid...

2017-06-08 Thread arshadmohammad
Github user arshadmohammad commented on the issue:

https://github.com/apache/zookeeper/pull/254
  
This PR can be merged to master and branch-3.5 only. I  will raise another 
pull request for branch-3.4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---