[jira] [Commented] (ZOOKEEPER-2800) zookeeper ephemeral node not deleted after server restart and consistency is not hold
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043968#comment-16043968 ] JiangJiafu commented on ZOOKEEPER-2800: --- I think this must be a bug, because the PR happens again in my environment. > zookeeper ephemeral node not deleted after server restart and consistency is > not hold > - > > Key: ZOOKEEPER-2800 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2800 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.11 > Environment: Centos6.5 java8 >Reporter: JiangJiafu >Priority: Critical > Attachments: zoo.cfg, zookeeper2.out, zookeeper3.out, zookeeper.out > > > I deploy a cluster of ZooKeeper with three nodes: > ofs_zk1:30.0.0.72 > ofs_zk2:30.0.0.73 > ofs_zk3:30.0.0.99 > On 2017-06-02, use the c zk client to create some ephemeral sequential nodes,: > /adm_election/rolemgr/rolemgr08, > /adm_election/rolemgr/rolemgr11, > /adm_election/rolemgr/rolemgr12, > with sesstion timeout 2 ms. > Then I restart ofs_zk1 and ofs_zk2. > On 2017-06-05, I found that, these ephemeral nodes still exist on ofs_zk1. > I can check the nodes by zkCli.sh get command on ofs_zk1. > But these nodes doesn't not exist on ofs_zk2 and ofs_zk3. > Is it odd? > I have upload the whole deploy directory of three nodes to: > https://pan.baidu.com/s/1miohiCo , > The log is printed in log/zookeeper.out > log of ofs_zk3 is too large, so I only show the head 1000 lines. > Since I find this PR a little late, some snapshot and log may be deleted. > I hope anyone can help find the reason. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reassigned ZOOKEEPER-1748: --- Assignee: Ben Sherman (was: Daniel Peon) > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Ben Sherman >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
ZooKeeper_branch34_jdk8 - Build # 1020 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1020/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 18.36 MB...] [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033) [junit] 2017-06-08 22:53:20,034 [myid:] - INFO [main:ZKTestCase$1@64] - FINISHED testSessionMoved [junit] 2017-06-08 22:53:20,035 [myid:] - INFO [main:PortAssignment@32] - assigning port 11341 [junit] 2017-06-08 22:53:20,035 [myid:] - INFO [main:PortAssignment@32] - assigning port 11342 [junit] 2017-06-08 22:53:20,035 [myid:] - INFO [main:ZKTestCase$1@59] - STARTING testDeleteWithChildren [junit] 2017-06-08 22:53:20,035 [myid:] - INFO [main:QuorumBase@69] - QuorumBase.setup null [junit] 2017-06-08 22:53:20,040 [myid:] - INFO [main:PortAssignment@32] - assigning port 11343 [junit] 2017-06-08 22:53:20,040 [myid:] - INFO [main:PortAssignment@32] - assigning port 11344 [junit] 2017-06-08 22:53:20,040 [myid:] - INFO [main:PortAssignment@32] - assigning port 11345 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11346 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11347 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11348 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11349 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11350 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11351 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:PortAssignment@32] - assigning port 11352 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:QuorumBase@93] - Ports are: 127.0.0.1:11343,127.0.0.1:11344,127.0.0.1:11345,127.0.0.1:11346,127.0.0.1:11347 [junit] 2017-06-08 22:53:20,041 [myid:] - INFO [main:QuorumBase@277] - TearDown started [junit] 2017-06-08 22:53:20,042 [myid:] - INFO [main:QuorumBase@281] - fdcount after test is: 102 [junit] 2017-06-08 22:53:20,042 [myid:] - INFO [main:ZKTestCase$1@74] - FAILED testDeleteWithChildren [junit] org.junit.internal.runners.model.MultipleFailureException [junit] at org.junit.internal.runners.model.MultipleFailureException.assertEmpty(MultipleFailureException.java:23) [junit] at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:42) [junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) [junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) [junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236) [junit] at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:535) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1182) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1033) [junit] 2017-06-08 22:53:20,042 [myid:] - INFO [main:ZKTestCase$1@64] - FINISHED testDeleteWithChildren [junit] Tests run: 12, Failures: 8, Errors: 9, Skipped: 1, Time elapsed: 11.358 sec [junit] 2017-06-08 22:53:20,122 [myid:] - INFO [SessionTracker:SessionTrackerImpl@163] - SessionTrackerImpl exited loop! [junit] 2017-06-08 22:53:20,895 [myid:] - INFO [/127.0.0.1:12244:QuorumCnxManager$Listener@773] - Leaving listener [junit] 2017-06-08 22:53:20,940 [myid:] - INFO [/127.0.0.1:12252:QuorumCnxManager$Listener@773] - Leaving listener
[jira] [Commented] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043588#comment-16043588 ] Hadoop QA commented on ZOOKEEPER-2803: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 48 new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//console This message is automatically generated. > Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads > -- > > Key: ZOOKEEPER-2803 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10 >Reporter: Abraham Fine >Assignee: Abraham Fine > > We have noticed on internal executions of the integration tests rare failures > of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads. > {code} > java.lang.RuntimeException: Unable to run quorum server > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) > at > org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Caused by: java.io.IOException: The current epoch, 0, is older than the last > zxid, 4294967296 > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) > {code} > along with this strange stack trace in the logs: > {code} > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) > at > org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) > at > org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) > at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) > {code} > It appears that this failure is related to the usage of {{((FileOutputStream) > out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. > {{FileChannel#force}} appears to be interruptible, which is not desirable > behavior when writing the epoch file. The interrupt may be triggered by the > repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. > Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does > not appear to have the same problem. > I was able to find another JIRA ticket describing a similar issue here: > https://issues.apache.org/jira/browse/DERBY-4963 > There is also interesting discussion in ZOOKEEPER-1835 (where the change was > made for 3.5) although these discussions appear to be Windows centric (we > noticed the issue on Linux) > https://issues.apache.org/jira/browse/ZOOKEEPER-1835 > The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build > #3241" but jenkins cleared out the logs (I only still have the test report > from the mailing list). > In addition, {{testWorkerThreads}} appears to be failing every few months on > Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430 > and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote > this Jenkins had cleaned out the logs from the latest failed run so I have no > way of determining if the cause is the same. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Failed: ZOOKEEPER- PreCommit Build #781
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 32.58 MB...] [exec] +0 tests included. The patch appears to be a documentation patch that doesn't require tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 48 new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/781//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 7a0a096774669ce2581f9534a3109dfaa0ef14aa logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1703: exec returned: 2 Total time: 36 minutes 20 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2803 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043580#comment-16043580 ] ASF GitHub Bot commented on ZOOKEEPER-2803: --- Github user afine closed the pull request at: https://github.com/apache/zookeeper/pull/277 > Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads > -- > > Key: ZOOKEEPER-2803 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10 >Reporter: Abraham Fine >Assignee: Abraham Fine > > We have noticed on internal executions of the integration tests rare failures > of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads. > {code} > java.lang.RuntimeException: Unable to run quorum server > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) > at > org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Caused by: java.io.IOException: The current epoch, 0, is older than the last > zxid, 4294967296 > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) > {code} > along with this strange stack trace in the logs: > {code} > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) > at > org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) > at > org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) > at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) > {code} > It appears that this failure is related to the usage of {{((FileOutputStream) > out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. > {{FileChannel#force}} appears to be interruptible, which is not desirable > behavior when writing the epoch file. The interrupt may be triggered by the > repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. > Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does > not appear to have the same problem. > I was able to find another JIRA ticket describing a similar issue here: > https://issues.apache.org/jira/browse/DERBY-4963 > There is also interesting discussion in ZOOKEEPER-1835 (where the change was > made for 3.5) although these discussions appear to be Windows centric (we > noticed the issue on Linux) > https://issues.apache.org/jira/browse/ZOOKEEPER-1835 > The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build > #3241" but jenkins cleared out the logs (I only still have the test report > from the mailing list). > In addition, {{testWorkerThreads}} appears to be failing every few months on > Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430 > and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote > this Jenkins had cleaned out the logs from the latest failed run so I have no > way of determining if the cause is the same. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #277: ZOOKEEPER-2803 Flaky test: org.apache.zookeeper...
Github user afine closed the pull request at: https://github.com/apache/zookeeper/pull/277 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abraham Fine updated ZOOKEEPER-2803: Description: We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads. {code} java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) {code} along with this strange stack trace in the logs: {code} java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) {code} It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. {{FileChannel#force}} appears to be interruptible, which is not desirable behavior when writing the epoch file. The interrupt may be triggered by the repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not appear to have the same problem. I was able to find another JIRA ticket describing a similar issue here: https://issues.apache.org/jira/browse/DERBY-4963 There is also interesting discussion in ZOOKEEPER-1835 (where the change was made for 3.5) although these discussions appear to be Windows centric (we noticed the issue on Linux) https://issues.apache.org/jira/browse/ZOOKEEPER-1835 The failure appears to have popped up on "ZOOKEEPER-2297 PreCommit Build #3241" but jenkins cleared out the logs (I only still have the test report from the mailing list). In addition, {{testWorkerThreads}} appears to be failing every few months on Solaris on Apache Jenkins (for 3.4 ZooKeeper_branch34_solaris - Build # 1430 and 3.5 ZooKeeper_branch35_solaris - Build # 387), but at the time I wrote this Jenkins had cleaned out the logs from the latest failed run so I have no way of determining if the cause is the same. was: We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads. {code} java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) {code} along with this strange stack trace in the logs: {code} java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) {code} It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}.
[jira] [Commented] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043538#comment-16043538 ] ASF GitHub Bot commented on ZOOKEEPER-2803: --- GitHub user afine opened a pull request: https://github.com/apache/zookeeper/pull/277 ZOOKEEPER-2803 Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads You can merge this pull request into a Git repository by running: $ git pull https://github.com/afine/zookeeper ZOOKEEPER-2803 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/277.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #277 commit 0a6ada6ba25ab3d4b2094a4c5f1842a9a0b67dfc Author: Abraham FineDate: 2017-06-08T22:10:41Z ZOOKEEPER-2803: Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads > Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads > -- > > Key: ZOOKEEPER-2803 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10 >Reporter: Abraham Fine >Assignee: Abraham Fine > > We have noticed on internal executions of the integration tests rare failures > of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads. > {code} > java.lang.RuntimeException: Unable to run quorum server > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) > at > org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) > at > org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Caused by: java.io.IOException: The current epoch, 0, is older than the last > zxid, 4294967296 > at > org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) > {code} > along with this strange stack trace in the logs: > {code} > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) > at > org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) > at > org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) > at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) > {code} > It appears that this failure is related to the usage of {{((FileOutputStream) > out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. > {{FileChannel#force}} appears to be interruptible, which is not desirable > behavior when writing the epoch file. The interrupt may be triggered by the > repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. > Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does > not appear to have the same problem. > I was able to find another JIRA ticket describing a similar issue here: > https://issues.apache.org/jira/browse/DERBY-4963 > There is also interesting discussion in ZOOKEEPER-1835 (where the change was > made for 3.5) although these discussions appear to be Windows centric (we > noticed the issue on Linux) > https://issues.apache.org/jira/browse/ZOOKEEPER-1835 > {{testWorkerThreads}} appears to be failing every few months on Solaris on > Apache Jenkins (for 3.4 and 3.5), but at the time I wrote this Jenkins had > cleaned out the logs from the latest failed run so I have no way of > determining if the cause is the same. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abraham Fine updated ZOOKEEPER-2803: Description: We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads. {code} java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) {code} along with this strange stack trace in the logs: {code} java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) {code} It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. {{FileChannel#force}} appears to be interruptible, which is not desirable behavior when writing the epoch file. The interrupt may be triggered by the repeated starting and shutting down of quorum peers in {{testWorkerThreads}}. Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not appear to have the same problem. I was able to find another JIRA ticket describing a similar issue here: https://issues.apache.org/jira/browse/DERBY-4963 There is also interesting discussion in ZOOKEEPER-1835 (where the change was made for 3.5) although these discussions appear to be Windows centric (we noticed the issue on Linux) https://issues.apache.org/jira/browse/ZOOKEEPER-1835 {{testWorkerThreads}} appears to be failing every few months on Solaris on Apache Jenkins (for 3.4 and 3.5), but at the time I wrote this Jenkins had cleaned out the logs from the latest failed run so I have no way of determining if the cause is the same. was: We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads {code} java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) {code} along with this strange stack trace in the logs: {code} java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) {code} It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. {{FileChannel#force}} appears to be interruptible, which is not desirable behavior when writing the epoch file. Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not appear to have the same problem. I was able to find another JIRA ticket describing
[GitHub] zookeeper issue #276: ZOOKEEPER-1748: add tcp keepalive option for branch 3....
Github user bensherman commented on the issue: https://github.com/apache/zookeeper/pull/276 Version number fixed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043532#comment-16043532 ] ASF GitHub Bot commented on ZOOKEEPER-1748: --- Github user bensherman commented on the issue: https://github.com/apache/zookeeper/pull/276 Version number fixed! > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZOOKEEPER-2803) Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads
Abraham Fine created ZOOKEEPER-2803: --- Summary: Flaky test: org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads Key: ZOOKEEPER-2803 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2803 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.10 Reporter: Abraham Fine Assignee: Abraham Fine We have noticed on internal executions of the integration tests rare failures of org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads {code} java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:565) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:520) at org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads(CnxManagerTest.java:328) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Caused by: java.io.IOException: The current epoch, 0, is older than the last zxid, 4294967296 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:546) {code} along with this strange stack trace in the logs: {code} java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:380) at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:71) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1232) at org.apache.zookeeper.server.quorum.QuorumPeer.setCurrentEpoch(QuorumPeer.java:1253) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:412) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:851) {code} It appears that this failure is related to the usage of {{((FileOutputStream) out).getChannel().force(true)}} in {{AtomicFileOutputStream}}. {{FileChannel#force}} appears to be interruptible, which is not desirable behavior when writing the epoch file. Branch 3.5 uses {{FileDescriptor#sync}} which is not interruptible and does not appear to have the same problem. I was able to find another JIRA ticket describing a similar issue here: https://issues.apache.org/jira/browse/DERBY-4963 There is also interesting discussion in ZOOKEEPER-1835 (where the change was made for 3.5) although these discussions appear to be Windows centric (we noticed the issue on Linux) https://issues.apache.org/jira/browse/ZOOKEEPER-1835 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2684) Fix a crashing bug in the mixed workloads commit processor
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043497#comment-16043497 ] ASF GitHub Bot commented on ZOOKEEPER-2684: --- Github user fpj commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/167#discussion_r121006265 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java --- @@ -254,24 +254,23 @@ public void run() { // If session queue != null, then it is also not empty. Request topPending = sessionQueue.poll(); if (request.cxid != topPending.cxid) { -LOG.error( -"Got cxid 0x" -+ Long.toHexString(request.cxid) -+ " expected 0x" + Long.toHexString( -topPending.cxid) -+ " for client session id " -+ Long.toHexString(request.sessionId)); -throw new IOException("Error: unexpected cxid for" -+ "client session"); +// we can get commit requests that is not at the queue head when +// session moves (see ZOOKEEPER-2684). We will just pass the +// commit to the next processor and put the pending back with +// a warning, we should not see this often under normal load +LOG.warn("Got request " + request + +" but we are expecting request " + topPending); +sessionQueue.addFirst(topPending); +} else { --- End diff -- Is it the case that for a given session, once we execute the else block once, executing the if block would be incorrect? If so, would it make sense to have a flag per session indicating that the else block has not been executed for the session? It might not even be a flag per session, but perhaps a set of session ids instead that we remove from once we execute the else block. > Fix a crashing bug in the mixed workloads commit processor > -- > > Key: ZOOKEEPER-2684 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.0 > Environment: with pretty heavy load on a real cluster >Reporter: Ryan Zhang >Assignee: Ryan Zhang >Priority: Blocker > Attachments: ZOOKEEPER-2684.patch > > > We deployed our build with ZOOKEEPER-2024 and it quickly started to crash > with the following error > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251 > atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR > [CommitProcessor:2] > -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268) > – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc > clearly something is not right in the new commit processor per session queue > implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #167: ZOOKEEPER-2684 commitProcessor does not crash w...
Github user fpj commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/167#discussion_r121006265 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java --- @@ -254,24 +254,23 @@ public void run() { // If session queue != null, then it is also not empty. Request topPending = sessionQueue.poll(); if (request.cxid != topPending.cxid) { -LOG.error( -"Got cxid 0x" -+ Long.toHexString(request.cxid) -+ " expected 0x" + Long.toHexString( -topPending.cxid) -+ " for client session id " -+ Long.toHexString(request.sessionId)); -throw new IOException("Error: unexpected cxid for" -+ "client session"); +// we can get commit requests that is not at the queue head when +// session moves (see ZOOKEEPER-2684). We will just pass the +// commit to the next processor and put the pending back with +// a warning, we should not see this often under normal load +LOG.warn("Got request " + request + +" but we are expecting request " + topPending); +sessionQueue.addFirst(topPending); +} else { --- End diff -- Is it the case that for a given session, once we execute the else block once, executing the if block would be incorrect? If so, would it make sense to have a flag per session indicating that the else block has not been executed for the session? It might not even be a flag per session, but perhaps a set of session ids instead that we remove from once we execute the else block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043479#comment-16043479 ] Hadoop QA commented on ZOOKEEPER-1748: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//console This message is automatically generated. > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Success: ZOOKEEPER- PreCommit Build #780
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 67.26 MB...] [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +0 tests included. The patch appears to be a documentation patch that doesn't require tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/780//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 53536046103656a3ef44e3fb2cd9fce903bf3cd8 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD SUCCESSFUL Total time: 18 minutes 47 seconds Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-1748 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Success Sending email for trigger: Success Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## All tests passed
[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/274 The Jekins failures are "expected" - the findbugs / doc warnings are known and the failed test is a known deprecated test. I'll merge this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #276: ZOOKEEPER-1748: add tcp keepalive option for branch 3....
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/276 It will go in semi automatically as part of commit process so no need to send a separate PR. And the failed jenkins test is a known flaky / buggy one. Patch lgtm pending update version number in doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043431#comment-16043431 ] ASF GitHub Bot commented on ZOOKEEPER-1748: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/276 It will go in semi automatically as part of commit process so no need to send a separate PR. And the failed jenkins test is a known flaky / buggy one. Patch lgtm pending update version number in doc. > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043427#comment-16043427 ] ASF GitHub Bot commented on ZOOKEEPER-1748: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/276#discussion_r120997880 --- Diff: src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml --- @@ -1179,6 +1179,29 @@ server.3=zoo3:2888:3888 + +tcpKeepAlive + + + (Java system property: zookeeper.tcpKeepAlive) + + New in 3.4.11: --- End diff -- Please replace 3.4.11 with 3.5.4. 3.4.11 is only applicable for branch-3.4 and 3.5.4 is the version number we use for branch-3.5. > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #276: ZOOKEEPER-1748: add tcp keepalive option for br...
Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/276#discussion_r120997880 --- Diff: src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml --- @@ -1179,6 +1179,29 @@ server.3=zoo3:2888:3888 + +tcpKeepAlive + + + (Java system property: zookeeper.tcpKeepAlive) + + New in 3.4.11: --- End diff -- Please replace 3.4.11 with 3.5.4. 3.4.11 is only applicable for branch-3.4 and 3.5.4 is the version number we use for branch-3.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043376#comment-16043376 ] ASF GitHub Bot commented on ZOOKEEPER-1748: --- Github user bensherman commented on the issue: https://github.com/apache/zookeeper/pull/276 This applies cleanly to master too, should that be a separate PR or wit it go auto-magically? > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #276: ZOOKEEPER-1748: add tcp keepalive option for branch 3....
Github user bensherman commented on the issue: https://github.com/apache/zookeeper/pull/276 This applies cleanly to master too, should that be a separate PR or wit it go auto-magically? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive
Github user bensherman commented on the issue: https://github.com/apache/zookeeper/pull/274 @hanm docs are fixed, 3.5 patch is in above referenced PR. Thanks again for your help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Failed: ZOOKEEPER- PreCommit Build #779
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 69.65 MB...] [exec] [exec] +0 tests included. The patch appears to be a documentation patch that doesn't require tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/779//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Error: No value specified for option "issue" [exec] 2a9c6f689150e19adfcbbe0e46ddbc050b82b353 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1642: exec returned: 1 Total time: 13 minutes 44 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Could not determine description. Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig Error Message: waiting for server 2 being up Stack Trace: junit.framework.AssertionFailedError: waiting for server 2 being up at org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043323#comment-16043323 ] ASF GitHub Bot commented on ZOOKEEPER-1748: --- GitHub user bensherman opened a pull request: https://github.com/apache/zookeeper/pull/276 add tcp keepalive option for branch 3.5 Adding TCP keepalive for branch 3.5, as described in https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and https://github.com/apache/zookeeper/pull/274 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bensherman/zookeeper ZOOKEEPER-1748-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/276.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #276 > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #276: add tcp keepalive option for branch 3.5
GitHub user bensherman opened a pull request: https://github.com/apache/zookeeper/pull/276 add tcp keepalive option for branch 3.5 Adding TCP keepalive for branch 3.5, as described in https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and https://github.com/apache/zookeeper/pull/274 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bensherman/zookeeper ZOOKEEPER-1748-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/276.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #276 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
ZooKeeper_branch35_jdk7 - Build # 994 - Failure
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/994/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 65.02 MB...] [junit] 2017-06-08 19:05:24,279 [myid:127.0.0.1:13915] - INFO [main-SendThread(127.0.0.1:13915):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:13915. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 19:05:24,279 [myid:127.0.0.1:13915] - WARN [main-SendThread(127.0.0.1:13915):ClientCnxn$SendThread@1235] - Session 0x104b3fed854 for server 127.0.0.1/127.0.0.1:13915, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-06-08 19:05:24,767 [myid:127.0.0.1:14038] - INFO [main-SendThread(127.0.0.1:14038):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:14038. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 19:05:24,768 [myid:127.0.0.1:14038] - WARN [main-SendThread(127.0.0.1:14038):ClientCnxn$SendThread@1235] - Session 0x104b402a442 for server 127.0.0.1/127.0.0.1:14038, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-06-08 19:05:24,902 [myid:127.0.0.1:14068] - INFO [main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:14068. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 19:05:24,902 [myid:127.0.0.1:14068] - WARN [main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1235] - Session 0x0 for server 127.0.0.1/127.0.0.1:14068, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-06-08 19:05:25,205 [myid:127.0.0.1:14044] - INFO [main-SendThread(127.0.0.1:14044):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:14044. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 19:05:25,206 [myid:127.0.0.1:14044] - WARN [main-SendThread(127.0.0.1:14044):ClientCnxn$SendThread@1235] - Session 0x304b402a444 for server 127.0.0.1/127.0.0.1:14044, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-06-08 19:05:25,329 [myid:127.0.0.1:14068] - INFO [main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:14068. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 19:05:25,330 [myid:127.0.0.1:14068] - WARN [main-SendThread(127.0.0.1:14068):ClientCnxn$SendThread@1235] - Session 0x504b402cb96 for server 127.0.0.1/127.0.0.1:14068, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] Running
Failed: ZOOKEEPER- PreCommit Build #778
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 35.10 MB...] [exec] [exec] +0 tests included. The patch appears to be a documentation patch that doesn't require tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 48 new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/778//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Error: No value specified for option "issue" [exec] 23b941014abd9b6e3e7dbab4f0d868489a29591a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1703: exec returned: 3 Total time: 32 minutes 3 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Could not determine description. Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.test.LETest.testLE Error Message: Threads didn't join Stack Trace: junit.framework.AssertionFailedError: Threads didn't join at org.apache.zookeeper.test.LETest.testLE(LETest.java:120) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
[jira] [Commented] (ZOOKEEPER-2755) Allow to subclass ClientCnxnSocketNetty and NettyServerCnxn in order to use Netty Local transport
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043100#comment-16043100 ] ASF GitHub Bot commented on ZOOKEEPER-2755: --- Github user eolivelli commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/227#discussion_r120951358 --- Diff: src/java/test/org/apache/zookeeper/test/NettyLocalSuiteTest.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.zookeeper.test; + +import org.junit.runners.Suite; + +/** + * Run tests with: Netty Client against Netty server + */ +@Suite.SuiteClasses({ --- End diff -- Ping > Allow to subclass ClientCnxnSocketNetty and NettyServerCnxn in order to use > Netty Local transport > - > > Key: ZOOKEEPER-2755 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2755 > Project: ZooKeeper > Issue Type: New Feature > Components: java client, server >Affects Versions: 3.5.2 >Reporter: Enrico Olivelli > > ClientCnxnSocketNetty and NettyServerCnxn use explicitly InetSocketAddress > class to work with network addresses. > We can do a little refactoring to use only SocketAddress and make it possible > to create subclasses of ClientCnxnSocketNetty and NettyServerCnxn which > leverage built-in Netty 'local' channels. > Such Netty local channels do not create real sockets and so allow a simple > ZooKeeper server + ZooKeeper client to be run on the same JVM without binding > to real TCP endpoints. > Usecases: > Ability to run concurrently on the same machine tests of projects which use > ZooKeeper (usually in unit tests the server and the client run inside the > same JVM) without dealing with random ports and in general using less network > resources > Run simplified (standalone, all processes in the same JVM) versions of > applications which need a working ZooKeeper ensemble to run. > Note: > Embedding ZooKeeper server + client on the same JVM has many risks and in > general I think we should encourage users to do so, so I in this patch I will > not provide official implementations of ClientCnxnSocketNetty and > NettyServerCnxn. There will be implementations only inside the test packages, > in order to test that most of the features are working with custom socket > factories and in particular with the 'LocalAddress' specific subclass of > SocketAddress. > Note: > the 'Local' sockets feature will be available on Netty 4 too -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #227: ZOOKEEPER-2755 Allow to subclass ClientCnxnSock...
Github user eolivelli commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/227#discussion_r120951358 --- Diff: src/java/test/org/apache/zookeeper/test/NettyLocalSuiteTest.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.zookeeper.test; + +import org.junit.runners.Suite; + +/** + * Run tests with: Netty Client against Netty server + */ +@Suite.SuiteClasses({ --- End diff -- Ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043079#comment-16043079 ] ASF GitHub Bot commented on ZOOKEEPER-2798: --- Github user afine closed the pull request at: https://github.com/apache/zookeeper/pull/270 > Fix flaky test: > org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents > --- > > Key: ZOOKEEPER-2798 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: Abraham Fine >Assignee: Abraham Fine > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > This test appears to be failing intermitently on both 3.4 and 3.5. Here are a > couple of example failing jobs. > 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/ > 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #270: ZOOKEEPER-2798 Fix flaky test: org.apache.zooke...
Github user afine closed the pull request at: https://github.com/apache/zookeeper/pull/270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive
Github user bensherman commented on the issue: https://github.com/apache/zookeeper/pull/274 I'll get the docs thing fixed right now, and I'll get the 3.5 PR done soon, it may take me some time as I don't have an environment setup to test 3.5 right now - bear with me! Should this also be applied to master or is there some magic there that keeps 3.5 and master in line? I am also concerned that jenkins isn't passing its tests, should there be anything in my change that's causing it, please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
ZooKeeper_branch35_openjdk7 - Build # 556 - Failure
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/556/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 62.35 MB...] [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-06-08 16:59:36,783 [myid:127.0.0.1:19547] - INFO [main-SendThread(127.0.0.1:19547):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:19547. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 16:59:36,783 [myid:127.0.0.1:19547] - INFO [main-SendThread(127.0.0.1:19547):ClientCnxn$SendThread@946] - Socket connection established, initiating session, client: /127.0.0.1:51824, server: 127.0.0.1/127.0.0.1:19547 [junit] 2017-06-08 16:59:36,784 [myid:] - INFO [New I/O worker #9949:ZooKeeperServer@1025] - Client attempting to renew session 0x104b3994f32 at /127.0.0.1:51824 [junit] 2017-06-08 16:59:36,784 [myid:] - INFO [New I/O worker #9949:ZooKeeperServer@727] - Established session 0x104b3994f32 with negotiated timeout 6000 for client /127.0.0.1:51824 [junit] 2017-06-08 16:59:36,784 [myid:127.0.0.1:19547] - INFO [main-SendThread(127.0.0.1:19547):ClientCnxn$SendThread@1381] - Session establishment complete on server 127.0.0.1/127.0.0.1:19547, sessionid = 0x104b3994f32, negotiated timeout = 6000 [junit] 2017-06-08 16:59:36,787 [myid:] - INFO [SyncThread:0:FileTxnLog@206] - Creating new log file: log.7 [junit] 2017-06-08 16:59:37,567 [myid:127.0.0.1:19427] - INFO [main-SendThread(127.0.0.1:19427):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:19427. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-06-08 16:59:37,567 [myid:127.0.0.1:19427] - WARN [main-SendThread(127.0.0.1:19427):ClientCnxn$SendThread@1235] - Session 0x204b395f312 for server 127.0.0.1/127.0.0.1:19427, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-06-08 16:59:37,800 [myid:] - INFO [ProcessThread(sid:0 cport:19547)::PrepRequestProcessor@613] - Processed session termination for sessionid: 0x104b3994f32 [junit] 2017-06-08 16:59:37,801 [myid:] - INFO [SyncThread:0:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port19547,name1=Connections,name2=127.0.0.1,name3=0x104b3994f32] [junit] 2017-06-08 16:59:37,801 [myid:] - INFO [main:ZooKeeper@1331] - Session: 0x104b3994f32 closed [junit] 2017-06-08 16:59:37,802 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for session: 0x104b3994f32 [junit] 2017-06-08 16:59:37,803 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 229900 [junit] 2017-06-08 16:59:37,804 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 2427 [junit] 2017-06-08 16:59:37,804 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD testWatcherAutoResetWithLocal [junit] 2017-06-08 16:59:37,804 [myid:] - INFO [main:ClientBase@586] - tearDown starting [junit] 2017-06-08 16:59:37,804 [myid:] - INFO [main:ClientBase@556] - STOPPING server [junit] 2017-06-08 16:59:37,804 [myid:] - INFO [main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:19547 [junit] 2017-06-08 16:59:37,811 [myid:] - INFO [main:ZooKeeperServer@541] - shutting down [junit] 2017-06-08 16:59:37,811 [myid:] - ERROR [main:ZooKeeperServer@505] - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes [junit] 2017-06-08 16:59:37,811 [myid:] - INFO [main:SessionTrackerImpl@232] - Shutting down [junit] 2017-06-08 16:59:37,812 [myid:] - INFO [main:PrepRequestProcessor@1010] - Shutting down [junit] 2017-06-08 16:59:37,812 [myid:] - INFO [main:SyncRequestProcessor@191] - Shutting down [junit] 2017-06-08 16:59:37,812 [myid:] - INFO [ProcessThread(sid:0 cport:19547)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2017-06-08 16:59:37,813 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited! [junit] 2017-06-08 16:59:37,813 [myid:] - INFO [main:FinalRequestProcessor@481] - shutdown of request processor complete [junit] 2017-06-08 16:59:37,813 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-2801) address spelling errors/typos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043023#comment-16043023 ] ASF GitHub Bot commented on ZOOKEEPER-2801: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/275 @tmancill Also I am curious what tools you use to catch the spelling errors. I think have such tool be part of commit workflow, or daily build would be helpful. > address spelling errors/typos > - > > Key: ZOOKEEPER-2801 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2801 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.5.3 >Reporter: tony mancill >Assignee: tony mancill >Priority: Trivial > > This is a follow-on for ZOOKEEPER-2617 (for which I only supplied a patch for > branch-3.4), that addresses minor typos in master. With a slight > modification, the patch also applies against the branch-3.5 branch. > If folks are curious, the typos are spotted with the "spellintian" shipped > with Debian's lintian package. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #275: ZOOKEEPER-2801: address spelling errors/typos
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/275 @tmancill Also I am curious what tools you use to catch the spelling errors. I think have such tool be part of commit workflow, or daily build would be helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2801) address spelling errors/typos
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043012#comment-16043012 ] ASF GitHub Bot commented on ZOOKEEPER-2801: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/275 Nice pull request. Please don't directly modify the document artifacts (html files, etc). The way the document is updated is by modifying the source of the docs, located at src/docs/src/documentation/content/xdocs. After modifying the source, please verify the generated document is correct locally by using apache forrest https://forrest.apache.org/. After verification please submit the document source change only - the compiled document artifacts (html files, etc) don't need to be submitted, because we will regenerate document in every release. Please check the commit history of https://github.com/apache/zookeeper/tree/master/src/docs/src/documentation/content/xdocs to get a concrete idea of how to make doc changes, it should be pretty straightforward. It would be also good to fix the similar typos in branch-3.5 and branch-3.4, which this PR does not directly apply with many merge conflicts. > address spelling errors/typos > - > > Key: ZOOKEEPER-2801 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2801 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.5.3 >Reporter: tony mancill >Assignee: tony mancill >Priority: Trivial > > This is a follow-on for ZOOKEEPER-2617 (for which I only supplied a patch for > branch-3.4), that addresses minor typos in master. With a slight > modification, the patch also applies against the branch-3.5 branch. > If folks are curious, the typos are spotted with the "spellintian" shipped > with Debian's lintian package. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #275: ZOOKEEPER-2801: address spelling errors/typos
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/275 Nice pull request. Please don't directly modify the document artifacts (html files, etc). The way the document is updated is by modifying the source of the docs, located at src/docs/src/documentation/content/xdocs. After modifying the source, please verify the generated document is correct locally by using apache forrest https://forrest.apache.org/. After verification please submit the document source change only - the compiled document artifacts (html files, etc) don't need to be submitted, because we will regenerate document in every release. Please check the commit history of https://github.com/apache/zookeeper/tree/master/src/docs/src/documentation/content/xdocs to get a concrete idea of how to make doc changes, it should be pretty straightforward. It would be also good to fix the similar typos in branch-3.5 and branch-3.4, which this PR does not directly apply with many merge conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042987#comment-16042987 ] Hudson commented on ZOOKEEPER-2798: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3419 (See [https://builds.apache.org/job/ZooKeeper-trunk/3419/]) ZOOKEEPER-2798: Fix flaky test: (hanm: rev 1038966e8289c09a6f3b863dd2713b9f1c83b4cf) * (edit) src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java * (edit) src/java/test/org/apache/zookeeper/test/ClientBase.java > Fix flaky test: > org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents > --- > > Key: ZOOKEEPER-2798 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: Abraham Fine >Assignee: Abraham Fine > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > This test appears to be failing intermitently on both 3.4 and 3.5. Here are a > couple of example failing jobs. > 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/ > 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042988#comment-16042988 ] Hudson commented on ZOOKEEPER-2775: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3419 (See [https://builds.apache.org/job/ZooKeeper-trunk/3419/]) ZOOKEEPER-2775: ZK Client not able to connect with Xid out of order (hanm: rev fa1dc109d4c1bb7913fee43170ed6131e3dc1b1f) * (edit) src/java/main/org/apache/zookeeper/ClientCnxn.java * (delete) src/java/test/org/apache/zookeeper/test/SaslAuthTest.java * (add) src/java/test/org/apache/zookeeper/SaslAuthTest.java > ZK Client not able to connect with Xid out of order error > -- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Bhupendra Kumar Jain >Assignee: Mohammad Arshad >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2775-01.patch > > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han updated ZOOKEEPER-2775: --- Fix Version/s: 3.6.0 3.5.4 > ZK Client not able to connect with Xid out of order error > -- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Bhupendra Kumar Jain >Assignee: Mohammad Arshad >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2775-01.patch > > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042923#comment-16042923 ] Michael Han commented on ZOOKEEPER-2775: Committed to master https://github.com/apache/zookeeper/commit/fa1dc109d4c1bb7913fee43170ed6131e3dc1b1f branch-3.5 https://github.com/apache/zookeeper/commit/0026e27e81f4889816bec162964e2a721cc53db9 JIRA will be resolved pending the pull request for branch-3.4. > ZK Client not able to connect with Xid out of order error > -- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Bhupendra Kumar Jain >Assignee: Mohammad Arshad >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2775-01.patch > > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042918#comment-16042918 ] ASF GitHub Bot commented on ZOOKEEPER-2775: --- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/254 > ZK Client not able to connect with Xid out of order error > -- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Bhupendra Kumar Jain >Assignee: Mohammad Arshad >Priority: Critical > Attachments: ZOOKEEPER-2775-01.patch > > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #254: ZOOKEEPER-2775: ZK Client not able to connect w...
Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/254 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042910#comment-16042910 ] Michael Han commented on ZOOKEEPER-2798: Committed to master https://github.com/apache/zookeeper/commit/1038966e8289c09a6f3b863dd2713b9f1c83b4cf, branch-3.5 https://github.com/apache/zookeeper/commit/643e551eacc1fb76c40e04b5d857aaac77089343 branch-3.4 https://github.com/apache/zookeeper/commit/06889c82fdf2093aba800b31f89628fbfd0c08a5 > Fix flaky test: > org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents > --- > > Key: ZOOKEEPER-2798 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: Abraham Fine >Assignee: Abraham Fine > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > This test appears to be failing intermitently on both 3.4 and 3.5. Here are a > couple of example failing jobs. > 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/ > 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han resolved ZOOKEEPER-2798. Resolution: Fixed Fix Version/s: 3.4.11 3.6.0 3.5.4 > Fix flaky test: > org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents > --- > > Key: ZOOKEEPER-2798 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: Abraham Fine >Assignee: Abraham Fine > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > This test appears to be failing intermitently on both 3.4 and 3.5. Here are a > couple of example failing jobs. > 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/ > 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042906#comment-16042906 ] ASF GitHub Bot commented on ZOOKEEPER-2798: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/270 Merged, please close this @afine > Fix flaky test: > org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents > --- > > Key: ZOOKEEPER-2798 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: Abraham Fine >Assignee: Abraham Fine > > This test appears to be failing intermitently on both 3.4 and 3.5. Here are a > couple of example failing jobs. > 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/ > 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #270: ZOOKEEPER-2798 Fix flaky test: org.apache.zookeeper.te...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/270 Merged, please close this @afine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2798) Fix flaky test: org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042902#comment-16042902 ] ASF GitHub Bot commented on ZOOKEEPER-2798: --- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/271 > Fix flaky test: > org.apache.zookeeper.test.ReadOnlyModeTest.testConnectionEvents > --- > > Key: ZOOKEEPER-2798 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2798 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: Abraham Fine >Assignee: Abraham Fine > > This test appears to be failing intermitently on both 3.4 and 3.5. Here are a > couple of example failing jobs. > 3.4: https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1404/ > 3.5: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/459/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper pull request #271: ZOOKEEPER-2798 Fix flaky test: org.apache.zooke...
Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/271 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #274: Zookeeper 1748: Add option for tcp keepalive
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/274 @bensherman : Merged, please close this pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Build failure for recent commit
One of the project committers needs to cut a release candidate and put it up for a vote. If it is a bug fix release, then it should be relatively straightforward. -Flavio > On 06 Jun 2017, at 20:39, Ben Shermanwrote: > > Looking at https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and > https://github.com/apache/zookeeper/pull/83 > > It looks like jenkins is trying to post that the build worked and can't, > resulting in what looks like a failure. Can I get a hand on fixing this? > > Also, what is the process for proposing a new release getting cut? I'd > like to see this change go into 3.4.11 asap.
[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042892#comment-16042892 ] Michael Han commented on ZOOKEEPER-1748: Merged to branch-3.4: https://github.com/apache/zookeeper/commit/51cdeb407cfb7887e647ba7d34718232e6108409 [~rakeshr] Can you please add [~bensherman] to contributor list and assign this issue to him. [~bensherman] If you have time to send a pull request targeting branch-3.5, that would be great. The current patch does not apply to branch-3.5. > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ZOOKEEPER-1748) TCP keepalive for leader election connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han resolved ZOOKEEPER-1748. Resolution: Fixed Fix Version/s: (was: 3.5.4) (was: 3.6.0) 3.4.11 Issue resolved by pull request 274 [https://github.com/apache/zookeeper/pull/274] > TCP keepalive for leader election connections > - > > Key: ZOOKEEPER-1748 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Affects Versions: 3.4.5, 3.5.0 > Environment: Linux, Java 1.7 >Reporter: Antal Sasvári >Assignee: Daniel Peon >Priority: Minor > Fix For: 3.4.11 > > Attachments: Zookeeper-1748-add_tcp_keepalive.patch > > > In our system we encountered the following problem: > If the system is stable, and there is no leader election, the leader election > port connections are open for very long time without any packets being sent > on them. > Some network elements silently drop the established TCP connection after a > timeout if there are no packets being sent on it. In this case the ZK servers > will not notice the connection loss. This causes additional delay later when > the next leader election is started, as the TCP connections are not alive any > more. > We would like to be able to enable TCP keepalive on the leader election > sockets in order to prevent the connection timeout in some network elements > due to connection inactivity. > This could be controlled by adding a new config parameter called tcpKeepAlive > in the ZooKeeper configuration file. It would be only applicable in case of > algorithm 3 (TCP based fast leader election), having the default value false. > If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for > the leader election sockets in QuorumCnxManager.setSockOpts() by calling > sock.setKeepAlive(true). > We have tested this change successfully in our environment. > Please comment whether you see any problem with this. If not, I am going to > submit a patch. > I've been told that e.g. Apache ActiveMQ also has a config option for similar > purpose called transport.keepalive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
ZooKeeper_branch34_openjdk7 - Build # 1527 - Still Failing
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1527/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer [EnvInject] - Loading node environment variables. Building remotely on qnode3 (ubuntu) in workspace /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10 Cleaning workspace > git rev-parse --verify HEAD # timeout=10 Resetting working tree > git reset --hard # timeout=10 > git clean -fdx # timeout=10 Fetching upstream changes from git://git.apache.org/zookeeper.git > git --version # timeout=10 > git fetch --tags --progress git://git.apache.org/zookeeper.git > +refs/heads/*:refs/remotes/origin/* > git rev-parse refs/remotes/origin/branch-3.4^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/branch-3.4^{commit} # timeout=10 Checking out Revision 3289ebbaa48d85ceb9dc5154f5547f37cf7d300c (refs/remotes/origin/branch-3.4) > git config core.sparsecheckout # timeout=10 > git checkout -f 3289ebbaa48d85ceb9dc5154f5547f37cf7d300c > git rev-list 3289ebbaa48d85ceb9dc5154f5547f37cf7d300c # timeout=10 No emails were triggered. [ZooKeeper_branch34_openjdk7] $ /home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -Dtest.output=yes -Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.7 clean test-core-java Error: JAVA_HOME is not defined correctly. We cannot execute /usr/lib/jvm/java-7-openjdk-amd64//bin/java Build step 'Invoke Ant' marked build as failure Recording test results ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
ZooKeeper-trunk-jdk8 - Build # 1078 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1078/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 62.80 MB...] [junit] 2017-06-08 11:58:05,744 [myid:] - WARN [New I/O worker #8383:NettyServerCnxnFactory$CnxnChannelHandler@142] - Exception caught [id: 0x48f46c30, /127.0.0.1:43882 :> /127.0.0.1:11468] EXCEPTION: java.nio.channels.ClosedChannelException [junit] java.nio.channels.ClosedChannelException [junit] at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373) [junit] at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81) [junit] at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36) [junit] at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779) [junit] at org.jboss.netty.channel.SimpleChannelHandler.closeRequested(SimpleChannelHandler.java:334) [junit] at org.jboss.netty.channel.SimpleChannelHandler.handleDownstream(SimpleChannelHandler.java:260) [junit] at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) [junit] at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) [junit] at org.jboss.netty.channel.Channels.close(Channels.java:812) [junit] at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:206) [junit] at org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:118) [junit] at org.apache.zookeeper.server.NettyServerCnxn.sendBuffer(NettyServerCnxn.java:221) [junit] at org.apache.zookeeper.server.NettyServerCnxn.sendCloseSession(NettyServerCnxn.java:460) [junit] at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:461) [junit] at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:182) [junit] at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) [junit] 2017-06-08 11:58:05,744 [myid:] - INFO [New I/O worker #8334:ClientCnxnSocketNetty$ZKClientHandler@384] - channel is disconnected: [id: 0x113b8508, /127.0.0.1:43882 :> 127.0.0.1/127.0.0.1:11468] [junit] 2017-06-08 11:58:05,748 [myid:] - INFO [New I/O worker #8334:ClientCnxnSocketNetty@208] - channel is told closing [junit] 2017-06-08 11:58:05,748 [myid:] - INFO [main:ClientCnxnSocketNetty@208] - channel is told closing [junit] 2017-06-08 11:58:05,748 [myid:] - INFO [main:ZooKeeper@1329] - Session: 0x10400ff22e9 closed [junit] 2017-06-08 11:58:05,748 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for session: 0x10400ff22e9 [junit] 2017-06-08 11:58:05,749 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 129832 [junit] 2017-06-08 11:58:05,750 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 948 [junit] 2017-06-08 11:58:05,750 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD testWatcherAutoResetWithLocal [junit] 2017-06-08 11:58:05,750 [myid:] - INFO [main:ClientBase@582] - tearDown starting [junit] 2017-06-08 11:58:05,750 [myid:] - INFO [main:ClientBase@552] - STOPPING server [junit] 2017-06-08 11:58:05,751 [myid:] - INFO [main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468 [junit] 2017-06-08 11:58:05,752 [myid:] - INFO [main:ZooKeeperServer@542] - shutting down [junit] 2017-06-08 11:58:05,752 [myid:] - ERROR [main:ZooKeeperServer@506] - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes [junit] 2017-06-08 11:58:05,753 [myid:] - INFO [main:SessionTrackerImpl@232] - Shutting down [junit] 2017-06-08 11:58:05,753 [myid:] - INFO [main:PrepRequestProcessor@1014] - Shutting down [junit] 2017-06-08 11:58:05,753 [myid:] - INFO [main:SyncRequestProcessor@191] - Shutting down [junit] 2017-06-08 11:58:05,753 [myid:] - INFO [ProcessThread(sid:0 cport:11468)::PrepRequestProcessor@157] - PrepRequestProcessor exited loop! [junit] 2017-06-08 11:58:05,753 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited! [junit] 2017-06-08 11:58:05,754 [myid:] - INFO [main:FinalRequestProcessor@481] - shutdown of request processor complete [junit] 2017-06-08 11:58:05,754 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean
[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042542#comment-16042542 ] ASF GitHub Bot commented on ZOOKEEPER-2775: --- Github user arshadmohammad commented on the issue: https://github.com/apache/zookeeper/pull/254 This PR can be merged to master and branch-3.5 only. I will raise another pull request for branch-3.4 > ZK Client not able to connect with Xid out of order error > -- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Bhupendra Kumar Jain >Assignee: Mohammad Arshad >Priority: Critical > Attachments: ZOOKEEPER-2775-01.patch > > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] zookeeper issue #254: ZOOKEEPER-2775: ZK Client not able to connect with Xid...
Github user arshadmohammad commented on the issue: https://github.com/apache/zookeeper/pull/254 This PR can be merged to master and branch-3.5 only. I will raise another pull request for branch-3.4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---