[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131788#comment-16131788 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133886561 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { --- End diff -- Update. I used to consider that the `closeSocket(client); ` should also be executed when exception was thrown. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException
Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133886561 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { --- End diff -- Update. I used to consider that the `closeSocket(client); ` should also be executed when exception was thrown. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131777#comment-16131777 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133885327 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { --- End diff -- - update - l checked the native method `java.net.PlainSocketImpl.socketAccept(Native Method)` in [openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c), **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is interpreted as an infinite timeout. In some cases, [-1 was interpreted as a larger positive integer](https://lwn.net/Articles/483078/). so this issue always happend after 49days. It's my wild conjecture. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException
Github user bitgaoshu commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133885327 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { --- End diff -- - update - l checked the native method `java.net.PlainSocketImpl.socketAccept(Native Method)` in [openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c), **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is interpreted as an infinite timeout. In some cases, [-1 was interpreted as a larger positive integer](https://lwn.net/Articles/483078/). so this issue always happend after 49days. It's my wild conjecture. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
ZooKeeper_branch34_openjdk7 - Build # 1614 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1614/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by an SCM change [EnvInject] - Loading node environment variables. Building remotely on qnode1 (ubuntu) in workspace /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10 Cleaning workspace > git rev-parse --verify HEAD # timeout=10 Resetting working tree > git reset --hard # timeout=10 > git clean -fdx # timeout=10 Fetching upstream changes from git://git.apache.org/zookeeper.git > git --version # timeout=10 > git fetch --tags --progress git://git.apache.org/zookeeper.git > +refs/heads/*:refs/remotes/origin/* > git rev-parse refs/remotes/origin/branch-3.4^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/branch-3.4^{commit} # timeout=10 Checking out Revision b903a07c4944cb0a90045e686b7c3f153aee6153 (refs/remotes/origin/branch-3.4) Commit message: "ZOOKEEPER-2874: Windows Debug builds don't link with `/MTd`" > git config core.sparsecheckout # timeout=10 > git checkout -f b903a07c4944cb0a90045e686b7c3f153aee6153 > git rev-list 1f811a6281090e1b24152dc51507aa6a2bdeafe3 # timeout=10 No emails were triggered. [ZooKeeper_branch34_openjdk7] $ /home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -Dtest.output=yes -Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.7 clean test-core-java Error: JAVA_HOME is not defined correctly. We cannot execute /usr/lib/jvm/java-7-openjdk-amd64//bin/java Build step 'Invoke Ant' marked build as failure Recording test results ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131746#comment-16131746 ] Hudson commented on ZOOKEEPER-2874: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3503 (See [https://builds.apache.org/job/ZooKeeper-trunk/3503/]) ZOOKEEPER-2874: Windows Debug builds don't link with `/MTd` (hanm: rev ab182d4561f1c6725af0e89e0b76d92186732195) * (edit) src/c/CMakeLists.txt > Windows Debug builds don't link with `/MTd` > --- > > Key: ZOOKEEPER-2874 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874 > Project: ZooKeeper > Issue Type: Bug > Environment: Windows 10 using CMake >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > While not apparent when building ZooKeeper stand-alone, further testing when > linking with Mesos revealed it was ZooKeeper that was causing the warning: > {noformat} > LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj] > {noformat} > As Mesos is linking with {{/MTd}} in Debug configuration (which is the most > common practice). > Once I found the source of the warning, the fix is trivial and I am posting a > patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131747#comment-16131747 ] Hudson commented on ZOOKEEPER-2872: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3503 (See [https://builds.apache.org/job/ZooKeeper-trunk/3503/]) ZOOKEEPER-2872: Interrupted snapshot sync causes data loss (hanm: rev 0706b40afad079f19fe9f76c99bbb7ec69780dbd) * (edit) src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java * (edit) src/java/test/org/apache/zookeeper/test/TruncateTest.java * (edit) src/java/main/org/apache/zookeeper/server/quorum/Learner.java * (edit) src/java/main/org/apache/zookeeper/server/persistence/SnapShot.java * (edit) src/java/main/org/apache/zookeeper/server/persistence/FileSnap.java * (edit) src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java * (edit) src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java > Interrupted snapshot sync causes data loss > -- > > Key: ZOOKEEPER-2872 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Brian Nixon > > There is a way for observers to permanently lose data from their local data > tree while remaining members of good standing with the ensemble and > continuing to serve client traffic when the following chain of events occurs. > 1. The observer dies in epoch N from machine failure. > 2. The observer comes back up in epoch N+1 and requests a snapshot sync to > catch up. > 3. The machine powers off before the snapshot is synced to disc and after > some txn's have been logged (depending on the OS, this can happen!). > 4. The observer comes back a second time and replays its most recent snapshot > (epoch <= N) as well as the txn logs (epoch N+1). > 5. A diff sync is requested from the leader and the observer broadcasts > availability. > In this scenario, any commits from epoch N that the observer did not receive > before it died the first time will never be exposed to the observer and no > part of the ensemble will complain. > This situation is not unique to observers and can happen to any learner. As a > simple fix, fsync-ing the snapshots received from the leader will avoid the > case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2804) Node creation fails with NPE if ACLs are null
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131709#comment-16131709 ] ASF GitHub Bot commented on ZOOKEEPER-2804: --- Github user jainbhupendra24 commented on the issue: https://github.com/apache/zookeeper/pull/279 @hanm , I will update the patch > Node creation fails with NPE if ACLs are null > - > > Key: ZOOKEEPER-2804 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2804 > Project: ZooKeeper > Issue Type: Bug >Reporter: Bhupendra Kumar Jain > > If null ACLs are passed then zk node creation or set ACL fails with NPE > {code} > java.lang.NullPointerException > at > org.apache.zookeeper.server.PrepRequestProcessor.removeDuplicates(PrepRequestProcessor.java:1301) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:1341) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:519) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:1126) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:178) > {code} > Expected to handle null in server and return proper error code to client -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #279: ZOOKEEPER-2804:Node creation fails with NPE if ACLs ar...
Github user jainbhupendra24 commented on the issue: https://github.com/apache/zookeeper/pull/279 @hanm , I will update the patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/333 Committed to master: 0706b40afad079f19fe9f76c99bbb7ec69780dbd Pending JIRA resolve after fixing merge conflicts and commit into branch-3.4 and 3.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131705#comment-16131705 ] ASF GitHub Bot commented on ZOOKEEPER-2872: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/333 Committed to master: 0706b40afad079f19fe9f76c99bbb7ec69780dbd Pending JIRA resolve after fixing merge conflicts and commit into branch-3.4 and 3.5. > Interrupted snapshot sync causes data loss > -- > > Key: ZOOKEEPER-2872 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Brian Nixon > > There is a way for observers to permanently lose data from their local data > tree while remaining members of good standing with the ensemble and > continuing to serve client traffic when the following chain of events occurs. > 1. The observer dies in epoch N from machine failure. > 2. The observer comes back up in epoch N+1 and requests a snapshot sync to > catch up. > 3. The machine powers off before the snapshot is synced to disc and after > some txn's have been logged (depending on the OS, this can happen!). > 4. The observer comes back a second time and replays its most recent snapshot > (epoch <= N) as well as the txn logs (epoch N+1). > 5. A diff sync is requested from the leader and the observer broadcasts > availability. > In this scenario, any commits from epoch N that the observer did not receive > before it died the first time will never be exposed to the observer and no > part of the ensemble will complain. > This situation is not unique to observers and can happen to any learner. As a > simple fix, fsync-ing the snapshots received from the leader will avoid the > case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131704#comment-16131704 ] Hadoop QA commented on ZOOKEEPER-2770: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 3.0.1) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//console This message is automatically generated. > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: ZOOKEEPER- PreCommit Build #944
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 72.33 MB...] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 3 new Findbugs (version 3.0.1) warnings. [exec] [exec] -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//testReport/ [exec] Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 93f9d2049e6d1a45a4db24a64472d4e1adabae4f logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: ‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’ and ‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’ are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1643: exec returned: 3 Total time: 12 minutes 48 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [Fast Archiver] Compressed 591.81 KB of artifacts by 32.4% relative to #943 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2770 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## 5 tests failed. FAILED: org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging Error Message: mockAppender.doAppend(); Wanted 3 times: -> at org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging(HighLatencyRequestLoggingTest.java:241) But was 2 times: -> at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) Stack Trace: junit.framework.AssertionFailedError: mockAppender.doAppend(); Wanted 3 times: -> at org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging(HighLatencyRequestLoggingTest.java:241) But was 2 times: -> at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging(HighLatencyRequestLoggingTest.java:241) at org.mockito.internal.runners.JUnit45AndHigherRunnerImpl.run(JUnit45AndHigherRunnerImpl.java:37) at org.mockito.runners.MockitoJUnitRunner.run(MockitoJUnitRunner.java:62) FAILED:
[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131702#comment-16131702 ] ASF GitHub Bot commented on ZOOKEEPER-2872: --- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/333 > Interrupted snapshot sync causes data loss > -- > > Key: ZOOKEEPER-2872 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Brian Nixon > > There is a way for observers to permanently lose data from their local data > tree while remaining members of good standing with the ensemble and > continuing to serve client traffic when the following chain of events occurs. > 1. The observer dies in epoch N from machine failure. > 2. The observer comes back up in epoch N+1 and requests a snapshot sync to > catch up. > 3. The machine powers off before the snapshot is synced to disc and after > some txn's have been logged (depending on the OS, this can happen!). > 4. The observer comes back a second time and replays its most recent snapshot > (epoch <= N) as well as the txn logs (epoch N+1). > 5. A diff sync is requested from the leader and the observer broadcasts > availability. > In this scenario, any commits from epoch N that the observer did not receive > before it died the first time will never be exposed to the observer and no > part of the ensemble will complain. > This situation is not unique to observers and can happen to any learner. As a > simple fix, fsync-ing the snapshots received from the leader will avoid the > case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #333: ZOOKEEPER-2872: Interrupted snapshot sync cause...
Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/333 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131698#comment-16131698 ] ASF GitHub Bot commented on ZOOKEEPER-2874: --- Github user andschwa commented on the issue: https://github.com/apache/zookeeper/pull/335 Thank you @hanm! > Windows Debug builds don't link with `/MTd` > --- > > Key: ZOOKEEPER-2874 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874 > Project: ZooKeeper > Issue Type: Bug > Environment: Windows 10 using CMake >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > While not apparent when building ZooKeeper stand-alone, further testing when > linking with Mesos revealed it was ZooKeeper that was causing the warning: > {noformat} > LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj] > {noformat} > As Mesos is linking with {{/MTd}} in Debug configuration (which is the most > common practice). > Once I found the source of the warning, the fix is trivial and I am posting a > patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #335: ZOOKEEPER-2874: Windows Debug builds don't link with `...
Github user andschwa commented on the issue: https://github.com/apache/zookeeper/pull/335 Thank you @hanm! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131695#comment-16131695 ] ASF GitHub Bot commented on ZOOKEEPER-2874: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/335 Committed to master: ab182d4561f1c6725af0e89e0b76d92186732195 branch-3.5: 8f68c04838c3d034bcef7e937a3c23f3cfef8065 branch-3.4: b903a07c4944cb0a90045e686b7c3f153aee6153 > Windows Debug builds don't link with `/MTd` > --- > > Key: ZOOKEEPER-2874 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874 > Project: ZooKeeper > Issue Type: Bug > Environment: Windows 10 using CMake >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > While not apparent when building ZooKeeper stand-alone, further testing when > linking with Mesos revealed it was ZooKeeper that was causing the warning: > {noformat} > LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj] > {noformat} > As Mesos is linking with {{/MTd}} in Debug configuration (which is the most > common practice). > Once I found the source of the warning, the fix is trivial and I am posting a > patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131694#comment-16131694 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user karanmehta93 commented on the issue: https://github.com/apache/zookeeper/pull/307 @hanm @eribeiro @tdunning @skamille Please review. Now that I have added rate limiting to logging, can we also turn this on by default? > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #335: ZOOKEEPER-2874: Windows Debug builds don't link with `...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/335 Committed to master: ab182d4561f1c6725af0e89e0b76d92186732195 branch-3.5: 8f68c04838c3d034bcef7e937a3c23f3cfef8065 branch-3.4: b903a07c4944cb0a90045e686b7c3f153aee6153 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Success: ZOOKEEPER- PreCommit Build #943
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 69.17 MB...] [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +0 tests included. The patch appears to be a documentation patch that doesn't require tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 2d9bc96aa913d3439ae248983e08fef507f4510a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD SUCCESSFUL Total time: 19 minutes 43 seconds Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2836 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Success Sending email for trigger: Success Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131692#comment-16131692 ] ASF GitHub Bot commented on ZOOKEEPER-2874: --- Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/335 > Windows Debug builds don't link with `/MTd` > --- > > Key: ZOOKEEPER-2874 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874 > Project: ZooKeeper > Issue Type: Bug > Environment: Windows 10 using CMake >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Fix For: 3.5.4, 3.6.0, 3.4.11 > > > While not apparent when building ZooKeeper stand-alone, further testing when > linking with Mesos revealed it was ZooKeeper that was causing the warning: > {noformat} > LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj] > {noformat} > As Mesos is linking with {{/MTd}} in Debug configuration (which is the most > common practice). > Once I found the source of the warning, the fix is trivial and I am posting a > patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user karanmehta93 commented on the issue: https://github.com/apache/zookeeper/pull/307 @hanm @eribeiro @tdunning @skamille Please review. Now that I have added rate limiting to logging, can we also turn this on by default? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han resolved ZOOKEEPER-2874. Resolution: Fixed Fix Version/s: 3.5.4 3.6.0 3.4.11 Issue resolved by pull request 335 [https://github.com/apache/zookeeper/pull/335] > Windows Debug builds don't link with `/MTd` > --- > > Key: ZOOKEEPER-2874 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874 > Project: ZooKeeper > Issue Type: Bug > Environment: Windows 10 using CMake >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Fix For: 3.4.11, 3.6.0, 3.5.4 > > > While not apparent when building ZooKeeper stand-alone, further testing when > linking with Mesos revealed it was ZooKeeper that was causing the warning: > {noformat} > LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj] > {noformat} > As Mesos is linking with {{/MTd}} in Debug configuration (which is the most > common practice). > Once I found the source of the warning, the fix is trivial and I am posting a > patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #335: ZOOKEEPER-2874: Windows Debug builds don't link...
Github user asfgit closed the pull request at: https://github.com/apache/zookeeper/pull/335 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131691#comment-16131691 ] Hadoop QA commented on ZOOKEEPER-2836: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//console This message is automatically generated. > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2804) Node creation fails with NPE if ACLs are null
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131689#comment-16131689 ] ASF GitHub Bot commented on ZOOKEEPER-2804: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/279 Let's wrap this up before it becoming more stale. I believe the only remaining work item is the last review comment @arshadmohammad made: >> As this Validation we are doing multiple places it would be better if this piece of code is extracted to method. @jainbhupendra24 Do you mind update this pull request and do what Arshad suggested? > Node creation fails with NPE if ACLs are null > - > > Key: ZOOKEEPER-2804 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2804 > Project: ZooKeeper > Issue Type: Bug >Reporter: Bhupendra Kumar Jain > > If null ACLs are passed then zk node creation or set ACL fails with NPE > {code} > java.lang.NullPointerException > at > org.apache.zookeeper.server.PrepRequestProcessor.removeDuplicates(PrepRequestProcessor.java:1301) > at > org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:1341) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:519) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:1126) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:178) > {code} > Expected to handle null in server and return proper error code to client -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #279: ZOOKEEPER-2804:Node creation fails with NPE if ACLs ar...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/279 Let's wrap this up before it becoming more stale. I believe the only remaining work item is the last review comment @arshadmohammad made: >> As this Validation we are doing multiple places it would be better if this piece of code is extracted to method. @jainbhupendra24 Do you mind update this pull request and do what Arshad suggested? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131683#comment-16131683 ] ASF GitHub Bot commented on ZOOKEEPER-2872: --- Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/333 >> it seems best to keep snapshot taking a lighter weight operation. Sounds reasonable. >> I am unable to reproduce the test failure in Zab1_0Test I think it's a flaky test. Filed ZOOKEEPER-2877 for this. > Interrupted snapshot sync causes data loss > -- > > Key: ZOOKEEPER-2872 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Brian Nixon > > There is a way for observers to permanently lose data from their local data > tree while remaining members of good standing with the ensemble and > continuing to serve client traffic when the following chain of events occurs. > 1. The observer dies in epoch N from machine failure. > 2. The observer comes back up in epoch N+1 and requests a snapshot sync to > catch up. > 3. The machine powers off before the snapshot is synced to disc and after > some txn's have been logged (depending on the OS, this can happen!). > 4. The observer comes back a second time and replays its most recent snapshot > (epoch <= N) as well as the txn logs (epoch N+1). > 5. A diff sync is requested from the leader and the observer broadcasts > availability. > In this scenario, any commits from epoch N that the observer did not receive > before it died the first time will never be exposed to the observer and no > part of the ensemble will complain. > This situation is not unique to observers and can happen to any learner. As a > simple fix, fsync-ing the snapshots received from the leader will avoid the > case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/333 >> it seems best to keep snapshot taking a lighter weight operation. Sounds reasonable. >> I am unable to reproduce the test failure in Zab1_0Test I think it's a flaky test. Filed ZOOKEEPER-2877 for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (ZOOKEEPER-2877) Flaky Test: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun
Michael Han created ZOOKEEPER-2877: -- Summary: Flaky Test: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun Key: ZOOKEEPER-2877 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2877 Project: ZooKeeper Issue Type: Bug Components: tests Reporter: Michael Han {noformat} Error Message expected:<1> but was:<0> Stacktrace junit.framework.AssertionFailedError: expected:<1> but was:<0> at org.apache.zookeeper.server.quorum.Zab1_0Test$6.converseWithLeader(Zab1_0Test.java:939) at org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderConversation(Zab1_0Test.java:398) at org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun(Zab1_0Test.java:906) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131588#comment-16131588 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133865685 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { --- End diff -- - can we reproduce this issue?(haha,49days)? This should never happen theoretically.According to [KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or [tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also didn't find the root-cause,just do like [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354) to add some protections against this issue here. - One assumption is SocketServer.accept() use the default infinite value(2 ^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) > a call to accept() for this ServerSocket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the ServerSocket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout. so this issuse always happended after 49days 17 hours(4294967295/1000/60/60/24=49.7days) > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException
Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133865685 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { -break; -} LOG.error("Exception while listening", e); -numRetries++; +if (!(e instanceof SocketTimeoutException)) { --- End diff -- - can we reproduce this issue?(haha,49days)? This should never happen theoretically.According to [KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or [tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also didn't find the root-cause,just do like [this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354) to add some protections against this issue here. - One assumption is SocketServer.accept() use the default infinite value(2 ^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) > a call to accept() for this ServerSocket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the ServerSocket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout. so this issuse always happended after 49days 17 hours(4294967295/1000/60/60/24=49.7days) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131577#comment-16131577 ] ASF GitHub Bot commented on ZOOKEEPER-2836: --- Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133864927 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { --- End diff -- why we need to move code block **Line650-Line652** to code block **Line665-Line667** ? > QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException > -- > > Key: ZOOKEEPER-2836 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum >Affects Versions: 3.4.6 > Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 > x86_64 GNU/Linux > Java Version: jdk64/jdk1.8.0_40 > zookeeper version: 3.4.6.2.3.2.0-2950 >Reporter: Amarjeet Singh >Priority: Critical > > QuorumCnxManager Listener thread blocks SocketServer on accept but we are > getting SocketTimeoutException on our boxes after 49days 17 hours . As per > current code there is a 3 times retry and after that it says "_As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: $/$:3888__" , Once server nodes reache this state and > we restart or add a new node ,it fails to join cluster and logs 'WARN > QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open > channel to 3 at election address $/$:3888' . > As there is no timeout specified for ServerSocket it should never > timeout but there are some already discussed issues where people have seen > this issue and added checks for SocketTimeoutException explicitly like > https://issues.apache.org/jira/browse/KARAF-3325 . > I think we need to handle SocketTimeoutException on similar lines for > zookeeper as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException
Github user maoling commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/336#discussion_r133864927 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java --- @@ -647,11 +648,10 @@ public void run() { numRetries = 0; } } catch (IOException e) { -if (shutdown) { --- End diff -- why we need to move code block **Line650-Line652** to code block **Line665-Line667** ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131435#comment-16131435 ] ASF GitHub Bot commented on ZOOKEEPER-1416: --- Github user Randgalt commented on the issue: https://github.com/apache/zookeeper/pull/136 Another goal is feature parity with other consensus tools such as etcd/consul. I added TTL nodes with this (and other) goals earlier in the year (or was it last year?). Watches in consul are persistent and optionally recursive. > Persistent Recursive Watch > -- > > Key: ZOOKEEPER-1416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416 > Project: ZooKeeper > Issue Type: Improvement > Components: c client, documentation, java client, server >Reporter: Phillip Liu >Assignee: Jordan Zimmerman > Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > h4. The Problem > A ZooKeeper Watch can be placed on a single znode and when the znode changes > a Watch event is sent to the client. If there are thousands of znodes being > watched, when a client (re)connect, it would have to send thousands of watch > requests. At Facebook, we have this problem storing information for thousands > of db shards. Consequently a naming service that consumes the db shard > definition issues thousands of watch requests each time the service starts > and changes client watcher. > h4. Proposed Solution > We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent > means no Watch reset is necessary after a watch-fire. Recursive means the > Watch applies to the node and descendant nodes. A Persistent Recursive Watch > behaves as follows: > # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS. > # CHILDREN and DATA Recursive Watches can be placed on any znode. > # EXISTS Recursive Watches can be placed on any path. > # A Recursive Watch behaves like a auto-watch registrar on the server side. > Setting a Recursive Watch means to set watches on all descendant znodes. > # When a watch on a descendant fires, no subsequent event is fired until a > corresponding getData(..) on the znode is called, then Recursive Watch > automically apply the watch on the znode. This maintains the existing Watch > semantic on an individual znode. > # A Recursive Watch overrides any watches placed on a descendant znode. > Practically this means the Recursive Watch Watcher callback is the one > receiving the event and event is delivered exactly once. > A goal here is to reduce the number of semantic changes. The guarantee of no > intermediate watch event until data is read will be maintained. The only > difference is we will automatically re-add the watch after read. At the same > time we add the convience of reducing the need to add multiple watches for > sibling znodes and in turn reduce the number of watch messages sent from the > client to the server. > There are some implementation details that needs to be hashed out. Initial > thinking is to have the Recursive Watch create per-node watches. This will > cause a lot of watches to be created on the server side. Currently, each > watch is stored as a single bit in a bit set relative to a session - up to 3 > bits per client per znode. If there are 100m znodes with 100k clients, each > watching all nodes, then this strategy will consume approximately 3.75TB of > ram distributed across all Observers. Seems expensive. > Alternatively, a blacklist of paths to not send Watches regardless of Watch > setting can be set each time a watch event from a Recursive Watch is fired. > The memory utilization is relative to the number of outstanding reads and at > worst case it's 1/3 * 3.75TB using the parameters given above. > Otherwise, a relaxation of no intermediate watch event until read guarantee > is required. If the server can send watch events regardless of one has > already been fired without corresponding read, then the server can simply > fire watch events without tracking. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #136: [ZOOKEEPER-1416] Persistent Recursive Watch
Github user Randgalt commented on the issue: https://github.com/apache/zookeeper/pull/136 Another goal is feature parity with other consensus tools such as etcd/consul. I added TTL nodes with this (and other) goals earlier in the year (or was it last year?). Watches in consul are persistent and optionally recursive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131426#comment-16131426 ] ASF GitHub Bot commented on ZOOKEEPER-1416: --- Github user skamille commented on the issue: https://github.com/apache/zookeeper/pull/136 We have to remember that people who don't use TreeCache will still use this feature. Not to say that we shouldn't keep it in mind as an important user, but presumably people who don't actually do anything with curator will decide to use this feature. Does the design make sense absent that consideration? Specifically, if you weren't thinking of this as a feature for TreeCache, would we implement it to automatically watch children changes as well, or would it be broken up into two modes: persistent no children, persistent children. > Persistent Recursive Watch > -- > > Key: ZOOKEEPER-1416 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416 > Project: ZooKeeper > Issue Type: Improvement > Components: c client, documentation, java client, server >Reporter: Phillip Liu >Assignee: Jordan Zimmerman > Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > h4. The Problem > A ZooKeeper Watch can be placed on a single znode and when the znode changes > a Watch event is sent to the client. If there are thousands of znodes being > watched, when a client (re)connect, it would have to send thousands of watch > requests. At Facebook, we have this problem storing information for thousands > of db shards. Consequently a naming service that consumes the db shard > definition issues thousands of watch requests each time the service starts > and changes client watcher. > h4. Proposed Solution > We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent > means no Watch reset is necessary after a watch-fire. Recursive means the > Watch applies to the node and descendant nodes. A Persistent Recursive Watch > behaves as follows: > # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS. > # CHILDREN and DATA Recursive Watches can be placed on any znode. > # EXISTS Recursive Watches can be placed on any path. > # A Recursive Watch behaves like a auto-watch registrar on the server side. > Setting a Recursive Watch means to set watches on all descendant znodes. > # When a watch on a descendant fires, no subsequent event is fired until a > corresponding getData(..) on the znode is called, then Recursive Watch > automically apply the watch on the znode. This maintains the existing Watch > semantic on an individual znode. > # A Recursive Watch overrides any watches placed on a descendant znode. > Practically this means the Recursive Watch Watcher callback is the one > receiving the event and event is delivered exactly once. > A goal here is to reduce the number of semantic changes. The guarantee of no > intermediate watch event until data is read will be maintained. The only > difference is we will automatically re-add the watch after read. At the same > time we add the convience of reducing the need to add multiple watches for > sibling znodes and in turn reduce the number of watch messages sent from the > client to the server. > There are some implementation details that needs to be hashed out. Initial > thinking is to have the Recursive Watch create per-node watches. This will > cause a lot of watches to be created on the server side. Currently, each > watch is stored as a single bit in a bit set relative to a session - up to 3 > bits per client per znode. If there are 100m znodes with 100k clients, each > watching all nodes, then this strategy will consume approximately 3.75TB of > ram distributed across all Observers. Seems expensive. > Alternatively, a blacklist of paths to not send Watches regardless of Watch > setting can be set each time a watch event from a Recursive Watch is fired. > The memory utilization is relative to the number of outstanding reads and at > worst case it's 1/3 * 3.75TB using the parameters given above. > Otherwise, a relaxation of no intermediate watch event until read guarantee > is required. If the server can send watch events regardless of one has > already been fired without corresponding read, then the server can simply > fire watch events without tracking. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #136: [ZOOKEEPER-1416] Persistent Recursive Watch
Github user skamille commented on the issue: https://github.com/apache/zookeeper/pull/136 We have to remember that people who don't use TreeCache will still use this feature. Not to say that we shouldn't keep it in mind as an important user, but presumably people who don't actually do anything with curator will decide to use this feature. Does the design make sense absent that consideration? Specifically, if you weren't thinking of this as a feature for TreeCache, would we implement it to automatically watch children changes as well, or would it be broken up into two modes: persistent no children, persistent children. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130996#comment-16130996 ] ASF GitHub Bot commented on ZOOKEEPER-2872: --- Github user enixon commented on the issue: https://github.com/apache/zookeeper/pull/333 I am unable to reproduce the test failure in Zab1_0Test > Interrupted snapshot sync causes data loss > -- > > Key: ZOOKEEPER-2872 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Brian Nixon > > There is a way for observers to permanently lose data from their local data > tree while remaining members of good standing with the ensemble and > continuing to serve client traffic when the following chain of events occurs. > 1. The observer dies in epoch N from machine failure. > 2. The observer comes back up in epoch N+1 and requests a snapshot sync to > catch up. > 3. The machine powers off before the snapshot is synced to disc and after > some txn's have been logged (depending on the OS, this can happen!). > 4. The observer comes back a second time and replays its most recent snapshot > (epoch <= N) as well as the txn logs (epoch N+1). > 5. A diff sync is requested from the leader and the observer broadcasts > availability. > In this scenario, any commits from epoch N that the observer did not receive > before it died the first time will never be exposed to the observer and no > part of the ensemble will complain. > This situation is not unique to observers and can happen to any learner. As a > simple fix, fsync-ing the snapshots received from the leader will avoid the > case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...
Github user enixon commented on the issue: https://github.com/apache/zookeeper/pull/333 I am unable to reproduce the test failure in Zab1_0Test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
ZooKeeper_branch35_jdk7 - Build # 1079 - Still Failing
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/1079/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 66.99 MB...] [junit] 2017-08-17 08:50:04,407 [myid:] - WARN [New I/O boss #5236:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 0x2d9f4d89] EXCEPTION: java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:27506 [junit] java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:27506 [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [junit] at java.lang.Thread.run(Thread.java:745) [junit] 2017-08-17 08:50:04,407 [myid:] - INFO [New I/O boss #5236:ClientCnxnSocketNetty@208] - channel is told closing [junit] 2017-08-17 08:50:04,408 [myid:127.0.0.1:27506] - INFO [main-SendThread(127.0.0.1:27506):ClientCnxn$SendThread@1231] - channel for sessionid 0x20568d0c015 is lost, closing socket connection and attempting reconnect [junit] 2017-08-17 08:50:04,421 [myid:127.0.0.1:27444] - INFO [main-SendThread(127.0.0.1:27444):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:27444. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-08-17 08:50:04,422 [myid:] - INFO [New I/O boss #3723:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {} [junit] java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:27444 [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [junit] at java.lang.Thread.run(Thread.java:745) [junit] 2017-08-17 08:50:04,473 [myid:] - WARN [New I/O boss #3723:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 0x1544a03b] EXCEPTION: java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:27444 [junit] java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:27444 [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [junit] at java.util.concurrent.ThreadPoolExecutor