[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131788#comment-16131788
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133886561
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
--- End diff --

Update. I used to consider that the `closeSocket(client); ` should also be 
executed when exception was thrown.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $/$:3888' .
> As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
> I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

2017-08-17 Thread bitgaoshu
Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133886561
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
--- End diff --

Update. I used to consider that the `closeSocket(client); ` should also be 
executed when exception was thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131777#comment-16131777
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133885327
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
--- End diff --

- update

- l checked the native method `java.net.PlainSocketImpl.socketAccept(Native 
Method)` in 
[openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c),
 **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is 
interpreted as an infinite timeout.  In some cases, [-1 was interpreted as a 
larger positive integer](https://lwn.net/Articles/483078/). so this issue 
always happend after 49days. It's my wild conjecture.


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $/$:3888' .
> As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
> I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

2017-08-17 Thread bitgaoshu
Github user bitgaoshu commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133885327
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
--- End diff --

- update

- l checked the native method `java.net.PlainSocketImpl.socketAccept(Native 
Method)` in 
[openjdk](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/9d617cfd6717/src/solaris/native/java/net/PlainSocketImpl.c),
 **line709-721**, in which it changed from 0 to -1. and then timeout of -1 is 
interpreted as an infinite timeout.  In some cases, [-1 was interpreted as a 
larger positive integer](https://lwn.net/Articles/483078/). so this issue 
always happend after 49days. It's my wild conjecture.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper_branch34_openjdk7 - Build # 1614 - Failure

2017-08-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1614/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on qnode1 (ubuntu) in workspace 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from git://git.apache.org/zookeeper.git
 > git --version # timeout=10
 > git fetch --tags --progress git://git.apache.org/zookeeper.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/branch-3.4^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/branch-3.4^{commit} # timeout=10
Checking out Revision b903a07c4944cb0a90045e686b7c3f153aee6153 
(refs/remotes/origin/branch-3.4)
Commit message: "ZOOKEEPER-2874: Windows Debug builds don't link with `/MTd`"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f b903a07c4944cb0a90045e686b7c3f153aee6153
 > git rev-list 1f811a6281090e1b24152dc51507aa6a2bdeafe3 # timeout=10
No emails were triggered.
[ZooKeeper_branch34_openjdk7] $ 
/home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -Dtest.output=yes 
-Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.7 clean 
test-core-java
Error: JAVA_HOME is not defined correctly.
  We cannot execute /usr/lib/jvm/java-7-openjdk-amd64//bin/java
Build step 'Invoke Ant' marked build as failure
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`

2017-08-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131746#comment-16131746
 ] 

Hudson commented on ZOOKEEPER-2874:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3503 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/3503/])
ZOOKEEPER-2874: Windows Debug builds don't link with `/MTd` (hanm: rev 
ab182d4561f1c6725af0e89e0b76d92186732195)
* (edit) src/c/CMakeLists.txt


> Windows Debug builds don't link with `/MTd`
> ---
>
> Key: ZOOKEEPER-2874
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Windows 10 using CMake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> While not apparent when building ZooKeeper stand-alone, further testing when 
> linking with Mesos revealed it was ZooKeeper that was causing the warning:
> {noformat}
> LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj]
> {noformat}
> As Mesos is linking with {{/MTd}} in Debug configuration (which is the most 
> common practice).
> Once I found the source of the warning, the fix is trivial and I am posting a 
> patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131747#comment-16131747
 ] 

Hudson commented on ZOOKEEPER-2872:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #3503 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/3503/])
ZOOKEEPER-2872: Interrupted snapshot sync causes data loss (hanm: rev 
0706b40afad079f19fe9f76c99bbb7ec69780dbd)
* (edit) src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
* (edit) src/java/test/org/apache/zookeeper/test/TruncateTest.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/Learner.java
* (edit) src/java/main/org/apache/zookeeper/server/persistence/SnapShot.java
* (edit) src/java/main/org/apache/zookeeper/server/persistence/FileSnap.java
* (edit) src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java
* (edit) 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java


> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2804) Node creation fails with NPE if ACLs are null

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131709#comment-16131709
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2804:
---

Github user jainbhupendra24 commented on the issue:

https://github.com/apache/zookeeper/pull/279
  
@hanm , I will update the patch


> Node creation fails with NPE if ACLs are null
> -
>
> Key: ZOOKEEPER-2804
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2804
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Bhupendra Kumar Jain
>
> If null ACLs are passed then zk node creation or set ACL fails with NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.removeDuplicates(PrepRequestProcessor.java:1301)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:1341)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:519)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:1126)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:178)
> {code}
> Expected to handle null in server and return proper error code to client



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #279: ZOOKEEPER-2804:Node creation fails with NPE if ACLs ar...

2017-08-17 Thread jainbhupendra24
Github user jainbhupendra24 commented on the issue:

https://github.com/apache/zookeeper/pull/279
  
@hanm , I will update the patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...

2017-08-17 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
Committed to master: 0706b40afad079f19fe9f76c99bbb7ec69780dbd

Pending JIRA resolve after fixing merge conflicts and commit into 
branch-3.4 and 3.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131705#comment-16131705
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2872:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
Committed to master: 0706b40afad079f19fe9f76c99bbb7ec69780dbd

Pending JIRA resolve after fixing merge conflicts and commit into 
branch-3.4 and 3.5.


> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-08-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131704#comment-16131704
 ] 

Hadoop QA commented on ZOOKEEPER-2770:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 3.0.1) 
warnings.

-1 release audit.  The applied patch generated 1 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//console

This message is automatically generated.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: ZOOKEEPER- PreCommit Build #944

2017-08-17 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 72.33 MB...]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 3 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 1 release audit 
warnings (more than the trunk's current 0 warnings).
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//testReport/
 [exec] Release audit warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/944//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 93f9d2049e6d1a45a4db24a64472d4e1adabae4f logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1643:
 exec returned: 3

Total time: 12 minutes 48 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[Fast Archiver] Compressed 591.81 KB of artifacts by 32.4% relative to #943
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2770
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
5 tests failed.
FAILED:  
org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging

Error Message:

mockAppender.doAppend();
Wanted 3 times:
-> at 
org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging(HighLatencyRequestLoggingTest.java:241)
But was 2 times:
-> at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)


Stack Trace:
junit.framework.AssertionFailedError: 
mockAppender.doAppend();
Wanted 3 times:
-> at 
org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging(HighLatencyRequestLoggingTest.java:241)
But was 2 times:
-> at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

at 
org.apache.zookeeper.server.HighLatencyRequestLoggingTest.testFrequentRequestWarningThresholdLogging(HighLatencyRequestLoggingTest.java:241)
at 
org.mockito.internal.runners.JUnit45AndHigherRunnerImpl.run(JUnit45AndHigherRunnerImpl.java:37)
at 
org.mockito.runners.MockitoJUnitRunner.run(MockitoJUnitRunner.java:62)


FAILED:  

[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131702#comment-16131702
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2872:
---

Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/333


> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #333: ZOOKEEPER-2872: Interrupted snapshot sync cause...

2017-08-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/333


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131698#comment-16131698
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2874:
---

Github user andschwa commented on the issue:

https://github.com/apache/zookeeper/pull/335
  
Thank you @hanm!


> Windows Debug builds don't link with `/MTd`
> ---
>
> Key: ZOOKEEPER-2874
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Windows 10 using CMake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> While not apparent when building ZooKeeper stand-alone, further testing when 
> linking with Mesos revealed it was ZooKeeper that was causing the warning:
> {noformat}
> LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj]
> {noformat}
> As Mesos is linking with {{/MTd}} in Debug configuration (which is the most 
> common practice).
> Once I found the source of the warning, the fix is trivial and I am posting a 
> patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #335: ZOOKEEPER-2874: Windows Debug builds don't link with `...

2017-08-17 Thread andschwa
Github user andschwa commented on the issue:

https://github.com/apache/zookeeper/pull/335
  
Thank you @hanm!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131695#comment-16131695
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2874:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/335
  
Committed to master: ab182d4561f1c6725af0e89e0b76d92186732195
branch-3.5: 8f68c04838c3d034bcef7e937a3c23f3cfef8065
branch-3.4: b903a07c4944cb0a90045e686b7c3f153aee6153


> Windows Debug builds don't link with `/MTd`
> ---
>
> Key: ZOOKEEPER-2874
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Windows 10 using CMake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> While not apparent when building ZooKeeper stand-alone, further testing when 
> linking with Mesos revealed it was ZooKeeper that was causing the warning:
> {noformat}
> LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj]
> {noformat}
> As Mesos is linking with {{/MTd}} in Debug configuration (which is the most 
> common practice).
> Once I found the source of the warning, the fix is trivial and I am posting a 
> patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131694#comment-16131694
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user karanmehta93 commented on the issue:

https://github.com/apache/zookeeper/pull/307
  
@hanm @eribeiro @tdunning @skamille 
Please review.
Now that I have added rate limiting to logging, can we also turn this on by 
default?


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #335: ZOOKEEPER-2874: Windows Debug builds don't link with `...

2017-08-17 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/335
  
Committed to master: ab182d4561f1c6725af0e89e0b76d92186732195
branch-3.5: 8f68c04838c3d034bcef7e937a3c23f3cfef8065
branch-3.4: b903a07c4944cb0a90045e686b7c3f153aee6153


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Success: ZOOKEEPER- PreCommit Build #943

2017-08-17 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 69.17 MB...]
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 2d9bc96aa913d3439ae248983e08fef507f4510a logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 19 minutes 43 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2836
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131692#comment-16131692
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2874:
---

Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/335


> Windows Debug builds don't link with `/MTd`
> ---
>
> Key: ZOOKEEPER-2874
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Windows 10 using CMake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
> Fix For: 3.5.4, 3.6.0, 3.4.11
>
>
> While not apparent when building ZooKeeper stand-alone, further testing when 
> linking with Mesos revealed it was ZooKeeper that was causing the warning:
> {noformat}
> LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj]
> {noformat}
> As Mesos is linking with {{/MTd}} in Debug configuration (which is the most 
> common practice).
> Once I found the source of the warning, the fix is trivial and I am posting a 
> patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-08-17 Thread karanmehta93
Github user karanmehta93 commented on the issue:

https://github.com/apache/zookeeper/pull/307
  
@hanm @eribeiro @tdunning @skamille 
Please review.
Now that I have added rate limiting to logging, can we also turn this on by 
default?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (ZOOKEEPER-2874) Windows Debug builds don't link with `/MTd`

2017-08-17 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-2874.

   Resolution: Fixed
Fix Version/s: 3.5.4
   3.6.0
   3.4.11

Issue resolved by pull request 335
[https://github.com/apache/zookeeper/pull/335]

> Windows Debug builds don't link with `/MTd`
> ---
>
> Key: ZOOKEEPER-2874
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2874
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Windows 10 using CMake
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
> Fix For: 3.4.11, 3.6.0, 3.5.4
>
>
> While not apparent when building ZooKeeper stand-alone, further testing when 
> linking with Mesos revealed it was ZooKeeper that was causing the warning:
> {noformat}
> LIBCMTD.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\andschwa\src\mesos\build\src\slave\mesos-agent.vcxproj]
> {noformat}
> As Mesos is linking with {{/MTd}} in Debug configuration (which is the most 
> common practice).
> Once I found the source of the warning, the fix is trivial and I am posting a 
> patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #335: ZOOKEEPER-2874: Windows Debug builds don't link...

2017-08-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/335


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131691#comment-16131691
 ] 

Hadoop QA commented on ZOOKEEPER-2836:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/943//console

This message is automatically generated.

> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $/$:3888' .
> As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
> I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2804) Node creation fails with NPE if ACLs are null

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131689#comment-16131689
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2804:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/279
  
Let's wrap this up before it becoming more stale. I believe the only 
remaining work item is the last review comment @arshadmohammad made:
>> As this Validation we are doing multiple places it would be better if 
this piece of code is extracted to method. 
@jainbhupendra24 Do you mind update this pull request and do what Arshad 
suggested?


> Node creation fails with NPE if ACLs are null
> -
>
> Key: ZOOKEEPER-2804
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2804
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Bhupendra Kumar Jain
>
> If null ACLs are passed then zk node creation or set ACL fails with NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.removeDuplicates(PrepRequestProcessor.java:1301)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:1341)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:519)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:1126)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:178)
> {code}
> Expected to handle null in server and return proper error code to client



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #279: ZOOKEEPER-2804:Node creation fails with NPE if ACLs ar...

2017-08-17 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/279
  
Let's wrap this up before it becoming more stale. I believe the only 
remaining work item is the last review comment @arshadmohammad made:
>> As this Validation we are doing multiple places it would be better if 
this piece of code is extracted to method. 
@jainbhupendra24 Do you mind update this pull request and do what Arshad 
suggested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131683#comment-16131683
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2872:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
>> it seems best to keep snapshot taking a lighter weight operation.

Sounds reasonable.

>> I am unable to reproduce the test failure in Zab1_0Test

I think it's a flaky test. Filed ZOOKEEPER-2877 for this.


> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...

2017-08-17 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
>> it seems best to keep snapshot taking a lighter weight operation.

Sounds reasonable.

>> I am unable to reproduce the test failure in Zab1_0Test

I think it's a flaky test. Filed ZOOKEEPER-2877 for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (ZOOKEEPER-2877) Flaky Test: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun

2017-08-17 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-2877:
--

 Summary: Flaky Test: 
org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun
 Key: ZOOKEEPER-2877
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2877
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Michael Han


{noformat}
Error Message

expected:<1> but was:<0>
Stacktrace

junit.framework.AssertionFailedError: expected:<1> but was:<0>
at 
org.apache.zookeeper.server.quorum.Zab1_0Test$6.converseWithLeader(Zab1_0Test.java:939)
at 
org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderConversation(Zab1_0Test.java:398)
at 
org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalRun(Zab1_0Test.java:906)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131588#comment-16131588
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133865685
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
--- End diff --

-  can we reproduce this issue?(haha,49days)? This should never happen 
theoretically.According to 
[KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or 
[tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also  
didn't find the root-cause,just do like 
[this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)
 to add some protections against this issue here.
-  One assumption is SocketServer.accept() use the default infinite value(2 
^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) 
> a call to accept() for this ServerSocket will block for only this 
amount of time. If the timeout expires, a java.net.SocketTimeoutException is 
raised, though the ServerSocket is still valid. The option must be enabled 
prior to entering the blocking operation to have effect. The timeout must be > 
0. A timeout of zero is interpreted as an infinite timeout.

   so this issuse always happended after 49days 17 
hours(4294967295/1000/60/60/24=49.7days)


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $/$:3888' .
> As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
> I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

2017-08-17 Thread maoling
Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133865685
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
-break;
-}
 LOG.error("Exception while listening", e);
-numRetries++;
+if (!(e instanceof SocketTimeoutException)) {
--- End diff --

-  can we reproduce this issue?(haha,49days)? This should never happen 
theoretically.According to 
[KARAF-3325](https://issues.apache.org/jira/browse/KARAF-3325) or 
[tomcat-56684](https://bz.apache.org/bugzilla/show_bug.cgi?id=56684),they also  
didn't find the root-cause,just do like 
[this](https://github.com/apache/karaf/pull/50/commits/0349d582c4899f19ad73ee37c8c688660cbc7354)
 to add some protections against this issue here.
-  One assumption is SocketServer.accept() use the default infinite value(2 
^ 32 -1=4294967295) without no timeout specified or setSoTimeout(0) 
> a call to accept() for this ServerSocket will block for only this 
amount of time. If the timeout expires, a java.net.SocketTimeoutException is 
raised, though the ServerSocket is still valid. The option must be enabled 
prior to entering the blocking operation to have effect. The timeout must be > 
0. A timeout of zero is interpreted as an infinite timeout.

   so this issuse always happended after 49days 17 
hours(4294967295/1000/60/60/24=49.7days)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131577#comment-16131577
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2836:
---

Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133864927
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
--- End diff --

why we need to move code block **Line650-Line652** to code block 
**Line665-Line667** ?


> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum
>Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 
> x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version:  3.4.6.2.3.2.0-2950 
>Reporter: Amarjeet Singh
>Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are 
> getting SocketTimeoutException  on our boxes after 49days 17 hours . As per 
> current code there is a 3 times retry and after that it says "_As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: $/$:3888__" , Once server nodes reache this state and 
> we restart or add a new node ,it fails to join cluster and logs 'WARN  
> QuorumPeer/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open 
> channel to 3 at election address $/$:3888' .
> As there is no timeout specified for ServerSocket it should never 
> timeout but there are some already discussed issues where people have seen 
> this issue and added checks for SocketTimeoutException explicitly like 
> https://issues.apache.org/jira/browse/KARAF-3325 . 
> I think we need to handle SocketTimeoutException on similar lines for 
> zookeeper as well 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #336: ZOOKEEPER-2836: fix SocketTimeoutException

2017-08-17 Thread maoling
Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/336#discussion_r133864927
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -647,11 +648,10 @@ public void run() {
 numRetries = 0;
 }
 } catch (IOException e) {
-if (shutdown) {
--- End diff --

why we need to move code block **Line650-Line652** to code block 
**Line665-Line667** ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131435#comment-16131435
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
---

Github user Randgalt commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
Another goal is feature parity with other consensus tools such as 
etcd/consul. I added TTL nodes with this (and other) goals earlier in the year 
(or was it last year?). Watches in consul are persistent and optionally 
recursive.


> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can simply 
> fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #136: [ZOOKEEPER-1416] Persistent Recursive Watch

2017-08-17 Thread Randgalt
Github user Randgalt commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
Another goal is feature parity with other consensus tools such as 
etcd/consul. I added TTL nodes with this (and other) goals earlier in the year 
(or was it last year?). Watches in consul are persistent and optionally 
recursive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131426#comment-16131426
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
---

Github user skamille commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
We have to remember that people who don't use TreeCache will still use this 
feature. Not to say that we shouldn't keep it in mind as an important user, but 
presumably people who don't actually do anything with curator will decide to 
use this feature. Does the design make sense absent that consideration? 
Specifically, if you weren't thinking of this as a feature for TreeCache, would 
we implement it to automatically watch children changes as well, or would it be 
broken up into two modes: persistent no children, persistent children.



> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can simply 
> fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #136: [ZOOKEEPER-1416] Persistent Recursive Watch

2017-08-17 Thread skamille
Github user skamille commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
We have to remember that people who don't use TreeCache will still use this 
feature. Not to say that we shouldn't keep it in mind as an important user, but 
presumably people who don't actually do anything with curator will decide to 
use this feature. Does the design make sense absent that consideration? 
Specifically, if you weren't thinking of this as a feature for TreeCache, would 
we implement it to automatically watch children changes as well, or would it be 
broken up into two modes: persistent no children, persistent children.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130996#comment-16130996
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2872:
---

Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
I am unable to reproduce the test failure in Zab1_0Test


> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...

2017-08-17 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
I am unable to reproduce the test failure in Zab1_0Test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper_branch35_jdk7 - Build # 1079 - Still Failing

2017-08-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/1079/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 66.99 MB...]
[junit] 2017-08-17 08:50:04,407 [myid:] - WARN  [New I/O boss 
#5236:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0x2d9f4d89] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:27506
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:27506
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-08-17 08:50:04,407 [myid:] - INFO  [New I/O boss 
#5236:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-08-17 08:50:04,408 [myid:127.0.0.1:27506] - INFO  
[main-SendThread(127.0.0.1:27506):ClientCnxn$SendThread@1231] - channel for 
sessionid 0x20568d0c015 is lost, closing socket connection and attempting 
reconnect
[junit] 2017-08-17 08:50:04,421 [myid:127.0.0.1:27444] - INFO  
[main-SendThread(127.0.0.1:27444):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:27444. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-17 08:50:04,422 [myid:] - INFO  [New I/O boss 
#3723:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {}
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:27444
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-08-17 08:50:04,473 [myid:] - WARN  [New I/O boss 
#3723:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0x1544a03b] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:27444
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:27444
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor