[jira] [Updated] (ZOOKEEPER-2995) ant docs fails when Java 1.9 is present on my system

2018-03-07 Thread Abraham Fine (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abraham Fine updated ZOOKEEPER-2995:

Description: 
When attempting to compile the documentation (with JAVA_HOME set to 1.7) I see 
output like this:
{code}
$ ant clean 
docs -Dforrest.home=$(brew info apache-forrest | grep /Cellar | awk '{print 
$1;}') -d
Apache Ant(TM) version 1.9.7 compiled on April 9 2016
Trying the default build file: build.xml
Buildfile: REDACTED/zookeeper/build.xml
Adding reference: ant.PropertyHelper
Detected Java version: 1.7 in: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre

OTHER STUFF

docs:
Class org.apache.tools.ant.taskdefs.condition.Os loaded from parent loader 
(parentFirst)
Condition false; setting forrest.exec to forrest
Setting project property: forrest.exec -> forrest
 [exec] Current OS is Mac OS X
 [exec] Executing '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
 [exec] The ' characters around the executable and arguments are
 [exec] not part of the command.
Execute:Java13CommandLauncher: Executing 
'/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
The ' characters around the executable and arguments are
not part of the command.
 [exec] Apache Forrest.  Run 'forrest -projecthelp' to list options
 [exec]
 [exec] Buildfile: 
/usr/local/Cellar/apache-forrest/0.9/libexec/main/forrest.build.xml
 [exec]
 [exec] check-java-version:
 [exec] This is apache-forrest-0.9
 [exec] Using Java 1.6 from 
/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home

MORE STUFF

 [exec]
 [exec] BUILD FAILED
 [exec] 
/usr/local/Cellar/apache-forrest/0.9/libexec/main/targets/site.xml:180: 
Warning: Could not find file 
REDACTED/zookeeper/src/docs/build/tmp/brokenlinks.xml to copy.
 [exec]
 [exec] Total time: 3 seconds
 [exec] 
-Djava.endorsed.dirs=/usr/local/Cellar/apache-forrest/0.9/libexec/lib/endorsed:${java.endorsed.dirs}
 is not supported. Endorsed standards and standalone APIs
 [exec] Error: Could not create the Java Virtual Machine.
 [exec] in modular form will be supported via the concept of upgradeable 
modules.
 [exec] Error: A fatal exception has occurred. Program will exit.
 [exec]
 [exec]   Copying broken links file to site root.
 [exec]

BUILD FAILED
REDACTED/zookeeper/build.xml:501: exec returned: 1
at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:644)
at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:670)
at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:496)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
at org.apache.tools.ant.Project.executeTarget(Project.java:1376)
at 
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
at org.apache.tools.ant.Main.runBuild(Main.java:854)
at org.apache.tools.ant.Main.startAnt(Main.java:236)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:285)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:112)
{code}

The build succeeds when I uninstall java 9.


  was:
When attempting to compile the documentation (with JAVA_HOME set to 1.7) I see 
output like this:
{code}
$ ant docs -Dforrest.home=$(brew info apache-forrest | grep /Cellar | awk 
'{print $1;}') -d
Apache Ant(TM) version 1.9.7 compiled on April 9 2016
Trying the default build file: build.xml
Buildfile: REDACTED/zookeeper/build.xml
Adding reference: ant.PropertyHelper
Detected Java version: 1.7 in: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre

OTHER STUFF

docs:
Class org.apache.tools.ant.taskdefs.condition.Os loaded from parent loader 
(parentFirst)
Condition false; setting forrest.exec to forrest
Setting project property: forrest.exec -> forrest
 [exec] Current OS is Mac OS X
 [exec] Executing '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
 [exec] The ' characters around the executable and arguments are
 [exec] not part of the command.
Execute:Java13CommandLauncher: Executing 
'/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
The ' characters around the executable and arguments are
not part of the 

[jira] [Created] (ZOOKEEPER-2995) ant docs fails when Java 1.9 is present on my system

2018-03-07 Thread Abraham Fine (JIRA)
Abraham Fine created ZOOKEEPER-2995:
---

 Summary: ant docs fails when Java 1.9 is present on my system
 Key: ZOOKEEPER-2995
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2995
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.11, 3.5.3, 3.6.0
Reporter: Abraham Fine
Assignee: Abraham Fine


When attempting to compile the documentation (with JAVA_HOME set to 1.7) I see 
output like this:
{code}
$ ant docs -Dforrest.home=$(brew info apache-forrest | grep /Cellar | awk 
'{print $1;}') -d
Apache Ant(TM) version 1.9.7 compiled on April 9 2016
Trying the default build file: build.xml
Buildfile: REDACTED/zookeeper/build.xml
Adding reference: ant.PropertyHelper
Detected Java version: 1.7 in: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre

OTHER STUFF

docs:
Class org.apache.tools.ant.taskdefs.condition.Os loaded from parent loader 
(parentFirst)
Condition false; setting forrest.exec to forrest
Setting project property: forrest.exec -> forrest
 [exec] Current OS is Mac OS X
 [exec] Executing '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
 [exec] The ' characters around the executable and arguments are
 [exec] not part of the command.
Execute:Java13CommandLauncher: Executing 
'/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
The ' characters around the executable and arguments are
not part of the command.
 [exec] Apache Forrest.  Run 'forrest -projecthelp' to list options
 [exec]
 [exec] Buildfile: 
/usr/local/Cellar/apache-forrest/0.9/libexec/main/forrest.build.xml
 [exec]
 [exec] check-java-version:
 [exec] This is apache-forrest-0.9
 [exec] Using Java 1.6 from 
/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home

MORE STUFF

 [exec]
 [exec] BUILD FAILED
 [exec] 
/usr/local/Cellar/apache-forrest/0.9/libexec/main/targets/site.xml:180: 
Warning: Could not find file 
REDACTED/zookeeper/src/docs/build/tmp/brokenlinks.xml to copy.
 [exec]
 [exec] Total time: 3 seconds
 [exec] 
-Djava.endorsed.dirs=/usr/local/Cellar/apache-forrest/0.9/libexec/lib/endorsed:${java.endorsed.dirs}
 is not supported. Endorsed standards and standalone APIs
 [exec] Error: Could not create the Java Virtual Machine.
 [exec] in modular form will be supported via the concept of upgradeable 
modules.
 [exec] Error: A fatal exception has occurred. Program will exit.
 [exec]
 [exec]   Copying broken links file to site root.
 [exec]

BUILD FAILED
REDACTED/zookeeper/build.xml:501: exec returned: 1
at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:644)
at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:670)
at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:496)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
at org.apache.tools.ant.Project.executeTarget(Project.java:1376)
at 
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
at org.apache.tools.ant.Main.runBuild(Main.java:854)
at org.apache.tools.ant.Main.startAnt(Main.java:236)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:285)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:112)
{code}

The build succeeds when I uninstall java 9.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-2995) ant docs fails when Java 1.9 is present on my system

2018-03-07 Thread Abraham Fine (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abraham Fine reassigned ZOOKEEPER-2995:
---

Assignee: (was: Abraham Fine)

> ant docs fails when Java 1.9 is present on my system
> 
>
> Key: ZOOKEEPER-2995
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2995
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.3, 3.4.11, 3.6.0
>Reporter: Abraham Fine
>Priority: Major
>
> When attempting to compile the documentation (with JAVA_HOME set to 1.7) I 
> see output like this:
> {code}
> $ ant docs -Dforrest.home=$(brew info apache-forrest | grep /Cellar | awk 
> '{print $1;}') -d
> Apache Ant(TM) version 1.9.7 compiled on April 9 2016
> Trying the default build file: build.xml
> Buildfile: REDACTED/zookeeper/build.xml
> Adding reference: ant.PropertyHelper
> Detected Java version: 1.7 in: 
> /Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/jre
> OTHER STUFF
> docs:
> Class org.apache.tools.ant.taskdefs.condition.Os loaded from parent loader 
> (parentFirst)
> Condition false; setting forrest.exec to forrest
> Setting project property: forrest.exec -> forrest
>  [exec] Current OS is Mac OS X
>  [exec] Executing '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
>  [exec] The ' characters around the executable and arguments are
>  [exec] not part of the command.
> Execute:Java13CommandLauncher: Executing 
> '/usr/local/Cellar/apache-forrest/0.9/bin/forrest'
> The ' characters around the executable and arguments are
> not part of the command.
>  [exec] Apache Forrest.  Run 'forrest -projecthelp' to list options
>  [exec]
>  [exec] Buildfile: 
> /usr/local/Cellar/apache-forrest/0.9/libexec/main/forrest.build.xml
>  [exec]
>  [exec] check-java-version:
>  [exec] This is apache-forrest-0.9
>  [exec] Using Java 1.6 from 
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home
> MORE STUFF
>  [exec]
>  [exec] BUILD FAILED
>  [exec] 
> /usr/local/Cellar/apache-forrest/0.9/libexec/main/targets/site.xml:180: 
> Warning: Could not find file 
> REDACTED/zookeeper/src/docs/build/tmp/brokenlinks.xml to copy.
>  [exec]
>  [exec] Total time: 3 seconds
>  [exec] 
> -Djava.endorsed.dirs=/usr/local/Cellar/apache-forrest/0.9/libexec/lib/endorsed:${java.endorsed.dirs}
>  is not supported. Endorsed standards and standalone APIs
>  [exec] Error: Could not create the Java Virtual Machine.
>  [exec] in modular form will be supported via the concept of upgradeable 
> modules.
>  [exec] Error: A fatal exception has occurred. Program will exit.
>  [exec]
>  [exec]   Copying broken links file to site root.
>  [exec]
> BUILD FAILED
> REDACTED/zookeeper/build.xml:501: exec returned: 1
>   at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:644)
>   at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:670)
>   at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:496)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:435)
>   at org.apache.tools.ant.Target.performTasks(Target.java:456)
>   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1405)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1376)
>   at 
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1260)
>   at org.apache.tools.ant.Main.runBuild(Main.java:854)
>   at org.apache.tools.ant.Main.startAnt(Main.java:236)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:285)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:112)
> {code}
> The build succeeds when I uninstall java 9.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch34 - Build # 2267 - Failure

2018-03-07 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34/2267/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 125.28 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
24.373 sec
[junit] Running org.apache.zookeeper.test.RepeatStartupTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.175 sec
[junit] Running org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.37 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.697 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.603 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.643 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.528 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.645 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.577 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.079 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.668 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.677 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.871 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.808 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.228 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.592 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.692 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.311 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.09 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.371 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
29.693 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.089 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.742 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34/build.xml:1474: The 
following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34/build.xml:1382: The 
following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34/build.xml:1385: Tests 
failed!

Total time: 41 minutes 13 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalFollowerRunWithDiff

Error Message:
expected:<4294967298> but was:<0>

Stack Trace:
junit.framework.AssertionFailedError: expected:<4294967298> but was:<0>
at 
org.apache.zookeeper.server.quorum.Zab1_0Test$5.converseWithFollower(Zab1_0Test.java:861)
at 
org.apache.zookeeper.server.quorum.Zab1_0Test.testFollowerConversation(Zab1_0Test.java:507)
at 

[jira] [Updated] (ZOOKEEPER-2994) Tool required to recover log and snapshot entries with CRC errors

2018-03-07 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-2994:

Description: 
In the even that the zookeeper transaction log or snapshot become corrupted and 
fail CRC checks (preventing startup) we should have a mechanism to get the 
cluster running again.

Previously we achieved this by loading the broken transaction log with a 
modified version of ZK with disabled CRC check and forced it to snapshot.

It'd very handy to have a tool which can do this for us. LogFormatter and 
SnapshotFormatter have already been designed to dump log and snapshot files, 
it'd be nice to extend their functionality and add ability for such recovery.

  was:
In the even that the zookeeper transaction log or snapshot become corrupted and 
fail CRC checks (preventing startup) we should have a mechanism to get the 
cluster running again.

Previously with achieved this by loading the broken transaction log with a 
modified version of ZK with disabled CRC check and forced it to snapshot.

It'd very handy to have a tool which can do this for us. LogFormatter and 
SnapshotFormatter have already been designed to dump log and snapshot files, 
it'd be nice to extend their functionality and add ability for such recovery.


> Tool required to recover log and snapshot entries with CRC errors
> -
>
> Key: ZOOKEEPER-2994
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2994
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.5.4, 3.6.0
>
>
> In the even that the zookeeper transaction log or snapshot become corrupted 
> and fail CRC checks (preventing startup) we should have a mechanism to get 
> the cluster running again.
> Previously we achieved this by loading the broken transaction log with a 
> modified version of ZK with disabled CRC check and forced it to snapshot.
> It'd very handy to have a tool which can do this for us. LogFormatter and 
> SnapshotFormatter have already been designed to dump log and snapshot files, 
> it'd be nice to extend their functionality and add ability for such recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: ZOOKEEPER-2770 PreCommit Build #3660

2018-03-07 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3660/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 1.26 KB...]
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision 99c9bbb0ab1eef469e1662086532c58078b9909a 
(refs/remotes/origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99c9bbb0ab1eef469e1662086532c58078b9909a
Commit message: "ZOOKEEPER-2992: The eclipse build target fails due to protocol 
redirection: http->https"
 > git rev-list --no-walk 99c9bbb0ab1eef469e1662086532c58078b9909a # timeout=10
No emails were triggered.
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/jenkins5209685587972065547.sh
/home/jenkins/tools/java/latest1.7/bin/java
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386417
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 6
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 10240
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
Exception in thread "main" java.lang.UnsupportedClassVersionError: 
org/apache/tools/ant/launch/Launcher : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2992
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389945#comment-16389945
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user karanmehta93 commented on the issue:

https://github.com/apache/zookeeper/pull/307
  
Hello everyone,
Appreciate your efforts in reviewing this patch. @hanm @tdunning @eribeiro 
@skamille 
Is there any possibility that the patch will get merged in (with minor 
changes if required) or shall we 'never' this JIRA and close this PR?
Thanks!


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
>Priority: Major
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2018-03-07 Thread karanmehta93
Github user karanmehta93 commented on the issue:

https://github.com/apache/zookeeper/pull/307
  
Hello everyone,
Appreciate your efforts in reviewing this patch. @hanm @tdunning @eribeiro 
@skamille 
Is there any possibility that the patch will get merged in (with minor 
changes if required) or shall we 'never' this JIRA and close this PR?
Thanks!


---


[jira] [Updated] (ZOOKEEPER-2994) Tool required to recover log and snapshot entries with CRC errors

2018-03-07 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-2994:

Fix Version/s: 3.5.4

> Tool required to recover log and snapshot entries with CRC errors
> -
>
> Key: ZOOKEEPER-2994
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2994
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.5.4, 3.6.0
>
>
> In the even that the zookeeper transaction log or snapshot become corrupted 
> and fail CRC checks (preventing startup) we should have a mechanism to get 
> the cluster running again.
> Previously with achieved this by loading the broken transaction log with a 
> modified version of ZK with disabled CRC check and forced it to snapshot.
> It'd very handy to have a tool which can do this for us. LogFormatter and 
> SnapshotFormatter have already been designed to dump log and snapshot files, 
> it'd be nice to extend their functionality and add ability for such recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2994) Tool required to recover log and snapshot entries with CRC errors

2018-03-07 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-2994:

Fix Version/s: 3.6.0

> Tool required to recover log and snapshot entries with CRC errors
> -
>
> Key: ZOOKEEPER-2994
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2994
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.5.4, 3.6.0
>
>
> In the even that the zookeeper transaction log or snapshot become corrupted 
> and fail CRC checks (preventing startup) we should have a mechanism to get 
> the cluster running again.
> Previously with achieved this by loading the broken transaction log with a 
> modified version of ZK with disabled CRC check and forced it to snapshot.
> It'd very handy to have a tool which can do this for us. LogFormatter and 
> SnapshotFormatter have already been designed to dump log and snapshot files, 
> it'd be nice to extend their functionality and add ability for such recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-2994) Tool required to recover log and snapshot entries with CRC errors

2018-03-07 Thread Andor Molnar (JIRA)
Andor Molnar created ZOOKEEPER-2994:
---

 Summary: Tool required to recover log and snapshot entries with 
CRC errors
 Key: ZOOKEEPER-2994
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2994
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Andor Molnar
Assignee: Andor Molnar


In the even that the zookeeper transaction log or snapshot become corrupted and 
fail CRC checks (preventing startup) we should have a mechanism to get the 
cluster running again.

Previously with achieved this by loading the broken transaction log with a 
modified version of ZK with disabled CRC check and forced it to snapshot.

It'd very handy to have a tool which can do this for us. LogFormatter and 
SnapshotFormatter have already been designed to dump log and snapshot files, 
it'd be nice to extend their functionality and add ability for such recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Andor Molnar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389636#comment-16389636
 ] 

Andor Molnar commented on ZOOKEEPER-2172:
-

Yes, please subscribe to 'users' list.

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Yuval Dori (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389619#comment-16389619
 ] 

Yuval Dori commented on ZOOKEEPER-2172:
---

So, in order to further investigate it, do I need to subscribe here:

[https://zookeeper.apache.org/lists.html] ?

 

Thanks,

 

Yuval

 

 

 

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Andor Molnar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389608#comment-16389608
 ] 

Andor Molnar commented on ZOOKEEPER-2172:
-

Could be.

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Yuval Dori (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389604#comment-16389604
 ] 

Yuval Dori commented on ZOOKEEPER-2172:
---

Thanks Andor.

Do you think it's another ZK bug or something else?

 

 

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Andor Molnar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389444#comment-16389444
 ] 

Andor Molnar commented on ZOOKEEPER-2172:
-

These are 2 totally different errors I believe.

I'm pretty sure they're not related, because the Jira is about a feature which 
is available from 3.5 versions only as mentioned.

Would you please kindly move this discussion to ZooKeeper 'user' mailing list 
and provide some more information (ensemble topology, config files, log files, 
step-by-step scenario, etc.)?

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-03-07 Thread Andor Molnar
Okay, I dropped a mail on the user list to get some feedback.


Regards,
Andor


On Thu, Feb 22, 2018 at 5:59 PM, Patrick Hunt  wrote:

> Perhaps discuss on the user list as Flavio mentioned prior to calling a
> vote? Has anyone looked at dependencies, is this consistent with what the
> rest of the ecosystem has defined. Hadoop/Hbase/Kafka/... components,
> Curator, etc...
>
> Regards,
>
> Patrick
>
> On Thu, Feb 22, 2018 at 7:52 AM, Andor Molnar  wrote:
>
> > Is everybody happy with the plan that Tamaas suggested?
> > Shall we start a vote?
> >
> > Andor
> >
> >
> >
> > On Wed, Feb 21, 2018 at 11:34 PM, Mark Fenes 
> wrote:
> >
> > > Hi All,
> > >
> > > I totally support the idea of upgrading to Java 8 and I agree with Abe
> > that
> > > we should not require different minimum versions of Java for the client
> > and
> > > the server.
> > > Also skipping the non-LTS versions sounds reasonable.
> > >
> > > Regards,
> > > Mark
> > >
> > >
> > > On Tue, Feb 20, 2018 at 8:49 PM, Tamás Pénzes 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Just to add my 2 cents. // Might be five, I write long. :)
> > > > Hope, you find valuable bits.
> > > >
> > > > As many of us I also hope that ZooKeeper 3.5 will be released soon.
> > > > Until then most of the changes go into master and branch-3.5 too, so
> I
> > > > would keep them on the same Java version for code compatibility. In
> the
> > > > same time I'd be happy if it was Java 8.
> > > >
> > > > ZK 3.5+ supports Java 7 since December 2014, an almost 7 year old
> Java
> > > > version today.
> > > > It was a perfect decision in 2014, when nobody expected ZK 3.5 coming
> > so
> > > > late, but things might be different four years later.
> > > >
> > > > Since we have to keep compatibility with Java 6 on branch-3.4 we
> > already
> > > > need manual changes when cherry picking into that branch. Not much
> > > > difference if branch-3.5 is Java 8.
> > > >
> > > >
> > > > As Flavio said changing branch-3.5 to Java 8 might cause issues for
> > users
> > > > already using ZK 3.5.x-beta.
> > > > I totally agree with that concern, but using a beta state software
> > means
> > > > you accept the risk of facing changes.
> > > > And Java 8 is four years old now, so we would not change to bleeding
> > > edge,
> > > > which I guess nobody wanted.
> > > >
> > > >
> > > > So what I would propose is the following:
> > > >
> > > >- Upgrade branches "master" and "branch-3.5" to Java 8 (LTS) asap.
> > > >- After releasing 3.5 GA and the next LTS Java version (Java 11 /
> > > >18.9-LTS) gets released upgrade "master" branch to Java 11-LTS. (
> > > >http://www.oracle.com/technetwork/java/eol-135779.html)
> > > >- I would not upgrade Java to a non-LTS version.
> > > >
> > > >
> > > > What do you think about it?
> > > >
> > > > Thanks, Tamaas
> > > >
> > > >
> > > > On Mon, Feb 19, 2018 at 10:32 PM, Flavio Junqueira 
> > > wrote:
> > > >
> > > > > I'm fine with moving to Java 8 or even 9 in 3.6. Does anyone have a
> > > > > different option? Otherwise, should we start a vote?
> > > > >
> > > > > -Flavio
> > > > >
> > > > >
> > > > > > On 16 Feb 2018, at 21:28, Abraham Fine  wrote:
> > > > > >
> > > > > > I'm a -1 on requiring different minimum versions of java for the
> > > client
> > > > > and the server.  I think this has the potential to create a lot of
> > > > > confusion for users and contributors.
> > > > > >
> > > > > > I would support moving master (3.6) to java 8, I also think it is
> > > worth
> > > > > considering moving to java 9. Given how long our release cycle
> tends
> > to
> > > > be
> > > > > I think targeting the latest and greatest this early in the
> > development
> > > > > cycle is reasonable.
> > > > > >
> > > > > > Thanks,
> > > > > > Abe
> > > > > >
> > > > > > On Fri, Feb 16, 2018, at 06:48, Enrico Olivelli wrote:
> > > > > >> 2018-02-16 14:20 GMT+01:00 Andor Molnar :
> > > > > >>
> > > > > >>> +1 for setting the Java8 requirement on server side.
> > > > > >>>
> > > > > >>> *Client side.*
> > > > > >>> I'd like the idea of the setting the requirement on client side
> > too
> > > > > without
> > > > > >>> introducing anything Java8 specific. I'm not planning to use
> > Java8
> > > > > features
> > > > > >>> right on, just thinking of opening the gates would be useful in
> > the
> > > > > long
> > > > > >>> run.
> > > > > >>>
> > > > > >>> Additionally, I don't see heavy development on the client side.
> > > Users
> > > > > who
> > > > > >>> are tightly coupled to Java7 are still able to use existing
> > clients
> > > > as
> > > > > long
> > > > > >>> as we introduce something breaking which they're forced to
> > upgrade
> > > to
> > > > > for
> > > > > >>> whatever reason. I'm not sure what are the odds of that to
> > happen.
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >> My two cents
> > > > > >> Actually 

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Yuval Dori (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389390#comment-16389390
 ] 

Yuval Dori commented on ZOOKEEPER-2172:
---

1. The use case here was adding node to ZK cluster using zookeeper-3.5.jar. 
It's not the same use case as for our customers. the first use 5 machines with 
3 ZK instances. shutdown 2 machine (one with ZK. so 2 ZK left) and got 
"java.lang.IllegalStateException: instance must be started before calling this 
method". The second customer got this error when deploying the application.

This is this issue stack trace: 

2015-04-20 12:53:52,230 [myid:1] - ERROR 
[LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
causing shutdown while sock still open 
java.io.EOFException 
at java.io.DataInputStream.readInt(Unknown Source) 
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) 
at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
 
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) 
at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:493) 
2015-04-20 12:53:52,231 [myid:1] - WARN 
[LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
/10.0.0.2:55890  

And this our customers stack trace: 

2018-02-15T09:58:12.094+0100; ERROR; WSOSTSLXWIT01/MANAGER; P3424/T194; 
[SPACE/LearnerHandler-/10.17.46.142:49336/LearnerHandler]; Unexpected exception 
causing shutdown while sock still open 
java.net.SocketTimeoutException: Read timed out 
at java.net.SocketInputStream.socketRead0(Native Method) 
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
at java.net.SocketInputStream.read(SocketInputStream.java:171) 
at java.net.SocketInputStream.read(SocketInputStream.java:141) 
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
at java.io.DataInputStream.readInt(DataInputStream.java:387) 
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) 
at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
 
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) 
at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:542) 

As you can see the row lines for BinaryInputArchive.java and 
LearnerHandler.java are different but I thought its related to the different 
versions (3.4.8 vs 3.5). 

 

The first customer tested it with ZK 3.5.3 and it didn't reproduced!

What is this new feature that was added to 3.5?

I'll be happy to hear whether do you think if it's related or not.

Thanks,

 

Yuval

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used 

[jira] [Commented] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389384#comment-16389384
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2930:
---

Github user JonathanO commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/456#discussion_r172802731
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -318,76 +318,167 @@ public Thread newThread(Runnable r) {
  */
 public void testInitiateConnection(long sid) throws Exception {
 LOG.debug("Opening channel to server " + sid);
-Socket sock = new Socket();
-setSockOpts(sock);
-sock.connect(self.getVotingView().get(sid).electionAddr, cnxTO);
-initiateConnection(sock, sid);
+initiateConnection(sid, 
self.getVotingView().get(sid).electionAddr);
+}
+
+private Socket openChannel(long sid, InetSocketAddress electionAddr) {
+LOG.debug("Opening channel to server " + sid);
+try {
+final Socket sock = new Socket();
+setSockOpts(sock);
+sock.connect(electionAddr, cnxTO);
+LOG.debug("Connected to server " + sid);
+return sock;
+} catch (UnresolvedAddressException e) {
+// Sun doesn't include the address that causes this
+// exception to be thrown, also UAE cannot be wrapped cleanly
+// so we log the exception in order to capture this critical
+// detail.
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr, e);
+throw e;
+} catch (IOException e) {
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr,
+e);
+return null;
+}
 }
 
 /**
  * If this server has initiated the connection, then it gives up on the
  * connection if it loses challenge. Otherwise, it keeps the 
connection.
  */
-public void initiateConnection(final Socket sock, final Long sid) {
+public boolean initiateConnection(final Long sid, InetSocketAddress 
electionAddr) {
 try {
-startConnection(sock, sid);
-} catch (IOException e) {
-LOG.error("Exception while connecting, id: {}, addr: {}, 
closing learner connection",
-new Object[] { sid, sock.getRemoteSocketAddress() }, 
e);
-closeSocket(sock);
-return;
+Socket sock = openChannel(sid, electionAddr);
+if (sock != null) {
+try {
+startConnection(sock, sid);
+} catch (IOException e) {
+LOG.error("Exception while connecting, id: {}, addr: 
{}, closing learner connection",
+new Object[]{sid, 
sock.getRemoteSocketAddress()}, e);
+closeSocket(sock);
+}
+return true;
+} else {
+return false;
+}
+} finally {
+inprogressConnections.remove(sid);
 }
 }
 
-/**
- * Server will initiate the connection request to its peer server
- * asynchronously via separate connection thread.
- */
-public void initiateConnectionAsync(final Socket sock, final Long sid) 
{
+synchronized private void connectOneAsync(final Long sid, final 
ZooKeeperThread connectorThread) {
+if (senderWorkerMap.get(sid) != null) {
+LOG.debug("There is a connection already for server " + sid);
+return;
+}
 if(!inprogressConnections.add(sid)){
 // simply return as there is a connection request to
 // server 'sid' already in progress.
 LOG.debug("Connection request to server id: {} is already in 
progress, so skipping this request",
 sid);
-closeSocket(sock);
 return;
 }
 try {
-connectionExecutor.execute(
-new QuorumConnectionReqThread(sock, sid));
+connectionExecutor.execute(connectorThread);
 connectionThreadCnt.incrementAndGet();
 } catch (Throwable e) {
 // Imp: Safer side catching all type of exceptions and remove 
'sid'
 // from inprogress connections. This is to avoid blocking 
further
 // connection requests from this 'sid' in case of errors.
 

[GitHub] zookeeper pull request #456: ZOOKEEPER-2930: Leader cannot be elected due to...

2018-03-07 Thread JonathanO
Github user JonathanO commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/456#discussion_r172802731
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -318,76 +318,167 @@ public Thread newThread(Runnable r) {
  */
 public void testInitiateConnection(long sid) throws Exception {
 LOG.debug("Opening channel to server " + sid);
-Socket sock = new Socket();
-setSockOpts(sock);
-sock.connect(self.getVotingView().get(sid).electionAddr, cnxTO);
-initiateConnection(sock, sid);
+initiateConnection(sid, 
self.getVotingView().get(sid).electionAddr);
+}
+
+private Socket openChannel(long sid, InetSocketAddress electionAddr) {
+LOG.debug("Opening channel to server " + sid);
+try {
+final Socket sock = new Socket();
+setSockOpts(sock);
+sock.connect(electionAddr, cnxTO);
+LOG.debug("Connected to server " + sid);
+return sock;
+} catch (UnresolvedAddressException e) {
+// Sun doesn't include the address that causes this
+// exception to be thrown, also UAE cannot be wrapped cleanly
+// so we log the exception in order to capture this critical
+// detail.
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr, e);
+throw e;
+} catch (IOException e) {
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr,
+e);
+return null;
+}
 }
 
 /**
  * If this server has initiated the connection, then it gives up on the
  * connection if it loses challenge. Otherwise, it keeps the 
connection.
  */
-public void initiateConnection(final Socket sock, final Long sid) {
+public boolean initiateConnection(final Long sid, InetSocketAddress 
electionAddr) {
 try {
-startConnection(sock, sid);
-} catch (IOException e) {
-LOG.error("Exception while connecting, id: {}, addr: {}, 
closing learner connection",
-new Object[] { sid, sock.getRemoteSocketAddress() }, 
e);
-closeSocket(sock);
-return;
+Socket sock = openChannel(sid, electionAddr);
+if (sock != null) {
+try {
+startConnection(sock, sid);
+} catch (IOException e) {
+LOG.error("Exception while connecting, id: {}, addr: 
{}, closing learner connection",
+new Object[]{sid, 
sock.getRemoteSocketAddress()}, e);
+closeSocket(sock);
+}
+return true;
+} else {
+return false;
+}
+} finally {
+inprogressConnections.remove(sid);
 }
 }
 
-/**
- * Server will initiate the connection request to its peer server
- * asynchronously via separate connection thread.
- */
-public void initiateConnectionAsync(final Socket sock, final Long sid) 
{
+synchronized private void connectOneAsync(final Long sid, final 
ZooKeeperThread connectorThread) {
+if (senderWorkerMap.get(sid) != null) {
+LOG.debug("There is a connection already for server " + sid);
+return;
+}
 if(!inprogressConnections.add(sid)){
 // simply return as there is a connection request to
 // server 'sid' already in progress.
 LOG.debug("Connection request to server id: {} is already in 
progress, so skipping this request",
 sid);
-closeSocket(sock);
 return;
 }
 try {
-connectionExecutor.execute(
-new QuorumConnectionReqThread(sock, sid));
+connectionExecutor.execute(connectorThread);
 connectionThreadCnt.incrementAndGet();
 } catch (Throwable e) {
 // Imp: Safer side catching all type of exceptions and remove 
'sid'
 // from inprogress connections. This is to avoid blocking 
further
 // connection requests from this 'sid' in case of errors.
 inprogressConnections.remove(sid);
 LOG.error("Exception while submitting quorum connection 
request", e);
-closeSocket(sock);
 }
 }
 
+/**
+ * Try to establish a connection to 

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Andor Molnar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389378#comment-16389378
 ] 

Andor Molnar commented on ZOOKEEPER-2172:
-

[~yuvald]

Are you sure about it's the same issue?

Dynamic reconfig is a 3.5+ feature.

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2018-03-07 Thread Yuval Dori (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389307#comment-16389307
 ] 

Yuval Dori commented on ZOOKEEPER-2172:
---

Hi,

 

This issue happens in a few of our customers using 3.4.8 version.

During this days we are upgrading to 3.4.10.

As 3.5.3 is in Beta, is it possible to backport this fix?

 

Thanks,

 

Yuval 

> Cluster crashes when reconfig a new node as a participant
> -
>
> Key: ZOOKEEPER-2172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.0
> Environment: Ubuntu 12.04 + java 7
>Reporter: Ziyou Wang
>Assignee: Mohammad Arshad
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2172-02.patch, ZOOKEEPER-2172-03.patch, 
> ZOOKEEPER-2172-04.patch, ZOOKEEPER-2172-06.patch, ZOOKEEPER-2172-07.patch, 
> ZOOKEEPER-2172.patch, ZOOKEPER-2172-05.patch, history.txt, node-1.log, 
> node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, 
> zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, 
> zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, 
> zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
> zookeeper-1.out, zookeeper-2.log, zookeeper-2.out, zookeeper-3.log, 
> zookeeper-3.out
>
>
> The operations are quite simple: start three zk servers one by one, then 
> reconfig the cluster to add the new one as a participant. When I add the  
> third one, the zk cluster may enter a weird state and cannot recover.
>  
>   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
> cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
> So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
> “2015-04-20  12:53:52,230 [myid:1] - ERROR 
> [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
> causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
> - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
>  /10.0.0.2:55890 ”. From then on, the first node and second node 
> rejected all client connections and the third node didn’t join the cluster as 
> a participant. The whole cluster was done.
>  
>  When the problem happened, all three nodes just used the same dynamic 
> config file zoo.cfg.dynamic.1005d which only contained the first two 
> nodes. But there was another unused dynamic config file in node-1 directory 
> zoo.cfg.dynamic.next  which already contained three nodes.
>  
>  When I extended the waiting time between starting the third node and 
> reconfiguring the cluster, the problem didn’t show again. So it should be a 
> race condition problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)