ZooKeeper-trunk - Build # 3736 - Still Failing

2018-02-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/3736/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 11.24 KB...]
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git clean -fdx" returned 
status code 1:
stdout: 
stderr: warning: failed to remove 
build/test/tmp/test7299294491915492585.junit.dir/data/version-2/snapshot.0

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1996)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1964)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1960)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1597)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1609)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.clean(CliGitAPIImpl.java:787)
at hudson.plugins.git.GitAPI.clean(GitAPI.java:311)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:922)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:896)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:853)
at hudson.remoting.UserRequest.perform(UserRequest.java:207)
at hudson.remoting.UserRequest.perform(UserRequest.java:53)
at hudson.remoting.Request$2.run(Request.java:358)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H12
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:310)
at hudson.remoting.Channel.call(Channel.java:908)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:281)
at com.sun.proxy.$Proxy110.clean(Unknown Source)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl.clean(RemoteGitImpl.java:450)
at 
hudson.plugins.git.extensions.impl.CleanBeforeCheckout.decorateFetchCommand(CleanBeforeCheckout.java:30)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:858)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1129)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1160)
at hudson.scm.SCM.checkout(SCM.java:495)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
ERROR: Error fetching remote repo 'origin'
[FINDBUGS] Skipping publisher since build result is FAILURE
[WARNINGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Recording fingerprints
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Publishing Javadoc
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

ZooKeeper-trunk-openjdk7 - Build # 1810 - Failure

2018-02-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1810/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.12 KB...]
[junit] Running org.apache.zookeeper.test.SaslClientTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.111 sec, Thread: 2, Class: org.apache.zookeeper.test.SaslClientTest
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 7
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 2
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.776 sec, Thread: 7, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.595 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 7
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
8
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.104 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.457 sec, Thread: 2, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.552 sec, Thread: 2, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.875 sec, Thread: 2, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 2
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.775 sec, Thread: 2, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.078 sec, Thread: 2, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.703 sec, Thread: 2, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 2
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
71.873 sec, Thread: 3, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.834 sec, Thread: 2, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.092 sec, Thread: 2, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 2
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.002 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
84.199 sec, Thread: 1, Class: org.apache.zookeeper.test.QuorumTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.773 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 2
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.118 sec, Thread: 1, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 1
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.759 sec, Thread: 8, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.41 sec, Thread: 7, Class: org.apache.zookeeper.test.SessionTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
18.871 sec, Thread: 3, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Test org.apache.zookeeper.test.WatchEventWhenAutoResetTest FAILED
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
16.582 sec, Thread: 1, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest

[jira] [Updated] (ZOOKEEPER-2986) My id not in the peer list

2018-02-22 Thread Mohammad Etemad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Etemad updated ZOOKEEPER-2986:
---
Description: 
Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
  
 The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
  
 In my cfg file I have:
  
{code:java}
standaloneEnabled=false 
tickTime=2000 
initLimit=10 
syncLimit=5 
#snapshot file dir 
dataDir=/data 
#tran log dir 
dataLogDir=/dataLog 
#zk log dir 
logDir=/logs 
4lw.commands.whitelist=* 
dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
  
 and in my cfg.dynamic file I have:
   
{code:java}
server.0=zookeeper-0:2888:3888 
server.1=zookeeper-1:2888:3888 
server.2=zookeeper-2:2888:3888{code}
  
 Has there been any change on the clustering side of things that makes the new 
version not work?
 Sample logs:
{code:java}
2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
 java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79){code}

  was:
Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
  
 The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
  
 In my cfg file I have:
  
  
{code:java}
standaloneEnabled=false 
tickTime=2000 
initLimit=10 
syncLimit=5 
#snapshot file dir 
dataDir=/data 
#tran log dir 
dataLogDir=/dataLog 
#zk log dir 
logDir=/logs 
4lw.commands.whitelist=* 
dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
 
  
 and in my cfg.dynamic file I have:
  
  
{code:java}
server.0=zookeeper-0:2888:3888 
server.1=zookeeper-1:2888:3888 
server.2=zookeeper-2:2888:3888{code}
 
  
 Has there been any change on the clustering side of things that makes the new 
version not work?
 Sample logs:
{code:java}
2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
 java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79){code}


> My id not in the peer list
> --
>
> Key: ZOOKEEPER-2986
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2986
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.3
> Environment: Running in a docker container in kubernetes 1.5
>Reporter: Mohammad Etemad
>Priority: Major
>
> Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue 
> of  "My id 1 not in the peer list". If I use the alpha version (3.5.2) and 
> then upgrade to the 3.5.3 beta version, the problem goes away. But if I 
> implement the 3.5.3 version directly, the clustering never  happens and I get 
> the error. To give you a bit more overview of the implementation:
>   
>  The pods use a persistent volume claim on a gluster volume. Each pod is 
> assigned its own volume on the gluster file system. I run zookeeper as a 
> stateful set with 3 pods. 
>   
>  In my cfg file I have:
>   
> {code:java}
> standaloneEnabled=false 
> tickTime=2000 
> initLimit=10 
> syncLimit=5 
> #snapshot file dir 
> dataDir=/data 
> #tran log dir 
> dataLogDir=/dataLog 
> #zk log dir 
> logDir=/logs 
> 4lw.commands.whitelist=* 
> dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
>   
>  and in my cfg.dynamic file I have:
>    
> {code:java}
> server.0=zookeeper-0:2888:3888 
> 

[jira] [Updated] (ZOOKEEPER-2986) My id not in the peer list

2018-02-22 Thread Mohammad Etemad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Etemad updated ZOOKEEPER-2986:
---
Affects Version/s: 3.5.3

> My id not in the peer list
> --
>
> Key: ZOOKEEPER-2986
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2986
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.3
> Environment: Running in a docker container in kubernetes 1.5
>Reporter: Mohammad Etemad
>Priority: Major
>
> Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue 
> of  "My id 1 not in the peer list". If I use the alpha version (3.5.2) and 
> then upgrade to the 3.5.3 beta version, the problem goes away. But if I 
> implement the 3.5.3 version directly, the clustering never  happens and I get 
> the error. To give you a bit more overview of the implementation:
>   
>  The pods use a persistent volume claim on a gluster volume. Each pod is 
> assigned its own volume on the gluster file system. I run zookeeper as a 
> stateful set with 3 pods. 
>   
>  In my cfg file I have:
>   
>   
> {code:java}
> standaloneEnabled=false 
> tickTime=2000 
> initLimit=10 
> syncLimit=5 
> #snapshot file dir 
> dataDir=/data 
> #tran log dir 
> dataLogDir=/dataLog 
> #zk log dir 
> logDir=/logs 
> 4lw.commands.whitelist=* 
> dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
>  
>   
>  and in my cfg.dynamic file I have:
>   
>   
> {code:java}
> server.0=zookeeper-0:2888:3888 
> server.1=zookeeper-1:2888:3888 
> server.2=zookeeper-2:2888:3888{code}
>  
>   
>  Has there been any change on the clustering side of things that makes the 
> new version not work?
>  Sample logs:
> {code:java}
> 2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - 
> Unexpected exception, exiting abnormally
>  java.lang.RuntimeException: My id 1 not in the peer list
>  at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2986) My id not in the peer list

2018-02-22 Thread Mohammad Etemad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Etemad updated ZOOKEEPER-2986:
---
Description: 
Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
  
 The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
  
In my cfg file I have:
 
 
{code:java}
standaloneEnabled=false 
tickTime=2000 
initLimit=10 
syncLimit=5 
#snapshot file dir 
dataDir=/data 
#tran log dir 
dataLogDir=/dataLog 
#zk log dir 
logDir=/logs 
4lw.commands.whitelist=* 
dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
 
 
and in my cfg.dynamic file I have:
 
 
{code:java}
server.0=zookeeper-0:2888:3888 
server.1=zookeeper-1:2888:3888 
server.2=zookeeper-2:2888:3888{code}
 
 
Has there been any change on the clustering side of things that makes the new 
version not work?
 Sample logs:

2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
 java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)

  was:
Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
 
The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
 
Has there been any change on the clustering side of things that makes the new 
version not work?
Sample logs:

2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)


> My id not in the peer list
> --
>
> Key: ZOOKEEPER-2986
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2986
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Running in a docker container in kubernetes 1.5
>Reporter: Mohammad Etemad
>Priority: Major
>
> Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue 
> of  "My id 1 not in the peer list". If I use the alpha version (3.5.2) and 
> then upgrade to the 3.5.3 beta version, the problem goes away. But if I 
> implement the 3.5.3 version directly, the clustering never  happens and I get 
> the error. To give you a bit more overview of the implementation:
>   
>  The pods use a persistent volume claim on a gluster volume. Each pod is 
> assigned its own volume on the gluster file system. I run zookeeper as a 
> stateful set with 3 pods. 
>   
> In my cfg file I have:
>  
>  
> {code:java}
> standaloneEnabled=false 
> tickTime=2000 
> initLimit=10 
> syncLimit=5 
> #snapshot file dir 
> dataDir=/data 
> #tran log dir 
> dataLogDir=/dataLog 
> #zk log dir 
> logDir=/logs 
> 4lw.commands.whitelist=* 
> dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
>  
>  
> and in my cfg.dynamic file I have:
>  
>  
> {code:java}
> server.0=zookeeper-0:2888:3888 
> server.1=zookeeper-1:2888:3888 
> server.2=zookeeper-2:2888:3888{code}
>  
>  
> Has there been any change on the clustering side of things that makes the new 
> version not work?
>  Sample logs:
> 2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - 
> Unexpected exception, exiting abnormally
>  java.lang.RuntimeException: My id 1 not in the peer list
>  at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
>  

[jira] [Updated] (ZOOKEEPER-2986) My id not in the peer list

2018-02-22 Thread Mohammad Etemad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Etemad updated ZOOKEEPER-2986:
---
Description: 
Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
  
 The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
  
 In my cfg file I have:
  
  
{code:java}
standaloneEnabled=false 
tickTime=2000 
initLimit=10 
syncLimit=5 
#snapshot file dir 
dataDir=/data 
#tran log dir 
dataLogDir=/dataLog 
#zk log dir 
logDir=/logs 
4lw.commands.whitelist=* 
dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
 
  
 and in my cfg.dynamic file I have:
  
  
{code:java}
server.0=zookeeper-0:2888:3888 
server.1=zookeeper-1:2888:3888 
server.2=zookeeper-2:2888:3888{code}
 
  
 Has there been any change on the clustering side of things that makes the new 
version not work?
 Sample logs:
{code:java}
2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
 java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79){code}

  was:
Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
  
 The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
  
In my cfg file I have:
 
 
{code:java}
standaloneEnabled=false 
tickTime=2000 
initLimit=10 
syncLimit=5 
#snapshot file dir 
dataDir=/data 
#tran log dir 
dataLogDir=/dataLog 
#zk log dir 
logDir=/logs 
4lw.commands.whitelist=* 
dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
 
 
and in my cfg.dynamic file I have:
 
 
{code:java}
server.0=zookeeper-0:2888:3888 
server.1=zookeeper-1:2888:3888 
server.2=zookeeper-2:2888:3888{code}
 
 
Has there been any change on the clustering side of things that makes the new 
version not work?
 Sample logs:

2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
 java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)


> My id not in the peer list
> --
>
> Key: ZOOKEEPER-2986
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2986
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: Running in a docker container in kubernetes 1.5
>Reporter: Mohammad Etemad
>Priority: Major
>
> Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue 
> of  "My id 1 not in the peer list". If I use the alpha version (3.5.2) and 
> then upgrade to the 3.5.3 beta version, the problem goes away. But if I 
> implement the 3.5.3 version directly, the clustering never  happens and I get 
> the error. To give you a bit more overview of the implementation:
>   
>  The pods use a persistent volume claim on a gluster volume. Each pod is 
> assigned its own volume on the gluster file system. I run zookeeper as a 
> stateful set with 3 pods. 
>   
>  In my cfg file I have:
>   
>   
> {code:java}
> standaloneEnabled=false 
> tickTime=2000 
> initLimit=10 
> syncLimit=5 
> #snapshot file dir 
> dataDir=/data 
> #tran log dir 
> dataLogDir=/dataLog 
> #zk log dir 
> logDir=/logs 
> 4lw.commands.whitelist=* 
> dynamicConfigFile=/opt/zookeeper/conf/zoo_replicated1.cfg.dynamic{code}
>  
>   
>  and in my cfg.dynamic file I have:
>   
>   
> {code:java}
> server.0=zookeeper-0:2888:3888 
> server.1=zookeeper-1:2888:3888 
> 

[jira] [Created] (ZOOKEEPER-2986) My id not in the peer list

2018-02-22 Thread Mohammad Etemad (JIRA)
Mohammad Etemad created ZOOKEEPER-2986:
--

 Summary: My id not in the peer list
 Key: ZOOKEEPER-2986
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2986
 Project: ZooKeeper
  Issue Type: Bug
 Environment: Running in a docker container in kubernetes 1.5
Reporter: Mohammad Etemad


Zookeeper 3.5.3-beta is throwing the following error. I am facing the issue of  
"My id 1 not in the peer list". If I use the alpha version (3.5.2) and then 
upgrade to the 3.5.3 beta version, the problem goes away. But if I implement 
the 3.5.3 version directly, the clustering never  happens and I get the error. 
To give you a bit more overview of the implementation:
 
The pods use a persistent volume claim on a gluster volume. Each pod is 
assigned its own volume on the gluster file system. I run zookeeper as a 
stateful set with 3 pods. 
 
Has there been any change on the clustering side of things that makes the new 
version not work?
Sample logs:

2018-02-22 19:21:18,078 [myid:1] - ERROR [main:QuorumPeerMain@98] - Unexpected 
exception, exiting abnormally
java.lang.RuntimeException: My id 1 not in the peer list
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:770)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:185)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
 at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Name resolution in StaticHostProvider

2018-02-22 Thread Andor Molnar
Did anybody happen to take a quick look by any chance?

I don't want to push this too hard, because I know it's a time consuming
topic to think about, but this is a blocker in 3.5 which has been hanging
around for a while and any feedback would be extremely helpful to close it
quickly.

Thanks,
Andor



On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar  wrote:

> Hi all,
>
> We need more eyes and brains on the following PR:
>
> https://github.com/apache/zookeeper/pull/451
>
> I added a comment few days ago about the way we currently do DNS name
> resolution in this class and a suggestion on how we could simplify things a
> little bit. We talked about it with Abe Fine, but we're a little bit unsure
> and cannot get a conclusion. It would be extremely handy to get more
> feedback from you.
>
> To add some colour to it, let me elaborate on the situation here:
>
> In general, the task that StaticHostProvider does is to get a list of
> potentially unresolved InetSocketAddress objects, resolve them and iterate
> over the resolved objects by calling next() method.
>
> *Option #1 (current logic)*
> - Resolve addresses with getAllByName() which returns a list of IP
> addresses associated with the address.
> - Cache all these IP's, shuffle them and iterate over.
> - If client is unable to connect to an IP, remove all IPs from the list
> which the original servername was resolved to and re-resolve it.
>
> *Option #2 (getByName())*
> - Resolve address with getByName() instead which returns only the first IP
> address of the name,
> - Do not cache IPs,
> - Shuffle the *names* and resolve with getByName() *every time* when
> next() is called,
> - JDK's built-in caching will prevent name servers from being flooded and
> will do the re-resolution automatically when cache expires,
> - Names with multiple IPs will be handled by DNS servers which (if
> configured properly) return IPs in different order - this is called DNS
> Round Robin -, so getByName() will return different IP on each call.
>
> *Options #3*
> - There's a small problem with option#2: if DNS server is not configured
> properly and handles the round-robin case in a way that it always return
> the IP list in the same order, getByName() will never return the next ip,
> - In order to overcome that, use getAllByName() instead, shuffle the list
> and return the first IP.
>
> All feedback if much appreciated.
>
> Thanks,
> Andor
>
>
>
>


Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-22 Thread Patrick Hunt
Perhaps discuss on the user list as Flavio mentioned prior to calling a
vote? Has anyone looked at dependencies, is this consistent with what the
rest of the ecosystem has defined. Hadoop/Hbase/Kafka/... components,
Curator, etc...

Regards,

Patrick

On Thu, Feb 22, 2018 at 7:52 AM, Andor Molnar  wrote:

> Is everybody happy with the plan that Tamaas suggested?
> Shall we start a vote?
>
> Andor
>
>
>
> On Wed, Feb 21, 2018 at 11:34 PM, Mark Fenes  wrote:
>
> > Hi All,
> >
> > I totally support the idea of upgrading to Java 8 and I agree with Abe
> that
> > we should not require different minimum versions of Java for the client
> and
> > the server.
> > Also skipping the non-LTS versions sounds reasonable.
> >
> > Regards,
> > Mark
> >
> >
> > On Tue, Feb 20, 2018 at 8:49 PM, Tamás Pénzes 
> wrote:
> >
> > > Hi All,
> > >
> > > Just to add my 2 cents. // Might be five, I write long. :)
> > > Hope, you find valuable bits.
> > >
> > > As many of us I also hope that ZooKeeper 3.5 will be released soon.
> > > Until then most of the changes go into master and branch-3.5 too, so I
> > > would keep them on the same Java version for code compatibility. In the
> > > same time I'd be happy if it was Java 8.
> > >
> > > ZK 3.5+ supports Java 7 since December 2014, an almost 7 year old Java
> > > version today.
> > > It was a perfect decision in 2014, when nobody expected ZK 3.5 coming
> so
> > > late, but things might be different four years later.
> > >
> > > Since we have to keep compatibility with Java 6 on branch-3.4 we
> already
> > > need manual changes when cherry picking into that branch. Not much
> > > difference if branch-3.5 is Java 8.
> > >
> > >
> > > As Flavio said changing branch-3.5 to Java 8 might cause issues for
> users
> > > already using ZK 3.5.x-beta.
> > > I totally agree with that concern, but using a beta state software
> means
> > > you accept the risk of facing changes.
> > > And Java 8 is four years old now, so we would not change to bleeding
> > edge,
> > > which I guess nobody wanted.
> > >
> > >
> > > So what I would propose is the following:
> > >
> > >- Upgrade branches "master" and "branch-3.5" to Java 8 (LTS) asap.
> > >- After releasing 3.5 GA and the next LTS Java version (Java 11 /
> > >18.9-LTS) gets released upgrade "master" branch to Java 11-LTS. (
> > >http://www.oracle.com/technetwork/java/eol-135779.html)
> > >- I would not upgrade Java to a non-LTS version.
> > >
> > >
> > > What do you think about it?
> > >
> > > Thanks, Tamaas
> > >
> > >
> > > On Mon, Feb 19, 2018 at 10:32 PM, Flavio Junqueira 
> > wrote:
> > >
> > > > I'm fine with moving to Java 8 or even 9 in 3.6. Does anyone have a
> > > > different option? Otherwise, should we start a vote?
> > > >
> > > > -Flavio
> > > >
> > > >
> > > > > On 16 Feb 2018, at 21:28, Abraham Fine  wrote:
> > > > >
> > > > > I'm a -1 on requiring different minimum versions of java for the
> > client
> > > > and the server.  I think this has the potential to create a lot of
> > > > confusion for users and contributors.
> > > > >
> > > > > I would support moving master (3.6) to java 8, I also think it is
> > worth
> > > > considering moving to java 9. Given how long our release cycle tends
> to
> > > be
> > > > I think targeting the latest and greatest this early in the
> development
> > > > cycle is reasonable.
> > > > >
> > > > > Thanks,
> > > > > Abe
> > > > >
> > > > > On Fri, Feb 16, 2018, at 06:48, Enrico Olivelli wrote:
> > > > >> 2018-02-16 14:20 GMT+01:00 Andor Molnar :
> > > > >>
> > > > >>> +1 for setting the Java8 requirement on server side.
> > > > >>>
> > > > >>> *Client side.*
> > > > >>> I'd like the idea of the setting the requirement on client side
> too
> > > > without
> > > > >>> introducing anything Java8 specific. I'm not planning to use
> Java8
> > > > features
> > > > >>> right on, just thinking of opening the gates would be useful in
> the
> > > > long
> > > > >>> run.
> > > > >>>
> > > > >>> Additionally, I don't see heavy development on the client side.
> > Users
> > > > who
> > > > >>> are tightly coupled to Java7 are still able to use existing
> clients
> > > as
> > > > long
> > > > >>> as we introduce something breaking which they're forced to
> upgrade
> > to
> > > > for
> > > > >>> whatever reason. I'm not sure what are the odds of that to
> happen.
> > > > >>>
> > > > >>
> > > > >>
> > > > >> My two cents
> > > > >> Actually ZooKeeper is distributed as a single JAR which contains
> > both
> > > > >> server and client side code, requiring Java 7 for the client and
> > Java
> > > 8
> > > > for
> > > > >> the server will require a new way of packaging the artifacts and
> > > > building
> > > > >> the project (and this will require separating client side and
> server
> > > > side
> > > > >> code base).
> > > > >> Maybe I am missing something.
> > > > >>
> > > > >>
> > > > 

[jira] [Created] (ZOOKEEPER-2985) Expired session may unexpired after leader failover

2018-02-22 Thread Chris Thunes (JIRA)
Chris Thunes created ZOOKEEPER-2985:
---

 Summary: Expired session may unexpired after leader failover
 Key: ZOOKEEPER-2985
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2985
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.11, 3.5.3
Reporter: Chris Thunes


We recently observed an inconsistency in our Kafka cluster which we tracked 
down to ZooKeeper sessions expiring and then re-appearing after a ZooKeeper 
leadership failover. The Kafka nodes received session "Expired" events, leading 
to them starting new sessions and attempting to re-create some ephemeral nodes 
(broker ID nodes in kafka/brokers/ids specifically). However, between receiving 
the session Expired event and establishing a new session a leadership failover 
occurred within the ZooKeeper cluster which resulted in the expired session 
re-appearing. When Kafka attempted to re-create the ephemeral nodes mentioned 
above it (unexpectedly) received NODEEXISTS errors.

This behavior is a result of how session expiration is handled by the leader. 
Specifically, the expired session is marked as "closing" immediately upon 
expiration (in SessionTrackerImpl) and _before_ the corresponding 
"closeSession" entry is committed. A client can therefore receive a session 
Expired event before its session is fully closed. A leadership failover which 
results in the loss of the (uncommitted) closeSession entry thus leads to the 
sessions' ephemeral nodes "re-appearing" until another expiration of the old 
session on the new leader takes place.

I'm not certain if this should be considered a bug or an edge case that client 
are expected to handle. If it is the latter then I think it would be good to 
include this in the Programmer's Guide in the documentation.

If it's helpful I have code to reproduce this on an in-process cluster running 
3.4.11 or 3.5.3-beta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-02-22 Thread Andor Molnar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373047#comment-16373047
 ] 

Andor Molnar commented on ZOOKEEPER-2982:
-

[~fpj] Sounds reasonable to me.

I'll think about how to test this and will let you know when I can come up with 
something. 

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type

2018-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372998#comment-16372998
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2901:
---

Github user Randgalt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r170009123
  
--- Diff: src/java/main/org/apache/zookeeper/server/OldEphemeralType.java 
---
@@ -0,0 +1,74 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+/**
+ * See https://issues.apache.org/jira/browse/ZOOKEEPER-2901
+ *
+ * version 3.5.3 introduced bugs associated with how TTL nodes were 
implemented. version 3.5.4
+ * fixes the problems but makes TTL nodes created in 3.5.3 invalid. 
OldEphemeralType is a copy
+ * of the old - bad - implementation that is provided as a workaround. 
{@link EphemeralType#TTL_3_5_3_EMULATION_PROPERTY}
+ * can be used to emulate support of the badly specified TTL nodes.
+ */
+public enum OldEphemeralType {
+/**
+ * Not ephemeral
+ */
+VOID,
+/**
+ * Standard, pre-3.5.x EPHEMERAL
+ */
+NORMAL,
+/**
+ * Container node
+ */
+CONTAINER,
+/**
+ * TTL node
+ */
+TTL;
+
+public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
+public static final long MAX_TTL = 0x0fffL;
+public static final long TTL_MASK = 0x8000L;
+
+public static OldEphemeralType get(long ephemeralOwner) {
--- End diff --

I put it in `OldEphemeralType` so it's easier to remove and reason about. 
The emulation is a separate concern.


> Session ID that is negative causes mis-calculation of Ephemeral Type
> 
>
> Key: ZOOKEEPER-2901
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
> Environment: Running 3.5.3-beta in Docker container
>Reporter: Mark Johnson
>Assignee: Jordan Zimmerman
>Priority: Blocker
>
> In the code that determines the EphemeralType it is looking at the owner 
> (which is the client ID or connection ID):
> EphemeralType.java:
>public static EphemeralType get(long ephemeralOwner) {
>if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
>return CONTAINER;
>}
>if (ephemeralOwner < 0) {
>return TTL;
>}
>return (ephemeralOwner == 0) ? VOID : NORMAL;
>}
> However my connection ID is:
> header.getClientId(): -720548323429908480
> This causes the code to think this is a TTL Ephemeral node instead of a
> NORMAL Ephemeral node.
> This also explains why this is random - if my client ID is non-negative
> then the node gets added correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...

2018-02-22 Thread Randgalt
Github user Randgalt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r170009123
  
--- Diff: src/java/main/org/apache/zookeeper/server/OldEphemeralType.java 
---
@@ -0,0 +1,74 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+/**
+ * See https://issues.apache.org/jira/browse/ZOOKEEPER-2901
+ *
+ * version 3.5.3 introduced bugs associated with how TTL nodes were 
implemented. version 3.5.4
+ * fixes the problems but makes TTL nodes created in 3.5.3 invalid. 
OldEphemeralType is a copy
+ * of the old - bad - implementation that is provided as a workaround. 
{@link EphemeralType#TTL_3_5_3_EMULATION_PROPERTY}
+ * can be used to emulate support of the badly specified TTL nodes.
+ */
+public enum OldEphemeralType {
+/**
+ * Not ephemeral
+ */
+VOID,
+/**
+ * Standard, pre-3.5.x EPHEMERAL
+ */
+NORMAL,
+/**
+ * Container node
+ */
+CONTAINER,
+/**
+ * TTL node
+ */
+TTL;
+
+public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
+public static final long MAX_TTL = 0x0fffL;
+public static final long TTL_MASK = 0x8000L;
+
+public static OldEphemeralType get(long ephemeralOwner) {
--- End diff --

I put it in `OldEphemeralType` so it's easier to remove and reason about. 
The emulation is a separate concern.


---


[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...

2018-02-22 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r170008461
  
--- Diff: src/java/main/org/apache/zookeeper/server/OldEphemeralType.java 
---
@@ -0,0 +1,74 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+/**
+ * See https://issues.apache.org/jira/browse/ZOOKEEPER-2901
+ *
+ * version 3.5.3 introduced bugs associated with how TTL nodes were 
implemented. version 3.5.4
+ * fixes the problems but makes TTL nodes created in 3.5.3 invalid. 
OldEphemeralType is a copy
+ * of the old - bad - implementation that is provided as a workaround. 
{@link EphemeralType#TTL_3_5_3_EMULATION_PROPERTY}
+ * can be used to emulate support of the badly specified TTL nodes.
+ */
+public enum OldEphemeralType {
+/**
+ * Not ephemeral
+ */
+VOID,
+/**
+ * Standard, pre-3.5.x EPHEMERAL
+ */
+NORMAL,
+/**
+ * Container node
+ */
+CONTAINER,
+/**
+ * TTL node
+ */
+TTL;
+
+public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
+public static final long MAX_TTL = 0x0fffL;
+public static final long TTL_MASK = 0x8000L;
+
+public static OldEphemeralType get(long ephemeralOwner) {
--- End diff --

Yes, that's fine with the emulate353 flag.
My concern is that we keep 2 enums in the codebase: `EphemeralType` and 
`OldEphemeralType` while I think it'd be nicer to build the functionality of 
the old enum into the new one.


---


Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-22 Thread Andor Molnar
Is everybody happy with the plan that Tamaas suggested?
Shall we start a vote?

Andor



On Wed, Feb 21, 2018 at 11:34 PM, Mark Fenes  wrote:

> Hi All,
>
> I totally support the idea of upgrading to Java 8 and I agree with Abe that
> we should not require different minimum versions of Java for the client and
> the server.
> Also skipping the non-LTS versions sounds reasonable.
>
> Regards,
> Mark
>
>
> On Tue, Feb 20, 2018 at 8:49 PM, Tamás Pénzes  wrote:
>
> > Hi All,
> >
> > Just to add my 2 cents. // Might be five, I write long. :)
> > Hope, you find valuable bits.
> >
> > As many of us I also hope that ZooKeeper 3.5 will be released soon.
> > Until then most of the changes go into master and branch-3.5 too, so I
> > would keep them on the same Java version for code compatibility. In the
> > same time I'd be happy if it was Java 8.
> >
> > ZK 3.5+ supports Java 7 since December 2014, an almost 7 year old Java
> > version today.
> > It was a perfect decision in 2014, when nobody expected ZK 3.5 coming so
> > late, but things might be different four years later.
> >
> > Since we have to keep compatibility with Java 6 on branch-3.4 we already
> > need manual changes when cherry picking into that branch. Not much
> > difference if branch-3.5 is Java 8.
> >
> >
> > As Flavio said changing branch-3.5 to Java 8 might cause issues for users
> > already using ZK 3.5.x-beta.
> > I totally agree with that concern, but using a beta state software means
> > you accept the risk of facing changes.
> > And Java 8 is four years old now, so we would not change to bleeding
> edge,
> > which I guess nobody wanted.
> >
> >
> > So what I would propose is the following:
> >
> >- Upgrade branches "master" and "branch-3.5" to Java 8 (LTS) asap.
> >- After releasing 3.5 GA and the next LTS Java version (Java 11 /
> >18.9-LTS) gets released upgrade "master" branch to Java 11-LTS. (
> >http://www.oracle.com/technetwork/java/eol-135779.html)
> >- I would not upgrade Java to a non-LTS version.
> >
> >
> > What do you think about it?
> >
> > Thanks, Tamaas
> >
> >
> > On Mon, Feb 19, 2018 at 10:32 PM, Flavio Junqueira 
> wrote:
> >
> > > I'm fine with moving to Java 8 or even 9 in 3.6. Does anyone have a
> > > different option? Otherwise, should we start a vote?
> > >
> > > -Flavio
> > >
> > >
> > > > On 16 Feb 2018, at 21:28, Abraham Fine  wrote:
> > > >
> > > > I'm a -1 on requiring different minimum versions of java for the
> client
> > > and the server.  I think this has the potential to create a lot of
> > > confusion for users and contributors.
> > > >
> > > > I would support moving master (3.6) to java 8, I also think it is
> worth
> > > considering moving to java 9. Given how long our release cycle tends to
> > be
> > > I think targeting the latest and greatest this early in the development
> > > cycle is reasonable.
> > > >
> > > > Thanks,
> > > > Abe
> > > >
> > > > On Fri, Feb 16, 2018, at 06:48, Enrico Olivelli wrote:
> > > >> 2018-02-16 14:20 GMT+01:00 Andor Molnar :
> > > >>
> > > >>> +1 for setting the Java8 requirement on server side.
> > > >>>
> > > >>> *Client side.*
> > > >>> I'd like the idea of the setting the requirement on client side too
> > > without
> > > >>> introducing anything Java8 specific. I'm not planning to use Java8
> > > features
> > > >>> right on, just thinking of opening the gates would be useful in the
> > > long
> > > >>> run.
> > > >>>
> > > >>> Additionally, I don't see heavy development on the client side.
> Users
> > > who
> > > >>> are tightly coupled to Java7 are still able to use existing clients
> > as
> > > long
> > > >>> as we introduce something breaking which they're forced to upgrade
> to
> > > for
> > > >>> whatever reason. I'm not sure what are the odds of that to happen.
> > > >>>
> > > >>
> > > >>
> > > >> My two cents
> > > >> Actually ZooKeeper is distributed as a single JAR which contains
> both
> > > >> server and client side code, requiring Java 7 for the client and
> Java
> > 8
> > > for
> > > >> the server will require a new way of packaging the artifacts and
> > > building
> > > >> the project (and this will require separating client side and server
> > > side
> > > >> code base).
> > > >> Maybe I am missing something.
> > > >>
> > > >>
> > > >> Enrico
> > > >>
> > > >>
> > > >>>
> > > >>> Andor
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, Feb 16, 2018 at 12:31 PM, Flavio Junqueira  >
> > > wrote:
> > > >>>
> > >  We have this section in the admin doc that talks about the system
> > >  requirements:
> > > 
> > >  https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperAdmin.
> > html#sc_
> > >  requiredSoftware  > >  zookeeperAdmin.html#sc_requiredSoftware>
> > > 
> > >  If we change, then we have to update that section. Specifically
> > about
> > >  

ZooKeeper_branch34_openjdk7 - Build # 1824 - Still Failing

2018-02-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1824/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 39.21 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.639 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.665 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.715 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.831 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.608 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.63 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.786 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.084 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.687 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.529 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.873 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.048 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.347 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.658 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
8.516 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.621 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.147 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.475 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
30.39 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.177 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.875 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1382:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1385:
 Tests failed!

Total time: 38 minutes 31 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Recording test results
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.

[jira] [Commented] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type

2018-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372909#comment-16372909
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2901:
---

Github user Randgalt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r169985373
  
--- Diff: src/java/main/org/apache/zookeeper/server/EphemeralType.java ---
@@ -37,41 +77,152 @@
 /**
  * TTL node
  */
-TTL;
+TTL() {
+@Override
+public long maxValue() {
+return EXTENDED_FEATURE_VALUE_MASK;  // 12725 days, about 34 
years
+}
+
+@Override
+public long toEphemeralOwner(long ttl) {
+if ((ttl > TTL.maxValue()) || (ttl <= 0)) {
+throw new IllegalArgumentException("ttl must be positive 
and cannot be larger than: " + TTL.maxValue());
+}
+//noinspection PointlessBitwiseExpression
+return EXTENDED_MASK | EXTENDED_BIT_TTL | ttl;  // 
TTL_RESERVED_BIT is actually zero - but it serves to document that the proper 
extended bit needs to be set
+}
+
+@Override
+public long getValue(long ephemeralOwner) {
+return getExtendedFeatureValue(ephemeralOwner);
+}
+};
+
+/**
+ * For types that support it, the maximum extended value
+ *
+ * @return 0 or max
+ */
+public long maxValue() {
+return 0;
+}
+
+/**
+ * For types that support it, convert a value to an extended ephemeral 
owner
+ *
+ * @return 0 or extended ephemeral owner
+ */
+public long toEphemeralOwner(long value) {
+return 0;
+}
+
+/**
+ * For types that support it, return the extended value from an 
extended ephemeral owner
+ *
+ * @return 0 or extended value
+ */
+public long getValue(long ephemeralOwner) {
+return 0;
+}
 
 public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
-public static final long MAX_TTL = 0x0fffL;
-public static final long TTL_MASK = 0x8000L;
+public static final long MAX_EXTENDED_SERVER_ID = 0xfe;  // 254
+
+private static final long EXTENDED_MASK = 0xff00L;
+private static final long EXTENDED_BIT_TTL = 0x;
+private static final long RESERVED_BITS_MASK = 0x0000L;
+private static final long RESERVED_BITS_SHIFT = 40;
+
+private static final Map extendedFeatureMap;
 
+static {
+Map map = new HashMap<>();
+map.put(EXTENDED_BIT_TTL, TTL);
+extendedFeatureMap = Collections.unmodifiableMap(map);
+}
+
+private static final long EXTENDED_FEATURE_VALUE_MASK = 
~(EXTENDED_MASK | RESERVED_BITS_MASK);
+
+// Visible for testing
+static final String EXTENDED_TYPES_ENABLED_PROPERTY = 
"zookeeper.extendedTypesEnabled";
+static final String TTL_3_5_3_EMULATION_PROPERTY = 
"zookeeper.emulate353TTLNodes";
+
+/**
+ * Return true if extended ephemeral types are enabled
+ *
+ * @return true/false
+ */
+public static boolean extendedEphemeralTypesEnabled() {
--- End diff --

The way it's implemented now, we can add easily new features in the future. 
Why code ourselves into a corner when we can leave some room? BTW - I'd 
discussed this offline with @phunt 


> Session ID that is negative causes mis-calculation of Ephemeral Type
> 
>
> Key: ZOOKEEPER-2901
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
> Environment: Running 3.5.3-beta in Docker container
>Reporter: Mark Johnson
>Assignee: Jordan Zimmerman
>Priority: Blocker
>
> In the code that determines the EphemeralType it is looking at the owner 
> (which is the client ID or connection ID):
> EphemeralType.java:
>public static EphemeralType get(long ephemeralOwner) {
>if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
>return CONTAINER;
>}
>if (ephemeralOwner < 0) {
>return TTL;
>}
>return (ephemeralOwner == 0) ? VOID : NORMAL;
>}
> However my connection ID is:
> header.getClientId(): -720548323429908480
> This causes the code to think this is a TTL Ephemeral node instead of a
> NORMAL Ephemeral node.
> This also explains why 

[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...

2018-02-22 Thread Randgalt
Github user Randgalt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r169985373
  
--- Diff: src/java/main/org/apache/zookeeper/server/EphemeralType.java ---
@@ -37,41 +77,152 @@
 /**
  * TTL node
  */
-TTL;
+TTL() {
+@Override
+public long maxValue() {
+return EXTENDED_FEATURE_VALUE_MASK;  // 12725 days, about 34 
years
+}
+
+@Override
+public long toEphemeralOwner(long ttl) {
+if ((ttl > TTL.maxValue()) || (ttl <= 0)) {
+throw new IllegalArgumentException("ttl must be positive 
and cannot be larger than: " + TTL.maxValue());
+}
+//noinspection PointlessBitwiseExpression
+return EXTENDED_MASK | EXTENDED_BIT_TTL | ttl;  // 
TTL_RESERVED_BIT is actually zero - but it serves to document that the proper 
extended bit needs to be set
+}
+
+@Override
+public long getValue(long ephemeralOwner) {
+return getExtendedFeatureValue(ephemeralOwner);
+}
+};
+
+/**
+ * For types that support it, the maximum extended value
+ *
+ * @return 0 or max
+ */
+public long maxValue() {
+return 0;
+}
+
+/**
+ * For types that support it, convert a value to an extended ephemeral 
owner
+ *
+ * @return 0 or extended ephemeral owner
+ */
+public long toEphemeralOwner(long value) {
+return 0;
+}
+
+/**
+ * For types that support it, return the extended value from an 
extended ephemeral owner
+ *
+ * @return 0 or extended value
+ */
+public long getValue(long ephemeralOwner) {
+return 0;
+}
 
 public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
-public static final long MAX_TTL = 0x0fffL;
-public static final long TTL_MASK = 0x8000L;
+public static final long MAX_EXTENDED_SERVER_ID = 0xfe;  // 254
+
+private static final long EXTENDED_MASK = 0xff00L;
+private static final long EXTENDED_BIT_TTL = 0x;
+private static final long RESERVED_BITS_MASK = 0x0000L;
+private static final long RESERVED_BITS_SHIFT = 40;
+
+private static final Map extendedFeatureMap;
 
+static {
+Map map = new HashMap<>();
+map.put(EXTENDED_BIT_TTL, TTL);
+extendedFeatureMap = Collections.unmodifiableMap(map);
+}
+
+private static final long EXTENDED_FEATURE_VALUE_MASK = 
~(EXTENDED_MASK | RESERVED_BITS_MASK);
+
+// Visible for testing
+static final String EXTENDED_TYPES_ENABLED_PROPERTY = 
"zookeeper.extendedTypesEnabled";
+static final String TTL_3_5_3_EMULATION_PROPERTY = 
"zookeeper.emulate353TTLNodes";
+
+/**
+ * Return true if extended ephemeral types are enabled
+ *
+ * @return true/false
+ */
+public static boolean extendedEphemeralTypesEnabled() {
--- End diff --

The way it's implemented now, we can add easily new features in the future. 
Why code ourselves into a corner when we can leave some room? BTW - I'd 
discussed this offline with @phunt 


---


[jira] [Commented] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type

2018-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372907#comment-16372907
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2901:
---

Github user Randgalt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r169984867
  
--- Diff: src/java/main/org/apache/zookeeper/server/OldEphemeralType.java 
---
@@ -0,0 +1,74 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+/**
+ * See https://issues.apache.org/jira/browse/ZOOKEEPER-2901
+ *
+ * version 3.5.3 introduced bugs associated with how TTL nodes were 
implemented. version 3.5.4
+ * fixes the problems but makes TTL nodes created in 3.5.3 invalid. 
OldEphemeralType is a copy
+ * of the old - bad - implementation that is provided as a workaround. 
{@link EphemeralType#TTL_3_5_3_EMULATION_PROPERTY}
+ * can be used to emulate support of the badly specified TTL nodes.
+ */
+public enum OldEphemeralType {
+/**
+ * Not ephemeral
+ */
+VOID,
+/**
+ * Standard, pre-3.5.x EPHEMERAL
+ */
+NORMAL,
+/**
+ * Container node
+ */
+CONTAINER,
+/**
+ * TTL node
+ */
+TTL;
+
+public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
+public static final long MAX_TTL = 0x0fffL;
+public static final long TTL_MASK = 0x8000L;
+
+public static OldEphemeralType get(long ephemeralOwner) {
--- End diff --

We should create a new task to delete it after a while. However, I know 
that this will be important. Whoever used TTLs in 3.5.3 will run into problems 
when they upgrade. We need to have a workaround for these users.


> Session ID that is negative causes mis-calculation of Ephemeral Type
> 
>
> Key: ZOOKEEPER-2901
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
> Environment: Running 3.5.3-beta in Docker container
>Reporter: Mark Johnson
>Assignee: Jordan Zimmerman
>Priority: Blocker
>
> In the code that determines the EphemeralType it is looking at the owner 
> (which is the client ID or connection ID):
> EphemeralType.java:
>public static EphemeralType get(long ephemeralOwner) {
>if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
>return CONTAINER;
>}
>if (ephemeralOwner < 0) {
>return TTL;
>}
>return (ephemeralOwner == 0) ? VOID : NORMAL;
>}
> However my connection ID is:
> header.getClientId(): -720548323429908480
> This causes the code to think this is a TTL Ephemeral node instead of a
> NORMAL Ephemeral node.
> This also explains why this is random - if my client ID is non-negative
> then the node gets added correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...

2018-02-22 Thread Randgalt
Github user Randgalt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/377#discussion_r169984867
  
--- Diff: src/java/main/org/apache/zookeeper/server/OldEphemeralType.java 
---
@@ -0,0 +1,74 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+/**
+ * See https://issues.apache.org/jira/browse/ZOOKEEPER-2901
+ *
+ * version 3.5.3 introduced bugs associated with how TTL nodes were 
implemented. version 3.5.4
+ * fixes the problems but makes TTL nodes created in 3.5.3 invalid. 
OldEphemeralType is a copy
+ * of the old - bad - implementation that is provided as a workaround. 
{@link EphemeralType#TTL_3_5_3_EMULATION_PROPERTY}
+ * can be used to emulate support of the badly specified TTL nodes.
+ */
+public enum OldEphemeralType {
+/**
+ * Not ephemeral
+ */
+VOID,
+/**
+ * Standard, pre-3.5.x EPHEMERAL
+ */
+NORMAL,
+/**
+ * Container node
+ */
+CONTAINER,
+/**
+ * TTL node
+ */
+TTL;
+
+public static final long CONTAINER_EPHEMERAL_OWNER = Long.MIN_VALUE;
+public static final long MAX_TTL = 0x0fffL;
+public static final long TTL_MASK = 0x8000L;
+
+public static OldEphemeralType get(long ephemeralOwner) {
--- End diff --

We should create a new task to delete it after a while. However, I know 
that this will be important. Whoever used TTLs in 3.5.3 will run into problems 
when they upgrade. We need to have a workaround for these users.


---


[GitHub] zookeeper pull request #472: clean resource when the client node is not lear...

2018-02-22 Thread luoxn28
Github user luoxn28 closed the pull request at:

https://github.com/apache/zookeeper/pull/472


---


[GitHub] zookeeper issue #472: clean resource when the client node is not learner.

2018-02-22 Thread luoxn28
Github user luoxn28 commented on the issue:

https://github.com/apache/zookeeper/pull/472
  
@anmolnar 
Sorry, this method is little long, I haven't seen this finally/shutdown(). 
This code is OK.


---


Success: ZOOKEEPER- PreCommit Build #1516

2018-02-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1516/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 38.97 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1516//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1516//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1516//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 36 minutes 47 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2930
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372624#comment-16372624
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2930:
---

Github user JonathanO commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/456#discussion_r169910520
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -318,76 +318,167 @@ public Thread newThread(Runnable r) {
  */
 public void testInitiateConnection(long sid) throws Exception {
 LOG.debug("Opening channel to server " + sid);
-Socket sock = new Socket();
-setSockOpts(sock);
-sock.connect(self.getVotingView().get(sid).electionAddr, cnxTO);
-initiateConnection(sock, sid);
+initiateConnection(sid, 
self.getVotingView().get(sid).electionAddr);
+}
+
+private Socket openChannel(long sid, InetSocketAddress electionAddr) {
+LOG.debug("Opening channel to server " + sid);
+try {
+final Socket sock = new Socket();
+setSockOpts(sock);
+sock.connect(electionAddr, cnxTO);
+LOG.debug("Connected to server " + sid);
+return sock;
+} catch (UnresolvedAddressException e) {
+// Sun doesn't include the address that causes this
+// exception to be thrown, also UAE cannot be wrapped cleanly
+// so we log the exception in order to capture this critical
+// detail.
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr, e);
+throw e;
+} catch (IOException e) {
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr,
+e);
+return null;
+}
 }
 
 /**
  * If this server has initiated the connection, then it gives up on the
  * connection if it loses challenge. Otherwise, it keeps the 
connection.
  */
-public void initiateConnection(final Socket sock, final Long sid) {
+public boolean initiateConnection(final Long sid, InetSocketAddress 
electionAddr) {
 try {
-startConnection(sock, sid);
-} catch (IOException e) {
-LOG.error("Exception while connecting, id: {}, addr: {}, 
closing learner connection",
-new Object[] { sid, sock.getRemoteSocketAddress() }, 
e);
-closeSocket(sock);
-return;
+Socket sock = openChannel(sid, electionAddr);
+if (sock != null) {
+try {
+startConnection(sock, sid);
+} catch (IOException e) {
+LOG.error("Exception while connecting, id: {}, addr: 
{}, closing learner connection",
+new Object[]{sid, 
sock.getRemoteSocketAddress()}, e);
+closeSocket(sock);
+}
+return true;
+} else {
+return false;
+}
+} finally {
+inprogressConnections.remove(sid);
 }
 }
 
-/**
- * Server will initiate the connection request to its peer server
- * asynchronously via separate connection thread.
- */
-public void initiateConnectionAsync(final Socket sock, final Long sid) 
{
+synchronized private void connectOneAsync(final Long sid, final 
ZooKeeperThread connectorThread) {
+if (senderWorkerMap.get(sid) != null) {
+LOG.debug("There is a connection already for server " + sid);
+return;
+}
 if(!inprogressConnections.add(sid)){
 // simply return as there is a connection request to
 // server 'sid' already in progress.
 LOG.debug("Connection request to server id: {} is already in 
progress, so skipping this request",
 sid);
-closeSocket(sock);
 return;
 }
 try {
-connectionExecutor.execute(
-new QuorumConnectionReqThread(sock, sid));
+connectionExecutor.execute(connectorThread);
 connectionThreadCnt.incrementAndGet();
 } catch (Throwable e) {
 // Imp: Safer side catching all type of exceptions and remove 
'sid'
 // from inprogress connections. This is to avoid blocking 
further
 // connection requests from this 'sid' in case of errors.
 

[GitHub] zookeeper pull request #456: ZOOKEEPER-2930: Leader cannot be elected due to...

2018-02-22 Thread JonathanO
Github user JonathanO commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/456#discussion_r169910520
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -318,76 +318,167 @@ public Thread newThread(Runnable r) {
  */
 public void testInitiateConnection(long sid) throws Exception {
 LOG.debug("Opening channel to server " + sid);
-Socket sock = new Socket();
-setSockOpts(sock);
-sock.connect(self.getVotingView().get(sid).electionAddr, cnxTO);
-initiateConnection(sock, sid);
+initiateConnection(sid, 
self.getVotingView().get(sid).electionAddr);
+}
+
+private Socket openChannel(long sid, InetSocketAddress electionAddr) {
+LOG.debug("Opening channel to server " + sid);
+try {
+final Socket sock = new Socket();
+setSockOpts(sock);
+sock.connect(electionAddr, cnxTO);
+LOG.debug("Connected to server " + sid);
+return sock;
+} catch (UnresolvedAddressException e) {
+// Sun doesn't include the address that causes this
+// exception to be thrown, also UAE cannot be wrapped cleanly
+// so we log the exception in order to capture this critical
+// detail.
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr, e);
+throw e;
+} catch (IOException e) {
+LOG.warn("Cannot open channel to " + sid
++ " at election address " + electionAddr,
+e);
+return null;
+}
 }
 
 /**
  * If this server has initiated the connection, then it gives up on the
  * connection if it loses challenge. Otherwise, it keeps the 
connection.
  */
-public void initiateConnection(final Socket sock, final Long sid) {
+public boolean initiateConnection(final Long sid, InetSocketAddress 
electionAddr) {
 try {
-startConnection(sock, sid);
-} catch (IOException e) {
-LOG.error("Exception while connecting, id: {}, addr: {}, 
closing learner connection",
-new Object[] { sid, sock.getRemoteSocketAddress() }, 
e);
-closeSocket(sock);
-return;
+Socket sock = openChannel(sid, electionAddr);
+if (sock != null) {
+try {
+startConnection(sock, sid);
+} catch (IOException e) {
+LOG.error("Exception while connecting, id: {}, addr: 
{}, closing learner connection",
+new Object[]{sid, 
sock.getRemoteSocketAddress()}, e);
+closeSocket(sock);
+}
+return true;
+} else {
+return false;
+}
+} finally {
+inprogressConnections.remove(sid);
 }
 }
 
-/**
- * Server will initiate the connection request to its peer server
- * asynchronously via separate connection thread.
- */
-public void initiateConnectionAsync(final Socket sock, final Long sid) 
{
+synchronized private void connectOneAsync(final Long sid, final 
ZooKeeperThread connectorThread) {
+if (senderWorkerMap.get(sid) != null) {
+LOG.debug("There is a connection already for server " + sid);
+return;
+}
 if(!inprogressConnections.add(sid)){
 // simply return as there is a connection request to
 // server 'sid' already in progress.
 LOG.debug("Connection request to server id: {} is already in 
progress, so skipping this request",
 sid);
-closeSocket(sock);
 return;
 }
 try {
-connectionExecutor.execute(
-new QuorumConnectionReqThread(sock, sid));
+connectionExecutor.execute(connectorThread);
 connectionThreadCnt.incrementAndGet();
 } catch (Throwable e) {
 // Imp: Safer side catching all type of exceptions and remove 
'sid'
 // from inprogress connections. This is to avoid blocking 
further
 // connection requests from this 'sid' in case of errors.
 inprogressConnections.remove(sid);
 LOG.error("Exception while submitting quorum connection 
request", e);
-closeSocket(sock);
 }
 }
 
+/**
+ * Try to establish a connection to 

[jira] [Issue Comment Deleted] (ZOOKEEPER-2984) Master

2018-02-22 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-2984:

Comment: was deleted

(was: Hi [~Yayan]

Have you created this intentionally?)

> Master
> --
>
> Key: ZOOKEEPER-2984
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2984
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Yayan Sinchan
>Priority: Major
> Attachments: firefox-2.0.complete.mar
>
>
> h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: ZOOKEEPER- PreCommit Build #1515

2018-02-22 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1515/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 76.97 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1515//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1515//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1515//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 19 minutes 52 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Could not determine description.
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalObserverRun

Error Message:
Timeout occurred. Please note the time in the report does not reflect the time 
until the timeout.

Stack Trace:
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in 
the report does not reflect the time until the timeout.
at java.lang.Thread.run(Thread.java:745)

[GitHub] zookeeper pull request #472: clean resource when the client node is not lear...

2018-02-22 Thread luoxn28
GitHub user luoxn28 opened a pull request:

https://github.com/apache/zookeeper/pull/472

clean resource when the client node is not learner.

When the client node is not learner, close the socket. otherwise 
`leader.addLearnerHandler(this)`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/luoxn28/zookeeper master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #472


commit f67f9dfdf46715ca3ed6985339d4e8faab58a021
Author: luoxn28 
Date:   2018-02-22T08:56:41Z

clean resource when the client node is not learner.




---


[jira] [Commented] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-02-22 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372542#comment-16372542
 ] 

Flavio Junqueira commented on ZOOKEEPER-2982:
-

I have tried your recipe for reproducing as well [~andorm] by changing 
{{/etc/hosts}} and got the same issue. The problem is that the leader fails to 
bind to the port, which actually makes me wonder whether we need to do anything 
about the leader with respect to this issue:

```
java.net.SocketException: Unresolved address
at java.net.ServerSocket.bind(ServerSocket.java:368)
at java.net.ServerSocket.bind(ServerSocket.java:329)
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:240)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226)
```

Your suggestion of the alternative change is sensible, but I'd say that for 
consistency, it is better that we simply do the same that we have in 3.4, which 
is to make the change in {{findLeader}}.

One thing that I believe we haven't been able to do is to have a test case to 
report it. It would be good to have it, but I'm not sure what would be a good 
way.

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-02-22 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372542#comment-16372542
 ] 

Flavio Junqueira edited comment on ZOOKEEPER-2982 at 2/22/18 8:33 AM:
--

I have tried your recipe for reproducing as well [~andorm] by changing 
{{/etc/hosts}} and got the same issue. The problem is that the leader fails to 
bind to the port, which actually makes me wonder whether we need to do anything 
about the leader with respect to this issue:

{noformat}
java.net.SocketException: Unresolved address
at java.net.ServerSocket.bind(ServerSocket.java:368)
at java.net.ServerSocket.bind(ServerSocket.java:329)
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:240)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226)
{noformat}

Your suggestion of the alternative change is sensible, but I'd say that for 
consistency, it is better that we simply do the same that we have in 3.4, which 
is to make the change in {{findLeader}}.

One thing that I believe we haven't been able to do is to have a test case to 
report it. It would be good to have it, but I'm not sure what would be a good 
way.


was (Author: fpj):
I have tried your recipe for reproducing as well [~andorm] by changing 
{{/etc/hosts}} and got the same issue. The problem is that the leader fails to 
bind to the port, which actually makes me wonder whether we need to do anything 
about the leader with respect to this issue:

```
java.net.SocketException: Unresolved address
at java.net.ServerSocket.bind(ServerSocket.java:368)
at java.net.ServerSocket.bind(ServerSocket.java:329)
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:240)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226)
```

Your suggestion of the alternative change is sensible, but I'd say that for 
consistency, it is better that we simply do the same that we have in 3.4, which 
is to make the change in {{findLeader}}.

One thing that I believe we haven't been able to do is to have a test case to 
report it. It would be good to have it, but I'm not sure what would be a good 
way.

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)