ZooKeeper_branch34_jdk8 - Build # 1286 - Failure

2018-02-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1286/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 10.39 KB...]
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
Caused by: hudson.plugins.git.GitException: Command "git config 
remote.origin.url git://git.apache.org/zookeeper.git" returned status code 4:
stdout: 
stderr: error: failed to write new configuration file 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/.git/config.lock

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1970)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1938)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1934)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1572)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1584)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.setRemoteUrl(CliGitAPIImpl.java:1218)
at hudson.plugins.git.GitAPI.setRemoteUrl(GitAPI.java:160)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:922)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:896)
at 
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:853)
at hudson.remoting.UserRequest.perform(UserRequest.java:207)
at hudson.remoting.UserRequest.perform(UserRequest.java:53)
at hudson.remoting.Request$2.run(Request.java:358)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H22
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:310)
at hudson.remoting.Channel.call(Channel.java:908)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:281)
at com.sun.proxy.$Proxy110.setRemoteUrl(Unknown Source)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl.setRemoteUrl(RemoteGitImpl.java:295)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:813)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1092)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1123)
at hudson.scm.SCM.checkout(SCM.java:495)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
ERROR: Error fetching remote repo 'origin'
Archiving artifacts
[Fast Archiver] Compressed 79.50 MB of artifacts by 100.0% relative to #1285
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did leafNodes run? 
For example, 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build/test/logs/TEST-org.apache.jute.BinaryInputArchiveTest.xml
 is 1 day 2 hr old

Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 

Re: ZooKeeper 3.4.11 bug: dataDir and dataLogDir swapped

2018-02-05 Thread Patrick Hunt
This is a good point Andor. I've updated the release page on the website to
reflect the regression addressed in ZOOKEEPER-2960 and upcoming fix.

Thanks!

Patrick

On Fri, Feb 2, 2018 at 1:07 AM, Andor Molnar  wrote:

> Hi all,
>
> Please be aware that 3.4.11 has a quite unfortunate bug which causes
> ZooKeeper to swap dataDir and dataLogDir parameters. If you configured ZK
> to use separate txn and snapshot folders in these two options and plan to
> upgrade, you'll experience that ZK is trying to load transaction logs from
> snapshot folder and vica versa.
>
> Fix is on the way, 3.4.12 will be released soon and it's recommended to
> postpone upgrading ZooKeeper until that.
>
> *dev*
> I think it'd be useful to add a similar warning message to the Releases
> page too.
>
> Regards,
> Andor
>


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-02-05 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166097161
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -6,9 +6,9 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
+ * 
--- End diff --

was this accidental?


---


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352910#comment-16352910
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166105622
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java ---
@@ -117,8 +116,32 @@ public void testTwoInvalidHostAddresses() {
 list.add(new InetSocketAddress("a", 2181));
 list.add(new InetSocketAddress("b", 2181));
 new StaticHostProvider(list);
+   }
+
+@Test
+public void testReResolving() {
+byte size = 1;
+ArrayList list = new 
ArrayList(size);
+
+// Test a hostname that resolves to multiple addresses
+list.add(InetSocketAddress.createUnresolved("www.apache.org", 
1234));
--- End diff --

I'm wondering if it's possible to mock this out? It would be great if our 
unit tests were not dependent on some other infrastructure.


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-02-05 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166103404
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -58,48 +61,122 @@
 public StaticHostProvider(Collection 
serverAddresses) {
 for (InetSocketAddress address : serverAddresses) {
 try {
-InetAddress ia = address.getAddress();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName((ia != null) ? ia.getHostAddress() :
-address.getHostName());
+InetAddress resolvedAddresses[] = 
InetAddress.getAllByName(getHostString(address));
 for (InetAddress resolvedAddress : resolvedAddresses) {
-// If hostName is null but the address is not, we can 
tell that
-// the hostName is an literal IP address. Then we can 
set the host string as the hostname
-// safely to avoid reverse DNS lookup.
-// As far as i know, the only way to check if the 
hostName is null is use toString().
-// Both the two implementations of InetAddress are 
final class, so we can trust the return value of
-// the toString() method.
-if (resolvedAddress.toString().startsWith("/")
-&& resolvedAddress.getAddress() != null) {
-this.serverAddresses.add(
-new 
InetSocketAddress(InetAddress.getByAddress(
-address.getHostName(),
-resolvedAddress.getAddress()),
-address.getPort()));
-} else {
-this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
-}
+this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress, address.getPort()));
 }
 } catch (UnknownHostException e) {
 LOG.error("Unable to connect to server: {}", address, e);
 }
 }
-
+
 if (this.serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
 }
 Collections.shuffle(this.serverAddresses);
 }
 
+/**
+ * Evaluate to a hostname if one is available and otherwise it returns 
the
+ * string representation of the IP address.
+ *
+ * In Java 7, we have a method getHostString, but earlier versions do 
not support it.
+ * This method is to provide a replacement for 
InetSocketAddress.getHostString().
+ *
+ * @param addr
+ * @return Hostname string of address parameter
+ */
+private String getHostString(InetSocketAddress addr) {
+String hostString = "";
+
+if (addr == null) {
+return hostString;
+}
+if (!addr.isUnresolved()) {
+InetAddress ia = addr.getAddress();
+
+// If the string starts with '/', then it has no hostname
+// and we want to avoid the reverse lookup, so we return
+// the string representation of the address.
+if (ia.toString().startsWith("/")) {
+hostString = ia.getHostAddress();
+} else {
+hostString = addr.getHostName();
+}
+} else {
+// According to the Java 6 documentation, if the hostname is
+// unresolved, then the string before the colon is the 
hostname.
+String addrString = addr.toString();
+hostString = addrString.substring(0, 
addrString.lastIndexOf(':'));
+}
+
+return hostString;
+}
+
 public int size() {
 return serverAddresses.size();
 }
 
+// Counts the number of addresses added and removed during
+// the last call to next. Used mainly for test purposes.
+// See StasticHostProviderTest.
+private int nextAdded = 0;
+private int nextRemoved = 0;
+
+public int getNextAdded() {
+return nextAdded;
+}
+
+public int getNextRemoved() {
+return nextRemoved;
+}
+
 public InetSocketAddress next(long spinDelay) {
-++currentIndex;
-if (currentIndex == serverAddresses.size()) {
-currentIndex = 0;
+// Handle possible connection error by re-resolving hostname if 
possible
+if 

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352908#comment-16352908
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166103404
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -58,48 +61,122 @@
 public StaticHostProvider(Collection 
serverAddresses) {
 for (InetSocketAddress address : serverAddresses) {
 try {
-InetAddress ia = address.getAddress();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName((ia != null) ? ia.getHostAddress() :
-address.getHostName());
+InetAddress resolvedAddresses[] = 
InetAddress.getAllByName(getHostString(address));
 for (InetAddress resolvedAddress : resolvedAddresses) {
-// If hostName is null but the address is not, we can 
tell that
-// the hostName is an literal IP address. Then we can 
set the host string as the hostname
-// safely to avoid reverse DNS lookup.
-// As far as i know, the only way to check if the 
hostName is null is use toString().
-// Both the two implementations of InetAddress are 
final class, so we can trust the return value of
-// the toString() method.
-if (resolvedAddress.toString().startsWith("/")
-&& resolvedAddress.getAddress() != null) {
-this.serverAddresses.add(
-new 
InetSocketAddress(InetAddress.getByAddress(
-address.getHostName(),
-resolvedAddress.getAddress()),
-address.getPort()));
-} else {
-this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
-}
+this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress, address.getPort()));
 }
 } catch (UnknownHostException e) {
 LOG.error("Unable to connect to server: {}", address, e);
 }
 }
-
+
 if (this.serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
 }
 Collections.shuffle(this.serverAddresses);
 }
 
+/**
+ * Evaluate to a hostname if one is available and otherwise it returns 
the
+ * string representation of the IP address.
+ *
+ * In Java 7, we have a method getHostString, but earlier versions do 
not support it.
+ * This method is to provide a replacement for 
InetSocketAddress.getHostString().
+ *
+ * @param addr
+ * @return Hostname string of address parameter
+ */
+private String getHostString(InetSocketAddress addr) {
+String hostString = "";
+
+if (addr == null) {
+return hostString;
+}
+if (!addr.isUnresolved()) {
+InetAddress ia = addr.getAddress();
+
+// If the string starts with '/', then it has no hostname
+// and we want to avoid the reverse lookup, so we return
+// the string representation of the address.
+if (ia.toString().startsWith("/")) {
+hostString = ia.getHostAddress();
+} else {
+hostString = addr.getHostName();
+}
+} else {
+// According to the Java 6 documentation, if the hostname is
+// unresolved, then the string before the colon is the 
hostname.
+String addrString = addr.toString();
+hostString = addrString.substring(0, 
addrString.lastIndexOf(':'));
+}
+
+return hostString;
+}
+
 public int size() {
 return serverAddresses.size();
 }
 
+// Counts the number of addresses added and removed during
+// the last call to next. Used mainly for test purposes.
+// See StasticHostProviderTest.
+private int nextAdded = 0;
+private int nextRemoved = 0;
+
+public int getNextAdded() {
+return nextAdded;
+}
+
+public int getNextRemoved() {
+return nextRemoved;
+}
+
 public InetSocketAddress 

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352909#comment-16352909
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166097161
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -6,9 +6,9 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
+ * 
--- End diff --

was this accidental?


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-02-05 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166105622
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java ---
@@ -117,8 +116,32 @@ public void testTwoInvalidHostAddresses() {
 list.add(new InetSocketAddress("a", 2181));
 list.add(new InetSocketAddress("b", 2181));
 new StaticHostProvider(list);
+   }
+
+@Test
+public void testReResolving() {
+byte size = 1;
+ArrayList list = new 
ArrayList(size);
+
+// Test a hostname that resolves to multiple addresses
+list.add(InetSocketAddress.createUnresolved("www.apache.org", 
1234));
--- End diff --

I'm wondering if it's possible to mock this out? It would be great if our 
unit tests were not dependent on some other infrastructure.


---


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-02-05 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166102194
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -58,48 +61,122 @@
 public StaticHostProvider(Collection 
serverAddresses) {
 for (InetSocketAddress address : serverAddresses) {
 try {
-InetAddress ia = address.getAddress();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName((ia != null) ? ia.getHostAddress() :
-address.getHostName());
+InetAddress resolvedAddresses[] = 
InetAddress.getAllByName(getHostString(address));
 for (InetAddress resolvedAddress : resolvedAddresses) {
-// If hostName is null but the address is not, we can 
tell that
-// the hostName is an literal IP address. Then we can 
set the host string as the hostname
-// safely to avoid reverse DNS lookup.
-// As far as i know, the only way to check if the 
hostName is null is use toString().
-// Both the two implementations of InetAddress are 
final class, so we can trust the return value of
-// the toString() method.
-if (resolvedAddress.toString().startsWith("/")
-&& resolvedAddress.getAddress() != null) {
-this.serverAddresses.add(
-new 
InetSocketAddress(InetAddress.getByAddress(
-address.getHostName(),
-resolvedAddress.getAddress()),
-address.getPort()));
-} else {
-this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
-}
+this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress, address.getPort()));
 }
 } catch (UnknownHostException e) {
 LOG.error("Unable to connect to server: {}", address, e);
 }
 }
-
+
 if (this.serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
 }
 Collections.shuffle(this.serverAddresses);
 }
 
+/**
+ * Evaluate to a hostname if one is available and otherwise it returns 
the
+ * string representation of the IP address.
+ *
+ * In Java 7, we have a method getHostString, but earlier versions do 
not support it.
+ * This method is to provide a replacement for 
InetSocketAddress.getHostString().
+ *
+ * @param addr
+ * @return Hostname string of address parameter
+ */
+private String getHostString(InetSocketAddress addr) {
+String hostString = "";
+
+if (addr == null) {
+return hostString;
+}
+if (!addr.isUnresolved()) {
+InetAddress ia = addr.getAddress();
+
+// If the string starts with '/', then it has no hostname
+// and we want to avoid the reverse lookup, so we return
+// the string representation of the address.
+if (ia.toString().startsWith("/")) {
+hostString = ia.getHostAddress();
+} else {
+hostString = addr.getHostName();
+}
+} else {
+// According to the Java 6 documentation, if the hostname is
+// unresolved, then the string before the colon is the 
hostname.
+String addrString = addr.toString();
+hostString = addrString.substring(0, 
addrString.lastIndexOf(':'));
+}
+
+return hostString;
+}
+
 public int size() {
 return serverAddresses.size();
 }
 
+// Counts the number of addresses added and removed during
+// the last call to next. Used mainly for test purposes.
+// See StasticHostProviderTest.
+private int nextAdded = 0;
+private int nextRemoved = 0;
+
+public int getNextAdded() {
+return nextAdded;
+}
+
+public int getNextRemoved() {
+return nextRemoved;
+}
+
 public InetSocketAddress next(long spinDelay) {
-++currentIndex;
-if (currentIndex == serverAddresses.size()) {
-currentIndex = 0;
+// Handle possible connection error by re-resolving hostname if 
possible
+if 

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352907#comment-16352907
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r166102194
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -58,48 +61,122 @@
 public StaticHostProvider(Collection 
serverAddresses) {
 for (InetSocketAddress address : serverAddresses) {
 try {
-InetAddress ia = address.getAddress();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName((ia != null) ? ia.getHostAddress() :
-address.getHostName());
+InetAddress resolvedAddresses[] = 
InetAddress.getAllByName(getHostString(address));
 for (InetAddress resolvedAddress : resolvedAddresses) {
-// If hostName is null but the address is not, we can 
tell that
-// the hostName is an literal IP address. Then we can 
set the host string as the hostname
-// safely to avoid reverse DNS lookup.
-// As far as i know, the only way to check if the 
hostName is null is use toString().
-// Both the two implementations of InetAddress are 
final class, so we can trust the return value of
-// the toString() method.
-if (resolvedAddress.toString().startsWith("/")
-&& resolvedAddress.getAddress() != null) {
-this.serverAddresses.add(
-new 
InetSocketAddress(InetAddress.getByAddress(
-address.getHostName(),
-resolvedAddress.getAddress()),
-address.getPort()));
-} else {
-this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
-}
+this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress, address.getPort()));
 }
 } catch (UnknownHostException e) {
 LOG.error("Unable to connect to server: {}", address, e);
 }
 }
-
+
 if (this.serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
 }
 Collections.shuffle(this.serverAddresses);
 }
 
+/**
+ * Evaluate to a hostname if one is available and otherwise it returns 
the
+ * string representation of the IP address.
+ *
+ * In Java 7, we have a method getHostString, but earlier versions do 
not support it.
+ * This method is to provide a replacement for 
InetSocketAddress.getHostString().
+ *
+ * @param addr
+ * @return Hostname string of address parameter
+ */
+private String getHostString(InetSocketAddress addr) {
+String hostString = "";
+
+if (addr == null) {
+return hostString;
+}
+if (!addr.isUnresolved()) {
+InetAddress ia = addr.getAddress();
+
+// If the string starts with '/', then it has no hostname
+// and we want to avoid the reverse lookup, so we return
+// the string representation of the address.
+if (ia.toString().startsWith("/")) {
+hostString = ia.getHostAddress();
+} else {
+hostString = addr.getHostName();
+}
+} else {
+// According to the Java 6 documentation, if the hostname is
+// unresolved, then the string before the colon is the 
hostname.
+String addrString = addr.toString();
+hostString = addrString.substring(0, 
addrString.lastIndexOf(':'));
+}
+
+return hostString;
+}
+
 public int size() {
 return serverAddresses.size();
 }
 
+// Counts the number of addresses added and removed during
+// the last call to next. Used mainly for test purposes.
+// See StasticHostProviderTest.
+private int nextAdded = 0;
+private int nextRemoved = 0;
+
+public int getNextAdded() {
+return nextAdded;
+}
+
+public int getNextRemoved() {
+return nextRemoved;
+}
+
 public InetSocketAddress 

Re: Stable version for SSL/TLS support

2018-02-05 Thread Abraham Fine
Hi Makarand-

There are no currently stable versions of ZooKeeper that support SSL and I 
cannot comment as to when they will be available. 

To the best of my knowledge there is no ongoing effort to backport TLS support 
to 3.4.

Thanks,
Abe

On Wed, Jan 24, 2018, at 01:43, Andor Molnar wrote:
> Hi Makarand,
> 
> Afaik there's an ongoing effort to backport TLS support to version 3.4
> which is currently the stable branch of Zookeeper. Abe Fine can comment on
> the progress I believe.
> As you said 3.5 and 3.6 version are still in alpha and not recommended for
> production usage. There's no ETA yet of getting them stable.
> 
> Regards,
> Andor
> 
> 
> 
> On Wed, Jan 24, 2018 at 9:52 AM, Makarand Manohar Sarmalkar Manohar
> Sarmalkar  wrote:
> 
> > Hello,
> >
> > We want to enable SSL/TLS communication for Kafka and Zookeeper.
> > For Zookeeper, I found a JIRA ticket "https://issues.apache.org/
> > jira/browse/ZOOKEEPER-2125  > jira/browse/ZOOKEEPER-2125>" which has fix version as 3.5.1, 3.6.0. which
> > apparently means SSL is enabled in 3.5.1 version. but versions 3.5.1 and
> > above are alpha and beta. ( can not use in production )
> >
> > So can you please help me with the version which is stable and supports
> > SSL/TLS communication?
> > In case, there is no stable version yet, Will there be any stable release
> > available in coming weeks?
> >
> >
> > --
> > Thanks & Regards
> > Makarand Sarmalkar
> >
> >
> >
> >


[jira] [Commented] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352637#comment-16352637
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2930:
---

Github user JonathanO commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/456#discussion_r166036745
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -689,15 +669,15 @@ synchronized void connectOne(long sid){
 Map lastProposedView = 
lastSeenQV.getAllMembers();
 if (lastCommittedView.containsKey(sid)) {
 knownId = true;
-if (connectOne(sid, 
lastCommittedView.get(sid).electionAddr))
-return;
--- End diff --

This part of the change isn't quite right since it relied on connectOne 
returning false on an IOException calling sock.connect(). We will no longer 
attempt to use lastProposedView.get(sid).electionAddr in the case that a 
connection using the lastCommittedView failed and the electionAddr has changed. 
I don't know what effect this will have. Maybe I need to move this condition 
into the async connection mechanism too?


> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.5.4, 3.4.12
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Critical
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in the sendqueue of WorkerSender). At 
> last, the potential leader ofs_zk3 fails to 

[GitHub] zookeeper pull request #456: ZOOKEEPER-2930: Leader cannot be elected due to...

2018-02-05 Thread JonathanO
Github user JonathanO commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/456#discussion_r166036745
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java ---
@@ -689,15 +669,15 @@ synchronized void connectOne(long sid){
 Map lastProposedView = 
lastSeenQV.getAllMembers();
 if (lastCommittedView.containsKey(sid)) {
 knownId = true;
-if (connectOne(sid, 
lastCommittedView.get(sid).electionAddr))
-return;
--- End diff --

This part of the change isn't quite right since it relied on connectOne 
returning false on an IOException calling sock.connect(). We will no longer 
attempt to use lastProposedView.get(sid).electionAddr in the case that a 
connection using the lastCommittedView failed and the electionAddr has changed. 
I don't know what effect this will have. Maybe I need to move this condition 
into the async connection mechanism too?


---


Failed: ZOOKEEPER- PreCommit Build #1446

2018-02-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1446/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.50 MB...]
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 1 release audit 
warnings (more than the trunk's current 0 warnings).
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1446//testReport/
 [exec] Release audit warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1446//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1446//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1446//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 2

Total time: 13 minutes 52 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-1534
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.EphemeralNodeDeletionTest.testEphemeralNodeDeletion

Error Message:
After session close ephemeral node must be deleted expected null, but 
was:<4294967302,4294967302,1517848434347,1517848434347,0,0,0,144226611127517184,1,0,4294967302
>

Stack Trace:
junit.framework.AssertionFailedError: After session close ephemeral node must 
be deleted expected null, but 
was:<4294967302,4294967302,1517848434347,1517848434347,0,0,0,144226611127517184,1,0,4294967302
>
at 
org.apache.zookeeper.server.quorum.EphemeralNodeDeletionTest.testEphemeralNodeDeletion(EphemeralNodeDeletionTest.java:156)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.lang.Thread.run(Thread.java:745)

[jira] [Commented] (ZOOKEEPER-1534) Zookeeper server do not send Sal authentication failure notification to the client

2018-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352617#comment-16352617
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1534:
---

GitHub user craz186 opened a pull request:

https://github.com/apache/zookeeper/pull/457

ZOOKEEPER-1534: ZookeeperServer now returns AuthFailed events for SASL cred 
failures

ZookeeperServer previously closed client connections instead of returning 
AuthFailed events for SASL authentication failures.
This PR changes the Zookeeper Server to return an AuthFailed event and then 
afterwards closes the connection. 
I am unsure of the standard for SetSaslResponses and would appreciate any 
feedback as to how to represent a failed Authentication through SetSaslResponse 
objects. Currently I am just returning a string.

Note: The unit test I've supplied will only work with a real ZKServer, it 
seems that the testing server hides this bug and I've been unable to reproduce 
with the Testing Server. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/craz186/zookeeper ZOOKEEPER-1534

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit add6963b8e62f3ccdaf80f1a02428544c3a105d8
Author: sean.gibbons 
Date:   2018-02-05T16:09:59Z

ZOOKEEPER-1534: ZookeeperServer now returns AuthFailed events instead of 
closing client connection when SASL authentication uses invalid credentials, 
added unit test to demonstrate




> Zookeeper server do not send Sal authentication failure notification to the 
> client
> --
>
> Key: ZOOKEEPER-1534
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1534
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.3
> Environment: Windows 7. Zookeeper 3.4.3 Curator 1.1.15  Java 1.6
>Reporter: Tally Tsabary
>Priority: Major
>
> Server side: zookeeper 3.4.3 with patch ZOOKEEPER-1437.patch 22/Jun/12 00:24
> Client side: java, Curator 1.1.15, zookeeper 3.4.3 with patch 
> ZOOKEEPER-1437.patch 22/Jun/12 00:24
> Environment configured to use Sasl authentication.
> While the authenticatiion is successful, everything works fine.
> In case of authentication failue, it seems that the zk server catch the 
> SaslException and close the socket without sending any additional 
> notification to the client, so despite the client has an implementation to 
> handle Sasl authentication failure, it is never used…
>  
> Details:
> =
>  
>  
> zk server log:
> {noformat}
> 2012-08-10 11:00:46,730 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - 
> Accepted socket connection from /127.0.0.1:50208
> 2012-08-10 11:00:46,731 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@780] - Session 
> establishment request from client /127.0.0.1:50208 client's lastZxid is 0x0
> 2012-08-10 11:00:46,731 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client 
> attempting to establish new session at /127.0.0.1:50208
> 2012-08-10 11:00:46,733 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x1390fd2ee630004 type:createSession cxid:0x0 zxid:0x26b 
> txntype:-10 reqpath:n/a
> 2012-08-10 11:00:46,733 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x1390fd2ee630004 
> type:createSession cxid:0x0 zxid:0x26b txntype:-10 reqpath:n/a
> 2012-08-10 11:00:46,734 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@604] - 
> Established session 0x1390fd2ee630004 with negotiated timeout 4 for 
> client /127.0.0.1:50208
> 2012-08-10 11:00:46,736 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@919] - Responding 
> to client SASL token.
> 2012-08-10 11:00:46,736 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@923] - Size of 
> client SASL token: 0
> 2012-08-10 11:00:46,736 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@954] - Size of 
> server SASL response: 101
> 2012-08-10 11:00:46,740 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@919] - Responding 
> to client SASL token.
> 2012-08-10 11:00:46,741 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@923] - Size of 
> client SASL token: 272
> 2012-08-10 11:00:46,741 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@106] - 
> client supplied realm: 

[GitHub] zookeeper pull request #457: ZOOKEEPER-1534: ZookeeperServer now returns Aut...

2018-02-05 Thread craz186
GitHub user craz186 opened a pull request:

https://github.com/apache/zookeeper/pull/457

ZOOKEEPER-1534: ZookeeperServer now returns AuthFailed events for SASL cred 
failures

ZookeeperServer previously closed client connections instead of returning 
AuthFailed events for SASL authentication failures.
This PR changes the Zookeeper Server to return an AuthFailed event and then 
afterwards closes the connection. 
I am unsure of the standard for SetSaslResponses and would appreciate any 
feedback as to how to represent a failed Authentication through SetSaslResponse 
objects. Currently I am just returning a string.

Note: The unit test I've supplied will only work with a real ZKServer, it 
seems that the testing server hides this bug and I've been unable to reproduce 
with the Testing Server. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/craz186/zookeeper ZOOKEEPER-1534

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit add6963b8e62f3ccdaf80f1a02428544c3a105d8
Author: sean.gibbons 
Date:   2018-02-05T16:09:59Z

ZOOKEEPER-1534: ZookeeperServer now returns AuthFailed events instead of 
closing client connection when SASL authentication uses invalid credentials, 
added unit test to demonstrate




---


[jira] [Assigned] (ZOOKEEPER-1990) suspicious instantiation of java Random instances

2018-02-05 Thread Mark Fenes (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Fenes reassigned ZOOKEEPER-1990:
-

Assignee: Mark Fenes

> suspicious instantiation of java Random instances
> -
>
> Key: ZOOKEEPER-1990
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1990
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Patrick Hunt
>Assignee: Mark Fenes
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
>
> It's not clear to me why we are doing this, but it looks very suspicious. Why 
> aren't we just calling "new Random()" in these cases? (even for the tests I 
> don't really see it - typically that would just be for repeatability)
> {noformat}
> ag "new Random[ \t]*\(" .
> src/java/main/org/apache/zookeeper/ClientCnxn.java
> 817:private Random r = new Random(System.nanoTime());
> src/java/main/org/apache/zookeeper/client/StaticHostProvider.java
> 75:   sourceOfRandomness = new Random(System.currentTimeMillis() ^ 
> this.hashCode());
> 98:sourceOfRandomness = new Random(randomnessSeed);
> src/java/main/org/apache/zookeeper/server/quorum/AuthFastLeaderElection.java
> 420:rand = new Random(java.lang.Thread.currentThread().getId()
> src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java
> 64:private final Random r = new Random(System.nanoTime());
> src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java
> 537:Random r = new Random(id ^ superSecret);
> 554:Random r = new Random(sessionId ^ superSecret);
> src/java/test/org/apache/zookeeper/server/quorum/WatchLeakTest.java
> 271:Random r = new Random(SESSION_ID ^ superSecret);
> src/java/test/org/apache/zookeeper/server/quorum/CommitProcessorTest.java
> 151:Random rand = new Random(Thread.currentThread().getId());
> 258:Random rand = new Random(Thread.currentThread().getId());
> 288:Random rand = new Random(Thread.currentThread().getId());
> src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java
> 40:private Random r = new Random(1);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review request for ZOOKEEPER-2845

2018-02-05 Thread Bobby Evans
I was really hoping to get a review for ZOOKEEPER-2845

https://github.com/apache/zookeeper/pull/453 (master)
https://github.com/apache/zookeeper/pull/454 (3.5 line)
https://github.com/apache/zookeeper/pull/455 (3.4 line)

The bug was exposed by changes made in ZOOKEEPER-2678 which went into
3.4.10.
There is a real, although rare, possibility of data corruption and because
ZooKeeper is mission critical to so many other projects I would love to get
this in before 3.4.12 is released.

Thanks,

Bobby Evans


ZooKeeper_branch35_jdk8 - Build # 835 - Still Failing

2018-02-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/835/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 59.93 KB...]
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 3
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.382 sec, Thread: 3, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.573 sec, Thread: 8, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.753 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 8
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.117 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
15.206 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.785 sec, Thread: 3, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.471 sec, Thread: 3, Class: org.apache.zookeeper.test.StatTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
161.013 sec, Thread: 7, Class: org.apache.zookeeper.test.RecoveryTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 7
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.09 sec, Thread: 3, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.109 sec, Thread: 3, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.109 sec, Thread: 7, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 7
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 3
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
35.262 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 8
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.19 sec, Thread: 8, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 8
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.259 sec, Thread: 8, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 8
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.083 sec, Thread: 7, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.273 sec, Thread: 7, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
124.284 sec, Thread: 2, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.108 sec, Thread: 2, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
27.904 sec, Thread: 3, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
12.669 sec, Thread: 7, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
31.571 sec, Thread: 8, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
278.793 sec, Thread: 5, Class: org.apache.zookeeper.test.ReconfigTest

ZooKeeper-trunk-jdk8 - Build # 1365 - Failure

2018-02-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1365/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 60.80 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.076 sec, Thread: 6, Class: org.apache.zookeeper.test.SaslClientTest
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 1
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.653 sec, Thread: 4, Class: 
org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.675 sec, Thread: 1, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.583 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 4
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.079 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
36.044 sec, Thread: 8, Class: org.apache.zookeeper.test.ReadOnlyModeTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 2
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 1
[junit] Running org.apache.zookeeper.test.StatTest in thread 8
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.679 sec, Thread: 8, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.46 sec, Thread: 6, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.523 sec, Thread: 1, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.064 sec, Thread: 6, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 6
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.666 sec, Thread: 6, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.331 sec, Thread: 8, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 6
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 8
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.068 sec, Thread: 8, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 8
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.915 sec, Thread: 8, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 8
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.898 sec, Thread: 1, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.076 sec, Thread: 1, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.482 sec, Thread: 1, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.677 sec, Thread: 1, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.706 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.674 sec, Thread: 6, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
79.956 sec, Thread: 7, Class: org.apache.zookeeper.test.QuorumTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.371 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest

ZooKeeper_branch35_openjdk7 - Build # 836 - Failure

2018-02-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/836/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 59.81 KB...]
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.648 sec, Thread: 8, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.626 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 8
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.104 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.467 sec, Thread: 7, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.536 sec, Thread: 7, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.783 sec, Thread: 7, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 7
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
70.853 sec, Thread: 5, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.638 sec, Thread: 7, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.075 sec, Thread: 7, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 5
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.732 sec, Thread: 5, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 5
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
80.002 sec, Thread: 4, Class: org.apache.zookeeper.test.QuorumTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.087 sec, Thread: 4, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.202 sec, Thread: 7, Class: org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.05 sec, Thread: 4, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 7
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.097 sec, Thread: 4, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.796 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.704 sec, Thread: 3, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.374 sec, Thread: 4, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.649 sec, Thread: 5, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.317 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
27.408 sec, Thread: 7, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
262.614 sec, Thread: 1, Class: org.apache.zookeeper.test.ReconfigTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
359.574 sec, Thread: 2, Class: org.apache.zookeeper.test.NettyNettySuiteTest
   

[jira] [Commented] (ZOOKEEPER-2930) Leader cannot be elected due to network timeout of some members.

2018-02-05 Thread Jonathan Oddy (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352198#comment-16352198
 ] 

Jonathan Oddy commented on ZOOKEEPER-2930:
--

So, I think what happens is, if the 2nd node in the list dies in a way that 
causes new connections to time out then the notification messages to the 3rd 
node are delayed by >=5s while those to the 1st node are delivered on time. 
(sendNotifications() queues a notification to all three nodes (including the 
local node), in order, and toSend() blocks during sending the message to the 
2nd node.)

This 5s delay means that if the 3rd node is elected, it will see the election 
complete >= 5s after the 1st node does. The 1st node attempts to connect to the 
3rd on the leader port 5 times with a 1s delay (both hard coded) but, since the 
3rd node hasn't seen the election complete, it hasn't started listening on that 
port yet. Unless you're very lucky with timing, the 1st node will give up and 
start a new election round before the 3rd realises that it has been elected. 
The 3rd node then sits there for initLimit before going back to the LOOKING 
state, leaving you with a broken cluster for at least initLimit.

My patch attempts to fix this by making the entire process of establishing a 
connection async, avoiding it blocking toSend().

> Leader cannot be elected due to network timeout of some members.
> 
>
> Key: ZOOKEEPER-2930
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.5.4, 3.4.12
> Environment: Java 8
> ZooKeeper 3.4.11(from github)
> Centos6.5
>Reporter: Jiafu Jiang
>Priority: Critical
> Attachments: zoo.cfg, zookeeper1.log, zookeeper2.log
>
>
> I deploy a cluster of ZooKeeper with three nodes:
> ofs_zk1:20.10.11.101, 30.10.11.101
> ofs_zk2:20.10.11.102, 30.10.11.102
> ofs_zk3:20.10.11.103, 30.10.11.103
> I shutdown the network interfaces of ofs_zk2 using "ifdown eth0 eth1" command.
> It is supposed that the new Leader should be elected in some seconds, but the 
> fact is, ofs_zk1 and ofs_zk3 just keep electing again and again, but none of 
> them can become the new Leader.
> I change the log level to DEBUG (the default is INFO), and restart zookeeper 
> servers on ofs_zk1 and ofs_zk2 again, but it can not fix the problem.
> I read the log and the ZooKeeper source code, and I think I find the reason.
> When the potential leader(says ofs_zk3) begins the 
> election(FastLeaderElection.lookForLeader()), it will send notifications to 
> all the servers. 
> When it fails to receive any notification during a timeout, it will resend 
> the notifications, and double the timeout. This process will repeat until any 
> notification is received or the timeout reaches a max value.
> The FastLeaderElection.sendNotifications() just put the notification message 
> into a queue and return. The WorkerSender is responsable to send the 
> notifications.
> The WorkerSender just process the notifications one by one by passing the 
> notifications to QuorumCnxManager. Here comes the problem, the 
> QuorumCnxManager.toSend() blocks for a long time when the notification is 
> send to ofs_zk2(whose network is down) and some notifications (which belongs 
> to ofs_zk1) will thus be blocked for a long time. The repeated notifications 
> by FastLeaderElection.sendNotifications() just make things worse.
> Here is the related source code:
> {code:java}
> public void toSend(Long sid, ByteBuffer b) {
> /*
>  * If sending message to myself, then simply enqueue it (loopback).
>  */
> if (this.mySid == sid) {
>  b.position(0);
>  addToRecvQueue(new Message(b.duplicate(), sid));
> /*
>  * Otherwise send to the corresponding thread to send.
>  */
> } else {
>  /*
>   * Start a new connection if doesn't have one already.
>   */
>  ArrayBlockingQueue bq = new 
> ArrayBlockingQueue(SEND_CAPACITY);
>  ArrayBlockingQueue bqExisting = 
> queueSendMap.putIfAbsent(sid, bq);
>  if (bqExisting != null) {
>  addToSendQueue(bqExisting, b);
>  } else {
>  addToSendQueue(bq, b);
>  }
>  
>  // This may block!!!
>  connectOne(sid);
> 
> }
> }
> {code}
> Therefore, when ofs_zk3 believes that it is the leader, it begins to wait the 
> epoch ack, but in fact the ofs_zk1 does not receive the notification(which 
> says the leader is ofs_zk3) because the ofs_zk3 has not sent the 
> notification(which may still exist in