回复:Re: Leader election

2018-12-06 Thread 毛蛤丝
---> "Can it happen that we end up with 2 leaders or 0 leader for some period of
time (for example, during network delays/partitions)?"
look at the code:
https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderSelector.java#L340
it can guarantee exactly one leader all the time(EPHEMERAL_SEQUENTIAL zk-node) 
which has not too much correlations with the network partitions of zk ensembles 
itself.
I guess,haha!
- 原始邮件 -
发件人:Michael Borokhovich 
收件人:dev@zookeeper.apache.org, maoling199210...@sina.com
主题:Re: Leader election
日期:2018年12月06日 15点18分

Thanks, I will check it out.
However, do you know if it gives any better guarantees?
Can it happen that we end up with 2 leaders or 0 leader for some period of
time (for example, during network delays/partitions)?
On Wed, Dec 5, 2018 at 10:54 PM 毛蛤丝  wrote:
> suggest you use the ready-made implements of curator:
> http://curator.apache.org/curator-recipes/leader-election.html
> - 原始邮件 -
> 发件人:Michael Borokhovich 
> 收件人:"dev@zookeeper.apache.org" 
> 主题:Leader election
> 日期:2018年12月06日 07点29分
>
> Hello,
> We have a service that runs on 3 hosts for high availability. However, at
> any given time, exactly one instance must be active. So, we are thinking to
> use Leader election using Zookeeper.
> To this goal, on each service host we also start a ZK server, so we have a
> 3-nodes ZK cluster and each service instance is a client to its dedicated
> ZK server.
> Then, we implement a leader election on top of Zookeeper using a basic
> recipe:
> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection.
> I have the following questions doubts regarding the approach:
> 1. It seems like we can run into inconsistency issues when network
> partition occurs. Zookeeper documentation says that the inconsistency
> period may last “tens of seconds”. Am I understanding correctly that during
> this time we may have 0 or 2 leaders?
> 2. Is it possible to reduce this inconsistency time (let's say to 3
> seconds) by tweaking tickTime and syncLimit parameters?
> 3. Is there a way to guarantee exactly one leader all the time? Should we
> implement a more complex leader election algorithm than the one suggested
> in the recipe (using ephemeral_sequential nodes)?
> Thanks,
> Michael.
>


[GitHub] zookeeper issue #703: [ZOOKEEPER-1818] Correctly handle potential inconsiste...

2018-12-06 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/703
  
Committed to master branch. Thanks @lvfangmin !
Please create another pull request for branch-3.5.


---


[GitHub] zookeeper pull request #703: [ZOOKEEPER-1818] Correctly handle potential inc...

2018-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/703


---


[jira] [Commented] (ZOOKEEPER-3207) Watch related code being copied over twice when doing maven migration

2018-12-06 Thread Norbert Kalmar (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711956#comment-16711956
 ] 

Norbert Kalmar commented on ZOOKEEPER-3207:
---

Thanks [~lvfangmin] for the catch!
I'll create the patch ASAP, but possibly only tomorrow. Or do you have one 
already (you assigned the patch to yourself)?

> Watch related code being copied over twice when doing maven migration
> -
>
> Key: ZOOKEEPER-3207
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3207
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> File like WatchManager.java and WatchesPathReport.java exist in both 
> org/apache/zookeeper/server and org/apache/zookeeper/server/watch folder, 
> org/apache/zookeeper/server/watch is the right one, looks like we introduced 
> the other one by mistake in ZOOKEEPER-3032.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239615050
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/FileChangeWatcher.java
 ---
@@ -0,0 +1,253 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.common;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+import org.apache.zookeeper.server.ZooKeeperThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.file.ClosedWatchServiceException;
+import java.nio.file.FileSystem;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+import java.util.function.Consumer;
+
+/**
+ * Instances of this class can be used to watch a directory for file 
changes. When a file is added to, deleted from,
+ * or is modified in the given directory, the callback provided by the 
user will be called from a background thread.
+ * Some things to keep in mind:
+ * 
+ * The callback should be thread-safe.
+ * Changes that happen around the time the thread is started may be 
missed.
+ * There is a delay between a file changing and the callback 
firing.
+ * The watch is not recursive - changes to subdirectories will not 
trigger a callback.
+ * 
+ */
+public final class FileChangeWatcher {
--- End diff --

Leave as it is.


---


[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239614732
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/X509Util.java ---
@@ -446,4 +458,119 @@ private void configureSSLServerSocket(SSLServerSocket 
sslServerSocket) {
 LOG.debug("Using Java8-optimized cipher suites for Java version 
{}", javaVersion);
 return DEFAULT_CIPHERS_JAVA8;
 }
+
+/**
+ * Enables automatic reloading of the trust store and key store files 
when they change on disk.
+ *
+ * @throws IOException if creating the FileChangeWatcher objects fails.
+ */
+public void enableCertFileReloading() throws IOException {
+LOG.info("enabling cert file reloading");
+ZKConfig config = new ZKConfig();
+String keyStoreLocation = 
config.getProperty(sslKeystoreLocationProperty);
+if (keyStoreLocation != null && !keyStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(keyStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Key store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newKeyStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+}
+keyStoreFileWatcher = newKeyStoreFileWatcher;
+keyStoreFileWatcher.start();
+}
+String trustStoreLocation = 
config.getProperty(sslTruststoreLocationProperty);
+if (trustStoreLocation != null && !trustStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(trustStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Trust store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newTrustStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+}
+trustStoreFileWatcher = newTrustStoreFileWatcher;
+trustStoreFileWatcher.start();
+}
+}
+
+/**
+ * Disables automatic reloading of the trust store and key store files 
when they change on disk.
+ * Stops background threads and closes WatchService instances.
+ */
+public void disableCertFileReloading() {
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+keyStoreFileWatcher = null;
+}
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+trustStoreFileWatcher = null;
+}
+}
+
+// Finalizer guardian object, see Effective Java item 7
+// TODO: finalize() is deprecated starting with Java 10. This needs to 
be
+// replaced with an explicit shutdown call.
+@SuppressWarnings("unused")
+private final Object finalizerGuardian = new Object() {
--- End diff --

I got your point, but in my understanding this is not the recommended way 
to do destructor-like behaviour in Java. The article advice on using the 
guardian pattern only for a safety net besides the `close()` method, logging a 
warning message if the class has not been properly closed by the user. This 
could be a valid case for us too, but given that this is purely our codebase, 
not a public API, I think it should be preferrable to expect that `X509Util` is 
closed properly everywhere.


---


Re: Leader election

2018-12-06 Thread Jordan Zimmerman
> it seems like the
> inconsistency may be caused by the partition of the Zookeeper cluster
> itself

Yes - there are many ways in which you can end up with 2 leaders. However, if 
properly tuned and configured, it will be for a few seconds at most. During a 
GC pause no work is being done anyway. But, this stuff is very tricky. 
Requiring an atomically unique leader is actually a design smell and you should 
reconsider your architecture.

> Maybe we can use a more
> lightweight Hazelcast for example?

There is no distributed system that can guarantee a single leader. Instead you 
need to adjust your design and algorithms to deal with this (using optimistic 
locking, etc.).

-Jordan

> On Dec 6, 2018, at 1:52 PM, Michael Borokhovich  wrote:
> 
> Thanks Jordan,
> 
> Yes, I will try Curator.
> Also, beyond the problem described in the Tech Note, it seems like the
> inconsistency may be caused by the partition of the Zookeeper cluster
> itself. E.g., if a "leader" client is connected to the partitioned ZK node,
> it may be notified not at the same time as the other clients connected to
> the other ZK nodes. So, another client may take leadership while the
> current leader still unaware of the change. Is it true?
> 
> Another follow up question. If Zookeeper can guarantee a single leader, is
> it worth using it just for leader election? Maybe we can use a more
> lightweight Hazelcast for example?
> 
> Michael.
> 
> 
> On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman 
> wrote:
> 
>> It is not possible to achieve the level of consistency you're after in an
>> eventually consistent system such as ZooKeeper. There will always be an
>> edge case where two ZooKeeper clients will believe they are leaders (though
>> for a short period of time). In terms of how it affects Apache Curator, we
>> have this Tech Note on the subject:
>> https://cwiki.apache.org/confluence/display/CURATOR/TN10 <
>> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the
>> description is true for any ZooKeeper client, not just Curator clients). If
>> you do still intend to use a ZooKeeper lock/leader I suggest you try Apache
>> Curator as writing these "recipes" is not trivial and have many gotchas
>> that aren't obvious.
>> 
>> -Jordan
>> 
>> http://curator.apache.org 
>> 
>> 
>>> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich 
>> wrote:
>>> 
>>> Hello,
>>> 
>>> We have a service that runs on 3 hosts for high availability. However, at
>>> any given time, exactly one instance must be active. So, we are thinking
>> to
>>> use Leader election using Zookeeper.
>>> To this goal, on each service host we also start a ZK server, so we have
>> a
>>> 3-nodes ZK cluster and each service instance is a client to its dedicated
>>> ZK server.
>>> Then, we implement a leader election on top of Zookeeper using a basic
>>> recipe:
>>> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection.
>>> 
>>> I have the following questions doubts regarding the approach:
>>> 
>>> 1. It seems like we can run into inconsistency issues when network
>>> partition occurs. Zookeeper documentation says that the inconsistency
>>> period may last “tens of seconds”. Am I understanding correctly that
>> during
>>> this time we may have 0 or 2 leaders?
>>> 2. Is it possible to reduce this inconsistency time (let's say to 3
>>> seconds) by tweaking tickTime and syncLimit parameters?
>>> 3. Is there a way to guarantee exactly one leader all the time? Should we
>>> implement a more complex leader election algorithm than the one suggested
>>> in the recipe (using ephemeral_sequential nodes)?
>>> 
>>> Thanks,
>>> Michael.
>> 
>> 



ZooKeeper_branch34_openjdk7 - Build # 2144 - Failure

2018-12-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/2144/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 43.48 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.555 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.542 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.659 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.536 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.084 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.486 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.411 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.783 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.883 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.71 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.818 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.679 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.373 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.368 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.096 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.062 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
29.121 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
15.298 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.668 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.083 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1408:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1411:
 Tests failed!

Total time: 34 minutes 16 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Recording test results
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.SaslAuthTest.testZKOperationsAfterClientSaslAuthFailure

Error Message:
Did not connect

Stack Trace:
java.util.concurrent.TimeoutException: Did not connect
at 
org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:152)
at 
org.apache.zookeeper.SaslAuthTest.testZKOperationsAfterClientSaslAuthFailure(SaslAuthTest.java:174)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)

Re: Leader election

2018-12-06 Thread Jordan Zimmerman
> Old service leader will detect network partition max 15 seconds after it
> happened.

If the old service leader is in a very long GC it will not detect the 
partition. In the face of VM pauses, etc. it's not possible to avoid 2 leaders 
for a short period of time.

-JZ

[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239593442
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/FileChangeWatcher.java
 ---
@@ -0,0 +1,253 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.common;
+
+import com.sun.nio.file.SensitivityWatchEventModifier;
+import org.apache.zookeeper.server.ZooKeeperThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.file.ClosedWatchServiceException;
+import java.nio.file.FileSystem;
+import java.nio.file.Path;
+import java.nio.file.StandardWatchEventKinds;
+import java.nio.file.WatchEvent;
+import java.nio.file.WatchKey;
+import java.nio.file.WatchService;
+import java.util.function.Consumer;
+
+/**
+ * Instances of this class can be used to watch a directory for file 
changes. When a file is added to, deleted from,
+ * or is modified in the given directory, the callback provided by the 
user will be called from a background thread.
+ * Some things to keep in mind:
+ * 
+ * The callback should be thread-safe.
+ * Changes that happen around the time the thread is started may be 
missed.
+ * There is a delay between a file changing and the callback 
firing.
+ * The watch is not recursive - changes to subdirectories will not 
trigger a callback.
+ * 
+ */
+public final class FileChangeWatcher {
--- End diff --

I prefer to use composition over inheritance in cases where inheritance is 
not clearly better - it tends to avoid problems. If FileChangeWatcher extends 
Thread, then it will have a "is a" relationship with Thread, and can be used 
anywhere a Thread is used. Since it's not a generic Thread, it's not clear to 
me that this would be correct. But I don't feel too strongly about it.


---


[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239593417
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/X509Util.java ---
@@ -446,4 +458,119 @@ private void configureSSLServerSocket(SSLServerSocket 
sslServerSocket) {
 LOG.debug("Using Java8-optimized cipher suites for Java version 
{}", javaVersion);
 return DEFAULT_CIPHERS_JAVA8;
 }
+
+/**
+ * Enables automatic reloading of the trust store and key store files 
when they change on disk.
+ *
+ * @throws IOException if creating the FileChangeWatcher objects fails.
+ */
+public void enableCertFileReloading() throws IOException {
+LOG.info("enabling cert file reloading");
+ZKConfig config = new ZKConfig();
+String keyStoreLocation = 
config.getProperty(sslKeystoreLocationProperty);
+if (keyStoreLocation != null && !keyStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(keyStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Key store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newKeyStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+}
+keyStoreFileWatcher = newKeyStoreFileWatcher;
+keyStoreFileWatcher.start();
+}
+String trustStoreLocation = 
config.getProperty(sslTruststoreLocationProperty);
+if (trustStoreLocation != null && !trustStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(trustStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Trust store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newTrustStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+}
+trustStoreFileWatcher = newTrustStoreFileWatcher;
+trustStoreFileWatcher.start();
+}
+}
+
+/**
+ * Disables automatic reloading of the trust store and key store files 
when they change on disk.
+ * Stops background threads and closes WatchService instances.
+ */
+public void disableCertFileReloading() {
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+keyStoreFileWatcher = null;
+}
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+trustStoreFileWatcher = null;
+}
+}
+
+// Finalizer guardian object, see Effective Java item 7
+// TODO: finalize() is deprecated starting with Java 10. This needs to 
be
+// replaced with an explicit shutdown call.
+@SuppressWarnings("unused")
+private final Object finalizerGuardian = new Object() {
--- End diff --

I'm worried about forgetting to call `close()` and leaking the background 
threads. X509Util is created in other places besides QuorumPeer. But I'll see 
what I can do about it, we only need to call close() if we called 
`enableCertFileReloading()` so it might be ok ...


---


[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239593458
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/X509Util.java ---
@@ -446,4 +458,119 @@ private void configureSSLServerSocket(SSLServerSocket 
sslServerSocket) {
 LOG.debug("Using Java8-optimized cipher suites for Java version 
{}", javaVersion);
 return DEFAULT_CIPHERS_JAVA8;
 }
+
+/**
+ * Enables automatic reloading of the trust store and key store files 
when they change on disk.
+ *
+ * @throws IOException if creating the FileChangeWatcher objects fails.
+ */
+public void enableCertFileReloading() throws IOException {
+LOG.info("enabling cert file reloading");
+ZKConfig config = new ZKConfig();
+String keyStoreLocation = 
config.getProperty(sslKeystoreLocationProperty);
+if (keyStoreLocation != null && !keyStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(keyStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Key store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newKeyStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+}
+keyStoreFileWatcher = newKeyStoreFileWatcher;
+keyStoreFileWatcher.start();
+}
+String trustStoreLocation = 
config.getProperty(sslTruststoreLocationProperty);
+if (trustStoreLocation != null && !trustStoreLocation.isEmpty()) {
--- End diff --

Will do.


---


[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239593424
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/X509Util.java ---
@@ -446,4 +458,119 @@ private void configureSSLServerSocket(SSLServerSocket 
sslServerSocket) {
 LOG.debug("Using Java8-optimized cipher suites for Java version 
{}", javaVersion);
 return DEFAULT_CIPHERS_JAVA8;
 }
+
+/**
+ * Enables automatic reloading of the trust store and key store files 
when they change on disk.
+ *
+ * @throws IOException if creating the FileChangeWatcher objects fails.
+ */
+public void enableCertFileReloading() throws IOException {
+LOG.info("enabling cert file reloading");
+ZKConfig config = new ZKConfig();
+String keyStoreLocation = 
config.getProperty(sslKeystoreLocationProperty);
+if (keyStoreLocation != null && !keyStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(keyStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Key store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newKeyStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+}
+keyStoreFileWatcher = newKeyStoreFileWatcher;
+keyStoreFileWatcher.start();
+}
+String trustStoreLocation = 
config.getProperty(sslTruststoreLocationProperty);
+if (trustStoreLocation != null && !trustStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(trustStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Trust store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newTrustStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+}
+trustStoreFileWatcher = newTrustStoreFileWatcher;
+trustStoreFileWatcher.start();
+}
+}
+
+/**
+ * Disables automatic reloading of the trust store and key store files 
when they change on disk.
+ * Stops background threads and closes WatchService instances.
+ */
+public void disableCertFileReloading() {
--- End diff --

Will do.


---


Re: Leader election

2018-12-06 Thread Maciej Smoleński
Hello,

Ensuring reliability requires to use consensus directly in your service or
change the service to use distributed log/journal (e.g. bookkeeper).

However following idea is simple and in many situation good enough.
If you configure session timeout to 15 seconds - then zookeeper client will
be disconnected when partitioned - after max 15 seconds.
Old service leader will detect network partition max 15 seconds after it
happened.
The new service leader should be idle for initial 15+ seconds (let's say 30
seconds).
In this way you avoid situation with 2 concurrently working leaders.

Described solution has time dependencies and in some situations leads to
incorrect state e.g.:
High load on machine might cause that zookeeper client will detect
disconnection after 60 seconds (instead of expected 15 seconds). In such
situation there will be 2 concurrent leaders.

Maciej




On Thu, Dec 6, 2018 at 8:09 PM Jordan Zimmerman 
wrote:

> > it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself
>
> Yes - there are many ways in which you can end up with 2 leaders. However,
> if properly tuned and configured, it will be for a few seconds at most.
> During a GC pause no work is being done anyway. But, this stuff is very
> tricky. Requiring an atomically unique leader is actually a design smell
> and you should reconsider your architecture.
>
> > Maybe we can use a more
> > lightweight Hazelcast for example?
>
> There is no distributed system that can guarantee a single leader. Instead
> you need to adjust your design and algorithms to deal with this (using
> optimistic locking, etc.).
>
> -Jordan
>
> > On Dec 6, 2018, at 1:52 PM, Michael Borokhovich 
> wrote:
> >
> > Thanks Jordan,
> >
> > Yes, I will try Curator.
> > Also, beyond the problem described in the Tech Note, it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself. E.g., if a "leader" client is connected to the partitioned ZK
> node,
> > it may be notified not at the same time as the other clients connected to
> > the other ZK nodes. So, another client may take leadership while the
> > current leader still unaware of the change. Is it true?
> >
> > Another follow up question. If Zookeeper can guarantee a single leader,
> is
> > it worth using it just for leader election? Maybe we can use a more
> > lightweight Hazelcast for example?
> >
> > Michael.
> >
> >
> > On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman <
> jor...@jordanzimmerman.com>
> > wrote:
> >
> >> It is not possible to achieve the level of consistency you're after in
> an
> >> eventually consistent system such as ZooKeeper. There will always be an
> >> edge case where two ZooKeeper clients will believe they are leaders
> (though
> >> for a short period of time). In terms of how it affects Apache Curator,
> we
> >> have this Tech Note on the subject:
> >> https://cwiki.apache.org/confluence/display/CURATOR/TN10 <
> >> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the
> >> description is true for any ZooKeeper client, not just Curator
> clients). If
> >> you do still intend to use a ZooKeeper lock/leader I suggest you try
> Apache
> >> Curator as writing these "recipes" is not trivial and have many gotchas
> >> that aren't obvious.
> >>
> >> -Jordan
> >>
> >> http://curator.apache.org 
> >>
> >>
> >>> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich 
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> We have a service that runs on 3 hosts for high availability. However,
> at
> >>> any given time, exactly one instance must be active. So, we are
> thinking
> >> to
> >>> use Leader election using Zookeeper.
> >>> To this goal, on each service host we also start a ZK server, so we
> have
> >> a
> >>> 3-nodes ZK cluster and each service instance is a client to its
> dedicated
> >>> ZK server.
> >>> Then, we implement a leader election on top of Zookeeper using a basic
> >>> recipe:
> >>> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection
> .
> >>>
> >>> I have the following questions doubts regarding the approach:
> >>>
> >>> 1. It seems like we can run into inconsistency issues when network
> >>> partition occurs. Zookeeper documentation says that the inconsistency
> >>> period may last “tens of seconds”. Am I understanding correctly that
> >> during
> >>> this time we may have 0 or 2 leaders?
> >>> 2. Is it possible to reduce this inconsistency time (let's say to 3
> >>> seconds) by tweaking tickTime and syncLimit parameters?
> >>> 3. Is there a way to guarantee exactly one leader all the time? Should
> we
> >>> implement a more complex leader election algorithm than the one
> suggested
> >>> in the recipe (using ephemeral_sequential nodes)?
> >>>
> >>> Thanks,
> >>> Michael.
> >>
> >>
>
>


[GitHub] zookeeper pull request #628: ZOOKEEPER-3140: Allow Followers to host Observe...

2018-12-06 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/628#discussion_r239578960
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/ObserverMaster.java
 ---
@@ -0,0 +1,514 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.quorum;
+
+import org.apache.zookeeper.jmx.MBeanRegistry;
+import org.apache.zookeeper.server.Request;
+import org.apache.zookeeper.server.ZKDatabase;
+
+import java.io.BufferedInputStream;
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.net.ServerSocket;
+import java.net.Socket;
+import java.net.SocketAddress;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.zookeeper.server.quorum.auth.QuorumAuthServer;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Used by Followers to host Observers. This reduces the network load on 
the Leader process by pushing
+ * the responsibility for keeping Observers in sync off the leading peer.
+ *
+ * It is expected that Observers will continue to perform the initial 
vetting of clients and requests.
+ * Observers send the request to the follower where it is received by an 
ObserverMaster.
+ *
+ * The ObserverMaster forwards a copy of the request to the ensemble 
Leader and inserts it into its own
+ * request processor pipeline where it can be matched with the response 
comes back. All commits received
+ * from the Leader will be forwarded along to every Learner connected to 
the ObserverMaster.
+ *
+ * New Learners connecting to a Follower will receive a LearnerHandler 
object and be party to its syncing logic
+ * to be brought up to date.
+ *
+ * The logic is quite a bit simpler than the corresponding logic in Leader 
because it only hosts observers.
+ */
+public class ObserverMaster implements LearnerMaster, Runnable {
+private static final Logger LOG = 
LoggerFactory.getLogger(ObserverMaster.class);
+
+//Follower counter
+private final AtomicLong followerCounter = new AtomicLong(-1);
+
+private QuorumPeer self;
+private FollowerZooKeeperServer zks;
+private int port;
+
+private Set activeObservers = 
Collections.newSetFromMap(new ConcurrentHashMap());
+
+private final ConcurrentHashMap 
connectionBeans = new ConcurrentHashMap<>();
+
+/**
+ * we want to keep a log of past txns so that observers can sync up 
with us when we connect,
+ * but we can't keep everything in memory, so this limits how much 
memory will be dedicated
+ * to keeping recent txns.
+ */
+private final static int PKTS_SIZE_LIMIT = 32 * 1024 * 1024;
+private static volatile int pktsSizeLimit = 
Integer.getInteger("zookeeper.observerMaster.sizeLimit", PKTS_SIZE_LIMIT);
+private ConcurrentLinkedQueue proposedPkts = new 
ConcurrentLinkedQueue<>();
+private ConcurrentLinkedQueue committedPkts = new 
ConcurrentLinkedQueue<>();
+private int pktsSize = 0;
+
+private long lastProposedZxid;
+
+// ensure ordering of revalidations returned to this learner
+private final Object revalidateSessionLock = new Object();
+
+// Throttle when there are too many concurrent snapshots being sent to 
observers
+private static final String MAX_CONCURRENT_SNAPSHOTS = 
"zookeeper.leader.maxConcurrentSnapshots";
+private static final int maxConcurrentSnapshots;
+
+private 

[jira] [Commented] (ZOOKEEPER-3188) Improve resilience to network

2018-12-06 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712162#comment-16712162
 ] 

Michael Han commented on ZOOKEEPER-3188:


Appreciate detailed reply, agree on replies on 1 and 2.

bq. Such changes should be handled exactly the way they are now and there 
should be no interactions with the changes to the networking stack. 

Agreed. I think I was just looking for more elaborated use cases around using 
reconfig to manipulate multiple server addresses, as the proposal does not go 
into details other than 'support dynamic reconfiguration.'. I expect dynamic 
reconfiguration will just work out of box with proper abstractions, without 
touching too much part of reconfiguration code path, but there are some 
subtleties to consider. A couple of examples:

* Proper rebalance client connections - this was discussed on dev mailing list.
* Avoid unnecessary leader elections during reconfig - this change will 
probably change the abstraction of server addresses (QuorumServer) and we 
should be careful how the QuorumServers will be compared, to avoid unnecessary 
leader elections in cases where the server set is the same but some servers 
have new server addresses.
There might be more cases to consider...

bq. The documentation, in particular, should be essentially identical except 
that an example of adding an address might be nice

I am thinking at least 
[this|https://zookeeper.apache.org/doc/r3.5.4-beta/zookeeperReconfig.html#sc_reconfig_clientport]
 should be updated to reflect the fact that 1. the config format is changed and 
2. the multiple server addresses can be manipulated via reconfig.


> Improve resilience to network
> -
>
> Key: ZOOKEEPER-3188
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3188
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We propose to add network level resiliency to Zookeeper. The ideas that we 
> have on the topic have been discussed on the mailing list and via a 
> specification document that is located at 
> [https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]
> That document is copied to this issue which is being created to report the 
> results of experimental implementations.
> h1. Zookeeper Network Resilience
> h2. Background
> Zookeeper is designed to help in building distributed systems. It provides a 
> variety of operations for doing this and all of these operations have rather 
> strict guarantees on semantics. Zookeeper itself is a distributed system made 
> up of cluster containing a leader and a number of followers. The leader is 
> designated in a process known as leader election in which a majority of all 
> nodes in the cluster must agree on a leader. All subsequent operations are 
> initiated by the leader and completed when a majority of nodes have confirmed 
> the operation. Whenever an operation cannot be confirmed by a majority or 
> whenever the leader goes missing for a time, a new leader election is 
> conducted and normal operations proceed once a new leader is confirmed.
>  
> The details of this are not important relative to this discussion. What is 
> important is that the semantics of the operations conducted by a Zookeeper 
> cluster and the semantics of how client processes communicate with the 
> cluster depend only on the basic fact that messages sent over TCP connections 
> will never appear out of order or missing. Central to the design of ZK is 
> that a server to server network connection is used as long as it works to use 
> it and a new connection is made when it appears that the old connection isn't 
> working.
>  
> As currently implemented, however, each member of a Zookeeper cluster can 
> have only a single address as viewed from some other process. This means, 
> absent network link bonding, that the loss of a single switch or a few 
> network connections could completely stop the operations of a the Zookeeper 
> cluster. It is the goal of this work to address this issue by allowing each 
> server to listen on multiple network interfaces and to connect to other 
> servers any of several addresses. The effect will be to allow servers to 
> communicate over redundant network paths to improve resiliency to network 
> failures without changing any core algorithms.
> h2. Proposed Change
> Interestingly, the correct operations of a Zookeeper cluster do not depend on 
> _how_ a TCP connection was made. There is no reason at all not to advertise 
> multiple addresses for members of a Zookeeper cluster. 
>  
> Connections between members of a Zookeeper cluster and between a client and a 
> cluster member are established by referencing a 

ZooKeeper_branch34_jdk8 - Build # 1618 - Failure

2018-12-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1618/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 42.78 KB...]
[junit] Running org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
21.956 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.84 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.683 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.467 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.47 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.757 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.311 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.125 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.539 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.355 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.427 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.221 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.834 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.122 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.879 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.01 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.708 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.162 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.647 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
31.89 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.413 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.744 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.074 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1408: 
The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk8/build.xml:1411: 
Tests failed!

Total time: 44 minutes 41 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  
org.apache.zookeeper.test.WatcherTest.testWatcherAutoResetDisabledWithLocal

Error Message:
KeeperErrorCode = ConnectionLoss for /watchtest/child2

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /watchtest/child2
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:882)
at 

[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-06 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/680#discussion_r239641391
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/common/X509Util.java ---
@@ -446,4 +458,119 @@ private void configureSSLServerSocket(SSLServerSocket 
sslServerSocket) {
 LOG.debug("Using Java8-optimized cipher suites for Java version 
{}", javaVersion);
 return DEFAULT_CIPHERS_JAVA8;
 }
+
+/**
+ * Enables automatic reloading of the trust store and key store files 
when they change on disk.
+ *
+ * @throws IOException if creating the FileChangeWatcher objects fails.
+ */
+public void enableCertFileReloading() throws IOException {
+LOG.info("enabling cert file reloading");
+ZKConfig config = new ZKConfig();
+String keyStoreLocation = 
config.getProperty(sslKeystoreLocationProperty);
+if (keyStoreLocation != null && !keyStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(keyStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Key store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newKeyStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+}
+keyStoreFileWatcher = newKeyStoreFileWatcher;
+keyStoreFileWatcher.start();
+}
+String trustStoreLocation = 
config.getProperty(sslTruststoreLocationProperty);
+if (trustStoreLocation != null && !trustStoreLocation.isEmpty()) {
+final Path filePath = 
Paths.get(trustStoreLocation).toAbsolutePath();
+Path parentPath = filePath.getParent();
+if (parentPath == null) {
+throw new IOException(
+"Trust store path does not have a parent: " + 
filePath);
+}
+FileChangeWatcher newTrustStoreFileWatcher = new 
FileChangeWatcher(
+parentPath,
+watchEvent -> {
+handleWatchEvent(filePath, watchEvent);
+});
+// stop old watcher if there is one
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+}
+trustStoreFileWatcher = newTrustStoreFileWatcher;
+trustStoreFileWatcher.start();
+}
+}
+
+/**
+ * Disables automatic reloading of the trust store and key store files 
when they change on disk.
+ * Stops background threads and closes WatchService instances.
+ */
+public void disableCertFileReloading() {
+if (keyStoreFileWatcher != null) {
+keyStoreFileWatcher.stop();
+keyStoreFileWatcher = null;
+}
+if (trustStoreFileWatcher != null) {
+trustStoreFileWatcher.stop();
+trustStoreFileWatcher = null;
+}
+}
+
+// Finalizer guardian object, see Effective Java item 7
+// TODO: finalize() is deprecated starting with Java 10. This needs to 
be
+// replaced with an explicit shutdown call.
+@SuppressWarnings("unused")
+private final Object finalizerGuardian = new Object() {
--- End diff --

Done.


---


[GitHub] zookeeper issue #680: ZOOKEEPER-3174: Quorum TLS - support reloading trust/k...

2018-12-06 Thread ivmaykov
Github user ivmaykov commented on the issue:

https://github.com/apache/zookeeper/pull/680
  
@anmolnar removed finalizer, use explicit close()


---


Re: Leader election

2018-12-06 Thread Jordan Zimmerman
It is not possible to achieve the level of consistency you're after in an 
eventually consistent system such as ZooKeeper. There will always be an edge 
case where two ZooKeeper clients will believe they are leaders (though for a 
short period of time). In terms of how it affects Apache Curator, we have this 
Tech Note on the subject: 
https://cwiki.apache.org/confluence/display/CURATOR/TN10 
 (the description is 
true for any ZooKeeper client, not just Curator clients). If you do still 
intend to use a ZooKeeper lock/leader I suggest you try Apache Curator as 
writing these "recipes" is not trivial and have many gotchas that aren't 
obvious. 

-Jordan

http://curator.apache.org 


> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich  wrote:
> 
> Hello,
> 
> We have a service that runs on 3 hosts for high availability. However, at
> any given time, exactly one instance must be active. So, we are thinking to
> use Leader election using Zookeeper.
> To this goal, on each service host we also start a ZK server, so we have a
> 3-nodes ZK cluster and each service instance is a client to its dedicated
> ZK server.
> Then, we implement a leader election on top of Zookeeper using a basic
> recipe:
> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection.
> 
> I have the following questions doubts regarding the approach:
> 
> 1. It seems like we can run into inconsistency issues when network
> partition occurs. Zookeeper documentation says that the inconsistency
> period may last “tens of seconds”. Am I understanding correctly that during
> this time we may have 0 or 2 leaders?
> 2. Is it possible to reduce this inconsistency time (let's say to 3
> seconds) by tweaking tickTime and syncLimit parameters?
> 3. Is there a way to guarantee exactly one leader all the time? Should we
> implement a more complex leader election algorithm than the one suggested
> in the recipe (using ephemeral_sequential nodes)?
> 
> Thanks,
> Michael.



Re: Leader election

2018-12-06 Thread Michael Han
Tweak timeout is tempting as your solution might work most of the time yet
fail in certain cases (which others have pointed out). If the goal is
absolute correctness then we should avoid timeout, which does not guarantee
correctness as it only makes the problem hard to manifest. Fencing is the
right solution here - the zxid and also znode cversion can be used as
fencing token if you use ZooKeeper. Fencing will guarantee at any single
point in time you will have one active leader in action (it does not
guarantee that at a single point of time there are multiple parties *think*
they are the leader). An alternative solution, depends on your use case, is
to instead of requiring a single active leader in action at any time, make
your workload idempotent so multiple active leaders don't do any damage.

On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman 
wrote:

> > Old service leader will detect network partition max 15 seconds after it
> > happened.
>
> If the old service leader is in a very long GC it will not detect the
> partition. In the face of VM pauses, etc. it's not possible to avoid 2
> leaders for a short period of time.
>
> -JZ


Re: Leader election

2018-12-06 Thread Michael Borokhovich
We are planning to run Zookeeper nodes embedded with the client nodes.
I.e., each client runs also a ZK node. So, network partition will
disconnect a ZK node and not only the client.
My concern is about the following statement from the ZK documentation:

"Timeliness: The clients view of the system is guaranteed to be up-to-date
within a certain time bound. (*On the order of tens of seconds.*) Either
system changes will be seen by a client within this bound, or the client
will detect a service outage."

What are these "*tens of seconds*"? Can we reduce this time by configuring
"syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
guarantee on this time bound?


On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman 
wrote:

> > Old service leader will detect network partition max 15 seconds after it
> > happened.
>
> If the old service leader is in a very long GC it will not detect the
> partition. In the face of VM pauses, etc. it's not possible to avoid 2
> leaders for a short period of time.
>
> -JZ


[jira] [Commented] (ZOOKEEPER-1818) Fix don't care for trunk

2018-12-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711371#comment-16711371
 ] 

Hudson commented on ZOOKEEPER-1818:
---

FAILURE: Integrated in Jenkins build Zookeeper-trunk-single-thread #135 (See 
[https://builds.apache.org/job/Zookeeper-trunk-single-thread/135/])
ZOOKEEPER-1818: Correctly handle potential inconsistent (andor: rev 
b7403b790ff8729f817680afcdef38cb98b87720)
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/FLEOutOfElectionTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Vote.java


> Fix don't care for trunk
> 
>
> Key: ZOOKEEPER-1818
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1818
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Flavio Junqueira
>Assignee: Fangmin Lv
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1818.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> See umbrella jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper-trunk - Build # 296 - Failure

2018-12-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/296/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 177.58 KB...]
[junit] Running org.apache.zookeeper.test.SessionTest in thread 1
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
16.361 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.379 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.089 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 1
[junit] Tests run: 106, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
334.325 sec, Thread: 3, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Test org.apache.zookeeper.test.NettyNettySuiteTest FAILED
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.631 sec, Thread: 3, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.044 sec, Thread: 3, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 3
[junit] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.743 sec, Thread: 3, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.074 sec, Thread: 3, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.917 sec, Thread: 3, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.719 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 3
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.933 sec, Thread: 3, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.09 sec, Thread: 3, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.045 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.569 sec, Thread: 1, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.093 sec, Thread: 1, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.838 sec, Thread: 1, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.728 sec, Thread: 1, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Running org.apache.zookeeper.util.PemReaderTest in thread 1
[junit] Tests run: 64, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.077 sec, Thread: 1, Class: org.apache.zookeeper.util.PemReaderTest
[junit] Running org.apache.jute.BinaryInputArchiveTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.074 sec, Thread: 1, Class: org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
29.808 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 106, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
352.607 sec, Thread: 4, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
262.526 sec, Thread: 2, Class: 

[jira] [Commented] (ZOOKEEPER-1818) Fix don't care for trunk

2018-12-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711396#comment-16711396
 ] 

Hudson commented on ZOOKEEPER-1818:
---

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #296 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/296/])
ZOOKEEPER-1818: Correctly handle potential inconsistent (andor: rev 
b7403b790ff8729f817680afcdef38cb98b87720)
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/FLEOutOfElectionTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Vote.java


> Fix don't care for trunk
> 
>
> Key: ZOOKEEPER-1818
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1818
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Flavio Junqueira
>Assignee: Fangmin Lv
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1818.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> See umbrella jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #732: typo

2018-12-06 Thread stanlyDoge
GitHub user stanlyDoge opened a pull request:

https://github.com/apache/zookeeper/pull/732

typo



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/stanlyDoge/zookeeper patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #732


commit 7681b79cff34904c5c02396a086f1e3918cf7e1c
Author: Stanislav Knot 
Date:   2018-12-06T13:25:00Z

typo




---


Re: Leader election

2018-12-06 Thread Ted Dunning
ZK is able to guarantee that there is only one leader for the purposes of
updating ZK data. That is because all commits have to originate with the
current quorum leader and then be acknowledged by a quorum of the current
cluster. IF the leader can't get enough acks, then it has de facto lost
leadership.

The problem comes when there is another system that depends on ZK data.
Such data might record which node is the leader for some other purposes.
That leader will only assume that they have become leader if they succeed
in writing to ZK, but if there is a partition, then the old leader may not
be notified that another leader has established themselves until some time
after it has happened. Of course, if the erstwhile leader tried to validate
its position with a write to ZK, that write would fail, but you can't spend
100% of your time doing that.

it all comes down to the fact that a ZK client determines that it is
connected to a ZK cluster member by pinging and that cluster member sees
heartbeats from the leader. The fact is, though, that you can't tune these
pings to be faster than some level because you start to see lots of false
positives for loss of connection. Remember that it isn't just loss of
connection here that is the point. Any kind of delay would have the same
effect. Getting these ping intervals below one second makes for a very
twitchy system.



On Fri, Dec 7, 2018 at 11:03 AM Michael Borokhovich 
wrote:

> We are planning to run Zookeeper nodes embedded with the client nodes.
> I.e., each client runs also a ZK node. So, network partition will
> disconnect a ZK node and not only the client.
> My concern is about the following statement from the ZK documentation:
>
> "Timeliness: The clients view of the system is guaranteed to be up-to-date
> within a certain time bound. (*On the order of tens of seconds.*) Either
> system changes will be seen by a client within this bound, or the client
> will detect a service outage."
>
> What are these "*tens of seconds*"? Can we reduce this time by configuring
> "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
> guarantee on this time bound?
>
>
> On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <
> jor...@jordanzimmerman.com>
> wrote:
>
> > > Old service leader will detect network partition max 15 seconds after
> it
> > > happened.
> >
> > If the old service leader is in a very long GC it will not detect the
> > partition. In the face of VM pauses, etc. it's not possible to avoid 2
> > leaders for a short period of time.
> >
> > -JZ
>


[GitHub] zookeeper pull request #733: hotfix: Fix type in zookeeperInternals.md

2018-12-06 Thread TisonKun
GitHub user TisonKun opened a pull request:

https://github.com/apache/zookeeper/pull/733

hotfix: Fix type in zookeeperInternals.md

I think this quick fix is far from need a JIRA. If needed, I would create 
one.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TisonKun/zookeeper patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #733


commit e7ab19e6840295c9c157520162351e8d347271e6
Author: Tzu-Li Chen 
Date:   2018-12-07T07:07:14Z

hotfix: Fix type in zookeeperInternals.md




---


[GitHub] zookeeper issue #703: [ZOOKEEPER-1818] Correctly handle potential inconsiste...

2018-12-06 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/703
  
Thanks @anmolnar, I'll send out the PR for 3.5.


---


[GitHub] zookeeper issue #628: ZOOKEEPER-3140: Allow Followers to host Observers

2018-12-06 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/628
  
Thanks @enixon for doing the update and rebase, went through this again, 
looks legit to me. I also compared with internal version and made sure this has 
included all the improvement and bug fixes.

Maybe consider to add the om_proposal_process_time_ms and 
om_commit_process_time_ms metric, but this is not a must have.

+1 this should be ready to go.



---


Re: Leader election

2018-12-06 Thread Michael Borokhovich
Thanks Jordan,

Yes, I will try Curator.
Also, beyond the problem described in the Tech Note, it seems like the
inconsistency may be caused by the partition of the Zookeeper cluster
itself. E.g., if a "leader" client is connected to the partitioned ZK node,
it may be notified not at the same time as the other clients connected to
the other ZK nodes. So, another client may take leadership while the
current leader still unaware of the change. Is it true?

Another follow up question. If Zookeeper can guarantee a single leader, is
it worth using it just for leader election? Maybe we can use a more
lightweight Hazelcast for example?

Michael.


On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman 
wrote:

> It is not possible to achieve the level of consistency you're after in an
> eventually consistent system such as ZooKeeper. There will always be an
> edge case where two ZooKeeper clients will believe they are leaders (though
> for a short period of time). In terms of how it affects Apache Curator, we
> have this Tech Note on the subject:
> https://cwiki.apache.org/confluence/display/CURATOR/TN10 <
> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the
> description is true for any ZooKeeper client, not just Curator clients). If
> you do still intend to use a ZooKeeper lock/leader I suggest you try Apache
> Curator as writing these "recipes" is not trivial and have many gotchas
> that aren't obvious.
>
> -Jordan
>
> http://curator.apache.org 
>
>
> > On Dec 5, 2018, at 6:20 PM, Michael Borokhovich 
> wrote:
> >
> > Hello,
> >
> > We have a service that runs on 3 hosts for high availability. However, at
> > any given time, exactly one instance must be active. So, we are thinking
> to
> > use Leader election using Zookeeper.
> > To this goal, on each service host we also start a ZK server, so we have
> a
> > 3-nodes ZK cluster and each service instance is a client to its dedicated
> > ZK server.
> > Then, we implement a leader election on top of Zookeeper using a basic
> > recipe:
> > https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection.
> >
> > I have the following questions doubts regarding the approach:
> >
> > 1. It seems like we can run into inconsistency issues when network
> > partition occurs. Zookeeper documentation says that the inconsistency
> > period may last “tens of seconds”. Am I understanding correctly that
> during
> > this time we may have 0 or 2 leaders?
> > 2. Is it possible to reduce this inconsistency time (let's say to 3
> > seconds) by tweaking tickTime and syncLimit parameters?
> > 3. Is there a way to guarantee exactly one leader all the time? Should we
> > implement a more complex leader election algorithm than the one suggested
> > in the recipe (using ephemeral_sequential nodes)?
> >
> > Thanks,
> > Michael.
>
>