Success: ZOOKEEPER- PreCommit Build #1498

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1498/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 40.14 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1498//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1498//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1498//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 37 minutes 0 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2967
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2967) Add check to validate dataDir and dataLogDir parameters at startup

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367982#comment-16367982
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2967:
---

Github user mfenes commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/450#discussion_r168891236
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/persistence/FileTxnSnapLogTest.java 
---
@@ -35,92 +38,181 @@
 
 public class FileTxnSnapLogTest {
 
-/**
- * Test verifies the auto creation of data dir and data log dir.
- * Sets "zookeeper.datadir.autocreate" to true.
- */
-@Test
-public void testWithAutoCreateDataLogDir() throws IOException {
-File tmpDir = ClientBase.createEmptyTestDir();
-File dataDir = new File(tmpDir, "data");
-File snapDir = new File(tmpDir, "data_txnlog");
-Assert.assertFalse("data directory already exists", 
dataDir.exists());
-Assert.assertFalse("snapshot directory already exists", 
snapDir.exists());
+private File tmpDir;
+
+private File logDir;
+
+private File snapDir;
+
+private File logVersionDir;
+
+private File snapVersionDir;
+
+@Before
+public void setUp() throws Exception {
+tmpDir = ClientBase.createEmptyTestDir();
--- End diff --

@eolivelli Thanks for the idea. I'll check and use this next time.


> Add check to validate dataDir and dataLogDir parameters at startup
> --
>
> Key: ZOOKEEPER-2967
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2967
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.11
>Reporter: Andor Molnar
>Assignee: Mark Fenes
>Priority: Major
>  Labels: startup, validation
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
>
> According to  -ZOOKEEPER-2960- we should at a startup check to validate that 
> dataDir and dataLogDir parameters are set correctly.
> Perhaps we should introduce a check of some kind? If datalogdir is different 
> that datadir and snapshots exist in datalogdir we throw an exception and quit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #450: ZOOKEEPER-2967: Add check to validate dataDir a...

2018-02-16 Thread mfenes
Github user mfenes commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/450#discussion_r168891236
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/persistence/FileTxnSnapLogTest.java 
---
@@ -35,92 +38,181 @@
 
 public class FileTxnSnapLogTest {
 
-/**
- * Test verifies the auto creation of data dir and data log dir.
- * Sets "zookeeper.datadir.autocreate" to true.
- */
-@Test
-public void testWithAutoCreateDataLogDir() throws IOException {
-File tmpDir = ClientBase.createEmptyTestDir();
-File dataDir = new File(tmpDir, "data");
-File snapDir = new File(tmpDir, "data_txnlog");
-Assert.assertFalse("data directory already exists", 
dataDir.exists());
-Assert.assertFalse("snapshot directory already exists", 
snapDir.exists());
+private File tmpDir;
+
+private File logDir;
+
+private File snapDir;
+
+private File logVersionDir;
+
+private File snapVersionDir;
+
+@Before
+public void setUp() throws Exception {
+tmpDir = ClientBase.createEmptyTestDir();
--- End diff --

@eolivelli Thanks for the idea. I'll check and use this next time.


---


Success: ZOOKEEPER- PreCommit Build #1497

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1497/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 80.60 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1497//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1497//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1497//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 19 minutes 25 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2967
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2967) Add check to validate dataDir and dataLogDir parameters at startup

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367972#comment-16367972
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2967:
---

Github user mfenes commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/459#discussion_r168889616
  
--- Diff: src/java/test/org/apache/zookeeper/test/ClientBase.java ---
@@ -351,20 +351,37 @@ static void verifyThreadTerminated(Thread thread, 
long millis)
 }
 }
 
+public static File createEmptyTestDir() throws IOException {
+return createTmpDir(BASETEST, false);
+}
 
 public static File createTmpDir() throws IOException {
-return createTmpDir(BASETEST);
+return createTmpDir(BASETEST, true);
 }
-static File createTmpDir(File parentDir) throws IOException {
+
+static File createTmpDir(File parentDir, boolean createInitFile) 
throws IOException {
 File tmpFile = File.createTempFile("test", ".junit", parentDir);
 // don't delete tmpFile - this ensures we don't attempt to create
 // a tmpDir with a duplicate name
 File tmpDir = new File(tmpFile + ".dir");
 Assert.assertFalse(tmpDir.exists()); // never true if tmpfile does 
it's job
 Assert.assertTrue(tmpDir.mkdirs());
 
+// todo not every tmp directory needs this file
--- End diff --

Unfortunately this todo comment is in the master branch from where I 
backported these changes in ClientBase.


> Add check to validate dataDir and dataLogDir parameters at startup
> --
>
> Key: ZOOKEEPER-2967
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2967
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.11
>Reporter: Andor Molnar
>Assignee: Mark Fenes
>Priority: Major
>  Labels: startup, validation
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
>
> According to  -ZOOKEEPER-2960- we should at a startup check to validate that 
> dataDir and dataLogDir parameters are set correctly.
> Perhaps we should introduce a check of some kind? If datalogdir is different 
> that datadir and snapshots exist in datalogdir we throw an exception and quit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #459: ZOOKEEPER-2967: Add check to validate dataDir a...

2018-02-16 Thread mfenes
Github user mfenes commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/459#discussion_r168889616
  
--- Diff: src/java/test/org/apache/zookeeper/test/ClientBase.java ---
@@ -351,20 +351,37 @@ static void verifyThreadTerminated(Thread thread, 
long millis)
 }
 }
 
+public static File createEmptyTestDir() throws IOException {
+return createTmpDir(BASETEST, false);
+}
 
 public static File createTmpDir() throws IOException {
-return createTmpDir(BASETEST);
+return createTmpDir(BASETEST, true);
 }
-static File createTmpDir(File parentDir) throws IOException {
+
+static File createTmpDir(File parentDir, boolean createInitFile) 
throws IOException {
 File tmpFile = File.createTempFile("test", ".junit", parentDir);
 // don't delete tmpFile - this ensures we don't attempt to create
 // a tmpDir with a duplicate name
 File tmpDir = new File(tmpFile + ".dir");
 Assert.assertFalse(tmpDir.exists()); // never true if tmpfile does 
it's job
 Assert.assertTrue(tmpDir.mkdirs());
 
+// todo not every tmp directory needs this file
--- End diff --

Unfortunately this todo comment is in the master branch from where I 
backported these changes in ClientBase.


---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367963#comment-16367963
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168884569
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -465,6 +470,37 @@ private void waitForAll(ZooKeeper[] zks, States state) 
throws InterruptedExcepti
 private static class Servers {
 MainThread mt[];
 ZooKeeper zk[];
+int[] clientPorts;
+
+public void shutDownAllServers() throws InterruptedException {
+for (MainThread t: mt) {
+t.shutdown();
+}
+}
+
+public void restartAllServersAndClients(Watcher watcher) throws 
IOException {
+for (MainThread t : mt) {
+if (!t.isAlive()) {
+t.start();
+}
+}
+for (int i = 0; i < zk.length; i++) {
+restartClient(i, watcher);
+}
+}
+
+public void restartClient(int i, Watcher watcher) throws 
IOException {
--- End diff --

annoying nitpick: let's use a better argument name than `i`


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367961#comment-16367961
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168884819
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -465,6 +470,37 @@ private void waitForAll(ZooKeeper[] zks, States state) 
throws InterruptedExcepti
 private static class Servers {
 MainThread mt[];
 ZooKeeper zk[];
+int[] clientPorts;
+
+public void shutDownAllServers() throws InterruptedException {
+for (MainThread t: mt) {
+t.shutdown();
+}
+}
+
+public void restartAllServersAndClients(Watcher watcher) throws 
IOException {
+for (MainThread t : mt) {
+if (!t.isAlive()) {
+t.start();
+}
+}
+for (int i = 0; i < zk.length; i++) {
+restartClient(i, watcher);
+}
+}
+
+public void restartClient(int i, Watcher watcher) throws 
IOException {
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, watcher);
+}
+
+public int findLeader() {
--- End diff --

there are other places in this test class that benefit from this 
refactoring. Would you mind cleaning that up?


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367962#comment-16367962
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168886064
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+servers = LaunchServers(SERVER_COUNT);
+
+waitForAll(servers, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+servers.shutDownAllServers();
+waitForAll(servers, States.CONNECTING);
+servers.restartAllServersAndClients(this);
+waitForAll(servers, States.CONNECTED);
+
+// 2. kill all followers
+int leader = servers.findLeader();
+Map outstanding =  
servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to looking
+servers.mt[leader].main.quorumPeer.tickTime = 1;
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the new leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+servers.restartClient(i, this);
+waitForOne(servers.zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to old leader and make sure it's 
synced to disk,
+//which means it acked from itself
+try {
+servers.zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertEquals(1, outstanding.size());
+Proposal p = outstanding.values().iterator().next();
+Assert.assertEquals(OpCode.create, p.request.getHdr().getType());
+
+// make sure it has a chance to write it to disk
+int sleepTime = 0;
+Long longLeader = new Long(leader);
+while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) {
+if (sleepTime > 2000) {
+Assert.fail("Transaction not synced to disk within 1 
second " + p.qvAcksetPairs.get(0).getAckset()
++ " expected " + leader);
+}
+Thread.sleep(100);
+sleepTime += 100;
+}
+
+// 6. wait for the leader to quit due to not enough followers and 
come back up as a part of the new quorum
+sleepTime = 0;
+Follower f = servers.mt[leader].main.quorumPeer.follower;
+while (f == null || !f.isRunning()) {
+if (sleepTime > 10_000) {
--- End diff --

nitpick: can we reuse the ticktime here to make the relationship more 
obvious?


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> 

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367960#comment-16367960
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168887935
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+servers = LaunchServers(SERVER_COUNT);
+
+waitForAll(servers, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+servers.shutDownAllServers();
+waitForAll(servers, States.CONNECTING);
+servers.restartAllServersAndClients(this);
+waitForAll(servers, States.CONNECTED);
+
+// 2. kill all followers
+int leader = servers.findLeader();
+Map outstanding =  
servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to looking
+servers.mt[leader].main.quorumPeer.tickTime = 1;
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the new leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+servers.restartClient(i, this);
+waitForOne(servers.zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to old leader and make sure it's 
synced to disk,
+//which means it acked from itself
+try {
+servers.zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertEquals(1, outstanding.size());
+Proposal p = outstanding.values().iterator().next();
+Assert.assertEquals(OpCode.create, p.request.getHdr().getType());
+
+// make sure it has a chance to write it to disk
+int sleepTime = 0;
+Long longLeader = new Long(leader);
+while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) {
+if (sleepTime > 2000) {
+Assert.fail("Transaction not synced to disk within 1 
second " + p.qvAcksetPairs.get(0).getAckset()
++ " expected " + leader);
+}
+Thread.sleep(100);
+sleepTime += 100;
+}
+
+// 6. wait for the leader to quit due to not enough followers and 
come back up as a part of the new quorum
+sleepTime = 0;
+Follower f = servers.mt[leader].main.quorumPeer.follower;
+while (f == null || !f.isRunning()) {
+if (sleepTime > 10_000) {
+Assert.fail("Took too long for old leader to time out " + 
servers.mt[leader].main.quorumPeer.getPeerState());
+}
+Thread.sleep(100);
+sleepTime += 100;
+f = servers.mt[leader].main.quorumPeer.follower;
+}
+servers.mt[leader].shutdown();
--- End diff --

why do we need this?


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 

[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168886064
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+servers = LaunchServers(SERVER_COUNT);
+
+waitForAll(servers, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+servers.shutDownAllServers();
+waitForAll(servers, States.CONNECTING);
+servers.restartAllServersAndClients(this);
+waitForAll(servers, States.CONNECTED);
+
+// 2. kill all followers
+int leader = servers.findLeader();
+Map outstanding =  
servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to looking
+servers.mt[leader].main.quorumPeer.tickTime = 1;
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the new leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+servers.restartClient(i, this);
+waitForOne(servers.zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to old leader and make sure it's 
synced to disk,
+//which means it acked from itself
+try {
+servers.zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertEquals(1, outstanding.size());
+Proposal p = outstanding.values().iterator().next();
+Assert.assertEquals(OpCode.create, p.request.getHdr().getType());
+
+// make sure it has a chance to write it to disk
+int sleepTime = 0;
+Long longLeader = new Long(leader);
+while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) {
+if (sleepTime > 2000) {
+Assert.fail("Transaction not synced to disk within 1 
second " + p.qvAcksetPairs.get(0).getAckset()
++ " expected " + leader);
+}
+Thread.sleep(100);
+sleepTime += 100;
+}
+
+// 6. wait for the leader to quit due to not enough followers and 
come back up as a part of the new quorum
+sleepTime = 0;
+Follower f = servers.mt[leader].main.quorumPeer.follower;
+while (f == null || !f.isRunning()) {
+if (sleepTime > 10_000) {
--- End diff --

nitpick: can we reuse the ticktime here to make the relationship more 
obvious?


---


[jira] [Commented] (ZOOKEEPER-2967) Add check to validate dataDir and dataLogDir parameters at startup

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367959#comment-16367959
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2967:
---

Github user mfenes commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/458#discussion_r16396
  
--- Diff: src/java/test/org/apache/zookeeper/test/ClientBase.java ---
@@ -368,22 +368,37 @@ static void verifyThreadTerminated(Thread thread, 
long millis)
 }
 }
 
+public static File createEmptyTestDir() throws IOException {
+return createTmpDir(BASETEST, false);
+}
 
 public static File createTmpDir() throws IOException {
-return createTmpDir(BASETEST);
+return createTmpDir(BASETEST, true);
 }
 
-static File createTmpDir(File parentDir) throws IOException {
+static File createTmpDir(File parentDir, boolean createInitFile) 
throws IOException {
 File tmpFile = File.createTempFile("test", ".junit", parentDir);
 // don't delete tmpFile - this ensures we don't attempt to create
 // a tmpDir with a duplicate name
 File tmpDir = new File(tmpFile + ".dir");
 Assert.assertFalse(tmpDir.exists()); // never true if tmpfile does 
it's job
 Assert.assertTrue(tmpDir.mkdirs());
 
+// todo not every tmp directory needs this file
--- End diff --

Unfortunately this todo comment is in the master branch from where I 
backported these changes in ClientBase. 


> Add check to validate dataDir and dataLogDir parameters at startup
> --
>
> Key: ZOOKEEPER-2967
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2967
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.11
>Reporter: Andor Molnar
>Assignee: Mark Fenes
>Priority: Major
>  Labels: startup, validation
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
>
> According to  -ZOOKEEPER-2960- we should at a startup check to validate that 
> dataDir and dataLogDir parameters are set correctly.
> Perhaps we should introduce a check of some kind? If datalogdir is different 
> that datadir and snapshots exist in datalogdir we throw an exception and quit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168887935
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+servers = LaunchServers(SERVER_COUNT);
+
+waitForAll(servers, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+servers.shutDownAllServers();
+waitForAll(servers, States.CONNECTING);
+servers.restartAllServersAndClients(this);
+waitForAll(servers, States.CONNECTED);
+
+// 2. kill all followers
+int leader = servers.findLeader();
+Map outstanding =  
servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to looking
+servers.mt[leader].main.quorumPeer.tickTime = 1;
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+servers.mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the new leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+servers.restartClient(i, this);
+waitForOne(servers.zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to old leader and make sure it's 
synced to disk,
+//which means it acked from itself
+try {
+servers.zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertEquals(1, outstanding.size());
+Proposal p = outstanding.values().iterator().next();
+Assert.assertEquals(OpCode.create, p.request.getHdr().getType());
+
+// make sure it has a chance to write it to disk
+int sleepTime = 0;
+Long longLeader = new Long(leader);
+while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) {
+if (sleepTime > 2000) {
+Assert.fail("Transaction not synced to disk within 1 
second " + p.qvAcksetPairs.get(0).getAckset()
++ " expected " + leader);
+}
+Thread.sleep(100);
+sleepTime += 100;
+}
+
+// 6. wait for the leader to quit due to not enough followers and 
come back up as a part of the new quorum
+sleepTime = 0;
+Follower f = servers.mt[leader].main.quorumPeer.follower;
+while (f == null || !f.isRunning()) {
+if (sleepTime > 10_000) {
+Assert.fail("Took too long for old leader to time out " + 
servers.mt[leader].main.quorumPeer.getPeerState());
+}
+Thread.sleep(100);
+sleepTime += 100;
+f = servers.mt[leader].main.quorumPeer.follower;
+}
+servers.mt[leader].shutdown();
--- End diff --

why do we need this?


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168884819
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -465,6 +470,37 @@ private void waitForAll(ZooKeeper[] zks, States state) 
throws InterruptedExcepti
 private static class Servers {
 MainThread mt[];
 ZooKeeper zk[];
+int[] clientPorts;
+
+public void shutDownAllServers() throws InterruptedException {
+for (MainThread t: mt) {
+t.shutdown();
+}
+}
+
+public void restartAllServersAndClients(Watcher watcher) throws 
IOException {
+for (MainThread t : mt) {
+if (!t.isAlive()) {
+t.start();
+}
+}
+for (int i = 0; i < zk.length; i++) {
+restartClient(i, watcher);
+}
+}
+
+public void restartClient(int i, Watcher watcher) throws 
IOException {
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, watcher);
+}
+
+public int findLeader() {
--- End diff --

there are other places in this test class that benefit from this 
refactoring. Would you mind cleaning that up?


---


[GitHub] zookeeper pull request #458: ZOOKEEPER-2967: Add check to validate dataDir a...

2018-02-16 Thread mfenes
Github user mfenes commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/458#discussion_r16396
  
--- Diff: src/java/test/org/apache/zookeeper/test/ClientBase.java ---
@@ -368,22 +368,37 @@ static void verifyThreadTerminated(Thread thread, 
long millis)
 }
 }
 
+public static File createEmptyTestDir() throws IOException {
+return createTmpDir(BASETEST, false);
+}
 
 public static File createTmpDir() throws IOException {
-return createTmpDir(BASETEST);
+return createTmpDir(BASETEST, true);
 }
 
-static File createTmpDir(File parentDir) throws IOException {
+static File createTmpDir(File parentDir, boolean createInitFile) 
throws IOException {
 File tmpFile = File.createTempFile("test", ".junit", parentDir);
 // don't delete tmpFile - this ensures we don't attempt to create
 // a tmpDir with a duplicate name
 File tmpDir = new File(tmpFile + ".dir");
 Assert.assertFalse(tmpDir.exists()); // never true if tmpfile does 
it's job
 Assert.assertTrue(tmpDir.mkdirs());
 
+// todo not every tmp directory needs this file
--- End diff --

Unfortunately this todo comment is in the master branch from where I 
backported these changes in ClientBase. 


---


[jira] [Commented] (ZOOKEEPER-2955) Enable Clover code coverage report

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367949#comment-16367949
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2955:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/443#discussion_r168883423
  
--- Diff: build.xml ---
@@ -1861,4 +1876,18 @@ 
xmlns:cs="antlib:com.puppycrawl.tools.checkstyle.ant">

  
 
+
--- End diff --

As we discussed offline please move these useful targets to a new JIRA.


> Enable Clover code coverage report
> --
>
> Key: ZOOKEEPER-2955
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2955
> Project: ZooKeeper
>  Issue Type: Test
>  Components: tests
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Mark Fenes
>Assignee: Mark Fenes
>Priority: Major
>
> We have limited code coverage support in ZK. Clover for Java was running in 
> the past but was turned off. 
> Enable Clover code coverage report to make us more confident on the quality, 
> stability and compatibility of future ZK releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #443: ZOOKEEPER-2955: Enable Clover code coverage rep...

2018-02-16 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/443#discussion_r168883423
  
--- Diff: build.xml ---
@@ -1861,4 +1876,18 @@ 
xmlns:cs="antlib:com.puppycrawl.tools.checkstyle.ant">

  
 
+
--- End diff --

As we discussed offline please move these useful targets to a new JIRA.


---


Success: ZOOKEEPER- PreCommit Build #1496

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1496/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 72.50 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1496//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1496//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1496//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 18 minutes 52 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2845
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-16 Thread Abraham Fine
I'm a -1 on requiring different minimum versions of java for the client and the 
server.  I think this has the potential to create a lot of confusion for users 
and contributors. 

I would support moving master (3.6) to java 8, I also think it is worth 
considering moving to java 9. Given how long our release cycle tends to be I 
think targeting the latest and greatest this early in the development cycle is 
reasonable.

Thanks,
Abe

On Fri, Feb 16, 2018, at 06:48, Enrico Olivelli wrote:
> 2018-02-16 14:20 GMT+01:00 Andor Molnar :
> 
> > +1 for setting the Java8 requirement on server side.
> >
> > *Client side.*
> > I'd like the idea of the setting the requirement on client side too without
> > introducing anything Java8 specific. I'm not planning to use Java8 features
> > right on, just thinking of opening the gates would be useful in the long
> > run.
> >
> > Additionally, I don't see heavy development on the client side. Users who
> > are tightly coupled to Java7 are still able to use existing clients as long
> > as we introduce something breaking which they're forced to upgrade to for
> > whatever reason. I'm not sure what are the odds of that to happen.
> >
> 
> 
> My two cents
> Actually ZooKeeper is distributed as a single JAR which contains both
> server and client side code, requiring Java 7 for the client and Java 8 for
> the server will require a new way of packaging the artifacts and building
> the project (and this will require separating client side and server side
> code base).
> Maybe I am missing something.
> 
> 
> Enrico
> 
> 
> >
> > Andor
> >
> >
> >
> > On Fri, Feb 16, 2018 at 12:31 PM, Flavio Junqueira  wrote:
> >
> > > We have this section in the admin doc that talks about the system
> > > requirements:
> > >
> > > https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperAdmin.html#sc_
> > > requiredSoftware  > > zookeeperAdmin.html#sc_requiredSoftware>
> > >
> > > If we change, then we have to update that section. Specifically about
> > > client and server, I'd think that there is no problem with requiring
> > Java 8
> > > on the server. The potential concern is with the client as it affects
> > > applications that build against it. It would be best to not force
> > > applications to upgrade themselves. Looking at the compatibility guide
> > for
> > > Java 8:
> > >
> > > http://www.oracle.com/technetwork/java/javase/8-
> > > compatibility-guide-2156366.html  > > technetwork/java/javase/8-compatibility-guide-2156366.html>
> > >
> > > The risk is that an application is strictly using Java 7 because of some
> > > incompatibility listed in that guide, in which case, it wouldn't be able
> > to
> > > compile the ZK client assuming we get it to use some Java 8 construct.
> > One
> > > option is that we raise the requirement to Java 8, but we do no really
> > > introduce anything that breaks compatibility for the next version. Users
> > > should take this as a warning that they need to migrate to Java 8. I'm
> > not
> > > sure this makes the situation any better, though. Another option is that
> > we
> > > set a release to be the one in which we migrate and let everyone know
> > that
> > > they need to migrate.
> > >
> > > -Flavio
> > >
> > >
> > > > On 16 Feb 2018, at 12:05, Andor Molnar  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I think it would be nice to draw a line at branch-3.5 and target Java
> > > > version 8 onwards. It seems to be a good opportunity for the upgrade
> > > before
> > > > we release a stable version of 3.5.
> > > >
> > > > The benefit would be the ability to use new features of Java 8 in the
> > > code:
> > > >
> > > > Do think it's feasible?
> > > >
> > > > Regards,
> > > > Andor
> > >
> > >
> >


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367823#comment-16367823
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@afine and @anmolnar I think I have addressed all of your review comments, 
except for the one about the change to `waitForOne` and I am happy to adjust 
however you want there.


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #453: ZOOKEEPER-2845: Apply commit log when restarting serve...

2018-02-16 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/zookeeper/pull/453
  
@afine and @anmolnar I think I have addressed all of your review comments, 
except for the one about the change to `waitForOne` and I am happy to adjust 
however you want there.


---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367814#comment-16367814
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168857757
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

@revans2 take a look at `testElectionFraud`, specifically: 

[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168857757
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

@revans2 take a look at `testElectionFraud`, specifically: 
https://github.com/apache/zookeeper/blob/master/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java#L383
 and 
https://github.com/apache/zookeeper/blob/master/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java#L397

You can manually start and stop leader election, I think that may solve 
this 

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367809#comment-16367809
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168857052
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

Since @anmolnar thinks it is valuable, I think it is fine for it to be left 
in. 


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168857052
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

Since @anmolnar thinks it is valuable, I think it is fine for it to be left 
in. 


---


Failed: ZOOKEEPER- PreCommit Build #1495

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1495/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.34 MB...]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1495//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1495//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1495//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 13 minutes 12 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2845
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss

Error Message:
Took too long for old leader to time out

Stack Trace:
junit.framework.AssertionFailedError: Took too long for old leader to time out
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss(QuorumPeerMainTest.java:997)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367559#comment-16367559
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807853
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

I was able to do what you said and drop the 1 second sleep, but the sleep 
at step 6 I am going to need something else because the leader is only in the 

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367562#comment-16367562
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807943
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
+p.qvAcksetPairs.get(0).getAckset().contains(leader);
+
+// 6. wait the leader to quit due to no enough followers
+Thread.sleep(4000);
+

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367563#comment-16367563
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807976
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
--- End diff --

done


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367561#comment-16367561
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807914
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
--- End diff --

removed the cast


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
> 

[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807943
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
+p.qvAcksetPairs.get(0).getAckset().contains(leader);
+
+// 6. wait the leader to quit due to no enough followers
+Thread.sleep(4000);
+//waitForOne(zk[leader], States.CONNECTING);
--- End diff --

done


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807976
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
--- End diff --

done


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807914
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
--- End diff --

removed the cast


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168807853
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

I was able to do what you said and drop the 1 second sleep, but the sleep 
at step 6 I am going to need something else because the leader is only in the 
looking state for 2 ms.  Leader election happens way too fast for us to be able 
to catch that by polling.  

If I remove the 4 second sleep it does not trigger the error case, I don't 
completely know why.  I'll spend some time looking at it, 

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367504#comment-16367504
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168795646
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

@anmolnar  and @afine I put this in for my own debugging and I forgot to 
remove it.  If you want me to I am happy to either remove it or file a separate 
JIRA and put it up as a separate pull request, or just leave it.  Either way is 
fine with me.


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367503#comment-16367503
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168795633
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

Use `LaunchServers(numServers, tickTime)` method in this class.
It gives you a collection of `MainThread` and `ZooKeeper` objects properly 
initialized.
Test `tearDown()` will care about destroying it. 


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168795646
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

@anmolnar  and @afine I put this in for my own debugging and I forgot to 
remove it.  If you want me to I am happy to either remove it or file a separate 
JIRA and put it up as a separate pull request, or just leave it.  Either way is 
fine with me.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168795633
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

Use `LaunchServers(numServers, tickTime)` method in this class.
It gives you a collection of `MainThread` and `ZooKeeper` objects properly 
initialized.
Test `tearDown()` will care about destroying it. 


---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367497#comment-16367497
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168794042
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

I will see if I can make it work.  I agree I would love to kill as many of 
the sleeps as possible.


> Data inconsistency issue due to retain database in 

[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168794042
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+LOG.warn("LEADER {}", leader);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {
+}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
--- End diff --

I will see if I can make it work.  I agree I would love to kill as many of 
the sleeps as possible.


---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367495#comment-16367495
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793764
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

I am not super familiar with the test infrastructure.  If you have a 
suggestion I would love it, otherwise I will look around and see what I can 
come up with.


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367492#comment-16367492
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793569
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

+1
As mentioned testElectionFraud() is a good example for that.


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793764
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

I am not super familiar with the test infrastructure.  If you have a 
suggestion I would love it, otherwise I will look around and see what I can 
come up with.


---


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793569
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +888,127 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for (int i = 0; i < SERVER_COUNT; i++) {
+clientPorts[i] = PortAssignment.unique();
+sb.append("server." + i + "=127.0.0.1:" + 
PortAssignment.unique() + ":" + PortAssignment.unique() + ";" + clientPorts[i] 
+ "\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
--- End diff --

+1
As mentioned testElectionFraud() is a good example for that.


---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367491#comment-16367491
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793211
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

Although I agree with you in general, I think this one here is a good 
addition to test output.


> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #453: ZOOKEEPER-2845: Apply commit log when restartin...

2018-02-16 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/453#discussion_r168793211
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -435,7 +435,7 @@ private void waitForOne(ZooKeeper zk, States state) 
throws InterruptedException
 int iterations = ClientBase.CONNECTION_TIMEOUT / 500;
 while (zk.getState() != state) {
 if (iterations-- == 0) {
-throw new RuntimeException("Waiting too long");
+throw new RuntimeException("Waiting too long " + 
zk.getState() + " != " + state);
--- End diff --

Although I agree with you in general, I think this one here is a good 
addition to test output.


---


Failed: ZOOKEEPER- PreCommit Build #1494

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1494/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 6.56 KB...]
 [exec] 
 [exec] Pull request id: 451
 [exec] Pull request title: ZOOKEEPER-2184: Zookeeper Client should 
re-resolve hosts when connection attempts fail 
Dload  Upload   Total   SpentLeft  Speed
 [exec] 
 [exec] 
 [exec] Defect number: ZOOKEEPER-2184
 [exec] - Parsed args, going to checkout -
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Testing patch for pull request 451.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec]   0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0100   1410   1410 0832  0 --:--:-- --:--:-- 
--:--:--   834
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec]  Pre-build trunk to verify trunk stability and javac warnings
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] /home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant  
-Djavac.args=-Xlint -Xmaxwarns 1000 
-Djava5.home=/home/jenkins/tools/java5/latest 
-Dforrest.home=/home/jenkins/tools/forrest/latest -DZookeeperPatchProcess= 
clean tar > 
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess/trunkJavacWarnings.txt
 2>&1
 [exec] Trunk compilation is broken?
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec]   0 00 197410 0  39846  0 --:--:-- --:--:-- 
--:--:-- 39846mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1778:
 exec returned: 1

Total time: 22 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2184
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-16 Thread Enrico Olivelli
2018-02-16 14:20 GMT+01:00 Andor Molnar :

> +1 for setting the Java8 requirement on server side.
>
> *Client side.*
> I'd like the idea of the setting the requirement on client side too without
> introducing anything Java8 specific. I'm not planning to use Java8 features
> right on, just thinking of opening the gates would be useful in the long
> run.
>
> Additionally, I don't see heavy development on the client side. Users who
> are tightly coupled to Java7 are still able to use existing clients as long
> as we introduce something breaking which they're forced to upgrade to for
> whatever reason. I'm not sure what are the odds of that to happen.
>


My two cents
Actually ZooKeeper is distributed as a single JAR which contains both
server and client side code, requiring Java 7 for the client and Java 8 for
the server will require a new way of packaging the artifacts and building
the project (and this will require separating client side and server side
code base).
Maybe I am missing something.


Enrico


>
> Andor
>
>
>
> On Fri, Feb 16, 2018 at 12:31 PM, Flavio Junqueira  wrote:
>
> > We have this section in the admin doc that talks about the system
> > requirements:
> >
> > https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperAdmin.html#sc_
> > requiredSoftware  > zookeeperAdmin.html#sc_requiredSoftware>
> >
> > If we change, then we have to update that section. Specifically about
> > client and server, I'd think that there is no problem with requiring
> Java 8
> > on the server. The potential concern is with the client as it affects
> > applications that build against it. It would be best to not force
> > applications to upgrade themselves. Looking at the compatibility guide
> for
> > Java 8:
> >
> > http://www.oracle.com/technetwork/java/javase/8-
> > compatibility-guide-2156366.html  > technetwork/java/javase/8-compatibility-guide-2156366.html>
> >
> > The risk is that an application is strictly using Java 7 because of some
> > incompatibility listed in that guide, in which case, it wouldn't be able
> to
> > compile the ZK client assuming we get it to use some Java 8 construct.
> One
> > option is that we raise the requirement to Java 8, but we do no really
> > introduce anything that breaks compatibility for the next version. Users
> > should take this as a warning that they need to migrate to Java 8. I'm
> not
> > sure this makes the situation any better, though. Another option is that
> we
> > set a release to be the one in which we migrate and let everyone know
> that
> > they need to migrate.
> >
> > -Flavio
> >
> >
> > > On 16 Feb 2018, at 12:05, Andor Molnar  wrote:
> > >
> > > Hi all,
> > >
> > > I think it would be nice to draw a line at branch-3.5 and target Java
> > > version 8 onwards. It seems to be a good opportunity for the upgrade
> > before
> > > we release a stable version of 3.5.
> > >
> > > The benefit would be the ability to use new features of Java 8 in the
> > code:
> > >
> > > Do think it's feasible?
> > >
> > > Regards,
> > > Andor
> >
> >
>


Success: ZOOKEEPER- PreCommit Build #1493

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1493/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 39.79 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1493//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1493//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1493//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 35 minutes 4 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2184
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: ZOOKEEPER- PreCommit Build #1492

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1492/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 79.04 MB...]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1492//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1492//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1492//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 21 minutes 31 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2967
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.StandaloneDisabledTest.startSingleServerTest

Error Message:
Timeout occurred. Please note the time in the report does not reflect the time 
until the timeout.

Stack Trace:
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in 
the report does not reflect the time until the timeout.
at java.lang.Thread.run(Thread.java:745)

Success: ZOOKEEPER- PreCommit Build #1490

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1490/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 39.32 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1490//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1490//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1490//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 36 minutes 51 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2967
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367332#comment-16367332
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r168764368
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java ---
@@ -119,6 +120,68 @@ public void testTwoInvalidHostAddresses() {
 new StaticHostProvider(list);
 }
 
+@Test
+public void testReResolvingSingle() {
+byte size = 1;
+ArrayList list = new 
ArrayList(size);
+
+// Test a hostname that resolves to a single address
+list.clear();
--- End diff --

Removed.


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-02-16 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r168764368
  
--- Diff: 
src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java ---
@@ -119,6 +120,68 @@ public void testTwoInvalidHostAddresses() {
 new StaticHostProvider(list);
 }
 
+@Test
+public void testReResolvingSingle() {
+byte size = 1;
+ArrayList list = new 
ArrayList(size);
+
+// Test a hostname that resolves to a single address
+list.clear();
--- End diff --

Removed.


---


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367329#comment-16367329
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r168763559
  
--- Diff: src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java ---
@@ -239,13 +243,13 @@ public void testSessionEstablishment() throws 
Exception {
 public void testSeekForRwServer() throws Exception {
 
 // setup the logger to capture all logs
-Layout layout = Logger.getRootLogger().getAppender("CONSOLE")
+Layout layout = 
org.apache.log4j.Logger.getRootLogger().getAppender("CONSOLE")
--- End diff --

Makes sense. I revert the changes.


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-02-16 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r168763559
  
--- Diff: src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java ---
@@ -239,13 +243,13 @@ public void testSessionEstablishment() throws 
Exception {
 public void testSeekForRwServer() throws Exception {
 
 // setup the logger to capture all logs
-Layout layout = Logger.getRootLogger().getAppender("CONSOLE")
+Layout layout = 
org.apache.log4j.Logger.getRootLogger().getAppender("CONSOLE")
--- End diff --

Makes sense. I revert the changes.


---


Success: ZOOKEEPER- PreCommit Build #1491

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1491/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.59 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1491//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1491//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1491//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 18 minutes 2 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2940
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2967) Add check to validate dataDir and dataLogDir parameters at startup

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367304#comment-16367304
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2967:
---

Github user mfenes commented on the issue:

https://github.com/apache/zookeeper/pull/459
  
@afine I've removed the whitespace changes. Please have a look if it looks 
right now.


> Add check to validate dataDir and dataLogDir parameters at startup
> --
>
> Key: ZOOKEEPER-2967
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2967
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.11
>Reporter: Andor Molnar
>Assignee: Mark Fenes
>Priority: Major
>  Labels: startup, validation
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
>
> According to  -ZOOKEEPER-2960- we should at a startup check to validate that 
> dataDir and dataLogDir parameters are set correctly.
> Perhaps we should introduce a check of some kind? If datalogdir is different 
> that datadir and snapshots exist in datalogdir we throw an exception and quit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #459: ZOOKEEPER-2967: Add check to validate dataDir and data...

2018-02-16 Thread mfenes
Github user mfenes commented on the issue:

https://github.com/apache/zookeeper/pull/459
  
@afine I've removed the whitespace changes. Please have a look if it looks 
right now.


---


Success: ZOOKEEPER- PreCommit Build #1489

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1489/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 40.29 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1489//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1489//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1489//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 35 minutes 34 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2955
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-16 Thread Andor Molnar
+1 for setting the Java8 requirement on server side.

*Client side.*
I'd like the idea of the setting the requirement on client side too without
introducing anything Java8 specific. I'm not planning to use Java8 features
right on, just thinking of opening the gates would be useful in the long
run.

Additionally, I don't see heavy development on the client side. Users who
are tightly coupled to Java7 are still able to use existing clients as long
as we introduce something breaking which they're forced to upgrade to for
whatever reason. I'm not sure what are the odds of that to happen.

Andor



On Fri, Feb 16, 2018 at 12:31 PM, Flavio Junqueira  wrote:

> We have this section in the admin doc that talks about the system
> requirements:
>
> https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperAdmin.html#sc_
> requiredSoftware  zookeeperAdmin.html#sc_requiredSoftware>
>
> If we change, then we have to update that section. Specifically about
> client and server, I'd think that there is no problem with requiring Java 8
> on the server. The potential concern is with the client as it affects
> applications that build against it. It would be best to not force
> applications to upgrade themselves. Looking at the compatibility guide for
> Java 8:
>
> http://www.oracle.com/technetwork/java/javase/8-
> compatibility-guide-2156366.html  technetwork/java/javase/8-compatibility-guide-2156366.html>
>
> The risk is that an application is strictly using Java 7 because of some
> incompatibility listed in that guide, in which case, it wouldn't be able to
> compile the ZK client assuming we get it to use some Java 8 construct. One
> option is that we raise the requirement to Java 8, but we do no really
> introduce anything that breaks compatibility for the next version. Users
> should take this as a warning that they need to migrate to Java 8. I'm not
> sure this makes the situation any better, though. Another option is that we
> set a release to be the one in which we migrate and let everyone know that
> they need to migrate.
>
> -Flavio
>
>
> > On 16 Feb 2018, at 12:05, Andor Molnar  wrote:
> >
> > Hi all,
> >
> > I think it would be nice to draw a line at branch-3.5 and target Java
> > version 8 onwards. It seems to be a good opportunity for the upgrade
> before
> > we release a stable version of 3.5.
> >
> > The benefit would be the ability to use new features of Java 8 in the
> code:
> >
> > Do think it's feasible?
> >
> > Regards,
> > Andor
>
>


Failed: ZOOKEEPER- PreCommit Build #1488

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1488/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.02 MB...]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1488//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1488//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1488//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 18 minutes 7 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2940
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

Re: [SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-16 Thread Flavio Junqueira
We have this section in the admin doc that talks about the system requirements:

https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperAdmin.html#sc_requiredSoftware
 


If we change, then we have to update that section. Specifically about client 
and server, I'd think that there is no problem with requiring Java 8 on the 
server. The potential concern is with the client as it affects applications 
that build against it. It would be best to not force applications to upgrade 
themselves. Looking at the compatibility guide for Java 8:

http://www.oracle.com/technetwork/java/javase/8-compatibility-guide-2156366.html
 


The risk is that an application is strictly using Java 7 because of some 
incompatibility listed in that guide, in which case, it wouldn't be able to 
compile the ZK client assuming we get it to use some Java 8 construct. One 
option is that we raise the requirement to Java 8, but we do no really 
introduce anything that breaks compatibility for the next version. Users should 
take this as a warning that they need to migrate to Java 8. I'm not sure this 
makes the situation any better, though. Another option is that we set a release 
to be the one in which we migrate and let everyone know that they need to 
migrate.

-Flavio


> On 16 Feb 2018, at 12:05, Andor Molnar  wrote:
> 
> Hi all,
> 
> I think it would be nice to draw a line at branch-3.5 and target Java
> version 8 onwards. It seems to be a good opportunity for the upgrade before
> we release a stable version of 3.5.
> 
> The benefit would be the ability to use new features of Java 8 in the code:
> 
> Do think it's feasible?
> 
> Regards,
> Andor



Success: ZOOKEEPER- PreCommit Build #1487

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1487/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.28 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1487//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1487//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1487//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 57 seconds
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2955
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[SUGGESTION] Target branches 3.5 and master (3.6) to Java 8

2018-02-16 Thread Andor Molnar
Hi all,

I think it would be nice to draw a line at branch-3.5 and target Java
version 8 onwards. It seems to be a good opportunity for the upgrade before
we release a stable version of 3.5.

The benefit would be the ability to use new features of Java 8 in the code:

Do think it's feasible?

Regards,
Andor


Failed: ZOOKEEPER- PreCommit Build #1486

2018-02-16 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1486/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 111.87 MB...]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1486//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1486//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1486//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 13 minutes 11 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2940
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
4 tests failed.
FAILED:  org.apache.zookeeper.server.admin.CommandsTest.testMonitor

Error Message:
Result from command monitor contains extra fields: 
{last_client_response_size=-1, max_client_response_size=-1, 
min_client_response_size=-1}

Stack Trace:
junit.framework.AssertionFailedError: Result from command monitor contains 
extra fields: {last_client_response_size=-1, max_client_response_size=-1, 
min_client_response_size=-1}
at 
org.apache.zookeeper.server.admin.CommandsTest.testCommand(CommandsTest.java:75)
at 
org.apache.zookeeper.server.admin.CommandsTest.testCommand(CommandsTest.java:81)
at 
org.apache.zookeeper.server.admin.CommandsTest.testMonitor(CommandsTest.java:161)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)


FAILED:  org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput(FourLetterWordsTest.java:170)
at 

[jira] [Commented] (ZOOKEEPER-2940) Deal with maxbuffer as it relates to large requests from clients

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366832#comment-16366832
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2940:
---

GitHub user anmolnar opened a pull request:

https://github.com/apache/zookeeper/pull/466

ZOOKEEPER-2940. Deal with maxbuffer as it relates to large requests from 
clients

This PR covers the other part of Jute buffer monitoring which relates to 
client response sizes.

`jute.maxbuffer` is often set too small or the default (4MB) value is not 
enough to receive large responses from the server which causes the client 
unable to operate. e.g. `getChildren()` on a node which has lots of children or 
execution of a large multi.

These new metrics lets operators to monitor the size of generated responses 
to get an idea on how to properly set client's `jute.maxbuffer`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-2940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #466


commit 003ea900ea06e0aa763f7d8c71b6cd3859ac18b1
Author: Andor Molnar 
Date:   2018-02-13T17:04:12Z

ZOOKEEPER-2940. Track size of client responses




> Deal with maxbuffer as it relates to large requests from clients
> 
>
> Key: ZOOKEEPER-2940
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2940
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: jute, server
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.5.4, 3.6.0
>
>
> Monitor real-time Jute buffer usage as it relates to large requests from 
> clients.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #466: ZOOKEEPER-2940. Deal with maxbuffer as it relat...

2018-02-16 Thread anmolnar
GitHub user anmolnar opened a pull request:

https://github.com/apache/zookeeper/pull/466

ZOOKEEPER-2940. Deal with maxbuffer as it relates to large requests from 
clients

This PR covers the other part of Jute buffer monitoring which relates to 
client response sizes.

`jute.maxbuffer` is often set too small or the default (4MB) value is not 
enough to receive large responses from the server which causes the client 
unable to operate. e.g. `getChildren()` on a node which has lots of children or 
execution of a large multi.

These new metrics lets operators to monitor the size of generated responses 
to get an idea on how to properly set client's `jute.maxbuffer`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-2940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #466


commit 003ea900ea06e0aa763f7d8c71b6cd3859ac18b1
Author: Andor Molnar 
Date:   2018-02-13T17:04:12Z

ZOOKEEPER-2940. Track size of client responses




---


ZooKeeper_branch35_openjdk7 - Build # 848 - Failure

2018-02-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/848/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 60.77 KB...]
[junit] Running org.apache.zookeeper.test.SessionTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.424 sec, Thread: 5, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
5
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.099 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 5
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
71.781 sec, Thread: 6, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 6
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
78.144 sec, Thread: 4, Class: org.apache.zookeeper.test.QuorumTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.483 sec, Thread: 6, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.724 sec, Thread: 4, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 6
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.076 sec, Thread: 4, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.638 sec, Thread: 4, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 4
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.692 sec, Thread: 6, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 6
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.139 sec, Thread: 4, Class: org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.713 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.091 sec, Thread: 4, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 5
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 4
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.98 sec, Thread: 5, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 5
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.1 
sec, Thread: 5, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 5
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.325 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.703 sec, Thread: 8, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.541 sec, Thread: 6, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
14.658 sec, Thread: 5, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
27.978 sec, Thread: 4, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
352.403 sec, Thread: 7, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
356.981 sec, Thread: 3, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
266.616 sec, Thread: 1, Class: org.apache.zookeeper.test.ReconfigTest
[junit] Running org.apache.zookeeper.server.quorum.StandaloneDisabledTest 
in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 
sec, Thread: 2, Class: