ZooKeeper_branch35_jdk7 - Build # 1043 - Still Failing

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/1043/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 63.43 MB...]
[junit] 2017-07-14 08:51:36,148 [myid:] - WARN  [New I/O boss 
#5219:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0xb355bd36] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30196
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30196
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-07-14 08:51:36,149 [myid:] - INFO  [New I/O boss 
#5219:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-07-14 08:51:36,149 [myid:127.0.0.1:30196] - INFO  
[main-SendThread(127.0.0.1:30196):ClientCnxn$SendThread@1231] - channel for 
sessionid 0x104b9b9dbbd is lost, closing socket connection and attempting 
reconnect
[junit] 2017-07-14 08:51:36,488 [myid:127.0.0.1:30199] - INFO  
[main-SendThread(127.0.0.1:30199):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30199. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 08:51:36,489 [myid:] - INFO  [New I/O boss 
#5236:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {}
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30199
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-07-14 08:51:36,489 [myid:] - WARN  [New I/O boss 
#5236:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0x3660a4f0] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30199
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:30199
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 

[GitHub] zookeeper pull request #279: ZOOKEEPER-2804:Node creation fails with NPE if ...

2017-07-14 Thread jainbhupendra24
Github user jainbhupendra24 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/279#discussion_r127447190
  
--- Diff: src/java/main/org/apache/zookeeper/ZooKeeper.java ---
@@ -1436,7 +1436,7 @@ public String create(final String path, byte data[], 
List acl,
 request.setData(data);
 request.setFlags(createMode.toFlag());
 request.setPath(serverPath);
-if (acl != null && acl.size() == 0) {
+if (acl == null || acl.isEmpty() || acl.contains(null)) {
--- End diff --

Throwing IllegalArgumentException is the correct way but it may break the 
existing behavior. Consider the case where application code is like 

   zk.create(callBackHandler)` // call to create api and callback 
handler is passed
   doSomething();
   - - - -  // some more stuff

At present, if client pass null ACL parameter, the error will be handled by 
callBackHandler and in application code doSomething() method will execute. 

But if we throw IllegalArgumentException , then doSomething() method will 
not execute. So it will change the existing behavior of application code. 

Please suggest if we really need to handle this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2804) Node creation fails with NPE if ACLs are null

2017-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087254#comment-16087254
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2804:
---

Github user jainbhupendra24 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/279#discussion_r127447190
  
--- Diff: src/java/main/org/apache/zookeeper/ZooKeeper.java ---
@@ -1436,7 +1436,7 @@ public String create(final String path, byte data[], 
List acl,
 request.setData(data);
 request.setFlags(createMode.toFlag());
 request.setPath(serverPath);
-if (acl != null && acl.size() == 0) {
+if (acl == null || acl.isEmpty() || acl.contains(null)) {
--- End diff --

Throwing IllegalArgumentException is the correct way but it may break the 
existing behavior. Consider the case where application code is like 

   zk.create(callBackHandler)` // call to create api and callback 
handler is passed
   doSomething();
   - - - -  // some more stuff

At present, if client pass null ACL parameter, the error will be handled by 
callBackHandler and in application code doSomething() method will execute. 

But if we throw IllegalArgumentException , then doSomething() method will 
not execute. So it will change the existing behavior of application code. 

Please suggest if we really need to handle this case. 


> Node creation fails with NPE if ACLs are null
> -
>
> Key: ZOOKEEPER-2804
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2804
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Bhupendra Kumar Jain
>
> If null ACLs are passed then zk node creation or set ACL fails with NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.removeDuplicates(PrepRequestProcessor.java:1301)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.fixupACL(PrepRequestProcessor.java:1341)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest2Txn(PrepRequestProcessor.java:519)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:1126)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:178)
> {code}
> Expected to handle null in server and return proper error code to client



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper_branch34_openjdk7 - Build # 1575 - Still Failing

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1575/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on qnode1 (ubuntu) in workspace 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from git://git.apache.org/zookeeper.git
 > git --version # timeout=10
 > git fetch --tags --progress git://git.apache.org/zookeeper.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/branch-3.4^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/branch-3.4^{commit} # timeout=10
Checking out Revision 3872e49f6fa6835e0fca2229bec41d1b3cad4553 
(refs/remotes/origin/branch-3.4)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3872e49f6fa6835e0fca2229bec41d1b3cad4553
 > git rev-list 3872e49f6fa6835e0fca2229bec41d1b3cad4553 # timeout=10
No emails were triggered.
[ZooKeeper_branch34_openjdk7] $ 
/home/jenkins/tools/ant/apache-ant-1.9.9/bin/ant -Dtest.output=yes 
-Dtest.junit.threads=8 -Dtest.junit.output.format=xml -Djavac.target=1.7 clean 
test-core-java
Error: JAVA_HOME is not defined correctly.
  We cannot execute /usr/lib/jvm/java-7-openjdk-amd64//bin/java
Build step 'Invoke Ant' marked build as failure
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

ZooKeeper-trunk-jdk8 - Build # 1128 - Still Failing

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1128/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 65.86 MB...]
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 12:01:14,625 [myid:127.0.0.1:30073] - INFO  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30073. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:01:14,626 [myid:127.0.0.1:30073] - WARN  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1235] - Session 
0xbef47698 for server 127.0.0.1/127.0.0.1:30073, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 12:01:15,081 [myid:127.0.0.1:30076] - INFO  
[main-SendThread(127.0.0.1:30076):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30076. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:01:15,081 [myid:127.0.0.1:30076] - WARN  
[main-SendThread(127.0.0.1:30076):ClientCnxn$SendThread@1235] - Session 
0x100bef47953 for server 127.0.0.1/127.0.0.1:30076, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 12:01:15,222 [myid:127.0.0.1:30073] - INFO  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30073. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:01:15,222 [myid:127.0.0.1:30073] - WARN  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1235] - Session 
0xbef476980001 for server 127.0.0.1/127.0.0.1:30073, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 12:01:15,463 [myid:127.0.0.1:30076] - INFO  
[main-SendThread(127.0.0.1:30076):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30076. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:01:15,464 [myid:127.0.0.1:30076] - WARN  
[main-SendThread(127.0.0.1:30076):ClientCnxn$SendThread@1235] - Session 
0x100bef479530001 for server 127.0.0.1/127.0.0.1:30076, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 12:01:16,116 [myid:127.0.0.1:30073] - INFO  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30073. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:01:16,116 [myid:127.0.0.1:30073] - WARN  
[main-SendThread(127.0.0.1:30073):ClientCnxn$SendThread@1235] - Session 
0xbef47698 for server 127.0.0.1/127.0.0.1:30073, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: 

[GitHub] zookeeper pull request #310: [ZOOKEEPER-2845][Test] Test used to reproduce t...

2017-07-14 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/310#discussion_r127567852
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -784,4 +784,126 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for(int i = 0; i < SERVER_COUNT; i++) {
+   clientPorts[i] = PortAssignment.unique();
+   
sb.append("server."+i+"=127.0.0.1:"+PortAssignment.unique()+":"+PortAssignment.unique()+";"+clientPorts[i]+"\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+ }
+
+waitForAll(zk, States.CONNECTED);
+
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
+p.qvAcksetPairs.get(0).getAckset().contains(leader);
+
+// 6. wait the leader to quit due to no enough followers
+waitForOne(zk[leader], States.CONNECTING);
+
+int newLeader = -1;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+newLeader = i;
+}
+}
+// make sure a different leader was elected
+

[GitHub] zookeeper pull request #310: [ZOOKEEPER-2845][Test] Test used to reproduce t...

2017-07-14 Thread lvfangmin
GitHub user lvfangmin opened a pull request:

https://github.com/apache/zookeeper/pull/310

[ZOOKEEPER-2845][Test] Test used to reproduce the data inconsistency issue 
due to retain database in leader election



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-2845-TEST

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #310


commit ff0bc49de51635da1d5bff0e4f260a61acc87db0
Author: Fangmin Lyu 
Date:   2017-07-14T23:02:20Z

reproduce the data inconsistency issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088277#comment-16088277
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

GitHub user lvfangmin opened a pull request:

https://github.com/apache/zookeeper/pull/310

[ZOOKEEPER-2845][Test] Test used to reproduce the data inconsistency issue 
due to retain database in leader election



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-2845-TEST

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #310


commit ff0bc49de51635da1d5bff0e4f260a61acc87db0
Author: Fangmin Lyu 
Date:   2017-07-14T23:02:20Z

reproduce the data inconsistency issue




> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088296#comment-16088296
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2845:
---

Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/310#discussion_r127567852
  
--- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -784,4 +784,126 @@ public void testWithOnlyMinSessionTimeout() throws 
Exception {
 maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
 }
 
+@Test
+public void testTxnAheadSnapInRetainDB() throws Exception {
+// 1. start up server and wait for leader election to finish
+ClientBase.setupTestEnv();
+final int SERVER_COUNT = 3;
+final int clientPorts[] = new int[SERVER_COUNT];
+StringBuilder sb = new StringBuilder();
+for(int i = 0; i < SERVER_COUNT; i++) {
+   clientPorts[i] = PortAssignment.unique();
+   
sb.append("server."+i+"=127.0.0.1:"+PortAssignment.unique()+":"+PortAssignment.unique()+";"+clientPorts[i]+"\n");
+}
+String quorumCfgSection = sb.toString();
+
+MainThread mt[] = new MainThread[SERVER_COUNT];
+ZooKeeper zk[] = new ZooKeeper[SERVER_COUNT];
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i] = new MainThread(i, clientPorts[i], quorumCfgSection);
+mt[i].start();
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+}
+
+waitForAll(zk, States.CONNECTED);
+
+// we need to shutdown and start back up to make sure that the 
create session isn't the first transaction since
+// that is rather innocuous.
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].shutdown();
+}
+
+waitForAll(zk, States.CONNECTING);
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+mt[i].start();
+// Recreate a client session since the previous session was 
not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+ }
+
+waitForAll(zk, States.CONNECTED);
+
+
+// 2. kill all followers
+int leader = -1;
+Map outstanding = null;
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (mt[i].main.quorumPeer.leader != null) {
+leader = i;
+outstanding = 
mt[leader].main.quorumPeer.leader.outstandingProposals;
+// increase the tick time to delay the leader going to 
looking
+mt[leader].main.quorumPeer.tickTime = 1;
+}
+}
+
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].shutdown();
+}
+}
+
+// 3. start up the followers to form a new quorum
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+mt[i].start();
+}
+}
+
+// 4. wait one of the follower to be the leader
+for (int i = 0; i < SERVER_COUNT; i++) {
+if (i != leader) {
+// Recreate a client session since the previous session 
was not persisted.
+zk[i] = new ZooKeeper("127.0.0.1:" + clientPorts[i], 
ClientBase.CONNECTION_TIMEOUT, this);
+waitForOne(zk[i], States.CONNECTED);
+}
+}
+
+// 5. send a create request to leader and make sure it's synced to 
disk,
+//which means it acked from itself
+try {
+zk[leader].create("/zk" + leader, "zk".getBytes(), 
Ids.OPEN_ACL_UNSAFE,
+CreateMode.PERSISTENT);
+Assert.fail("create /zk" + leader + " should have failed");
+} catch (KeeperException e) {}
+
+// just make sure that we actually did get it in process at the
+// leader
+Assert.assertTrue(outstanding.size() == 1);
+Proposal p = (Proposal) outstanding.values().iterator().next();
+Assert.assertTrue(p.request.getHdr().getType() == OpCode.create);
+
+// make sure it has a chance to write it to disk
+Thread.sleep(1000);
+p.qvAcksetPairs.get(0).getAckset().contains(leader);
+
+// 6. wait the leader to quit due to no enough followers
+waitForOne(zk[leader], States.CONNECTING);
+
+int newLeader = -1;

[jira] [Updated] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Fangmin Lv (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv updated ZOOKEEPER-2845:
--
Affects Version/s: 3.6.0

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Fangmin Lv (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088306#comment-16088306
 ] 

Fangmin Lv commented on ZOOKEEPER-2845:
---

[~hanm] T1 only exists in txn file but hasn't been applied to the data tree 
yet, the lastProcessedZxid in follower C is T0, so no TRUNC message when sync 
with leader.

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: ZOOKEEPER- PreCommit Build #883

2017-07-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 71.51 MB...]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] dea2761bab8c6e73897e813361b13e245383c891 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1642:
 exec returned: 1

Total time: 12 minutes 44 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2845
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
3 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testTxnAheadSnapInRetainDB

Error Message:
server 2 should not have /zk2

Stack Trace:
junit.framework.AssertionFailedError: server 2 should not have /zk2
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testTxnAheadSnapInRetainDB(QuorumPeerMainTest.java:900)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)


FAILED:  
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig

Error Message:
waiting for server 4 being up

Stack Trace:
junit.framework.AssertionFailedError: waiting for server 4 being up
at 
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig(ReconfigRecoveryTest.java:223)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)


FAILED:  
org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest.testDuringLeaderSync

Error Message:
zoo.cfg.dynamic.next is not deleted.

Stack Trace:
junit.framework.AssertionFailedError: zoo.cfg.dynamic.next is not deleted.
at 
org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest.testDuringLeaderSync(ReconfigDuringLeaderSyncTest.java:165)

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088293#comment-16088293
 ] 

Hadoop QA commented on ZOOKEEPER-2845:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/883//console

This message is automatically generated.

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Fangmin Lv (JIRA)
Fangmin Lv created ZOOKEEPER-2845:
-

 Summary: Data inconsistency issue due to retain database in leader 
election
 Key: ZOOKEEPER-2845
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.5.3, 3.4.10
Reporter: Fangmin Lv
Assignee: Fangmin Lv
Priority: Critical


In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
during leader election. In ZooKeeper ensemble, it's possible that the snapshot 
is ahead of txn file (due to slow disk on the server, etc), or the txn file is 
ahead of snapshot due to no commit message being received yet. 

If snapshot is ahead of txn file, since the SyncRequestProcessor queue will be 
drained during shutdown, the snapshot and txn file will keep consistent before 
leader election happening, so this is not an issue.

But if txn is ahead of snapshot, it's possible that the ensemble will have data 
inconsistent issue, here is the simplified scenario to show the issue:

Let's say we have a 3 servers in the ensemble, server A and B are followers, 
and C is leader, and all the snapshot and txn are up to T0:
1. A new request reached to leader C to create Node N, and it's converted to 
txn T1 
2. Txn T1 was synced to disk in C, but just before the proposal reaching out to 
the followers, A and B restarted, so the T1 didn't exist in A and B
3. A and B formed a new quorum after restart, let's say B is the leader
4. C changed to looking state due to no enough followers, it will sync with 
leader B with last Zxid T0, which will have an empty diff sync
5. Before C take snapshot it restarted, it replayed the txns on disk which 
includes T1, now it will have Node N, but A and B doesn't have it.

Also I included the a test case to reproduce this issue consistently. 

We have a totally different RetainDB version which will avoid this issue by 
doing consensus between snapshot and txn files before leader election, will 
submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper-trunk - Build # 3466 - Failure

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/3466/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 64.15 MB...]
[junit] 2017-07-14 23:25:39,087 [myid:] - INFO  [ProcessThread(sid:0 
cport:24933)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop!
[junit] 2017-07-14 23:25:39,088 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-07-14 23:25:39,088 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-07-14 23:25:39,088 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port24933,name1=InMemoryDataTree]
[junit] 2017-07-14 23:25:39,089 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port24933]
[junit] 2017-07-14 23:25:39,089 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 24933
[junit] 2017-07-14 23:25:39,090 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2017-07-14 23:25:39,090 [myid:127.0.0.1:24810] - INFO  
[main-SendThread(127.0.0.1:24810):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:24810. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 23:25:39,090 [myid:127.0.0.1:24810] - WARN  
[main-SendThread(127.0.0.1:24810):ClientCnxn$SendThread@1235] - Session 
0x1056d1908e3 for server 127.0.0.1/127.0.0.1:24810, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 23:25:39,102 [myid:] - INFO  [main:ClientBase@626] - 
fdcount after test is: 7147 at start it was 7147
[junit] 2017-07-14 23:25:39,103 [myid:] - INFO  [main:ZKTestCase$1@68] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2017-07-14 23:25:39,103 [myid:] - INFO  [main:ZKTestCase$1@63] - 
FINISHED testWatcherAutoResetWithLocal
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
400.493 sec, Thread: 6, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] 2017-07-14 23:25:39,210 [myid:127.0.0.1:24816] - INFO  
[main-SendThread(127.0.0.1:24816):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:24816. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 23:25:39,210 [myid:127.0.0.1:24816] - WARN  
[main-SendThread(127.0.0.1:24816):ClientCnxn$SendThread@1235] - Session 
0x3056d1908e9 for server 127.0.0.1/127.0.0.1:24816, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 23:25:39,223 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop!
[junit] 2017-07-14 23:25:39,223 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop!
[junit] 2017-07-14 23:25:39,558 [myid:127.0.0.1:24736] - INFO  
[main-SendThread(127.0.0.1:24736):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:24736. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 23:25:39,558 [myid:127.0.0.1:24736] - WARN  
[main-SendThread(127.0.0.1:24736):ClientCnxn$SendThread@1235] - Session 
0x1056d1686840001 for server 127.0.0.1/127.0.0.1:24736, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1338: The 
following error occurred while executing this line:

[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088299#comment-16088299
 ] 

Michael Han commented on ZOOKEEPER-2845:


Thanks for reporting this issue [~lvfangmin].

bq. C changed to looking state due to no enough followers, it will sync with 
leader B with last Zxid T0, which will have an empty diff sync

Are you saying leader B is sending a DIFF to follower C in this case? Since B 
does not have T1, I think it should send a TRUNC and C should drop T1 in its 
txn log.

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088464#comment-16088464
 ] 

Michael Han commented on ZOOKEEPER-2845:


Make sense to me. I think previously we don't have this issue because the 
{{zkDb}} was cleared across leader election, and if we restarting C it will 
recover from both the snap and the tnx log so it will find out its 
{{lastProcessedZxid}} is T1, rather than T0, which will yield a TRUNC instead 
of DIFF from leader B. 

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ZOOKEEPER-2845) Data inconsistency issue due to retain database in leader election

2017-07-14 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088464#comment-16088464
 ] 

Michael Han edited comment on ZOOKEEPER-2845 at 7/15/17 5:21 AM:
-

Make sense to me. I think previously we don't have this issue because the 
{{zkDb}} was cleared across leader election, and C will find out its 
{{lastProcessedZxid}} is T1, rather than T0, while reinitialize {{zkDb}} from 
snap/tnx log which will yield a TRUNC instead of DIFF from leader B. 


was (Author: hanm):
Make sense to me. I think previously we don't have this issue because the 
{{zkDb}} was cleared across leader election, and if we restarting C it will 
recover from both the snap and the tnx log so it will find out its 
{{lastProcessedZxid}} is T1, rather than T0, which will yield a TRUNC instead 
of DIFF from leader B. 

> Data inconsistency issue due to retain database in leader election
> --
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time 
> during leader election. In ZooKeeper ensemble, it's possible that the 
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the 
> txn file is ahead of snapshot due to no commit message being received yet. 
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will 
> be drained during shutdown, the snapshot and txn file will keep consistent 
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have 
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers, 
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to 
> txn T1 
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out 
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with 
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which 
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently. 
> We have a totally different RetainDB version which will avoid this issue by 
> doing consensus between snapshot and txn files before leader election, will 
> submit for review.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper-trunk-WinVS2008 - Build # 2471 - Still Failing

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008/2471/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
[EnvInject] - Loading node environment variables.
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
Building remotely on windows-2012-2 (Windows) in workspace 
f:\jenkins\jenkins-slave\workspace\ZooKeeper-trunk-WinVS2008
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from git://git.apache.org/zookeeper.git
 > git --version # timeout=10
 > git fetch --tags --progress git://git.apache.org/zookeeper.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse "refs/remotes/origin/master^{commit}" # timeout=10
 > git rev-parse "refs/remotes/origin/origin/master^{commit}" # timeout=10
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
Checking out Revision ddf0364903bf7ac7cd25b2e1927f0d9d3c7203c4 
(refs/remotes/origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f ddf0364903bf7ac7cd25b2e1927f0d9d3c7203c4
 > git rev-list ddf0364903bf7ac7cd25b2e1927f0d9d3c7203c4 # timeout=10
No emails were triggered.
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
[ZooKeeper-trunk-WinVS2008] $ cmd.exe /C 
"F:\jenkins\tools\ant\latest\bin\ant.bat -Dtest.output=yes 
-Dtest.junit.output.format=xml clean compile_jute && exit %%ERRORLEVEL%%"
java.lang.UnsupportedClassVersionError: org/apache/tools/ant/launch/Launcher : 
Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Exception in thread "main" Build step 'Invoke Ant' marked build as failure
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found
No JDK named ‘JDK 1.8 (unlimited security) 64-bit Windows only’ found



###
## FAILED TESTS (if any) 
##
No tests ran.

ZooKeeper_branch35_jdk8 - Build # 600 - Still Failing

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/600/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 66.81 MB...]
[junit] 2017-07-14 12:27:13,045 [myid:] - INFO  
[SyncThread:0:MBeanRegistry@128] - Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port16854,name1=Connections,name2=127.0.0.1,name3=0x10686e04e06]
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x10686e04e06
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  [main:ZooKeeper@1331] - 
Session: 0x10686e04e06 closed
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 100505
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 471
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  [main:ClientBase@586] - 
tearDown starting
[junit] 2017-07-14 12:27:13,046 [myid:] - INFO  [main:ClientBase@556] - 
STOPPING server
[junit] 2017-07-14 12:27:13,047 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:16854
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  [main:ZooKeeperServer@541] 
- shutting down
[junit] 2017-07-14 12:27:13,048 [myid:] - ERROR [main:ZooKeeperServer@505] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  
[main:PrepRequestProcessor@1005] - Shutting down
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  [ProcessThread(sid:0 
cport:16854)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop!
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-07-14 12:27:13,048 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port16854,name1=InMemoryDataTree]
[junit] 2017-07-14 12:27:13,049 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port16854]
[junit] 2017-07-14 12:27:13,049 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 16854
[junit] 2017-07-14 12:27:13,049 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2017-07-14 12:27:13,053 [myid:] - INFO  [main:ClientBase@611] - 
fdcount after test is: 1395 at start it was 1391
[junit] 2017-07-14 12:27:13,053 [myid:] - INFO  [main:ClientBase@613] - 
sleeping for 20 secs
[junit] 2017-07-14 12:27:13,053 [myid:] - INFO  [main:ZKTestCase$1@68] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2017-07-14 12:27:13,053 [myid:] - INFO  [main:ZKTestCase$1@63] - 
FINISHED testWatcherAutoResetWithLocal
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
427.355 sec, Thread: 3, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] 2017-07-14 12:27:13,293 [myid:127.0.0.1:16608] - INFO  
[main-SendThread(127.0.0.1:16608):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:16608. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:27:13,295 [myid:127.0.0.1:16608] - WARN  
[main-SendThread(127.0.0.1:16608):ClientCnxn$SendThread@1235] - Session 
0x10686d9e18e for server 127.0.0.1/127.0.0.1:16608, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-14 12:27:13,298 [myid:127.0.0.1:16731] - INFO  
[main-SendThread(127.0.0.1:16731):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:16731. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 12:27:13,299 [myid:127.0.0.1:16731] - WARN  
[main-SendThread(127.0.0.1:16731):ClientCnxn$SendThread@1235] - Session 
0x10686dd157e for 

ZooKeeper_branch35_openjdk7 - Build # 600 - Still Failing

2017-07-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/600/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 64.07 MB...]
[junit] 2017-07-14 10:11:08,692 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30022ms for sessionid 0x0, closing 
socket connection and attempting reconnect
[junit] 2017-07-14 10:11:10,054 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30079. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 10:11:10,056 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:30079:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:59589
[junit] 2017-07-14 10:11:10,056 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:59589, server: 
127.0.0.1/127.0.0.1:30079
[junit] 2017-07-14 10:11:28,766 [myid:127.0.0.1:30079] - WARN  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30031ms for sessionid 0x2022f857561
[junit] 2017-07-14 10:11:28,767 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30031ms for sessionid 
0x2022f857561, closing socket connection and attempting reconnect
[junit] 2017-07-14 10:11:30,408 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30079. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 10:11:30,409 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:59594, server: 
127.0.0.1/127.0.0.1:30079
[junit] 2017-07-14 10:11:30,410 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:30079:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:59594
[junit] 2017-07-14 10:11:40,073 [myid:127.0.0.1:30079] - WARN  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30017ms for sessionid 0x0
[junit] 2017-07-14 10:11:40,073 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30017ms for sessionid 0x0, closing 
socket connection and attempting reconnect
[junit] 2017-07-14 10:11:41,596 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30079. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 10:11:41,597 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:59599, server: 
127.0.0.1/127.0.0.1:30079
[junit] 2017-07-14 10:11:41,598 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:30079:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:59599
[junit] 2017-07-14 10:12:00,439 [myid:127.0.0.1:30079] - WARN  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30030ms for sessionid 0x2022f857561
[junit] 2017-07-14 10:12:00,440 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30030ms for sessionid 
0x2022f857561, closing socket connection and attempting reconnect
[junit] 2017-07-14 10:12:01,675 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:30079. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-14 10:12:01,676 [myid:127.0.0.1:30079] - INFO  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:59605, server: 
127.0.0.1/127.0.0.1:30079
[junit] 2017-07-14 10:12:01,677 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:30079:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:59605
[junit] 2017-07-14 10:12:11,616 [myid:127.0.0.1:30079] - WARN  
[main-SendThread(127.0.0.1:30079):ClientCnxn$SendThread@1181] -