[jira] [Commented] (ZOOKEEPER-2488) Unsynchronized access to shuttingDownLE in QuorumPeer

2018-11-25 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698585#comment-16698585
 ] 

Michael K. Edwards commented on ZOOKEEPER-2488:
---

I pulled that fix out as a separate PR (#724).

> Unsynchronized access to shuttingDownLE in QuorumPeer
> -
>
> Key: ZOOKEEPER-2488
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2488
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.2
>Reporter: Michael Han
>Assignee: gaoshu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Access to shuttingDownLE in QuorumPeer is not synchronized here:
> https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1066
> https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/
> The access should be synchronized as the same variable might be accessed 
> in QuormPeer::restartLeaderElection, which is synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2488) Unsynchronized access to shuttingDownLE in QuorumPeer

2018-11-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-2488:
--
Labels: pull-request-available  (was: )

> Unsynchronized access to shuttingDownLE in QuorumPeer
> -
>
> Key: ZOOKEEPER-2488
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2488
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.2
>Reporter: Michael Han
>Assignee: gaoshu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>
> Access to shuttingDownLE in QuorumPeer is not synchronized here:
> https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1066
> https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/
> The access should be synchronized as the same variable might be accessed 
> in QuormPeer::restartLeaderElection, which is synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #724: ZOOKEEPER-2488: Synchronized access to shutting...

2018-11-25 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/724

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer

I think this can be reviewed separately from the (functionally somewhat 
related) changes in #707.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-2488

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #724


commit 33c1f0bad6e76a4cc67ab4ee44e08fa8f1ac5449
Author: Michael Edwards 
Date:   2018-11-21T23:09:55Z

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer




---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
retest this please


---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
Fixed that (and tested it locally before pushing this time).


---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
Except, er, that seems to make the tests fail.  😝  Investigating.


---


[jira] [Commented] (ZOOKEEPER-3201) Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart

2018-11-25 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698526#comment-16698526
 ] 

Michael K. Edwards commented on ZOOKEEPER-3201:
---

What seems to be happening here is that this "second chance" logic isn't 
sufficient to ensure that we don't hit {{ConnectionLossException}} again during 
the second attempt to test {{zk.exists()}}.

{noformat}
/**
 * Ensure the client is able to talk to the server.
 * 
 * @param idx the idx of the server the client is talking to
 */
private void checkClientConnected(int idx) throws Exception {
ZooKeeper zk = getClient(idx);
if (zk == null) {
return;
}
try {
Assert.assertNull(zk.exists("/foofoofoo-connected", false));
} catch (ConnectionLossException e) {
// second chance...
// in some cases, leader change in particular, the timing is
// very tricky to get right in order to assure that the client has
// disconnected and reconnected. In some cases the client will
// disconnect, then attempt to reconnect before the server is
// back, in which case we'll see another connloss on the operation
// in the try, this catches that case and waits for the server
// to come back
PeerStruct peer = qu.getPeer(idx);
Assert.assertTrue("Waiting for server down", 
ClientBase.waitForServerUp(
"127.0.0.1:" + peer.clientPort, 
ClientBase.CONNECTION_TIMEOUT));

Assert.assertNull(zk.exists("/foofoofoo-connected", false));
}
}
{noformat}

> Flaky test: 
> org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart
> --
>
> Key: ZOOKEEPER-3201
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3201
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Michael K. Edwards
>Priority: Major
>
> Encountered when running tests locally:
> {noformat}
> 64429     [junit] 2018-11-25 22:28:12,729 [myid:127.0.0.1:27389] - INFO  
> [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1108] - Opening 
> socket connection to server localhost/127.0.0      .1:27389. Will not attempt 
> to authenticate using SASL (unknown error)
> 64430     [junit] 2018-11-25 22:28:12,730 [myid:127.0.0.1:27389] - INFO  
> [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@955] - Socket 
> connection established, initiating session, cli      ent: /127.0.0.1:47668, 
> server: localhost/127.0.0.1:27389
> 64431     [junit] 2018-11-25 22:28:12,734 [myid:] - INFO  
> [NIOWorkerThread-1:Learner@117] - Revalidating client: 0x1a9cccf
> 64432     [junit] 2018-11-25 22:28:12,743 [myid:127.0.0.1:27389] - INFO  
> [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1390] - Session 
> establishment complete on server localhost/12      7.0.0.1:27389, sessionid = 
> 0x1a9cccf, negotiated timeout = 3
> 64433     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  
> [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1108] - Opening 
> socket connection to server localhost/127.0.0      .1:27392. Will not attempt 
> to authenticate using SASL (unknown error)
> 64434     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  
> [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@955] - Socket 
> connection established, initiating session, cli      ent: /127.0.0.1:52160, 
> server: localhost/127.0.0.1:27392
> 64435     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  
> [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening 
> socket connection to server localhost/127.0.0      .1:27395. Will not attempt 
> to authenticate using SASL (unknown error)
> 64436     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  
> [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket 
> connection established, initiating session, cli      ent: /127.0.0.1:47256, 
> server: localhost/127.0.0.1:27395
> 64437     [junit] 2018-11-25 22:28:13,017 [myid:] - INFO  
> [NIOWorkerThread-4:ZooKeeperServer@1030] - Refusing session request for 
> client /127.0.0.1:47256 as it has seen zxid 0x3 our       last zxid 
> is 0x2fffe client must try another server
> 64438     [junit] 2018-11-25 22:28:13,018 [myid:127.0.0.1:27395] - INFO  
> [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to 
> read additional data from server sessionid       0x3a9ccd2, likely 
> server has closed socket, closing socket connection and attempting reconnect
> 64439     [junit] 2018-11-25 22:28:13,023 [myid:127.0.0.1:27392] - INFO  
> [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1390] - Session 

[jira] [Updated] (ZOOKEEPER-3202) Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL

2018-11-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-3202:
--
Labels: pull-request-available  (was: )

> Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL
> ---
>
> Key: ZOOKEEPER-3202
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3202
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Michael K. Edwards
>Priority: Major
>  Labels: pull-request-available
>
> Encountered while running tests locally:
> {noformat}
> 283208     [junit] 2018-11-25 22:35:31,581 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):ZooKeeperServer@164]
>  - Created server with tick       Time 4000 minSessionTimeout 8000 
> maxSessionTimeout 8 datadir 
> /usr/src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2
>  snapdir /usr/src/zookeeper/build/te       
> st/tmp/test6909783885989201471.junit.dir/data/version-2
> 283209     [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):ZooKeeperServer@164]
>  - Created server with tick       Time 4000 minSessionTimeout 8000 
> maxSessionTimeout 8 datadir 
> /usr/src/zookeeper/build/test/tmp/test9169467659375976724.junit.dir/data/version-2
>  snapdir /usr/src/zookeeper/build/te       
> st/tmp/test9169467659375976724.junit.dir/data/version-2
> 283210     [junit] 2018-11-25 22:35:31,581 [myid:0] - INFO  
> [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):ZooKeeperServer@164]
>  - Created server with tick       Time 4000 minSessionTimeout 8000 
> maxSessionTimeout 8 datadir 
> /usr/src/zookeeper/build/test/tmp/test8933570428019756122.junit.dir/data/version-2
>  snapdir /usr/src/zookeeper/build/te       
> st/tmp/test8933570428019756122.junit.dir/data/version-2
> 283211     [junit] 2018-11-25 22:35:31,585 [myid:0] - INFO  
> [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):Follower@69]
>  - FOLLOWING - LEADER ELECTION TOOK        - 275 MS
> 283212     [junit] 2018-11-25 22:35:31,588 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):Leader@457]
>  - LEADING - LEADER ELECTION TOOK -       160 MS
> 283213     [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):Follower@69]
>  - FOLLOWING - LEADER ELECTION TOOK        - 155 MS
> 283214     [junit] 2018-11-25 22:35:31,633 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):FileTxnSnapLog@372]
>  - Snapshotting: 0x0 to /usr       
> /src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2/snapshot.0
> 283215     [junit] 2018-11-25 22:35:31,694 [myid:] - INFO  
> [main:FourLetterWordMain@87] - connecting to 127.0.0.1 11222
> 283216     [junit] 2018-11-25 22:35:31,695 [myid:0] - INFO  [New I/O worker 
> #11:NettyServerCnxn@288] - Processing stat command from /127.0.0.1:60484
> 283217     [junit] 2018-11-25 22:35:31,699 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
> testClientServerSSL
> 283218     [junit] java.lang.AssertionError: waiting for server 0 being up
> 283219     [junit]     at org.junit.Assert.fail(Assert.java:88)
> 283220     [junit]     at org.junit.Assert.assertTrue(Assert.java:41)
> 283221     [junit]     at 
> org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL(ClientSSLTest.java:98){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #723: ZOOKEEPER-3202: Add timing margin to improve re...

2018-11-25 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/723

ZOOKEEPER-3202: Add timing margin to improve reliability of 
testClientServerSSL()

Allowing just 5 seconds for 3 quorum peers to start and elect a leader is a 
bit tight, at least when running 4 test processes in parallel inside a (Linux) 
Docker container on a (non-Linux) laptop.  Add up to 10 seconds of extra margin.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3202

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #723


commit dcedaf9ad7c756b1852a9bfca4cfbc3313f1a0fc
Author: Michael Edwards 
Date:   2018-11-26T06:31:23Z

ZOOKEEPER-3202: Add timing margin to improve reliability of 
testClientServerSSL()




---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
I think so too!  Done 😄 


---


[jira] [Updated] (ZOOKEEPER-3203) Tracking and exposing the non voting followers in ZK

2018-11-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-3203:
--
Labels: pull-request-available  (was: )

> Tracking and exposing the non voting followers in ZK
> 
>
> Key: ZOOKEEPER-3203
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3203
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>
> The current synced_followers metric reports all the forwarding followers, 
> including non-voting ones.
> We found it's useful to track how many servers are following leader in 
> non-voting mode, so that we can identify issues like servers following but 
> not issuing reconfig. This JIRA is going to add a separate metric to report 
> the number of non-voting members.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #722: [ZOOKEEPER-3203] Tracking the number of non vot...

2018-11-25 Thread lvfangmin
GitHub user lvfangmin opened a pull request:

https://github.com/apache/zookeeper/pull/722

[ZOOKEEPER-3203] Tracking the number of non voting followers in ZK



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-3203

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #722


commit f093c2b6306d86efc5da9ad0834553060f99ae63
Author: Fangmin Lyu 
Date:   2018-11-26T00:41:25Z

[ZOOKEEPER-3203] Track the number of non voting followers in ZK




---


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-25 Thread tumativ
Github user tumativ commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236102347
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) {
 totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
 try {
 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
cleaning");
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.wait(100);
+synchronized(processingCompletedEvent) {
+processingCompletedEvent.wait(100);
 }
 } catch (InterruptedException e) {
-LOG.info("Got interrupted while waiting for dead watches " 
+
+LOG.info("Got interrupted while waiting for dead watchers 
" +
--- End diff --

Done


---


[jira] [Updated] (ZOOKEEPER-3203) Tracking and exposing the non voting followers in ZK

2018-11-25 Thread Fangmin Lv (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv updated ZOOKEEPER-3203:
--
Summary: Tracking and exposing the non voting followers in ZK  (was: Track 
the number of non voting followers in ZK)

> Tracking and exposing the non voting followers in ZK
> 
>
> Key: ZOOKEEPER-3203
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3203
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
> Fix For: 3.6.0
>
>
> The current synced_followers metric reports all the forwarding followers, 
> including non-voting ones.
> We found it's useful to track how many servers are following leader in 
> non-voting mode, so that we can identify issues like servers following but 
> not issuing reconfig. This JIRA is going to add a separate metric to report 
> the number of non-voting members.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3203) Track the number of non voting followers in ZK

2018-11-25 Thread Fangmin Lv (JIRA)
Fangmin Lv created ZOOKEEPER-3203:
-

 Summary: Track the number of non voting followers in ZK
 Key: ZOOKEEPER-3203
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3203
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Fangmin Lv
Assignee: Fangmin Lv
 Fix For: 3.6.0


The current synced_followers metric reports all the forwarding followers, 
including non-voting ones.

We found it's useful to track how many servers are following leader in 
non-voting mode, so that we can identify issues like servers following but not 
issuing reconfig. This JIRA is going to add a separate metric to report the 
number of non-voting members.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-25 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236098795
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -161,10 +163,10 @@ public void doWork() throws Exception {
 long startTime = Time.currentElapsedTime();
 listener.processDeadWatchers(snapshot);
 long latency = Time.currentElapsedTime() - 
startTime;
-LOG.info("Takes {} to process {} watches", 
latency, total);
+LOG.info("Takes {} to process {} watchers", 
latency, total);
--- End diff --

Watches seems to be more reasonable here, watcher maps to a single client 
session, and it can watch multiple paths, so have multiple watches on a single 
watcher, the total value is the total watches count not watcher.


---


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-25 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236098818
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) {
 totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
 try {
 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
cleaning");
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.wait(100);
+synchronized(processingCompletedEvent) {
+processingCompletedEvent.wait(100);
 }
 } catch (InterruptedException e) {
-LOG.info("Got interrupted while waiting for dead watches " 
+
+LOG.info("Got interrupted while waiting for dead watchers 
" +
--- End diff --

Ditto, watches are more accurate.


---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/721#discussion_r236097804
  
--- Diff: 
zookeeper-server/src/test/java/org/apache/zookeeper/test/DisconnectedWatcherTest.java
 ---
@@ -221,6 +228,7 @@ public void testManyChildWatchersAutoReset() throws 
Exception {
 watcher.waitForDisconnected(3);
 startServer();
 watcher.waitForConnected(3);
+watcher1.waitForConnected(3);
--- End diff --

That example is due to an unrelated class of failure, in which the 
Zookeeper server is permanently unreachable (notice that the entire test timed 
out); I think many failure cases of that kind are the product of failure to 
bind() the dynamically-assigned port to the server socket (probably due to 
collisions with other tests running concurrently on the same build host).  I've 
updated the description of this PR to explain what kind of failure it's 
intended to fix.


---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread lavacat
Github user lavacat commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/721#discussion_r236095252
  
--- Diff: 
zookeeper-server/src/test/java/org/apache/zookeeper/test/DisconnectedWatcherTest.java
 ---
@@ -221,6 +228,7 @@ public void testManyChildWatchersAutoReset() throws 
Exception {
 watcher.waitForDisconnected(3);
 startServer();
 watcher.waitForConnected(3);
+watcher1.waitForConnected(3);
--- End diff --

If this is fixing this test - that's great.
Trying to understand why. If zk1 isn't connected, should we get 
CONNECTIONLOSS on line 237 zk1.create?
Here is an example and I don't see it.

https://builds.apache.org/job/ZooKeeper_branch35_java10/263/testReport/junit/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset/

Good improvement anyway, I think should be merged.


---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/721

ZOOKEEPER-3046: wait for clients to reconnect after restarting server



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 62e6bca2460b261e7cad204566f7b88a41ab0972
Author: Michael Edwards 
Date:   2018-11-25T22:23:29Z

ZOOKEEPER-3046: wait for clients to reconnect after restarting server




---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/721


---


[jira] [Created] (ZOOKEEPER-3202) Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL

2018-11-25 Thread Michael K. Edwards (JIRA)
Michael K. Edwards created ZOOKEEPER-3202:
-

 Summary: Flaky test: 
org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL
 Key: ZOOKEEPER-3202
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3202
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Michael K. Edwards


Encountered while running tests locally:
{noformat}
283208     [junit] 2018-11-25 22:35:31,581 [myid:2] - INFO  
[QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):ZooKeeperServer@164]
 - Created server with tick       Time 4000 minSessionTimeout 8000 
maxSessionTimeout 8 datadir 
/usr/src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2
 snapdir /usr/src/zookeeper/build/te       
st/tmp/test6909783885989201471.junit.dir/data/version-2

283209     [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO  
[QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):ZooKeeperServer@164]
 - Created server with tick       Time 4000 minSessionTimeout 8000 
maxSessionTimeout 8 datadir 
/usr/src/zookeeper/build/test/tmp/test9169467659375976724.junit.dir/data/version-2
 snapdir /usr/src/zookeeper/build/te       
st/tmp/test9169467659375976724.junit.dir/data/version-2

283210     [junit] 2018-11-25 22:35:31,581 [myid:0] - INFO  
[QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):ZooKeeperServer@164]
 - Created server with tick       Time 4000 minSessionTimeout 8000 
maxSessionTimeout 8 datadir 
/usr/src/zookeeper/build/test/tmp/test8933570428019756122.junit.dir/data/version-2
 snapdir /usr/src/zookeeper/build/te       
st/tmp/test8933570428019756122.junit.dir/data/version-2

283211     [junit] 2018-11-25 22:35:31,585 [myid:0] - INFO  
[QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):Follower@69]
 - FOLLOWING - LEADER ELECTION TOOK        - 275 MS

283212     [junit] 2018-11-25 22:35:31,588 [myid:2] - INFO  
[QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):Leader@457]
 - LEADING - LEADER ELECTION TOOK -       160 MS

283213     [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO  
[QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):Follower@69]
 - FOLLOWING - LEADER ELECTION TOOK        - 155 MS

283214     [junit] 2018-11-25 22:35:31,633 [myid:2] - INFO  
[QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):FileTxnSnapLog@372]
 - Snapshotting: 0x0 to /usr       
/src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2/snapshot.0

283215     [junit] 2018-11-25 22:35:31,694 [myid:] - INFO  
[main:FourLetterWordMain@87] - connecting to 127.0.0.1 11222

283216     [junit] 2018-11-25 22:35:31,695 [myid:0] - INFO  [New I/O worker 
#11:NettyServerCnxn@288] - Processing stat command from /127.0.0.1:60484

283217     [junit] 2018-11-25 22:35:31,699 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
testClientServerSSL

283218     [junit] java.lang.AssertionError: waiting for server 0 being up

283219     [junit]     at org.junit.Assert.fail(Assert.java:88)

283220     [junit]     at org.junit.Assert.assertTrue(Assert.java:41)

283221     [junit]     at 
org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL(ClientSSLTest.java:98){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3201) Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart

2018-11-25 Thread Michael K. Edwards (JIRA)
Michael K. Edwards created ZOOKEEPER-3201:
-

 Summary: Flaky test: 
org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart
 Key: ZOOKEEPER-3201
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3201
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Michael K. Edwards


Encountered when running tests locally:
{noformat}
64429     [junit] 2018-11-25 22:28:12,729 [myid:127.0.0.1:27389] - INFO  
[main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1108] - Opening socket 
connection to server localhost/127.0.0      .1:27389. Will not attempt to 
authenticate using SASL (unknown error)

64430     [junit] 2018-11-25 22:28:12,730 [myid:127.0.0.1:27389] - INFO  
[main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@955] - Socket 
connection established, initiating session, cli      ent: /127.0.0.1:47668, 
server: localhost/127.0.0.1:27389

64431     [junit] 2018-11-25 22:28:12,734 [myid:] - INFO  
[NIOWorkerThread-1:Learner@117] - Revalidating client: 0x1a9cccf

64432     [junit] 2018-11-25 22:28:12,743 [myid:127.0.0.1:27389] - INFO  
[main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1390] - Session 
establishment complete on server localhost/12      7.0.0.1:27389, sessionid = 
0x1a9cccf, negotiated timeout = 3

64433     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  
[main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1108] - Opening socket 
connection to server localhost/127.0.0      .1:27392. Will not attempt to 
authenticate using SASL (unknown error)

64434     [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO  
[main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@955] - Socket 
connection established, initiating session, cli      ent: /127.0.0.1:52160, 
server: localhost/127.0.0.1:27392

64435     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  
[main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket 
connection to server localhost/127.0.0      .1:27395. Will not attempt to 
authenticate using SASL (unknown error)

64436     [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO  
[main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket 
connection established, initiating session, cli      ent: /127.0.0.1:47256, 
server: localhost/127.0.0.1:27395

64437     [junit] 2018-11-25 22:28:13,017 [myid:] - INFO  
[NIOWorkerThread-4:ZooKeeperServer@1030] - Refusing session request for client 
/127.0.0.1:47256 as it has seen zxid 0x3 our       last zxid is 
0x2fffe client must try another server

64438     [junit] 2018-11-25 22:28:13,018 [myid:127.0.0.1:27395] - INFO  
[main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read 
additional data from server sessionid       0x3a9ccd2, likely server 
has closed socket, closing socket connection and attempting reconnect

64439     [junit] 2018-11-25 22:28:13,023 [myid:127.0.0.1:27392] - INFO  
[main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1390] - Session 
establishment complete on server localhost/12      7.0.0.1:27392, sessionid = 
0x2a9d094, negotiated timeout = 3

64440     [junit] 2018-11-25 22:28:13,119 [myid:] - INFO  
[main:FourLetterWordMain@87] - connecting to 127.0.0.1 27395

64441     [junit] 2018-11-25 22:28:13,120 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@518] - Processing stat command from 
/127.0.0.1:47258

64442     [junit] 2018-11-25 22:28:13,121 [myid:] - INFO  
[NIOWorkerThread-1:StatCommand@53] - Stat command output

64443     [junit] 2018-11-25 22:28:14,134 [myid:127.0.0.1:27395] - INFO  
[main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket 
connection to server localhost/127.0.0      .1:27395. Will not attempt to 
authenticate using SASL (unknown error)

6     [junit] 2018-11-25 22:28:14,135 [myid:127.0.0.1:27395] - INFO  
[main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket 
connection established, initiating session, cli      ent: /127.0.0.1:47312, 
server: localhost/127.0.0.1:27395

64445     [junit] 2018-11-25 22:28:14,135 [myid:] - INFO  
[NIOWorkerThread-2:ZooKeeperServer@1030] - Refusing session request for client 
/127.0.0.1:47312 as it has seen zxid 0x3 our       last zxid is 
0x2fffe client must try another server

64446     [junit] 2018-11-25 22:28:14,137 [myid:127.0.0.1:27395] - INFO  
[main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read 
additional data from server sessionid       0x3a9ccd2, likely server 
has closed socket, closing socket connection and attempting reconnect

64447     [junit] 2018-11-25 22:28:14,240 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
testRolloverThenLeaderRestart

64448     [junit] org.apache.zookeeper.KeeperException$ConnectionLossException: 

[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/721

ZOOKEEPER-3046: wait for clients to reconnect after restarting server



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 62e6bca2460b261e7cad204566f7b88a41ab0972
Author: Michael Edwards 
Date:   2018-11-25T22:23:29Z

ZOOKEEPER-3046: wait for clients to reconnect after restarting server




---


[jira] [Comment Edited] (ZOOKEEPER-3046) testManyChildWatchersAutoReset is flaky

2018-11-25 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698341#comment-16698341
 ] 

Michael K. Edwards edited comment on ZOOKEEPER-3046 at 11/25/18 10:21 PM:
--

Still seeing test failures; basically a variant of ZOOKEEPER-2508.  (After 
stopping/starting the server, we have to wait for all clients to reconnect 
before continuing the test.)

{noformat}
422005 [junit] 2018-11-25 21:25:50,228 [myid:127.0.0.1:16611] - INFO  
[Time-limited test-SendThread(127.0.0.1:16611):ClientCnxn$SendThread@1390] - 
Session establishment complete on serve   r localhost/127.0.0.1:16611, 
sessionid = 0x17077c50001, negotiated timeout = 3
422006 [junit] 2018-11-25 21:25:50,286 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
testManyChildWatchersAutoReset
422007 [junit] 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/long-path-0-1-2-3-4-5-6   
-7-8-9/ch-00/ch
422008 [junit] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
422009 [junit] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
422010 [junit] at 
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1459)
422011 [junit] at 
org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:229)
{noformat}


was (Author: mkedwards):
Still seeing test failures; basically a variant of ZOOKEEPER-2508.  (After 
stopping/starting the server, we have to wait for all clients to reconnect 
before continuing the test.)

{{
422005 [junit] 2018-11-25 21:25:50,228 [myid:127.0.0.1:16611] - INFO  
[Time-limited test-SendThread(127.0.0.1:16611):ClientCnxn$SendThread@1390] - 
Session establishment complete on serve   r localhost/127.0.0.1:16611, 
sessionid = 0x17077c50001, negotiated timeout = 3
422006 [junit] 2018-11-25 21:25:50,286 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
testManyChildWatchersAutoReset
422007 [junit] 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/long-path-0-1-2-3-4-5-6   
-7-8-9/ch-00/ch
422008 [junit] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
422009 [junit] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
422010 [junit] at 
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1459)
422011 [junit] at 
org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:229)
}}

> testManyChildWatchersAutoReset is flaky
> ---
>
> Key: ZOOKEEPER-3046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3046
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: tests
>Affects Versions: 3.5.3, 3.4.12
>Reporter: Bogdan Kanivets
>Assignee: Bogdan Kanivets
>Priority: Minor
>  Labels: flaky, pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> According to the 
> [dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html]
>  testManyChildWatchersAutoReset is flaky in 3.4 and 3.5
> [ZooKeeper_branch34_java10|https://builds.apache.org/job/ZooKeeper_branch34_java10//13]
> [ZooKeeper_branch35_java9|https://builds.apache.org/job/ZooKeeper_branch35_java9/253]
> Test times out and because of that ant doesn't capture any output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3046) testManyChildWatchersAutoReset is flaky

2018-11-25 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698341#comment-16698341
 ] 

Michael K. Edwards commented on ZOOKEEPER-3046:
---

Still seeing test failures; basically a variant of ZOOKEEPER-2508.  (After 
stopping/starting the server, we have to wait for all clients to reconnect 
before continuing the test.)

{{
422005 [junit] 2018-11-25 21:25:50,228 [myid:127.0.0.1:16611] - INFO  
[Time-limited test-SendThread(127.0.0.1:16611):ClientCnxn$SendThread@1390] - 
Session establishment complete on serve   r localhost/127.0.0.1:16611, 
sessionid = 0x17077c50001, negotiated timeout = 3
422006 [junit] 2018-11-25 21:25:50,286 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
testManyChildWatchersAutoReset
422007 [junit] 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/long-path-0-1-2-3-4-5-6   
-7-8-9/ch-00/ch
422008 [junit] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
422009 [junit] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
422010 [junit] at 
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1459)
422011 [junit] at 
org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:229)
}}

> testManyChildWatchersAutoReset is flaky
> ---
>
> Key: ZOOKEEPER-3046
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3046
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: tests
>Affects Versions: 3.5.3, 3.4.12
>Reporter: Bogdan Kanivets
>Assignee: Bogdan Kanivets
>Priority: Minor
>  Labels: flaky, pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> According to the 
> [dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html]
>  testManyChildWatchersAutoReset is flaky in 3.4 and 3.5
> [ZooKeeper_branch34_java10|https://builds.apache.org/job/ZooKeeper_branch34_java10//13]
> [ZooKeeper_branch35_java9|https://builds.apache.org/job/ZooKeeper_branch35_java9/253]
> Test times out and because of that ant doesn't capture any output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3200) Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testInconsistentDueToNewLeaderOrder

2018-11-25 Thread Michael K. Edwards (JIRA)
Michael K. Edwards created ZOOKEEPER-3200:
-

 Summary: Flaky test: 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testInconsistentDueToNewLeaderOrder
 Key: ZOOKEEPER-3200
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3200
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Michael K. Edwards


https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1206/

I've seen this locally as well, in a branch where ZOOKEEPER-2778, 
ZOOKEEPER-1818, and ZOOKEEPER-2488 have all been addressed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch35_jdk8 - Build # 1206 - Failure

2018-11-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1206/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 66.03 KB...]
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 4
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
446.365 sec, Thread: 3, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.917 sec, Thread: 4, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
3
[junit] Running org.apache.zookeeper.test.SessionTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.596 sec, Thread: 1, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.493 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 1
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.504 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.537 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 3
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 1
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
15.088 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.589 sec, Thread: 1, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 4
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.465 sec, Thread: 4, Class: org.apache.zookeeper.test.StatTest
[junit] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.278 sec, Thread: 1, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.202 sec, Thread: 4, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 1
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.688 sec, Thread: 1, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 1
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.9 
sec, Thread: 4, Class: org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
30.027 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.3 
sec, Thread: 4, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 4
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.369 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.363 sec, Thread: 3, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
25.505 sec, Thread: 1, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.806 sec, Thread: 1, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Running org.apache.zookeeper.util.PemReaderTest in thread 1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
26.321 sec, Thread: 3, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Running org.apache.jute.BinaryInputArchiveTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 

[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...

2018-11-25 Thread TyqITstudent
Github user TyqITstudent commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/720#discussion_r236065557
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java ---
@@ -616,6 +616,14 @@ private void processEvent(Object event) {
   } else {
   cb.processResult(rc, clientPath, p.ctx, null);
   }
+  } else if (p.response instanceof 
GetAllChildrenNumberResponse) {
--- End diff --

> Also need to import GetAllChildrenNumberResponse

OK, thanks for your remind.


---


[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...

2018-11-25 Thread TyqITstudent
Github user TyqITstudent commented on the issue:

https://github.com/apache/zookeeper/pull/720
  
> It seems you are not renaming classes/fields in every point of code

Ok, thanks for your remind.


---


[GitHub] zookeeper issue #703: [ZOOKEEPER-1818] Correctly handle potential inconsiste...

2018-11-25 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/703
  
@anmolnar following are my understanding about the acceptedEpoch, 
currentEpoch and electionEpoch:

* acceptedEpoch : the previous epoch we accepted so far, usually is the 
epoch is the highest zxid on that server.
* currentEpoch  : the current epoch after syncing with the new leader, it's 
based on the maximum acceptedEpoch in the quorum, and usually it's the 
max(acceptedEpoch) + 1. The currentEpoch is used as the peerEpoch in the leader 
election, as we know (sid, zxid, peerEpoch) are the set used to decide a leader.
* electionEpoch : not part of the factors to decide leader, but it's used 
as a logical clock to avoid considering a vote delayed from a while ago.

Basically, we know there is a corner case where the learner may not update 
it's zxid, peerEpoch, and electionEpoch after leader election (check the new 
comment I added Leader.updateElectionVote), peerEpoch is fixed with a hack 
solution, but we cannot easily update the zxid and electionEpoch, so we try to 
ignore it. But IGNOREVALUE introduced will have compatible issue when rolling 
upgrade ensemble, that's why we introduced version in notification, and only 
compare id or peerEpoch based on version.


---