[jira] [Commented] (ZOOKEEPER-2488) Unsynchronized access to shuttingDownLE in QuorumPeer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698585#comment-16698585 ] Michael K. Edwards commented on ZOOKEEPER-2488: --- I pulled that fix out as a separate PR (#724). > Unsynchronized access to shuttingDownLE in QuorumPeer > - > > Key: ZOOKEEPER-2488 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2488 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.2 >Reporter: Michael Han >Assignee: gaoshu >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Time Spent: 10m > Remaining Estimate: 0h > > Access to shuttingDownLE in QuorumPeer is not synchronized here: > https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1066 > https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/ > The access should be synchronized as the same variable might be accessed > in QuormPeer::restartLeaderElection, which is synchronized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-2488) Unsynchronized access to shuttingDownLE in QuorumPeer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ZOOKEEPER-2488: -- Labels: pull-request-available (was: ) > Unsynchronized access to shuttingDownLE in QuorumPeer > - > > Key: ZOOKEEPER-2488 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2488 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.2 >Reporter: Michael Han >Assignee: gaoshu >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > > Access to shuttingDownLE in QuorumPeer is not synchronized here: > https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1066 > https://github.com/apache/zookeeper/blob/3c37184e83a3e68b73544cebccf9388eea26f523/src/java/main/org/ > The access should be synchronized as the same variable might be accessed > in QuormPeer::restartLeaderElection, which is synchronized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper pull request #724: ZOOKEEPER-2488: Synchronized access to shutting...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/724 ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer I think this can be reviewed separately from the (functionally somewhat related) changes in #707. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-2488 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #724 commit 33c1f0bad6e76a4cc67ab4ee44e08fa8f1ac5449 Author: Michael Edwards Date: 2018-11-21T23:09:55Z ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 retest this please ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 Fixed that (and tested it locally before pushing this time). ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 Except, er, that seems to make the tests fail. ð Investigating. ---
[jira] [Commented] (ZOOKEEPER-3201) Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698526#comment-16698526 ] Michael K. Edwards commented on ZOOKEEPER-3201: --- What seems to be happening here is that this "second chance" logic isn't sufficient to ensure that we don't hit {{ConnectionLossException}} again during the second attempt to test {{zk.exists()}}. {noformat} /** * Ensure the client is able to talk to the server. * * @param idx the idx of the server the client is talking to */ private void checkClientConnected(int idx) throws Exception { ZooKeeper zk = getClient(idx); if (zk == null) { return; } try { Assert.assertNull(zk.exists("/foofoofoo-connected", false)); } catch (ConnectionLossException e) { // second chance... // in some cases, leader change in particular, the timing is // very tricky to get right in order to assure that the client has // disconnected and reconnected. In some cases the client will // disconnect, then attempt to reconnect before the server is // back, in which case we'll see another connloss on the operation // in the try, this catches that case and waits for the server // to come back PeerStruct peer = qu.getPeer(idx); Assert.assertTrue("Waiting for server down", ClientBase.waitForServerUp( "127.0.0.1:" + peer.clientPort, ClientBase.CONNECTION_TIMEOUT)); Assert.assertNull(zk.exists("/foofoofoo-connected", false)); } } {noformat} > Flaky test: > org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart > -- > > Key: ZOOKEEPER-3201 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3201 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Michael K. Edwards >Priority: Major > > Encountered when running tests locally: > {noformat} > 64429 [junit] 2018-11-25 22:28:12,729 [myid:127.0.0.1:27389] - INFO > [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1108] - Opening > socket connection to server localhost/127.0.0 .1:27389. Will not attempt > to authenticate using SASL (unknown error) > 64430 [junit] 2018-11-25 22:28:12,730 [myid:127.0.0.1:27389] - INFO > [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@955] - Socket > connection established, initiating session, cli ent: /127.0.0.1:47668, > server: localhost/127.0.0.1:27389 > 64431 [junit] 2018-11-25 22:28:12,734 [myid:] - INFO > [NIOWorkerThread-1:Learner@117] - Revalidating client: 0x1a9cccf > 64432 [junit] 2018-11-25 22:28:12,743 [myid:127.0.0.1:27389] - INFO > [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1390] - Session > establishment complete on server localhost/12 7.0.0.1:27389, sessionid = > 0x1a9cccf, negotiated timeout = 3 > 64433 [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO > [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1108] - Opening > socket connection to server localhost/127.0.0 .1:27392. Will not attempt > to authenticate using SASL (unknown error) > 64434 [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO > [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@955] - Socket > connection established, initiating session, cli ent: /127.0.0.1:52160, > server: localhost/127.0.0.1:27392 > 64435 [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO > [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening > socket connection to server localhost/127.0.0 .1:27395. Will not attempt > to authenticate using SASL (unknown error) > 64436 [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO > [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket > connection established, initiating session, cli ent: /127.0.0.1:47256, > server: localhost/127.0.0.1:27395 > 64437 [junit] 2018-11-25 22:28:13,017 [myid:] - INFO > [NIOWorkerThread-4:ZooKeeperServer@1030] - Refusing session request for > client /127.0.0.1:47256 as it has seen zxid 0x3 our last zxid > is 0x2fffe client must try another server > 64438 [junit] 2018-11-25 22:28:13,018 [myid:127.0.0.1:27395] - INFO > [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to > read additional data from server sessionid 0x3a9ccd2, likely > server has closed socket, closing socket connection and attempting reconnect > 64439 [junit] 2018-11-25 22:28:13,023 [myid:127.0.0.1:27392] - INFO > [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1390] - Session
[jira] [Updated] (ZOOKEEPER-3202) Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ZOOKEEPER-3202: -- Labels: pull-request-available (was: ) > Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL > --- > > Key: ZOOKEEPER-3202 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3202 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Michael K. Edwards >Priority: Major > Labels: pull-request-available > > Encountered while running tests locally: > {noformat} > 283208 [junit] 2018-11-25 22:35:31,581 [myid:2] - INFO > [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):ZooKeeperServer@164] > - Created server with tick Time 4000 minSessionTimeout 8000 > maxSessionTimeout 8 datadir > /usr/src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2 > snapdir /usr/src/zookeeper/build/te > st/tmp/test6909783885989201471.junit.dir/data/version-2 > 283209 [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO > [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):ZooKeeperServer@164] > - Created server with tick Time 4000 minSessionTimeout 8000 > maxSessionTimeout 8 datadir > /usr/src/zookeeper/build/test/tmp/test9169467659375976724.junit.dir/data/version-2 > snapdir /usr/src/zookeeper/build/te > st/tmp/test9169467659375976724.junit.dir/data/version-2 > 283210 [junit] 2018-11-25 22:35:31,581 [myid:0] - INFO > [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):ZooKeeperServer@164] > - Created server with tick Time 4000 minSessionTimeout 8000 > maxSessionTimeout 8 datadir > /usr/src/zookeeper/build/test/tmp/test8933570428019756122.junit.dir/data/version-2 > snapdir /usr/src/zookeeper/build/te > st/tmp/test8933570428019756122.junit.dir/data/version-2 > 283211 [junit] 2018-11-25 22:35:31,585 [myid:0] - INFO > [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):Follower@69] > - FOLLOWING - LEADER ELECTION TOOK - 275 MS > 283212 [junit] 2018-11-25 22:35:31,588 [myid:2] - INFO > [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):Leader@457] > - LEADING - LEADER ELECTION TOOK - 160 MS > 283213 [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO > [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):Follower@69] > - FOLLOWING - LEADER ELECTION TOOK - 155 MS > 283214 [junit] 2018-11-25 22:35:31,633 [myid:2] - INFO > [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):FileTxnSnapLog@372] > - Snapshotting: 0x0 to /usr > /src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2/snapshot.0 > 283215 [junit] 2018-11-25 22:35:31,694 [myid:] - INFO > [main:FourLetterWordMain@87] - connecting to 127.0.0.1 11222 > 283216 [junit] 2018-11-25 22:35:31,695 [myid:0] - INFO [New I/O worker > #11:NettyServerCnxn@288] - Processing stat command from /127.0.0.1:60484 > 283217 [junit] 2018-11-25 22:35:31,699 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED > testClientServerSSL > 283218 [junit] java.lang.AssertionError: waiting for server 0 being up > 283219 [junit] at org.junit.Assert.fail(Assert.java:88) > 283220 [junit] at org.junit.Assert.assertTrue(Assert.java:41) > 283221 [junit] at > org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL(ClientSSLTest.java:98){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper pull request #723: ZOOKEEPER-3202: Add timing margin to improve re...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/723 ZOOKEEPER-3202: Add timing margin to improve reliability of testClientServerSSL() Allowing just 5 seconds for 3 quorum peers to start and elect a leader is a bit tight, at least when running 4 test processes in parallel inside a (Linux) Docker container on a (non-Linux) laptop. Add up to 10 seconds of extra margin. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3202 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/723.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #723 commit dcedaf9ad7c756b1852a9bfca4cfbc3313f1a0fc Author: Michael Edwards Date: 2018-11-26T06:31:23Z ZOOKEEPER-3202: Add timing margin to improve reliability of testClientServerSSL() ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 I think so too! Done ð ---
[jira] [Updated] (ZOOKEEPER-3203) Tracking and exposing the non voting followers in ZK
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ZOOKEEPER-3203: -- Labels: pull-request-available (was: ) > Tracking and exposing the non voting followers in ZK > > > Key: ZOOKEEPER-3203 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3203 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Fangmin Lv >Assignee: Fangmin Lv >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > > The current synced_followers metric reports all the forwarding followers, > including non-voting ones. > We found it's useful to track how many servers are following leader in > non-voting mode, so that we can identify issues like servers following but > not issuing reconfig. This JIRA is going to add a separate metric to report > the number of non-voting members. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper pull request #722: [ZOOKEEPER-3203] Tracking the number of non vot...
GitHub user lvfangmin opened a pull request: https://github.com/apache/zookeeper/pull/722 [ZOOKEEPER-3203] Tracking the number of non voting followers in ZK You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-3203 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/722.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #722 commit f093c2b6306d86efc5da9ad0834553060f99ae63 Author: Fangmin Lyu Date: 2018-11-26T00:41:25Z [ZOOKEEPER-3203] Track the number of non voting followers in ZK ---
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user tumativ commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236102347 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) { totalDeadWatchers.get() >= maxInProcessingDeadWatchers) { try { RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning"); -synchronized(totalDeadWatchers) { -totalDeadWatchers.wait(100); +synchronized(processingCompletedEvent) { +processingCompletedEvent.wait(100); } } catch (InterruptedException e) { -LOG.info("Got interrupted while waiting for dead watches " + +LOG.info("Got interrupted while waiting for dead watchers " + --- End diff -- Done ---
[jira] [Updated] (ZOOKEEPER-3203) Tracking and exposing the non voting followers in ZK
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fangmin Lv updated ZOOKEEPER-3203: -- Summary: Tracking and exposing the non voting followers in ZK (was: Track the number of non voting followers in ZK) > Tracking and exposing the non voting followers in ZK > > > Key: ZOOKEEPER-3203 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3203 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Fangmin Lv >Assignee: Fangmin Lv >Priority: Minor > Fix For: 3.6.0 > > > The current synced_followers metric reports all the forwarding followers, > including non-voting ones. > We found it's useful to track how many servers are following leader in > non-voting mode, so that we can identify issues like servers following but > not issuing reconfig. This JIRA is going to add a separate metric to report > the number of non-voting members. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3203) Track the number of non voting followers in ZK
Fangmin Lv created ZOOKEEPER-3203: - Summary: Track the number of non voting followers in ZK Key: ZOOKEEPER-3203 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3203 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Fangmin Lv Assignee: Fangmin Lv Fix For: 3.6.0 The current synced_followers metric reports all the forwarding followers, including non-voting ones. We found it's useful to track how many servers are following leader in non-voting mode, so that we can identify issues like servers following but not issuing reconfig. This JIRA is going to add a separate metric to report the number of non-voting members. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236098795 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -161,10 +163,10 @@ public void doWork() throws Exception { long startTime = Time.currentElapsedTime(); listener.processDeadWatchers(snapshot); long latency = Time.currentElapsedTime() - startTime; -LOG.info("Takes {} to process {} watches", latency, total); +LOG.info("Takes {} to process {} watchers", latency, total); --- End diff -- Watches seems to be more reasonable here, watcher maps to a single client session, and it can watch multiple paths, so have multiple watches on a single watcher, the total value is the total watches count not watcher. ---
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236098818 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) { totalDeadWatchers.get() >= maxInProcessingDeadWatchers) { try { RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning"); -synchronized(totalDeadWatchers) { -totalDeadWatchers.wait(100); +synchronized(processingCompletedEvent) { +processingCompletedEvent.wait(100); } } catch (InterruptedException e) { -LOG.info("Got interrupted while waiting for dead watches " + +LOG.info("Got interrupted while waiting for dead watchers " + --- End diff -- Ditto, watches are more accurate. ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/721#discussion_r236097804 --- Diff: zookeeper-server/src/test/java/org/apache/zookeeper/test/DisconnectedWatcherTest.java --- @@ -221,6 +228,7 @@ public void testManyChildWatchersAutoReset() throws Exception { watcher.waitForDisconnected(3); startServer(); watcher.waitForConnected(3); +watcher1.waitForConnected(3); --- End diff -- That example is due to an unrelated class of failure, in which the Zookeeper server is permanently unreachable (notice that the entire test timed out); I think many failure cases of that kind are the product of failure to bind() the dynamically-assigned port to the server socket (probably due to collisions with other tests running concurrently on the same build host). I've updated the description of this PR to explain what kind of failure it's intended to fix. ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
Github user lavacat commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/721#discussion_r236095252 --- Diff: zookeeper-server/src/test/java/org/apache/zookeeper/test/DisconnectedWatcherTest.java --- @@ -221,6 +228,7 @@ public void testManyChildWatchersAutoReset() throws Exception { watcher.waitForDisconnected(3); startServer(); watcher.waitForConnected(3); +watcher1.waitForConnected(3); --- End diff -- If this is fixing this test - that's great. Trying to understand why. If zk1 isn't connected, should we get CONNECTIONLOSS on line 237 zk1.create? Here is an example and I don't see it. https://builds.apache.org/job/ZooKeeper_branch35_java10/263/testReport/junit/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset/ Good improvement anyway, I think should be merged. ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/721 ZOOKEEPER-3046: wait for clients to reconnect after restarting server You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #721 commit 62e6bca2460b261e7cad204566f7b88a41ab0972 Author: Michael Edwards Date: 2018-11-25T22:23:29Z ZOOKEEPER-3046: wait for clients to reconnect after restarting server ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/721 ---
[jira] [Created] (ZOOKEEPER-3202) Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL
Michael K. Edwards created ZOOKEEPER-3202: - Summary: Flaky test: org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL Key: ZOOKEEPER-3202 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3202 Project: ZooKeeper Issue Type: Sub-task Reporter: Michael K. Edwards Encountered while running tests locally: {noformat} 283208 [junit] 2018-11-25 22:35:31,581 [myid:2] - INFO [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):ZooKeeperServer@164] - Created server with tick Time 4000 minSessionTimeout 8000 maxSessionTimeout 8 datadir /usr/src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2 snapdir /usr/src/zookeeper/build/te st/tmp/test6909783885989201471.junit.dir/data/version-2 283209 [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):ZooKeeperServer@164] - Created server with tick Time 4000 minSessionTimeout 8000 maxSessionTimeout 8 datadir /usr/src/zookeeper/build/test/tmp/test9169467659375976724.junit.dir/data/version-2 snapdir /usr/src/zookeeper/build/te st/tmp/test9169467659375976724.junit.dir/data/version-2 283210 [junit] 2018-11-25 22:35:31,581 [myid:0] - INFO [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):ZooKeeperServer@164] - Created server with tick Time 4000 minSessionTimeout 8000 maxSessionTimeout 8 datadir /usr/src/zookeeper/build/test/tmp/test8933570428019756122.junit.dir/data/version-2 snapdir /usr/src/zookeeper/build/te st/tmp/test8933570428019756122.junit.dir/data/version-2 283211 [junit] 2018-11-25 22:35:31,585 [myid:0] - INFO [QuorumPeer[myid=0](plain=localhost/127.0.0.1:11222)(secure=0.0.0.0/0.0.0.0:11223):Follower@69] - FOLLOWING - LEADER ELECTION TOOK - 275 MS 283212 [junit] 2018-11-25 22:35:31,588 [myid:2] - INFO [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):Leader@457] - LEADING - LEADER ELECTION TOOK - 160 MS 283213 [junit] 2018-11-25 22:35:31,582 [myid:1] - INFO [QuorumPeer[myid=1](plain=localhost/127.0.0.1:11226)(secure=0.0.0.0/0.0.0.0:11227):Follower@69] - FOLLOWING - LEADER ELECTION TOOK - 155 MS 283214 [junit] 2018-11-25 22:35:31,633 [myid:2] - INFO [QuorumPeer[myid=2](plain=localhost/127.0.0.1:11230)(secure=0.0.0.0/0.0.0.0:11231):FileTxnSnapLog@372] - Snapshotting: 0x0 to /usr /src/zookeeper/build/test/tmp/test6909783885989201471.junit.dir/data/version-2/snapshot.0 283215 [junit] 2018-11-25 22:35:31,694 [myid:] - INFO [main:FourLetterWordMain@87] - connecting to 127.0.0.1 11222 283216 [junit] 2018-11-25 22:35:31,695 [myid:0] - INFO [New I/O worker #11:NettyServerCnxn@288] - Processing stat command from /127.0.0.1:60484 283217 [junit] 2018-11-25 22:35:31,699 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testClientServerSSL 283218 [junit] java.lang.AssertionError: waiting for server 0 being up 283219 [junit] at org.junit.Assert.fail(Assert.java:88) 283220 [junit] at org.junit.Assert.assertTrue(Assert.java:41) 283221 [junit] at org.apache.zookeeper.test.ClientSSLTest.testClientServerSSL(ClientSSLTest.java:98){noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3201) Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart
Michael K. Edwards created ZOOKEEPER-3201: - Summary: Flaky test: org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenLeaderRestart Key: ZOOKEEPER-3201 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3201 Project: ZooKeeper Issue Type: Sub-task Reporter: Michael K. Edwards Encountered when running tests locally: {noformat} 64429 [junit] 2018-11-25 22:28:12,729 [myid:127.0.0.1:27389] - INFO [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0 .1:27389. Will not attempt to authenticate using SASL (unknown error) 64430 [junit] 2018-11-25 22:28:12,730 [myid:127.0.0.1:27389] - INFO [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli ent: /127.0.0.1:47668, server: localhost/127.0.0.1:27389 64431 [junit] 2018-11-25 22:28:12,734 [myid:] - INFO [NIOWorkerThread-1:Learner@117] - Revalidating client: 0x1a9cccf 64432 [junit] 2018-11-25 22:28:12,743 [myid:127.0.0.1:27389] - INFO [main-SendThread(127.0.0.1:27389):ClientCnxn$SendThread@1390] - Session establishment complete on server localhost/12 7.0.0.1:27389, sessionid = 0x1a9cccf, negotiated timeout = 3 64433 [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0 .1:27392. Will not attempt to authenticate using SASL (unknown error) 64434 [junit] 2018-11-25 22:28:13,009 [myid:127.0.0.1:27392] - INFO [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli ent: /127.0.0.1:52160, server: localhost/127.0.0.1:27392 64435 [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0 .1:27395. Will not attempt to authenticate using SASL (unknown error) 64436 [junit] 2018-11-25 22:28:13,016 [myid:127.0.0.1:27395] - INFO [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli ent: /127.0.0.1:47256, server: localhost/127.0.0.1:27395 64437 [junit] 2018-11-25 22:28:13,017 [myid:] - INFO [NIOWorkerThread-4:ZooKeeperServer@1030] - Refusing session request for client /127.0.0.1:47256 as it has seen zxid 0x3 our last zxid is 0x2fffe client must try another server 64438 [junit] 2018-11-25 22:28:13,018 [myid:127.0.0.1:27395] - INFO [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read additional data from server sessionid 0x3a9ccd2, likely server has closed socket, closing socket connection and attempting reconnect 64439 [junit] 2018-11-25 22:28:13,023 [myid:127.0.0.1:27392] - INFO [main-SendThread(127.0.0.1:27392):ClientCnxn$SendThread@1390] - Session establishment complete on server localhost/12 7.0.0.1:27392, sessionid = 0x2a9d094, negotiated timeout = 3 64440 [junit] 2018-11-25 22:28:13,119 [myid:] - INFO [main:FourLetterWordMain@87] - connecting to 127.0.0.1 27395 64441 [junit] 2018-11-25 22:28:13,120 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@518] - Processing stat command from /127.0.0.1:47258 64442 [junit] 2018-11-25 22:28:13,121 [myid:] - INFO [NIOWorkerThread-1:StatCommand@53] - Stat command output 64443 [junit] 2018-11-25 22:28:14,134 [myid:127.0.0.1:27395] - INFO [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1108] - Opening socket connection to server localhost/127.0.0 .1:27395. Will not attempt to authenticate using SASL (unknown error) 6 [junit] 2018-11-25 22:28:14,135 [myid:127.0.0.1:27395] - INFO [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@955] - Socket connection established, initiating session, cli ent: /127.0.0.1:47312, server: localhost/127.0.0.1:27395 64445 [junit] 2018-11-25 22:28:14,135 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@1030] - Refusing session request for client /127.0.0.1:47312 as it has seen zxid 0x3 our last zxid is 0x2fffe client must try another server 64446 [junit] 2018-11-25 22:28:14,137 [myid:127.0.0.1:27395] - INFO [main-SendThread(127.0.0.1:27395):ClientCnxn$SendThread@1236] - Unable to read additional data from server sessionid 0x3a9ccd2, likely server has closed socket, closing socket connection and attempting reconnect 64447 [junit] 2018-11-25 22:28:14,240 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testRolloverThenLeaderRestart 64448 [junit] org.apache.zookeeper.KeeperException$ConnectionLossException:
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/721 ZOOKEEPER-3046: wait for clients to reconnect after restarting server You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #721 commit 62e6bca2460b261e7cad204566f7b88a41ab0972 Author: Michael Edwards Date: 2018-11-25T22:23:29Z ZOOKEEPER-3046: wait for clients to reconnect after restarting server ---
[jira] [Comment Edited] (ZOOKEEPER-3046) testManyChildWatchersAutoReset is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698341#comment-16698341 ] Michael K. Edwards edited comment on ZOOKEEPER-3046 at 11/25/18 10:21 PM: -- Still seeing test failures; basically a variant of ZOOKEEPER-2508. (After stopping/starting the server, we have to wait for all clients to reconnect before continuing the test.) {noformat} 422005 [junit] 2018-11-25 21:25:50,228 [myid:127.0.0.1:16611] - INFO [Time-limited test-SendThread(127.0.0.1:16611):ClientCnxn$SendThread@1390] - Session establishment complete on serve r localhost/127.0.0.1:16611, sessionid = 0x17077c50001, negotiated timeout = 3 422006 [junit] 2018-11-25 21:25:50,286 [myid:] - INFO [Time-limited test:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testManyChildWatchersAutoReset 422007 [junit] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /long-path-0-1-2-3-4-5-6 -7-8-9/ch-00/ch 422008 [junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) 422009 [junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 422010 [junit] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1459) 422011 [junit] at org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:229) {noformat} was (Author: mkedwards): Still seeing test failures; basically a variant of ZOOKEEPER-2508. (After stopping/starting the server, we have to wait for all clients to reconnect before continuing the test.) {{ 422005 [junit] 2018-11-25 21:25:50,228 [myid:127.0.0.1:16611] - INFO [Time-limited test-SendThread(127.0.0.1:16611):ClientCnxn$SendThread@1390] - Session establishment complete on serve r localhost/127.0.0.1:16611, sessionid = 0x17077c50001, negotiated timeout = 3 422006 [junit] 2018-11-25 21:25:50,286 [myid:] - INFO [Time-limited test:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testManyChildWatchersAutoReset 422007 [junit] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /long-path-0-1-2-3-4-5-6 -7-8-9/ch-00/ch 422008 [junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) 422009 [junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 422010 [junit] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1459) 422011 [junit] at org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:229) }} > testManyChildWatchersAutoReset is flaky > --- > > Key: ZOOKEEPER-3046 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3046 > Project: ZooKeeper > Issue Type: Sub-task > Components: tests >Affects Versions: 3.5.3, 3.4.12 >Reporter: Bogdan Kanivets >Assignee: Bogdan Kanivets >Priority: Minor > Labels: flaky, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > According to the > [dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html] > testManyChildWatchersAutoReset is flaky in 3.4 and 3.5 > [ZooKeeper_branch34_java10|https://builds.apache.org/job/ZooKeeper_branch34_java10//13] > [ZooKeeper_branch35_java9|https://builds.apache.org/job/ZooKeeper_branch35_java9/253] > Test times out and because of that ant doesn't capture any output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3046) testManyChildWatchersAutoReset is flaky
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698341#comment-16698341 ] Michael K. Edwards commented on ZOOKEEPER-3046: --- Still seeing test failures; basically a variant of ZOOKEEPER-2508. (After stopping/starting the server, we have to wait for all clients to reconnect before continuing the test.) {{ 422005 [junit] 2018-11-25 21:25:50,228 [myid:127.0.0.1:16611] - INFO [Time-limited test-SendThread(127.0.0.1:16611):ClientCnxn$SendThread@1390] - Session establishment complete on serve r localhost/127.0.0.1:16611, sessionid = 0x17077c50001, negotiated timeout = 3 422006 [junit] 2018-11-25 21:25:50,286 [myid:] - INFO [Time-limited test:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED testManyChildWatchersAutoReset 422007 [junit] org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /long-path-0-1-2-3-4-5-6 -7-8-9/ch-00/ch 422008 [junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) 422009 [junit] at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 422010 [junit] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1459) 422011 [junit] at org.apache.zookeeper.test.DisconnectedWatcherTest.testManyChildWatchersAutoReset(DisconnectedWatcherTest.java:229) }} > testManyChildWatchersAutoReset is flaky > --- > > Key: ZOOKEEPER-3046 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3046 > Project: ZooKeeper > Issue Type: Sub-task > Components: tests >Affects Versions: 3.5.3, 3.4.12 >Reporter: Bogdan Kanivets >Assignee: Bogdan Kanivets >Priority: Minor > Labels: flaky, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > According to the > [dashboard|https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html] > testManyChildWatchersAutoReset is flaky in 3.4 and 3.5 > [ZooKeeper_branch34_java10|https://builds.apache.org/job/ZooKeeper_branch34_java10//13] > [ZooKeeper_branch35_java9|https://builds.apache.org/job/ZooKeeper_branch35_java9/253] > Test times out and because of that ant doesn't capture any output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3200) Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testInconsistentDueToNewLeaderOrder
Michael K. Edwards created ZOOKEEPER-3200: - Summary: Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testInconsistentDueToNewLeaderOrder Key: ZOOKEEPER-3200 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3200 Project: ZooKeeper Issue Type: Sub-task Reporter: Michael K. Edwards https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1206/ I've seen this locally as well, in a branch where ZOOKEEPER-2778, ZOOKEEPER-1818, and ZOOKEEPER-2488 have all been addressed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
ZooKeeper_branch35_jdk8 - Build # 1206 - Failure
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1206/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 66.03 KB...] [junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 4 [junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 446.365 sec, Thread: 3, Class: org.apache.zookeeper.test.NioNettySuiteTest [junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 1 [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.917 sec, Thread: 4, Class: org.apache.zookeeper.test.SaslSuperUserTest [junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 3 [junit] Running org.apache.zookeeper.test.SessionTest in thread 4 [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.596 sec, Thread: 1, Class: org.apache.zookeeper.test.ServerCnxnTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.493 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionInvalidationTest [junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 1 [junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 3 [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.504 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTrackerCheckTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.537 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTimeoutTest [junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 3 [junit] Running org.apache.zookeeper.test.StandaloneTest in thread 1 [junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.088 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.589 sec, Thread: 1, Class: org.apache.zookeeper.test.StandaloneTest [junit] Running org.apache.zookeeper.test.StatTest in thread 4 [junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 1 [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.465 sec, Thread: 4, Class: org.apache.zookeeper.test.StatTest [junit] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.278 sec, Thread: 1, Class: org.apache.zookeeper.test.StaticHostProviderTest [junit] Running org.apache.zookeeper.test.StringUtilTest in thread 4 [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.202 sec, Thread: 4, Class: org.apache.zookeeper.test.StringUtilTest [junit] Running org.apache.zookeeper.test.SyncCallTest in thread 1 [junit] Running org.apache.zookeeper.test.TruncateTest in thread 4 [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.688 sec, Thread: 1, Class: org.apache.zookeeper.test.SyncCallTest [junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in thread 1 [junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.9 sec, Thread: 4, Class: org.apache.zookeeper.test.TruncateTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.027 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionUpgradeTest [junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 4 [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.3 sec, Thread: 4, Class: org.apache.zookeeper.test.WatchedEventTest [junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3 [junit] Running org.apache.zookeeper.test.WatcherTest in thread 4 [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.369 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest [junit] Running org.apache.zookeeper.test.X509AuthTest in thread 3 [junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.363 sec, Thread: 3, Class: org.apache.zookeeper.test.X509AuthTest [junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in thread 3 [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.505 sec, Thread: 1, Class: org.apache.zookeeper.test.WatchEventWhenAutoResetTest [junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 1 [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.806 sec, Thread: 1, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest [junit] Running org.apache.zookeeper.util.PemReaderTest in thread 1 [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.321 sec, Thread: 3, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest [junit] Running org.apache.jute.BinaryInputArchiveTest in thread 3 [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...
Github user TyqITstudent commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/720#discussion_r236065557 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java --- @@ -616,6 +616,14 @@ private void processEvent(Object event) { } else { cb.processResult(rc, clientPath, p.ctx, null); } + } else if (p.response instanceof GetAllChildrenNumberResponse) { --- End diff -- > Also need to import GetAllChildrenNumberResponse OK, thanks for your remind. ---
[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...
Github user TyqITstudent commented on the issue: https://github.com/apache/zookeeper/pull/720 > It seems you are not renaming classes/fields in every point of code Ok, thanks for your remind. ---
[GitHub] zookeeper issue #703: [ZOOKEEPER-1818] Correctly handle potential inconsiste...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/703 @anmolnar following are my understanding about the acceptedEpoch, currentEpoch and electionEpoch: * acceptedEpoch : the previous epoch we accepted so far, usually is the epoch is the highest zxid on that server. * currentEpoch : the current epoch after syncing with the new leader, it's based on the maximum acceptedEpoch in the quorum, and usually it's the max(acceptedEpoch) + 1. The currentEpoch is used as the peerEpoch in the leader election, as we know (sid, zxid, peerEpoch) are the set used to decide a leader. * electionEpoch : not part of the factors to decide leader, but it's used as a logical clock to avoid considering a vote delayed from a while ago. Basically, we know there is a corner case where the learner may not update it's zxid, peerEpoch, and electionEpoch after leader election (check the new comment I added Leader.updateElectionVote), peerEpoch is fixed with a hack solution, but we cannot easily update the zxid and electionEpoch, so we try to ignore it. But IGNOREVALUE introduced will have compatible issue when rolling upgrade ensemble, that's why we introduced version in notification, and only compare id or peerEpoch based on version. ---