[GitHub] zookeeper issue #714: Zookeeper 1818 (on branch-3.5)
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/714 I updated this PR to match the changes made to #703 during review. Anything else I can do to help? ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/707 ---
[GitHub] zookeeper issue #715: Rollup of blocker/critical fixes for 3.5 (to trigger C...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/715 This is the branch where my colleague Eric Schwartz and I are trying to get to a "zero flaky tests" standard. (It started as the branch where I worked on ZOOKEEPER-2778, and grew from there.) The pull request is not really a pull request any more; I was just still interested in what Jenkins makes of the state of the branch. If anybody wants to go over with me what we've done in this branch and why, I'd be *delighted* to discuss it. (We're down to one test that needs a bit of work to restore the functional equivalent of the log-grepping verification we ripped out so we could switch to the log4j2 implementation of slf4j.) On Tue, Dec 4, 2018, 6:54 AM Norbert Kalmar I'm not sure what this is. Do you have a corresponding jira? > > â > You are receiving this because you modified the open/close state. > Reply to this email directly, view it on GitHub > <https://github.com/apache/zookeeper/pull/715#issuecomment-444127695>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAzdsgFEKqCiQzKnr-K5s-2m6qTG7M2fks5u1oyzgaJpZM4YvOj3> > . > ---
[GitHub] zookeeper issue #719: [ZOOKEEPER-2778] QuorumPeer: address potential reconfi...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/719 I believe #707 has @eolivelli's LGTM now. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 That makes perfect sense. Thanks for keeping an eye on this. ---
[GitHub] zookeeper issue #719: [ZOOKEEPER-2778] QuorumPeer: address potential reconfi...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/719 Does this look fit to merge? I'm not a committer, so I'll need help getting it merged. (The version targeted to branch-3.5 is #707.) ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 @eolivelli , have I addressed all your concerns? Does this look to you like a complete fix to ZOOKEEPER-2778? ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 retest this please ---
[GitHub] zookeeper issue #724: ZOOKEEPER-2488: Synchronized access to shuttingDownLE ...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/724 retest this please ---
[GitHub] zookeeper pull request #724: ZOOKEEPER-2488: Synchronized access to shutting...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/724 ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer I think this can be reviewed separately from the (functionally somewhat related) changes in #707. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-2488 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #724 commit 33c1f0bad6e76a4cc67ab4ee44e08fa8f1ac5449 Author: Michael Edwards Date: 2018-11-21T23:09:55Z ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 retest this please ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 Fixed that (and tested it locally before pushing this time). ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 Except, er, that seems to make the tests fail. ð Investigating. ---
[GitHub] zookeeper pull request #723: ZOOKEEPER-3202: Add timing margin to improve re...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/723 ZOOKEEPER-3202: Add timing margin to improve reliability of testClientServerSSL() Allowing just 5 seconds for 3 quorum peers to start and elect a leader is a bit tight, at least when running 4 test processes in parallel inside a (Linux) Docker container on a (non-Linux) laptop. Add up to 10 seconds of extra margin. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3202 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/723.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #723 commit dcedaf9ad7c756b1852a9bfca4cfbc3313f1a0fc Author: Michael Edwards Date: 2018-11-26T06:31:23Z ZOOKEEPER-3202: Add timing margin to improve reliability of testClientServerSSL() ---
[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/721 I think so too! Done ð ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/721#discussion_r236097804 --- Diff: zookeeper-server/src/test/java/org/apache/zookeeper/test/DisconnectedWatcherTest.java --- @@ -221,6 +228,7 @@ public void testManyChildWatchersAutoReset() throws Exception { watcher.waitForDisconnected(3); startServer(); watcher.waitForConnected(3); +watcher1.waitForConnected(3); --- End diff -- That example is due to an unrelated class of failure, in which the Zookeeper server is permanently unreachable (notice that the entire test timed out); I think many failure cases of that kind are the product of failure to bind() the dynamically-assigned port to the server socket (probably due to collisions with other tests running concurrently on the same build host). I've updated the description of this PR to explain what kind of failure it's intended to fix. ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/721 ZOOKEEPER-3046: wait for clients to reconnect after restarting server You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #721 commit 62e6bca2460b261e7cad204566f7b88a41ab0972 Author: Michael Edwards Date: 2018-11-25T22:23:29Z ZOOKEEPER-3046: wait for clients to reconnect after restarting server ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/721 ---
[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/721 ZOOKEEPER-3046: wait for clients to reconnect after restarting server You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #721 commit 62e6bca2460b261e7cad204566f7b88a41ab0972 Author: Michael Edwards Date: 2018-11-25T22:23:29Z ZOOKEEPER-3046: wait for clients to reconnect after restarting server ---
[GitHub] zookeeper issue #528: ZOOKEEPER-3034 Facing issues while building from sourc...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/528 I'd like to see this brought back. Not cool to drop autoconf when the CMake setup can't build shared libraries. From `zookeeper-client/zookeeper-client-c/README`: ``` Current limitations of the CMake build system include lack of Solaris support, no shared library option, no explicitly exported symbols (all are exported by default), no versions on the libraries, and no documentation generation. ``` ---
[GitHub] zookeeper issue #719: [ZOOKEEPER-2778] QuorumPeer: address potential reconfi...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/719 I don't understand the test failures. Are these unrelated problems in master? ---
[GitHub] zookeeper pull request #719: [ZOOKEEPER-2778] QuorumPeer: address potential ...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/719 [ZOOKEEPER-2778] QuorumPeer: address potential reconfiguration deadlocks * QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference * QuorumPeer/QuorumCnxManager: address deadlock and visibility issues * QuorumPeer: add fast path for already-non-null quorum/election address * QuorumPeer: fix access to newly private data members from ReconfigTest * LeaderBeanTest: set up mock QuorumVerifier so that addresses get set * QuorumPeer: warn when clobbering existing election algorithm * QuorumPeer: halt old QCM when clobbering existing election algorithm You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-2778-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/719.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #719 commit 53ab259c38ee4d33c8f3ab26cde82992e2074537 Author: Michael Edwards Date: 2018-11-20T13:33:09Z [ZOOKEEPER-2778] QuorumPeer: address potential deadlocks related to reconfiguration * QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference * QuorumPeer/QuorumCnxManager: address deadlock and visibility issues * QuorumPeer: add fast path for already-non-null quorum/election address * QuorumPeer: fix access to newly private data members from ReconfigTest * LeaderBeanTest: set up mock QuorumVerifier so that addresses get set * QuorumPeer: warn when clobbering existing election algorithm * QuorumPeer: halt old QCM when clobbering existing election algorithm ---
[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/715 Rollup of blocker/critical fixes for 3.5 (to trigger CI) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper rollup-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/715.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #715 commit 3694a4e31eef9b85de59112c22ab163452610743 Author: Michael Edwards Date: 2018-11-20T13:33:09Z [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference commit 4cd10c86519b75521f89e451033dca4869d8d0d1 Author: Michael Edwards Date: 2018-11-21T08:53:54Z [ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and visibility issues commit 03d259bae3b744dc494022698fa843f6cf35e7ed Author: Michael Edwards Date: 2018-11-21T09:01:45Z [ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null quorum/election address commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e Author: Michael Edwards Date: 2018-11-21T17:13:14Z [ZOOKEEPER-2778] QuorumPeer: fixes from code review commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1 Author: Michael Edwards Date: 2018-11-21T17:19:44Z [ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from ReconfigTest commit bbeeebf87391ef642059c4b3b65592c361a2ab4e Author: Michael Edwards Date: 2018-11-21T19:48:49Z [ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that addresses get set commit 5038179e217a0b80805fbb6780a6fec024f9e29d Author: Michael Edwards Date: 2018-11-22T19:21:19Z [ZOOKEEPER-2778] QuorumPeer: warn when clobbering existing election algorithm commit 78df674c4413561336e3435eaa692a9ec2ede0ca Author: Michael Edwards Date: 2018-11-22T19:29:27Z [ZOOKEEPER-2778] QuorumPeer: halt old QCM when clobbering existing election algorithm commit d6898072947cc908ca802fa542e636d615ac29f0 Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z ZOOKEEPER-1818: Correctly handle potentially inconsistent zxid/electionEpoch and peerEpoch during leader election commit ab30d5c8fb64b9fc93bd5e691855f6c17abd3d17 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) commit e05ae82437e8a7124c5501772ce188d29ac9bfc2 Author: Michael Edwards Date: 2018-11-21T23:09:55Z ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer commit bf685a609c991ac4642da42ab41c7d43d6e35246 Author: Andor Molnar Date: 2018-11-19T16:25:52Z ZOOKEEPER-3193. Refactor SaslAuthFail test to use single class. Use CountDownLatch to sync with watcher. commit b964efbaa363fc873205acba4aefc76940a358f1 Author: Michael Edwards Date: 2018-11-21T21:33:01Z Bump library versions, fix 'ant package-native tar' targets commit bba8aebae6c153d21913f7391f2201c148f24f1e Author: Michael Edwards Date: 2018-11-22T08:38:28Z Add OneLinerFormatter to get semi-verbose logs with captured stdout/stderr commit 984189537df99a6c55b058362047d272a5c59145 Author: Michael Edwards Date: 2018-11-22T12:07:38Z ZOOKEEPER-3198: plumb BindException up to Leader and its peers * Normalize bind failures to java.net.BindException in both implementation of ServerCnxnFactory * Throw BindException out as far as the caller of QuorumPeer.processReconfig * Catch BindException in tryToCommit(); the transaction needs to be committed ---
[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/715 ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/718 ZOOKEEPER-1818: Correctly handle potentially inconsistent zxid/electionEpoch⦠⦠and peerEpoch during leader election. (This is Fangmin's patch, I'm just firing off a CI build against current master.) See #714 for the backport to branch-3.5. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/718.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #718 commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96 Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch during leader election ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/718 ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/718 ZOOKEEPER-1818: Correctly handle potentially inconsistent zxid/electionEpoch⦠⦠and peerEpoch during leader election. (This is Fangmin's patch, I'm just firing off a CI build against current master.) See #714 for the backport to branch-3.5. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/718.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #718 commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96 Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch during leader election ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/718 ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/718 ZOOKEEPER-1818: Correctly handle potentially inconsistent zxid/electionEpoch⦠⦠and peerEpoch during leader election. (This is Fangmin's patch, I'm just firing off a CI build against current master.) See #714 for the backport to branch-3.5. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/718.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #718 commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96 Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch during leader election ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/718 ---
[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/717 ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/717.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #717 commit 40eb35efb56ddf4ef06243ec433d57200e9029f5 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) ---
[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/717 ---
[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...
GitHub user mkedwards reopened a pull request: https://github.com/apache/zookeeper/pull/717 ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/717.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #717 commit 40eb35efb56ddf4ef06243ec433d57200e9029f5 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) ---
[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...
Github user mkedwards closed the pull request at: https://github.com/apache/zookeeper/pull/717 ---
[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/718 ZOOKEEPER-1818: Correctly handle potentially inconsistent zxid/electionEpoch⦠⦠and peerEpoch during leader election. (This is Fangmin's patch, I'm just firing off a CI build against current master.) See #714 for the backport to branch-3.5. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/718.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #718 commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96 Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch during leader election ---
[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/717 ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/717.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #717 commit 40eb35efb56ddf4ef06243ec433d57200e9029f5 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235808196 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -108,7 +109,11 @@ LocalPeerBean jmxLocalPeerBean; private Map jmxRemotePeerBean; LeaderElectionBean jmxLeaderElectionBean; -private QuorumCnxManager qcm; + +// The QuorumCnxManager is held through an AtomicReference to ensure cross-thread visibility +// of updates; see the implementation comment at setLastSeenQuorumVerifier(). +private AtomicReference qcmRef = new AtomicReference<>(); --- End diff -- I am hoping to reduce the need for synchronized blocks in follow-up changes. For now, I added a simple use of getAndSet() to detect multiple calls to createElectionAlgorithm() and ensure that the QCM that's being dropped on the floor gets halted first. ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235807260 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -121,6 +126,18 @@ */ private ZKDatabase zkDb; +public static class AddressTuple { --- End diff -- Good idea. Done. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 This PR now has the extraneous changes removed and is green in CI. Please re-review at your convenience. ---
[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/715 Rollup of blocker/critical fixes for 3.5 (to trigger CI) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper rollup-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/715.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #715 commit 3694a4e31eef9b85de59112c22ab163452610743 Author: Michael Edwards Date: 2018-11-20T13:33:09Z [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference commit 4cd10c86519b75521f89e451033dca4869d8d0d1 Author: Michael Edwards Date: 2018-11-21T08:53:54Z [ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and visibility issues commit 03d259bae3b744dc494022698fa843f6cf35e7ed Author: Michael Edwards Date: 2018-11-21T09:01:45Z [ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null quorum/election address commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e Author: Michael Edwards Date: 2018-11-21T17:13:14Z [ZOOKEEPER-2778] QuorumPeer: fixes from code review commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1 Author: Michael Edwards Date: 2018-11-21T17:19:44Z [ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from ReconfigTest commit bbeeebf87391ef642059c4b3b65592c361a2ab4e Author: Michael Edwards Date: 2018-11-21T19:48:49Z [ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that addresses get set commit 0e2b571d452306ae151b106e05d0511b09a237d3 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) commit 3683a1b451fa334ba51227db327557966d319a4d Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z ZOOKEEPER-1818: Correctly handle potentially inconsistent zxid/electionEpoch and peerEpoch during leader election commit a1b56505671b756448f7c0126de764cd25633e2f Author: Michael Edwards Date: 2018-11-21T21:33:01Z Bump library versions, fix 'ant package-native tar' targets commit 3dfd49f6bfea357c838e21d5a2e4f1486ed753e9 Author: Michael Edwards Date: 2018-11-21T23:09:55Z ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer commit 1cbaec427037b8ac10004e8198821da524949843 Author: Andor Molnar Date: 2018-11-19T16:25:52Z ZOOKEEPER-3193. Refactor SaslAuthFail test to use single class. Use CountDownLatch to sync with watcher. commit 0d4e7839eaab8b3222be31a705f9edded1ad98a5 Author: Michael Edwards Date: 2018-11-22T08:38:28Z Add OneLinerFormatter to get semi-verbose logs with captured stdout/stderr commit 3b19067c2a5b213e53f6e2ce7638b76250076fe6 Author: Michael Edwards Date: 2018-11-22T12:07:38Z Throw BindException out as far as the caller of QuorumPeer.processReconfig ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 After groveling through https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/consoleText, I think this may be a contributing factor to flaky tests: ``` [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind port: localhost/127.0.0.1:19459 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493] - binding to port localhost/127.0.0.1:19466 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR [QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497] - Error while reconfiguring [exec] [junit] org.jboss.netty.channel.ChannelException: Failed to bind to: localhost/127.0.0.1:19466 [exec] [junit] at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) [exec] [junit] at org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494) [exec] [junit] at org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947) [exec] [junit] at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154) [exec] [junit] at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93) [exec] [junit] at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263) [exec] [junit] Caused by: java.net.BindException: Address already in use [exec] [junit] at sun.nio.ch.Net.bind0(Native Method) [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433) [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425) [exec] [junit] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) [exec] [junit] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) [exec] [junit] at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) [exec] [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) [exec] [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) [exec] [junit] at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) [exec] [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [exec] [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [exec] [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [exec] [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [exec] [junit] at java.lang.Thread.run(Thread.java:748) ``` ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 Thanks maoling; I'm familiar with the procedure, just hadn't gotten around to juggling branches. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 Whoops, just pushed some unrelated stuff to this branch. The code state that fixes just ZOOKEEPER-2778 is https://github.com/apache/zookeeper/pull/707/commits/bbeeebf87391ef642059c4b3b65592c361a2ab4e ---
[GitHub] zookeeper pull request #714: Zookeeper 1818 (on top of #713 to fix ZOOKEEPER...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/714 Zookeeper 1818 (on top of #713 to fix ZOOKEEPER-2778 and ZOOKEEPER-1636 for a green build) Rebasing Fangmin Lyu's patch from https://github.com/lvfangmin/zookeeper/tree/ZOOKEEPER-1818 onto the 3.5 branch (cumulative with other PRs in flight). You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/714.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #714 commit 3694a4e31eef9b85de59112c22ab163452610743 Author: Michael Edwards Date: 2018-11-20T13:33:09Z [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference commit 4cd10c86519b75521f89e451033dca4869d8d0d1 Author: Michael Edwards Date: 2018-11-21T08:53:54Z [ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and visibility issues commit 03d259bae3b744dc494022698fa843f6cf35e7ed Author: Michael Edwards Date: 2018-11-21T09:01:45Z [ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null quorum/election address commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e Author: Michael Edwards Date: 2018-11-21T17:13:14Z [ZOOKEEPER-2778] QuorumPeer: fixes from code review commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1 Author: Michael Edwards Date: 2018-11-21T17:19:44Z [ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from ReconfigTest commit bbeeebf87391ef642059c4b3b65592c361a2ab4e Author: Michael Edwards Date: 2018-11-21T19:48:49Z [ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that addresses get set commit 0e2b571d452306ae151b106e05d0511b09a237d3 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) commit 7f286c46d505c3300a15de9d1f4849a4da84e6d3 Author: Fangmin Lyu Date: 2018-11-15T17:46:51Z ZOOKEEPER-1818: Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch during leader election ---
[GitHub] zookeeper pull request #713: Zookeeper 1636 (on top of #707 to fix ZOOKEEPER...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/713 Zookeeper 1636 (on top of #707 to fix ZOOKEEPER-2778 for a green build) (Rebased Thawan Kooburat's patch on current layout of 3.5 branch) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #713 commit 3694a4e31eef9b85de59112c22ab163452610743 Author: Michael Edwards Date: 2018-11-20T13:33:09Z [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference commit 4cd10c86519b75521f89e451033dca4869d8d0d1 Author: Michael Edwards Date: 2018-11-21T08:53:54Z [ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and visibility issues commit 03d259bae3b744dc494022698fa843f6cf35e7ed Author: Michael Edwards Date: 2018-11-21T09:01:45Z [ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null quorum/election address commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e Author: Michael Edwards Date: 2018-11-21T17:13:14Z [ZOOKEEPER-2778] QuorumPeer: fixes from code review commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1 Author: Michael Edwards Date: 2018-11-21T17:19:44Z [ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from ReconfigTest commit bbeeebf87391ef642059c4b3b65592c361a2ab4e Author: Michael Edwards Date: 2018-11-21T19:48:49Z [ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that addresses get set commit 0e2b571d452306ae151b106e05d0511b09a237d3 Author: Michael Edwards Date: 2018-11-21T20:31:16Z ZOOKEEPER-1636: cleanup completion list of a failed multi request (from Thawan Kooburat) ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 The `LeaderBeanTest` failure was real-ish; the mock `QuorumVerifier` wasn't set up properly for address resolution, so it would NPE (without the new condvar-style wait for addresses to be set) or hang (with it). I just pushed a fix to the test. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 @afine , does this approach address your concerns with #247? ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 Review comments addressed. It's a little bit weird that `ant test` doesn't recompile the tests when the implementation has changed; an `ant clean`/`ant test` cycle caught another test failure-to-compile due to more restrictive access on data members. ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235470421 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() { } public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){ -synchronized (QV_LOCK) { -if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) { -LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() + -". Current version: " + quorumVerifier.getVersion()); +// If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm +// and then take QV_LOCK. Take the locks in the same order to ensure that we don't +// deadlock against other callers of connectOne(). If qcmRef gets set in another +// thread while we're inside the synchronized block, that does no harm; if we didn't +// take a lock on qcm (because it was null when we sampled it), we won't call +// connectOne() on it. (Use of an AtomicReference is enough to guarantee visibility +// of updates that provably happen in another thread before entering this method.) +QuorumCnxManager qcm = qcmRef.get(); +Object outerLockObject = (qcm != null) ? qcm : QV_LOCK; +synchronized (outerLockObject) { +synchronized (QV_LOCK) { +if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) { +LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() + +". Current version: " + quorumVerifier.getVersion()); +} +// assuming that a version uniquely identifies a configuration, so if +// version is the same, nothing to do here. +if (lastSeenQuorumVerifier != null && +lastSeenQuorumVerifier.getVersion() == qv.getVersion()) { +return; +} +lastSeenQuorumVerifier = qv; -} -// assuming that a version uniquely identifies a configuration, so if -// version is the same, nothing to do here. -if (lastSeenQuorumVerifier != null && -lastSeenQuorumVerifier.getVersion() == qv.getVersion()) { -return; -} -lastSeenQuorumVerifier = qv; -connectNewPeers(); --- End diff -- I had folded it back in so I could see the whole flow with the nested calls to `qcm.connectOne()`. I'll factor it back out with `qcm` as a method parameter (it's important that the `AtomicReference` `qcmRef` not be sampled again inside the `synchronized` block, since we could race against another thread changing the election algorithm). ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235467449 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() { } public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){ -synchronized (QV_LOCK) { -if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) { -LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() + -". Current version: " + quorumVerifier.getVersion()); +// If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm +// and then take QV_LOCK. Take the locks in the same order to ensure that we don't +// deadlock against other callers of connectOne(). If qcmRef gets set in another +// thread while we're inside the synchronized block, that does no harm; if we didn't +// take a lock on qcm (because it was null when we sampled it), we won't call +// connectOne() on it. (Use of an AtomicReference is enough to guarantee visibility +// of updates that provably happen in another thread before entering this method.) +QuorumCnxManager qcm = qcmRef.get(); +Object outerLockObject = (qcm != null) ? qcm : QV_LOCK; --- End diff -- It does. I'm fairly sure that this guarantees that the `QuorumCnxManager` lock can only be taken inside `QV_LOCK` if it's already held outside it. Perhaps we can get confirmation from @castuardo and his team's Distributed System Model Checking tool? ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235463520 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -755,39 +769,55 @@ public void recreateSocketAddresses(long id) { } } -public InetSocketAddress getQuorumAddress(){ -synchronized (QV_LOCK) { -return myQuorumAddr; +InetSocketAddress getQuorumAddress(){ --- End diff -- Good suggestion. Done. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 > `private` should be changed back to `public ` > Jenkins complains about [this](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2685/artifact/patchprocess/trunkJavacWarnings.txt) Okay. It might be better for the tests to be in the same package as the implementation, so these methods can be package-private; but that's out of scope for this PR. I'll change them back to `public`. ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235460740 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() { } public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){ -synchronized (QV_LOCK) { -if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) { -LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() + -". Current version: " + quorumVerifier.getVersion()); +// If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm +// and then take QV_LOCK. Take the locks in the same order to ensure that we don't +// deadlock against other callers of connectOne(). If qcmRef gets set in another +// thread while we're inside the synchronized block, that does no harm; if we didn't +// take a lock on qcm (because it was null when we sampled it), we won't call +// connectOne() on it. (Use of an AtomicReference is enough to guarantee visibility +// of updates that provably happen in another thread before entering this method.) +QuorumCnxManager qcm = qcmRef.get(); +Object outerLockObject = (qcm != null) ? qcm : QV_LOCK; --- End diff -- Yes; I'd like to get confirmation from @castuardo and his team that this fully solves the lock nesting problem. But if you look at the flow of the code, I think you'll see that the only way _this_ code can take the lock on a {{QuorumCnxManager}} is via {{qcm.connectOne()}}. Since Java locks are reentrant, we're safe ensuring that the nesting is always qcm -> QV_LOCK -> qcm -> QV_LOCK if qcm is non-null and QV_LOCK -> QV_LOCK if qcm is null. ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
Github user mkedwards commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/707#discussion_r235458260 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java --- @@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() { } public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){ -synchronized (QV_LOCK) { -if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) { -LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() + -". Current version: " + quorumVerifier.getVersion()); +// If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm +// and then take QV_LOCK. Take the locks in the same order to ensure that we don't +// deadlock against other callers of connectOne(). If qcmRef gets set in another +// thread while we're inside the synchronized block, that does no harm; if we didn't +// take a lock on qcm (because it was null when we sampled it), we won't call +// connectOne() on it. (Use of an AtomicReference is enough to guarantee visibility +// of updates that provably happen in another thread before entering this method.) +QuorumCnxManager qcm = qcmRef.get(); +Object outerLockObject = (qcm != null) ? qcm : QV_LOCK; +synchronized (outerLockObject) { +synchronized (QV_LOCK) { +if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) { +LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() + +". Current version: " + quorumVerifier.getVersion()); +} +// assuming that a version uniquely identifies a configuration, so if +// version is the same, nothing to do here. +if (lastSeenQuorumVerifier != null && +lastSeenQuorumVerifier.getVersion() == qv.getVersion()) { +return; +} +lastSeenQuorumVerifier = qv; -} -// assuming that a version uniquely identifies a configuration, so if -// version is the same, nothing to do here. -if (lastSeenQuorumVerifier != null && -lastSeenQuorumVerifier.getVersion() == qv.getVersion()) { -return; -} -lastSeenQuorumVerifier = qv; -connectNewPeers(); --- End diff -- The {{qcm}} that it references is now a local variable rather than an instance data member. If you'd like, I could factor it back out, with {{qcm}} as an argument to the function. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 Note that this patch is targeted at the 3.5 release branch, which I'm trying to help whip into shape for testing in our environment (where 3.4.13 is already in production use). What else can I do to help https://builds.apache.org/job/ZooKeeper_branch35_jdk8/ get to green? ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 I have pushed an incremental change which should address the lock inversion scenario more completely, along with condition-variable-style waiting for the quorum/election address to become non-null. (This proved a great deal simpler than trying to use a ReentrantReadWriteLock to solve that problem.) Please re-review. ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 The deadlock, as I understand it, comes from an attempt to take `QV_LOCK` in order to access `myElectionAddress` in thread A, while holding the lock on the `QuorumCnxManager` object; meanwhile, thread B, which holds the `QV_LOCK` while `connectOne()` is running, tries to take the lock on the `QuorumCnxManager` object, producing a deadly embrace. From the Jira: ``` [junit] java.lang.Thread.State: BLOCKED [junit] at org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686) [junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265) [junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445) [junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369) [junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642) [junit] [junit] java.lang.Thread.State: BLOCKED [junit] at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472) [junit] at org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438) [junit] at org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471) [junit] at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520) [junit] at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88) [junit] at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133) ``` This patch removes the synchronization on `QV_LOCK` from the address accessor paths, where its current effect seems primarily to ensure cross-thread visibility; the memory barriers involved in setting/fetching through an `AtomicReference` are equally effective at this. Wrapping a single `AtomicReference` around all three addresses ensures that you don't get interleaved sets when two threads each try to update the address fields -- although in the code as I read it, callers are holding the `QV_LOCK` anyway. I can't yet say that I've analyzed all other paths that take `QV_LOCK`, but I'm fairly sure that if https://github.com/apache/zookeeper/pull/247 would work, this would work at least as well, while at least partially addressing concerns about getting "outdated" state in read paths. (If this is a serious concern, then presumably we need a separate reader/writer lock that is taken during creation of the QuorumPeer and not released until it reaches a "fully configu red" state for the first time.) ---
[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/707 There appear to be test failures in LeaderBeanTest: ``` [exec] [junit] Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 0.619 sec, Thread: 8, Class: org.apache.zookeeper.server.quorum.LeaderBeanTest [exec] [junit] Test org.apache.zookeeper.server.quorum.LeaderBeanTest FAILED ``` Can anyone help me read the tea leaves? ---
[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...
GitHub user mkedwards opened a pull request: https://github.com/apache/zookeeper/pull/707 [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference You can merge this pull request into a Git repository by running: $ git pull https://github.com/mkedwards/zookeeper branch-3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #707 commit 3694a4e31eef9b85de59112c22ab163452610743 Author: Michael Edwards Date: 2018-11-20T13:33:09Z [ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses in an AddressTuple held through an AtomicReference ---