[GitHub] zookeeper issue #714: Zookeeper 1818 (on branch-3.5)

2018-12-12 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/714
  
I updated this PR to match the changes made to #703 during review.  
Anything else I can do to help?


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-12-07 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/707


---


[GitHub] zookeeper issue #715: Rollup of blocker/critical fixes for 3.5 (to trigger C...

2018-12-04 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/715
  
This is the branch where my colleague Eric Schwartz and I are trying to get
to a "zero flaky tests" standard.  (It started as the branch where I worked
on ZOOKEEPER-2778, and grew from there.)  The pull request is not really a
pull request any more; I was just still interested in what Jenkins makes of
the state of the branch.  If anybody wants to go over with me what we've
done in this branch and why, I'd be *delighted* to discuss it.  (We're down
to one test that needs a bit of work to restore the functional equivalent
of the log-grepping verification we ripped out so we could switch to the
log4j2 implementation of slf4j.)

On Tue, Dec 4, 2018, 6:54 AM Norbert Kalmar  I'm not sure what this is. Do you have a corresponding jira?
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/zookeeper/pull/715#issuecomment-444127695>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AAzdsgFEKqCiQzKnr-K5s-2m6qTG7M2fks5u1oyzgaJpZM4YvOj3>
> .
>



---


[GitHub] zookeeper issue #719: [ZOOKEEPER-2778] QuorumPeer: address potential reconfi...

2018-11-28 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/719
  
I believe #707 has @eolivelli's LGTM now.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-27 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
That makes perfect sense.  Thanks for keeping an eye on this.


---


[GitHub] zookeeper issue #719: [ZOOKEEPER-2778] QuorumPeer: address potential reconfi...

2018-11-27 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/719
  
Does this look fit to merge?  I'm not a committer, so I'll need help 
getting it merged.  (The version targeted to branch-3.5 is #707.)


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-27 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
@eolivelli , have I addressed all your concerns?  Does this look to you 
like a complete fix to ZOOKEEPER-2778?


---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-27 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
retest this please


---


[GitHub] zookeeper issue #724: ZOOKEEPER-2488: Synchronized access to shuttingDownLE ...

2018-11-26 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/724
  
retest this please


---


[GitHub] zookeeper pull request #724: ZOOKEEPER-2488: Synchronized access to shutting...

2018-11-25 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/724

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer

I think this can be reviewed separately from the (functionally somewhat 
related) changes in #707.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-2488

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #724


commit 33c1f0bad6e76a4cc67ab4ee44e08fa8f1ac5449
Author: Michael Edwards 
Date:   2018-11-21T23:09:55Z

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer




---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
retest this please


---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
Fixed that (and tested it locally before pushing this time).


---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
Except, er, that seems to make the tests fail.  😝  Investigating.


---


[GitHub] zookeeper pull request #723: ZOOKEEPER-3202: Add timing margin to improve re...

2018-11-25 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/723

ZOOKEEPER-3202: Add timing margin to improve reliability of 
testClientServerSSL()

Allowing just 5 seconds for 3 quorum peers to start and elect a leader is a 
bit tight, at least when running 4 test processes in parallel inside a (Linux) 
Docker container on a (non-Linux) laptop.  Add up to 10 seconds of extra margin.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3202

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #723


commit dcedaf9ad7c756b1852a9bfca4cfbc3313f1a0fc
Author: Michael Edwards 
Date:   2018-11-26T06:31:23Z

ZOOKEEPER-3202: Add timing margin to improve reliability of 
testClientServerSSL()




---


[GitHub] zookeeper issue #721: ZOOKEEPER-3046: wait for clients to reconnect after re...

2018-11-25 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/721
  
I think so too!  Done 😄 


---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/721#discussion_r236097804
  
--- Diff: 
zookeeper-server/src/test/java/org/apache/zookeeper/test/DisconnectedWatcherTest.java
 ---
@@ -221,6 +228,7 @@ public void testManyChildWatchersAutoReset() throws 
Exception {
 watcher.waitForDisconnected(3);
 startServer();
 watcher.waitForConnected(3);
+watcher1.waitForConnected(3);
--- End diff --

That example is due to an unrelated class of failure, in which the 
Zookeeper server is permanently unreachable (notice that the entire test timed 
out); I think many failure cases of that kind are the product of failure to 
bind() the dynamically-assigned port to the server socket (probably due to 
collisions with other tests running concurrently on the same build host).  I've 
updated the description of this PR to explain what kind of failure it's 
intended to fix.


---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/721

ZOOKEEPER-3046: wait for clients to reconnect after restarting server



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 62e6bca2460b261e7cad204566f7b88a41ab0972
Author: Michael Edwards 
Date:   2018-11-25T22:23:29Z

ZOOKEEPER-3046: wait for clients to reconnect after restarting server




---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/721


---


[GitHub] zookeeper pull request #721: ZOOKEEPER-3046: wait for clients to reconnect a...

2018-11-25 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/721

ZOOKEEPER-3046: wait for clients to reconnect after restarting server



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-3046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 62e6bca2460b261e7cad204566f7b88a41ab0972
Author: Michael Edwards 
Date:   2018-11-25T22:23:29Z

ZOOKEEPER-3046: wait for clients to reconnect after restarting server




---


[GitHub] zookeeper issue #528: ZOOKEEPER-3034 Facing issues while building from sourc...

2018-11-24 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/528
  
I'd like to see this brought back.  Not cool to drop autoconf when the 
CMake setup can't build shared libraries.  From 
`zookeeper-client/zookeeper-client-c/README`:
```
Current limitations of the CMake build system include lack of Solaris 
support,
no shared library option, no explicitly exported symbols (all are exported 
by
default), no versions on the libraries, and no documentation generation.
```


---


[GitHub] zookeeper issue #719: [ZOOKEEPER-2778] QuorumPeer: address potential reconfi...

2018-11-23 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/719
  
I don't understand the test failures.  Are these unrelated problems in 
master?


---


[GitHub] zookeeper pull request #719: [ZOOKEEPER-2778] QuorumPeer: address potential ...

2018-11-23 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/719

[ZOOKEEPER-2778] QuorumPeer: address potential reconfiguration deadlocks

* QuorumPeer: encapsulate quorum/election/client addresses in an 
AddressTuple held through an AtomicReference
* QuorumPeer/QuorumCnxManager: address deadlock and visibility issues
* QuorumPeer: add fast path for already-non-null quorum/election address
* QuorumPeer: fix access to newly private data members from ReconfigTest
* LeaderBeanTest: set up mock QuorumVerifier so that addresses get set
* QuorumPeer: warn when clobbering existing election algorithm
* QuorumPeer: halt old QCM when clobbering existing election algorithm

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-2778-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/719.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #719


commit 53ab259c38ee4d33c8f3ab26cde82992e2074537
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: address potential deadlocks related to 
reconfiguration

* QuorumPeer: encapsulate quorum/election/client addresses in an 
AddressTuple held through an AtomicReference
* QuorumPeer/QuorumCnxManager: address deadlock and visibility issues
* QuorumPeer: add fast path for already-non-null quorum/election address
* QuorumPeer: fix access to newly private data members from ReconfigTest
* LeaderBeanTest: set up mock QuorumVerifier so that addresses get set
* QuorumPeer: warn when clobbering existing election algorithm
* QuorumPeer: halt old QCM when clobbering existing election algorithm




---


[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/715

Rollup of blocker/critical fixes for 3.5 (to trigger CI)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper rollup-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #715


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

commit 4cd10c86519b75521f89e451033dca4869d8d0d1
Author: Michael Edwards 
Date:   2018-11-21T08:53:54Z

[ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and 
visibility issues

commit 03d259bae3b744dc494022698fa843f6cf35e7ed
Author: Michael Edwards 
Date:   2018-11-21T09:01:45Z

[ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null 
quorum/election address

commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e
Author: Michael Edwards 
Date:   2018-11-21T17:13:14Z

[ZOOKEEPER-2778] QuorumPeer: fixes from code review

commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1
Author: Michael Edwards 
Date:   2018-11-21T17:19:44Z

[ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from 
ReconfigTest

commit bbeeebf87391ef642059c4b3b65592c361a2ab4e
Author: Michael Edwards 
Date:   2018-11-21T19:48:49Z

[ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that 
addresses get set

commit 5038179e217a0b80805fbb6780a6fec024f9e29d
Author: Michael Edwards 
Date:   2018-11-22T19:21:19Z

[ZOOKEEPER-2778] QuorumPeer: warn when clobbering existing election 
algorithm

commit 78df674c4413561336e3435eaa692a9ec2ede0ca
Author: Michael Edwards 
Date:   2018-11-22T19:29:27Z

[ZOOKEEPER-2778] QuorumPeer: halt old QCM when clobbering existing election 
algorithm

commit d6898072947cc908ca802fa542e636d615ac29f0
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

ZOOKEEPER-1818: Correctly handle potentially inconsistent 
zxid/electionEpoch and peerEpoch during leader election

commit ab30d5c8fb64b9fc93bd5e691855f6c17abd3d17
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)

commit e05ae82437e8a7124c5501772ce188d29ac9bfc2
Author: Michael Edwards 
Date:   2018-11-21T23:09:55Z

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer

commit bf685a609c991ac4642da42ab41c7d43d6e35246
Author: Andor Molnar 
Date:   2018-11-19T16:25:52Z

ZOOKEEPER-3193. Refactor SaslAuthFail test to use single class. Use 
CountDownLatch to sync with watcher.

commit b964efbaa363fc873205acba4aefc76940a358f1
Author: Michael Edwards 
Date:   2018-11-21T21:33:01Z

Bump library versions, fix 'ant package-native tar' targets

commit bba8aebae6c153d21913f7391f2201c148f24f1e
Author: Michael Edwards 
Date:   2018-11-22T08:38:28Z

Add OneLinerFormatter to get semi-verbose logs with captured stdout/stderr

commit 984189537df99a6c55b058362047d272a5c59145
Author: Michael Edwards 
Date:   2018-11-22T12:07:38Z

ZOOKEEPER-3198: plumb BindException up to Leader and its peers

* Normalize bind failures to java.net.BindException in both implementation 
of ServerCnxnFactory
* Throw BindException out as far as the caller of QuorumPeer.processReconfig
* Catch BindException in tryToCommit(); the transaction needs to be 
committed




---


[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/715


---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/718


---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/718


---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/718


---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/717

ZOOKEEPER-1636: cleanup completion list of a failed multi request

(from Thawan Kooburat)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #717


commit 40eb35efb56ddf4ef06243ec433d57200e9029f5
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/717


---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
GitHub user mkedwards reopened a pull request:

https://github.com/apache/zookeeper/pull/717

ZOOKEEPER-1636: cleanup completion list of a failed multi request

(from Thawan Kooburat)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #717


commit 40eb35efb56ddf4ef06243ec433d57200e9029f5
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
Github user mkedwards closed the pull request at:

https://github.com/apache/zookeeper/pull/717


---


[GitHub] zookeeper pull request #718: ZOOKEEPER-1818: Correctly handle potentially in...

2018-11-22 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/718

ZOOKEEPER-1818:  Correctly handle potentially inconsistent 
zxid/electionEpoch…

… and peerEpoch during leader election.  (This is Fangmin's patch, I'm 
just firing off a CI build against current master.)  See #714 for the backport 
to branch-3.5.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #718


commit 73250fad37d5c1f461a0a60e7ed88c710b1b3c96
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

Correctly handle potential inconsitent zxid/electionEpoch and peerEpoch 
during leader election




---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-11-22 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/717

ZOOKEEPER-1636: cleanup completion list of a failed multi request

(from Thawan Kooburat)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636-for-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #717


commit 40eb35efb56ddf4ef06243ec433d57200e9029f5
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-22 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235808196
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -108,7 +109,11 @@
 LocalPeerBean jmxLocalPeerBean;
 private Map jmxRemotePeerBean;
 LeaderElectionBean jmxLeaderElectionBean;
-private QuorumCnxManager qcm;
+
+// The QuorumCnxManager is held through an AtomicReference to ensure 
cross-thread visibility
+// of updates; see the implementation comment at 
setLastSeenQuorumVerifier().
+private AtomicReference qcmRef = new 
AtomicReference<>();
--- End diff --

I am hoping to reduce the need for synchronized blocks in follow-up 
changes.  For now, I added a simple use of getAndSet() to detect multiple calls 
to createElectionAlgorithm() and ensure that the QCM that's being dropped on 
the floor gets halted first.


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-22 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235807260
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -121,6 +126,18 @@
  */
 private ZKDatabase zkDb;
 
+public static class AddressTuple {
--- End diff --

Good idea.  Done.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
This PR now has the extraneous changes removed and is green in CI.  Please 
re-review at your convenience.


---


[GitHub] zookeeper pull request #715: Rollup of blocker/critical fixes for 3.5 (to tr...

2018-11-22 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/715

Rollup of blocker/critical fixes for 3.5 (to trigger CI)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper rollup-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #715


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

commit 4cd10c86519b75521f89e451033dca4869d8d0d1
Author: Michael Edwards 
Date:   2018-11-21T08:53:54Z

[ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and 
visibility issues

commit 03d259bae3b744dc494022698fa843f6cf35e7ed
Author: Michael Edwards 
Date:   2018-11-21T09:01:45Z

[ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null 
quorum/election address

commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e
Author: Michael Edwards 
Date:   2018-11-21T17:13:14Z

[ZOOKEEPER-2778] QuorumPeer: fixes from code review

commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1
Author: Michael Edwards 
Date:   2018-11-21T17:19:44Z

[ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from 
ReconfigTest

commit bbeeebf87391ef642059c4b3b65592c361a2ab4e
Author: Michael Edwards 
Date:   2018-11-21T19:48:49Z

[ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that 
addresses get set

commit 0e2b571d452306ae151b106e05d0511b09a237d3
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)

commit 3683a1b451fa334ba51227db327557966d319a4d
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

ZOOKEEPER-1818: Correctly handle potentially inconsistent 
zxid/electionEpoch and peerEpoch during leader election

commit a1b56505671b756448f7c0126de764cd25633e2f
Author: Michael Edwards 
Date:   2018-11-21T21:33:01Z

Bump library versions, fix 'ant package-native tar' targets

commit 3dfd49f6bfea357c838e21d5a2e4f1486ed753e9
Author: Michael Edwards 
Date:   2018-11-21T23:09:55Z

ZOOKEEPER-2488: Synchronized access to shuttingDownLE in QuorumPeer

commit 1cbaec427037b8ac10004e8198821da524949843
Author: Andor Molnar 
Date:   2018-11-19T16:25:52Z

ZOOKEEPER-3193. Refactor SaslAuthFail test to use single class. Use 
CountDownLatch to sync with watcher.

commit 0d4e7839eaab8b3222be31a705f9edded1ad98a5
Author: Michael Edwards 
Date:   2018-11-22T08:38:28Z

Add OneLinerFormatter to get semi-verbose logs with captured stdout/stderr

commit 3b19067c2a5b213e53f6e2ce7638b76250076fe6
Author: Michael Edwards 
Date:   2018-11-22T12:07:38Z

Throw BindException out as far as the caller of QuorumPeer.processReconfig




---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
After groveling through 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2708/consoleText,
 I think this may be a contributing factor to flaky tests:
```
 [exec] [junit] 2018-11-22 00:18:30,336 [myid:] - INFO  
[QuorumPeerListener:QuorumCnxManager$Listener@884] - My election bind port: 
localhost/127.0.0.1:19459
 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - INFO  
[QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@493]
 - binding to port localhost/127.0.0.1:19466
 [exec] [junit] 2018-11-22 00:18:30,337 [myid:] - ERROR 
[QuorumPeer[myid=1](plain=/127.0.0.1:19457)(secure=disabled):NettyServerCnxnFactory@497]
 - Error while reconfiguring
 [exec] [junit] org.jboss.netty.channel.ChannelException: Failed to 
bind to: localhost/127.0.0.1:19466
 [exec] [junit] at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
 [exec] [junit] at 
org.apache.zookeeper.server.NettyServerCnxnFactory.reconfigure(NettyServerCnxnFactory.java:494)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.processReconfig(QuorumPeer.java:1947)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:154)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
 [exec] [junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1263)
 [exec] [junit] Caused by: java.net.BindException: Address already 
in use
 [exec] [junit] at sun.nio.ch.Net.bind0(Native Method)
 [exec] [junit] at sun.nio.ch.Net.bind(Net.java:433)
 [exec] [junit] at sun.nio.ch.Net.bind(Net.java:425)
 [exec] [junit] at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
 [exec] [junit] at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
 [exec] [junit] at 
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
 [exec] [junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 [exec] [junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 [exec] [junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 [exec] [junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 [exec] [junit] at java.lang.Thread.run(Thread.java:748)
```


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-22 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
Thanks maoling; I'm familiar with the procedure, just hadn't gotten around 
to juggling branches.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
Whoops, just pushed some unrelated stuff to this branch.  The code state 
that fixes just ZOOKEEPER-2778 is 
https://github.com/apache/zookeeper/pull/707/commits/bbeeebf87391ef642059c4b3b65592c361a2ab4e


---


[GitHub] zookeeper pull request #714: Zookeeper 1818 (on top of #713 to fix ZOOKEEPER...

2018-11-21 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/714

Zookeeper 1818 (on top of #713 to fix ZOOKEEPER-2778 and ZOOKEEPER-1636 for 
a green build)

Rebasing Fangmin Lyu's patch from 
https://github.com/lvfangmin/zookeeper/tree/ZOOKEEPER-1818 onto the 3.5 branch 
(cumulative with other PRs in flight).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1818

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/714.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #714


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

commit 4cd10c86519b75521f89e451033dca4869d8d0d1
Author: Michael Edwards 
Date:   2018-11-21T08:53:54Z

[ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and 
visibility issues

commit 03d259bae3b744dc494022698fa843f6cf35e7ed
Author: Michael Edwards 
Date:   2018-11-21T09:01:45Z

[ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null 
quorum/election address

commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e
Author: Michael Edwards 
Date:   2018-11-21T17:13:14Z

[ZOOKEEPER-2778] QuorumPeer: fixes from code review

commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1
Author: Michael Edwards 
Date:   2018-11-21T17:19:44Z

[ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from 
ReconfigTest

commit bbeeebf87391ef642059c4b3b65592c361a2ab4e
Author: Michael Edwards 
Date:   2018-11-21T19:48:49Z

[ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that 
addresses get set

commit 0e2b571d452306ae151b106e05d0511b09a237d3
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)

commit 7f286c46d505c3300a15de9d1f4849a4da84e6d3
Author: Fangmin Lyu 
Date:   2018-11-15T17:46:51Z

ZOOKEEPER-1818: Correctly handle potential inconsitent zxid/electionEpoch 
and peerEpoch during leader election




---


[GitHub] zookeeper pull request #713: Zookeeper 1636 (on top of #707 to fix ZOOKEEPER...

2018-11-21 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/713

Zookeeper 1636 (on top of #707 to fix ZOOKEEPER-2778 for a green build)

(Rebased Thawan Kooburat's patch on current layout of 3.5 branch)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper ZOOKEEPER-1636

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #713


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

commit 4cd10c86519b75521f89e451033dca4869d8d0d1
Author: Michael Edwards 
Date:   2018-11-21T08:53:54Z

[ZOOKEEPER-2778] QuorumPeer/QuorumCnxManager: address deadlock and 
visibility issues

commit 03d259bae3b744dc494022698fa843f6cf35e7ed
Author: Michael Edwards 
Date:   2018-11-21T09:01:45Z

[ZOOKEEPER-2778] QuorumPeer: add fast path for already-non-null 
quorum/election address

commit 0531d9c8e6a44ec531a4d8ad667307d9859bef7e
Author: Michael Edwards 
Date:   2018-11-21T17:13:14Z

[ZOOKEEPER-2778] QuorumPeer: fixes from code review

commit 9701f0576f53d1859d3584d0bb9730c89eb57ac1
Author: Michael Edwards 
Date:   2018-11-21T17:19:44Z

[ZOOKEEPER-2778] QuorumPeer: fix access to newly private data members from 
ReconfigTest

commit bbeeebf87391ef642059c4b3b65592c361a2ab4e
Author: Michael Edwards 
Date:   2018-11-21T19:48:49Z

[ZOOKEEPER-2778] LeaderBeanTest: set up mock QuorumVerifier so that 
addresses get set

commit 0e2b571d452306ae151b106e05d0511b09a237d3
Author: Michael Edwards 
Date:   2018-11-21T20:31:16Z

ZOOKEEPER-1636: cleanup completion list of a failed multi request (from 
Thawan Kooburat)




---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
The `LeaderBeanTest` failure was real-ish; the mock `QuorumVerifier` wasn't 
set up properly for address resolution, so it would NPE (without the new 
condvar-style wait for addresses to be set) or hang (with it).  I just pushed a 
fix to the test.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
@afine , does this approach address your concerns with #247?


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
Review comments addressed.  It's a little bit weird that `ant test` doesn't 
recompile the tests when the implementation has changed; an `ant clean`/`ant 
test` cycle caught another test failure-to-compile due to more restrictive 
access on data members.


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-21 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235470421
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() {
 }
 
 public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean 
writeToDisk){
-synchronized (QV_LOCK) {
-if (lastSeenQuorumVerifier != null && 
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
-LOG.error("setLastSeenQuorumVerifier called with stale 
config " + qv.getVersion() +
-". Current version: " + 
quorumVerifier.getVersion());
+// If qcm is non-null, we may call qcm.connectOne(), which will 
take the lock on qcm
+// and then take QV_LOCK.  Take the locks in the same order to 
ensure that we don't
+// deadlock against other callers of connectOne().  If qcmRef gets 
set in another
+// thread while we're inside the synchronized block, that does no 
harm; if we didn't
+// take a lock on qcm (because it was null when we sampled it), we 
won't call
+// connectOne() on it.  (Use of an AtomicReference is enough to 
guarantee visibility
+// of updates that provably happen in another thread before 
entering this method.)
+QuorumCnxManager qcm = qcmRef.get();
+Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
+synchronized (outerLockObject) {
+synchronized (QV_LOCK) {
+if (lastSeenQuorumVerifier != null && 
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
+LOG.error("setLastSeenQuorumVerifier called with stale 
config " + qv.getVersion() +
+". Current version: " + 
quorumVerifier.getVersion());
+}
+// assuming that a version uniquely identifies a 
configuration, so if
+// version is the same, nothing to do here.
+if (lastSeenQuorumVerifier != null &&
+lastSeenQuorumVerifier.getVersion() == 
qv.getVersion()) {
+return;
+}
+lastSeenQuorumVerifier = qv;
 
-}
-// assuming that a version uniquely identifies a 
configuration, so if
-// version is the same, nothing to do here.
-if (lastSeenQuorumVerifier != null &&
-lastSeenQuorumVerifier.getVersion() == 
qv.getVersion()) {
-return;
-}
-lastSeenQuorumVerifier = qv;
-connectNewPeers();
--- End diff --

I had folded it back in so I could see the whole flow with the nested calls 
to `qcm.connectOne()`.  I'll factor it back out with `qcm` as a method 
parameter (it's important that the `AtomicReference` `qcmRef` not be sampled 
again inside the `synchronized` block, since we could race against another 
thread changing the election algorithm).


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-21 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235467449
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() {
 }
 
 public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean 
writeToDisk){
-synchronized (QV_LOCK) {
-if (lastSeenQuorumVerifier != null && 
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
-LOG.error("setLastSeenQuorumVerifier called with stale 
config " + qv.getVersion() +
-". Current version: " + 
quorumVerifier.getVersion());
+// If qcm is non-null, we may call qcm.connectOne(), which will 
take the lock on qcm
+// and then take QV_LOCK.  Take the locks in the same order to 
ensure that we don't
+// deadlock against other callers of connectOne().  If qcmRef gets 
set in another
+// thread while we're inside the synchronized block, that does no 
harm; if we didn't
+// take a lock on qcm (because it was null when we sampled it), we 
won't call
+// connectOne() on it.  (Use of an AtomicReference is enough to 
guarantee visibility
+// of updates that provably happen in another thread before 
entering this method.)
+QuorumCnxManager qcm = qcmRef.get();
+Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
--- End diff --

It does.  I'm fairly sure that this guarantees that the `QuorumCnxManager` 
lock can only be taken inside `QV_LOCK` if it's already held outside it.  
Perhaps we can get confirmation from @castuardo and his team's Distributed 
System Model Checking tool?


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-21 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235463520
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -755,39 +769,55 @@ public void recreateSocketAddresses(long id) {
 }
 }
 
-public InetSocketAddress getQuorumAddress(){
-synchronized (QV_LOCK) {
-return myQuorumAddr;
+InetSocketAddress getQuorumAddress(){
--- End diff --

Good suggestion.  Done.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
> `private` should be changed back to `public `
> Jenkins complains about 
[this](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2685/artifact/patchprocess/trunkJavacWarnings.txt)

Okay.  It might be better for the tests to be in the same package as the 
implementation, so these methods can be package-private; but that's out of 
scope for this PR.  I'll change them back to `public`.


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-21 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235460740
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -1556,32 +1573,50 @@ public String getNextDynamicConfigFilename() {
 }
 
 public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean 
writeToDisk){
-synchronized (QV_LOCK) {
-if (lastSeenQuorumVerifier != null && 
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
-LOG.error("setLastSeenQuorumVerifier called with stale 
config " + qv.getVersion() +
-". Current version: " + 
quorumVerifier.getVersion());
+// If qcm is non-null, we may call qcm.connectOne(), which will 
take the lock on qcm
+// and then take QV_LOCK.  Take the locks in the same order to 
ensure that we don't
+// deadlock against other callers of connectOne().  If qcmRef gets 
set in another
+// thread while we're inside the synchronized block, that does no 
harm; if we didn't
+// take a lock on qcm (because it was null when we sampled it), we 
won't call
+// connectOne() on it.  (Use of an AtomicReference is enough to 
guarantee visibility
+// of updates that provably happen in another thread before 
entering this method.)
+QuorumCnxManager qcm = qcmRef.get();
+Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
--- End diff --

Yes; I'd like to get confirmation from @castuardo and his team that this 
fully solves the lock nesting problem.  But if you look at the flow of the 
code, I think you'll see that the only way _this_ code can take the lock on a 
{{QuorumCnxManager}} is via {{qcm.connectOne()}}.  Since Java locks are 
reentrant, we're safe ensuring that the nesting is always qcm -> QV_LOCK -> qcm 
-> QV_LOCK if qcm is non-null and QV_LOCK -> QV_LOCK if qcm is null.


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-21 Thread mkedwards
Github user mkedwards commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/707#discussion_r235458260
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
 ---
@@ -1566,32 +1585,50 @@ public String getNextDynamicConfigFilename() {
 }
 
 public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean 
writeToDisk){
-synchronized (QV_LOCK) {
-if (lastSeenQuorumVerifier != null && 
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
-LOG.error("setLastSeenQuorumVerifier called with stale 
config " + qv.getVersion() +
-". Current version: " + 
quorumVerifier.getVersion());
+// If qcm is non-null, we may call qcm.connectOne(), which will 
take the lock on qcm
+// and then take QV_LOCK.  Take the locks in the same order to 
ensure that we don't
+// deadlock against other callers of connectOne().  If qcmRef gets 
set in another
+// thread while we're inside the synchronized block, that does no 
harm; if we didn't
+// take a lock on qcm (because it was null when we sampled it), we 
won't call
+// connectOne() on it.  (Use of an AtomicReference is enough to 
guarantee visibility
+// of updates that provably happen in another thread before 
entering this method.)
+QuorumCnxManager qcm = qcmRef.get();
+Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
+synchronized (outerLockObject) {
+synchronized (QV_LOCK) {
+if (lastSeenQuorumVerifier != null && 
lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
+LOG.error("setLastSeenQuorumVerifier called with stale 
config " + qv.getVersion() +
+". Current version: " + 
quorumVerifier.getVersion());
+}
+// assuming that a version uniquely identifies a 
configuration, so if
+// version is the same, nothing to do here.
+if (lastSeenQuorumVerifier != null &&
+lastSeenQuorumVerifier.getVersion() == 
qv.getVersion()) {
+return;
+}
+lastSeenQuorumVerifier = qv;
 
-}
-// assuming that a version uniquely identifies a 
configuration, so if
-// version is the same, nothing to do here.
-if (lastSeenQuorumVerifier != null &&
-lastSeenQuorumVerifier.getVersion() == 
qv.getVersion()) {
-return;
-}
-lastSeenQuorumVerifier = qv;
-connectNewPeers();
--- End diff --

The {{qcm}} that it references is now a local variable rather than an 
instance data member.  If you'd like, I could factor it back out, with {{qcm}} 
as an argument to the function.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
Note that this patch is targeted at the 3.5 release branch, which I'm 
trying to help whip into shape for testing in our environment (where 3.4.13 is 
already in production use).  What else can I do to help 
https://builds.apache.org/job/ZooKeeper_branch35_jdk8/ get to green?


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-21 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
I have pushed an incremental change which should address the lock inversion 
scenario more completely, along with condition-variable-style waiting for the 
quorum/election address to become non-null.  (This proved a great deal simpler 
than trying to use a ReentrantReadWriteLock to solve that problem.)  Please 
re-review.


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-20 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
The deadlock, as I understand it, comes from an attempt to take `QV_LOCK` 
in order to access `myElectionAddress` in thread A, while holding the lock on 
the `QuorumCnxManager` object; meanwhile, thread B, which holds the `QV_LOCK` 
while `connectOne()` is running, tries to take the lock on the 
`QuorumCnxManager` object, producing a deadly embrace.  From the Jira:

```
[junit]  java.lang.Thread.State: BLOCKED
[junit] at  
org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
[junit] 


[junit]  java.lang.Thread.State: BLOCKED
[junit] at  
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
[junit] at  
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
[junit] at  
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
[junit] at  
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
```

This patch removes the synchronization on `QV_LOCK` from the address 
accessor paths, where its current effect seems primarily to ensure cross-thread 
visibility; the memory barriers involved in setting/fetching through an 
`AtomicReference` are equally effective at this.  Wrapping a single 
`AtomicReference` around all three addresses ensures that you 
don't get interleaved sets when two threads each try to update the address 
fields -- although in the code as I read it, callers are holding the `QV_LOCK` 
anyway.  I can't yet say that I've analyzed all other paths that take 
`QV_LOCK`, but I'm fairly sure that if 
https://github.com/apache/zookeeper/pull/247 would work, this would work at 
least as well, while at least partially addressing concerns about getting 
"outdated" state in read paths.  (If this is a serious concern, then presumably 
we need a separate reader/writer lock that is taken during creation of the 
QuorumPeer and not released until it reaches a "fully configu
 red" state for the first time.)


---


[GitHub] zookeeper issue #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

2018-11-20 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/707
  
There appear to be test failures in LeaderBeanTest:
```
 [exec] [junit] Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, 
Time elapsed: 0.619 sec, Thread: 8, Class: 
org.apache.zookeeper.server.quorum.LeaderBeanTest
 [exec] [junit] Test 
org.apache.zookeeper.server.quorum.LeaderBeanTest FAILED
```
Can anyone help me read the tea leaves?


---


[GitHub] zookeeper pull request #707: [ZOOKEEPER-2778] QuorumPeer: encapsulate addres...

2018-11-20 Thread mkedwards
GitHub user mkedwards opened a pull request:

https://github.com/apache/zookeeper/pull/707

[ZOOKEEPER-2778] QuorumPeer: encapsulate addresses

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mkedwards/zookeeper branch-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/707.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #707


commit 3694a4e31eef9b85de59112c22ab163452610743
Author: Michael Edwards 
Date:   2018-11-20T13:33:09Z

[ZOOKEEPER-2778] QuorumPeer: encapsulate quorum/election/client addresses 
in an AddressTuple held through an AtomicReference




---