Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-11-03 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review104946
---

Ship it!


Thanks! LGTM!


src/log/consensus.cpp (line 46)


`static bool isRejectedPromise(...)`

This is a file local helper. Let's make it explicit.



src/log/consensus.cpp (line 57)


Ditto here.



src/tests/log_tests.cpp (lines 718 - 719)


We try to avoid jagness like this. The following might be better:

```
Future recovering1 =
  recover(2, Owned(replica1), network, true);
```


- Jie Yu


On Nov. 2, 2015, 7:42 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Nov. 2, 2015, 7:42 p.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> REVIEWER NOTES:
> 
> * GMock code is ugly, but see notes below.
> * Should we add exponential backoff when retrying coordinator election?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
>   src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
>   src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
>   src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff. On a more realistic LAN configuration, many fewer retries will 
> likely be needed, but we could also use a backoff instead. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-11-03 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Nov. 3, 2015, 7:13 p.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Changes
---

Address code review comments.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

REVIEWER NOTES:

* GMock code is ugly, but see notes below.
* Should we add exponential backoff when retrying coordinator election?


Diffs (updated)
-

  src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
  src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
  src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
  src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff. On a 
more realistic LAN configuration, many fewer retries will likely be needed, but 
we could also use a backoff instead. But the important point is that election 
eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-11-02 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Nov. 2, 2015, 5:41 p.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

REVIEWER NOTES:

* GMock code is ugly, but see notes below.
* Should we add exponential backoff when retrying coordinator election?


Diffs (updated)
-

  src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
  src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
  src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
  src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-11-02 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Nov. 2, 2015, 7:42 p.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

REVIEWER NOTES:

* GMock code is ugly, but see notes below.
* Should we add exponential backoff when retrying coordinator election?


Diffs
-

  src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
  src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
  src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
  src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 

Diff: https://reviews.apache.org/r/39325/diff/


Testing (updated)
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff. On a 
more realistic LAN configuration, many fewer retries will likely be needed, but 
we could also use a backoff instead. But the important point is that election 
eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-11-02 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review104801
---


Patch looks great!

Reviews applied: [39463, 39325]

All tests passed.

- Mesos ReviewBot


On Nov. 2, 2015, 7:42 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Nov. 2, 2015, 7:42 p.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> REVIEWER NOTES:
> 
> * GMock code is ugly, but see notes below.
> * Should we add exponential backoff when retrying coordinator election?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
>   src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
>   src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
>   src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff. On a more realistic LAN configuration, many fewer retries will 
> likely be needed, but we could also use a backoff instead. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-28 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review104389
---


Latest revision includes a new approach to setting up the unit test using GMock 
(big thanks to Tim Chen for helping me figure out the appropriate GMock 
magic!). It is ugly in a different way than the previous approach :) But I 
think overall it is better, and can be refactored shortly.

Before refactoring it, I'd like to explore removing the STARTING -> RECOVERING 
-> VOTING transition. But playing around with doing that, I think it will take 
a bit of thought to make sure we're doing it correctly. So I'd like to leave 
that to a separate review.

- Neil Conway


On Oct. 29, 2015, 4:50 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 29, 2015, 4:50 a.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
>   src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
>   src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
>   src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff; that should probably be fixed, per comments above. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-28 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Oct. 29, 2015, 4:50 a.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Changes
---

Address review comments, new test mock code.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

CODE REVIEW TO DISCUSS / FIX:

* Test mock is incredibly ugly: it works, but we clearly need a better approach
  before committing this. I've been chatting with @tnachen to find a better
  approach but haven't got anything that works yet.

* Should we add a backoff when retrying after a failed coordinator election?

* Should we also send back an "ignored" response if an I/O error occurs?


Diffs (updated)
-

  src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
  src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
  src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
  src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-28 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Oct. 29, 2015, 4:55 a.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description (updated)
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

REVIEWER NOTES:

* GMock code is ugly, but see notes below.
* Should we add exponential backoff when retrying coordinator election?


Diffs
-

  src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
  src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
  src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
  src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-28 Thread Neil Conway


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/tests/log_tests.cpp, line 708
> > 
> >
> > No snake_case please. Also, we try not to use tailing underscore for 
> > member fields if possible.

Re: underscores, the style guide says "Prefer trailing underscores for use as 
member fields (but not required)." -- is that incorrect?


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/log/consensus.cpp, line 334
> > 
> >
> > 'type' is optional. Do you need to check `has_type()`?

As written, I think it is actually safe (type() should not equal 
PromiseResponse::IGNORE if it isn't set), but you're right that checking for it 
explicitly is probably safer and more readable.


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/log/consensus.cpp, line 356
> > 
> >
> > This is a little hard to read. Can you introduce a small helper:
> > ```
> > bool isReject(const Response& response)
> > {
> >   if (response.has_type()) {
> > // new replica
> > return response.type() == PromiseResponse::REJECT;
> >   } else {
> > // old replica
> > return !response.okay();
> >   }
> > }
> > ```
> > 
> > To be consistent, you might wanna introduce `isIgnore` as well.

I didn't introduce isIgnore(): since we already have 2 utility functions (for 
WriteResponse and PromiseResponse), requiring 4 functions seemed like it would 
be a bit much.


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/messages/log.proto, line 142
> > 
> >
> > s/old master/old coordinator/

Isn't "master" accurate, since a non-coordinator might try to pass a promise 
(e.g., during catchup)?


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review103873
---


On Oct. 29, 2015, 4:50 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 29, 2015, 4:50 a.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
>   src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
>   src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
>   src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> 

Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-28 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Oct. 29, 2015, 4:50 a.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description (updated)
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.


Diffs
-

  src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
  src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
  src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
  src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-23 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review103873
---


Looks good to me except the last test. Looks quite ugly to me:( Still thinking 
about better solutions.

The code part looks good to me, we can commit that first (with one more unit 
test to test the coordinator election logic with IGNORE responses).


src/log/consensus.cpp (line 334)


'type' is optional. Do you need to check `has_type()`?



src/log/consensus.cpp (line 356)


This is a little hard to read. Can you introduce a small helper:
```
bool isReject(const Response& response)
{
  if (response.has_type()) {
// new replica
return response.type() == PromiseResponse::REJECT;
  } else {
// old replica
return !response.okay();
  }
}
```

To be consistent, you might wanna introduce `isIgnore` as well.



src/log/coordinator.cpp (line 197)


Please add a CHECK(response.has_type())?



src/messages/log.proto (line 142)


s/old master/old coordinator/



src/messages/log.proto (line 163)


Could you please put 'type' field right below 'okay' field?



src/messages/log.proto (line 213)


Ditto. Please put it right below 'okay' field.



src/tests/log_tests.cpp (line 643)


2 lines apart.



src/tests/log_tests.cpp (line 647)


I think we put `:` in the next line.



src/tests/log_tests.cpp (line 703)


No snake_case please. Also, we try not to use tailing underscore for member 
fields if possible.



src/tests/log_tests.cpp (line 733)


Please add a brief descrption about what this test is doing (instead of 
just a JIRA number).


- Jie Yu


On Oct. 20, 2015, 8:45 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 20, 2015, 8:45 p.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> CODE REVIEW TO DISCUSS / FIX:
> 
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> 
> * Should we add a backoff when retrying after a failed coordinator election?
> 
> * Should we also send back an "ignored" response if an I/O error occurs?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
>   src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
>   src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
>   src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
>   

Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-23 Thread Neil Conway


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > Looks good to me except the last test. Looks quite ugly to me:( Still 
> > thinking about better solutions.
> > 
> > The code part looks good to me, we can commit that first (with one more 
> > unit test to test the coordinator election logic with IGNORE responses).

Thanks for the review!

Re test mock, yes it is awful as written (not intended to be committed). I 
played around with using some GMock magic here: 
https://gist.github.com/neilconway/96a94661e485dcade2ad -- it _should_ work but 
it doesn't compile for some reason (see comment on gist). I'll try again soon, 
although help from someone who knows FutureResult and friends would be helpful.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review103873
---


On Oct. 20, 2015, 8:45 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 20, 2015, 8:45 p.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> CODE REVIEW TO DISCUSS / FIX:
> 
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> 
> * Should we add a backoff when retrying after a failed coordinator election?
> 
> * Should we also send back an "ignored" response if an I/O error occurs?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
>   src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
>   src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
>   src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff; that should probably be fixed, per comments above. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-20 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Oct. 20, 2015, 8:45 p.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

CODE REVIEW TO DISCUSS / FIX:

* Test mock is incredibly ugly: it works, but we clearly need a better approach
  before committing this. I've been chatting with @tnachen to find a better
  approach but haven't got anything that works yet.

* Should we add a backoff when retrying after a failed coordinator election?

* Should we also send back an "ignored" response if an I/O error occurs?


Diffs (updated)
-

  src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
  src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
  src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
  src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-16 Thread Neil Conway


> On Oct. 15, 2015, 12:31 a.m., Jie Yu wrote:
> > src/messages/log.proto, line 148
> > 
> >
> > I am just thinking about the rolling upgrade case. What happens if the 
> > old coordinator receives a response from a new replica. According to the 
> > current semantics, it'll be treated as a NACK. Is that OK? Will that 
> > unnecessarily demote the old coordinator?
> > 
> > If you think that's OK, please add a comment about why it is OK (e.g., 
> > it'll eventually succeed when all replicas/coordinator are updated). If 
> > that's the case, we definitely need to call out this in upgrades.md.
> > 
> > Ditto for the 'ignored' field in WriteResponse.

As I see it, we have three options when the candidate coordinator is old (<= 
0.25) but one or more replicas are new:

1. New replica sends "ignored = true, okay = false", which means that the 
coordinator will view this as a NACK and potentially get demoted.
2. New replica sends "ignored = true, okay = true": clearly this is wrong
3. New replica only sends an "ignored" response if the master is sufficiently 
new; otherwise, it keeps the old behavior of silently ignoring the 
promise/write request

Obviously #2 is a non-starter; I think #1 should be fine, and is preferrable to 
the complexity of having replicas track the coordinator's version.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review102710
---


On Oct. 15, 2015, 1:21 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 15, 2015, 1:21 a.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> CODE REVIEW TO DISCUSS / FIX:
> 
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> 
> * Should we add a backoff when retrying after a failed coordinator election?
> 
> * Should we also send back an "ignored" response if an I/O error occurs?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
>   src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
>   src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
>   src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
>   src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
>   src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry 

Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-14 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

CODE REVIEW TO DISCUSS / FIX:

* Test mock is incredibly ugly: it works, but we clearly need a better approach
  before committing this. I've been chatting with @tnachen to find a better
  approach but haven't got anything that works yet.

* Should we add a backoff when retrying after a failed coordinator election?

* Should we also send back an "ignored" response if an I/O error occurs?


Diffs
-

  src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
  src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
  src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
  src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
  src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
  src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-14 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Oct. 14, 2015, 8:29 p.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Changes
---

Address code review nit.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

CODE REVIEW TO DISCUSS / FIX:

* Test mock is incredibly ugly: it works, but we clearly need a better approach
  before committing this. I've been chatting with @tnachen to find a better
  approach but haven't got anything that works yet.

* Should we add a backoff when retrying after a failed coordinator election?

* Should we also send back an "ignored" response if an I/O error occurs?


Diffs (updated)
-

  src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
  src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
  src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
  src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
  src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
  src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-14 Thread Artem Harutyunyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review102693
---



src/tests/log_tests.cpp (line 730)


micro-nit: period is missing.


- Artem Harutyunyan


On Oct. 14, 2015, 11:33 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 14, 2015, 11:33 a.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> CODE REVIEW TO DISCUSS / FIX:
> 
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> 
> * Should we add a backoff when retrying after a failed coordinator election?
> 
> * Should we also send back an "ignored" response if an I/O error occurs?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
>   src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
>   src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
>   src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
>   src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
>   src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff; that should probably be fixed, per comments above. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-14 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review102710
---


Haven't looked into details yet. Some high level questions below.


src/log/replica.cpp (line 209)


We typically do the cleanup in a separate patch to reduce the size for the 
main patch (easiser for reviewers to review).



src/log/replica.cpp (lines 363 - 364)


Given that we want to handle this in the future, does it make sense to use 
a 'type' field in the response message and deprecate 'okay' field?

Having two boolean field 'okay' and 'ignore' becomes semantically wiered 
when okay = true and ignore = true.

```
message PromiseResponse {
  // To be deprecated.
  required bool okay;
  
  message Type {
ACCEPT,
REJECT,
IGNORE,
ERROR,
  }
  optional Type type;
}
```



src/messages/log.proto (line 148)


I am just thinking about the rolling upgrade case. What happens if the old 
coordinator receives a response from a new replica. According to the current 
semantics, it'll be treated as a NACK. Is that OK? Will that unnecessarily 
demote the old coordinator?

If you think that's OK, please add a comment about why it is OK (e.g., 
it'll eventually succeed when all replicas/coordinator are updated). If that's 
the case, we definitely need to call out this in upgrades.md.

Ditto for the 'ignored' field in WriteResponse.


- Jie Yu


On Oct. 14, 2015, 8:29 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 14, 2015, 8:29 p.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> CODE REVIEW TO DISCUSS / FIX:
> 
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> 
> * Should we add a backoff when retrying after a failed coordinator election?
> 
> * Should we also send back an "ignored" response if an I/O error occurs?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
>   src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
>   src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
>   src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
>   src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
>   src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, 

Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-14 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/
---

(Updated Oct. 15, 2015, 1:21 a.m.)


Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.


Changes
---

Fixed Linux compile errors in test code.


Bugs: MESOS-3280
https://issues.apache.org/jira/browse/MESOS-3280


Repository: mesos


Description
---

MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
and Write requests if they have not finished the recovery protocol yet (because
they can't safely vote on such requests). Hence, if we try to do a Paxos round
while a quorum of nodes have not finished recovering, the Paxos round will never
complete. In particular, this might happen during coordinator election:
coordinator election (which is implemented as performing a full Paxos round)
starts as soon as the candidate coordinator replica has finished the recovery
protocol. If several nodes start concurrently, a quorum of those nodes might
still be executing the recovery protocol, and hence the coordinator will never
be elected.

To address this, add "ignored" responses to the Promise and Write sub-protocols:
if a proposer sees a quorum of "ignored" responses to a promise or write request
it has issued, it knows the request will never succeed.  When used for
coordinator election, the current coding will retry immediately (without a
backoff).

Note that replicas will still silently drop promise/write requests if another
kind of problem occurs (e.g., an I/O error prevents reading/writing log
data). We might consider changing this, although it will require some thought:
e.g., if a replica's disk is broken, sending an "ignored" message on every
request might flood the network.

CODE REVIEW TO DISCUSS / FIX:

* Test mock is incredibly ugly: it works, but we clearly need a better approach
  before committing this. I've been chatting with @tnachen to find a better
  approach but haven't got anything that works yet.

* Should we add a backoff when retrying after a failed coordinator election?

* Should we also send back an "ignored" response if an I/O error occurs?


Diffs (updated)
-

  src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
  src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
  src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
  src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
  src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
  src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
  src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
  src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 

Diff: https://reviews.apache.org/r/39325/diff/


Testing
---

"make check" passes, including a new test that uses a newly constructed mock to 
ensure we're testing the message schedule described above.

I also wrote a script stops and starts mesos-master in a loop, removing the 
replicated log each time. Without the patch, this occasionally fails with a 
"registry fetch" timeout; with the patch, you can observe several scenarios 
where coordinator election is reborted and retried because a quorum of ignored 
responses is seen. Note that in some cases, we need to retry coordinator 
election up to ~70 times (!), because we don't currently use a backoff; that 
should probably be fixed, per comments above. But the important point is that 
election eventually succeeds and we don't hang.


Thanks,

Neil Conway



Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

2015-10-14 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review102749
---


Patch looks great!

Reviews applied: [39325]

All tests passed.

- Mesos ReviewBot


On Oct. 15, 2015, 1:21 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> ---
> 
> (Updated Oct. 15, 2015, 1:21 a.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
> https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> CODE REVIEW TO DISCUSS / FIX:
> 
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> 
> * Should we add a backoff when retrying after a failed coordinator election?
> 
> * Should we also send back an "ignored" response if an I/O error occurs?
> 
> 
> Diffs
> -
> 
>   src/log/consensus.cpp 59f80d02d1d3c11683631f3fc5f6e923b5ebdf96 
>   src/log/coordinator.cpp 5500bca77f3020e0051010c5c178a20a3c7ad44a 
>   src/log/replica.hpp 33d3f1d9e89035936c67739898e73a06b391fcd0 
>   src/log/replica.cpp 75d39ff56822bf00fce9daf5c1e3befb75f2e039 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/state/log.cpp a75a605a4b0edb8863a3378e2133df7d6eb1cc3d 
>   src/tests/log_tests.cpp f2dd47cfbe73fb18c360a637db009b7d391a782e 
>   src/tests/slave_tests.cpp 10a4fa7eaa8e868ccc6d60ac220d66a4f0a523b4 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> ---
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff; that should probably be fixed, per comments above. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>