Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

Neil Conway Wed, 28 Oct 2015 21:51:07 -0700


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/tests/log_tests.cpp, line 708
> > <https://reviews.apache.org/r/39325/diff/5/?file=1102092#file1102092line708>
> >
> >     No snake_case please. Also, we try not to use tailing underscore for 
> > member fields if possible.


Re: underscores, the style guide says "Prefer trailing underscores for use as 
member fields (but not required)." -- is that incorrect?


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/log/consensus.cpp, line 334
> > <https://reviews.apache.org/r/39325/diff/5/?file=1102087#file1102087line334>
> >
> >     'type' is optional. Do you need to check `has_type()`?

As written, I think it is actually safe (type() should not equal 
PromiseResponse::IGNORE if it isn't set), but you're right that checking for it 
explicitly is probably safer and more readable.


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/log/consensus.cpp, line 356
> > <https://reviews.apache.org/r/39325/diff/5/?file=1102087#file1102087line356>
> >
> >     This is a little hard to read. Can you introduce a small helper:
> >     ```
> >     bool isReject(const Response& response)
> >     {
> >       if (response.has_type()) {
> >         // new replica
> >         return response.type() == PromiseResponse::REJECT;
> >       } else {
> >         // old replica
> >         return !response.okay();
> >       }
> >     }
> >     ```
> >     
> >     To be consistent, you might wanna introduce `isIgnore` as well.

I didn't introduce isIgnore(): since we already have 2 utility functions (for 
WriteResponse and PromiseResponse), requiring 4 functions seemed like it would 
be a bit much.


> On Oct. 23, 2015, 11:52 p.m., Jie Yu wrote:
> > src/messages/log.proto, line 142
> > <https://reviews.apache.org/r/39325/diff/5/?file=1102091#file1102091line142>
> >
> >     s/old master/old coordinator/

Isn't "master" accurate, since a non-coordinator might try to pass a promise 
(e.g., during catchup)?


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39325/#review103873
-----------------------------------------------------------


On Oct. 29, 2015, 4:50 a.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39325/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2015, 4:50 a.m.)
> 
> 
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> 
> 
> Bugs: MESOS-3280
>     https://issues.apache.org/jira/browse/MESOS-3280
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> 
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> 
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> 
> 
> Diffs
> -----
> 
>   src/log/consensus.cpp 71cd5f39c02f583c1ea53d1ab4569115b0cee2a3 
>   src/log/coordinator.cpp e1df8b01f28447955e1e8bbd764fce3e1a948d88 
>   src/log/replica.hpp 70f415f4f81465dbb7d1e4d4f65c7965cd432415 
>   src/log/replica.cpp 414e1165ae969c1f9492e7b8361c72df2c4fbc16 
>   src/messages/log.proto d73b33f865963292af580945659ad0e800f2a204 
>   src/tests/log_tests.cpp 222a12e10032082e6315ca7b43d3393738edd312 
> 
> Diff: https://reviews.apache.org/r/39325/diff/
> 
> 
> Testing
> -------
> 
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> 
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff; that should probably be fixed, per comments above. But the important 
> point is that election eventually succeeds and we don't hang.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>

Re: Review Request 39325: Fixed race between coordinator election and recovery in replicated log.

Reply via email to