Haven't looked into details yet. Some high level questions below.

src/log/replica.cpp (line 209)

    We typically do the cleanup in a separate patch to reduce the size for the 
main patch (easiser for reviewers to review).

src/log/replica.cpp (lines 363 - 364)

    Given that we want to handle this in the future, does it make sense to use 
a 'type' field in the response message and deprecate 'okay' field?
    Having two boolean field 'okay' and 'ignore' becomes semantically wiered 
when okay = true and ignore = true.
    message PromiseResponse {
      // To be deprecated.
      required bool okay;
      message Type {
      optional Type type;

src/messages/log.proto (line 148)

    I am just thinking about the rolling upgrade case. What happens if the old 
coordinator receives a response from a new replica. According to the current 
semantics, it'll be treated as a NACK. Is that OK? Will that unnecessarily 
demote the old coordinator?
    If you think that's OK, please add a comment about why it is OK (e.g., 
it'll eventually succeed when all replicas/coordinator are updated). If that's 
the case, we definitely need to call out this in upgrades.md.
    Ditto for the 'ignored' field in WriteResponse.

- Jie Yu

On Oct. 14, 2015, 8:29 p.m., Neil Conway wrote:
> (Updated Oct. 14, 2015, 8:29 p.m.)
> Review request for mesos, Jie Yu, Joris Van Remoortere, and Timothy Chen.
> Bugs: MESOS-3280
>     https://issues.apache.org/jira/browse/MESOS-3280
> Repository: mesos
> Description
> -------
> MESOS-3280. The basic problem is that replicas silently ignore inbound Promise
> and Write requests if they have not finished the recovery protocol yet 
> (because
> they can't safely vote on such requests). Hence, if we try to do a Paxos round
> while a quorum of nodes have not finished recovering, the Paxos round will 
> never
> complete. In particular, this might happen during coordinator election:
> coordinator election (which is implemented as performing a full Paxos round)
> starts as soon as the candidate coordinator replica has finished the recovery
> protocol. If several nodes start concurrently, a quorum of those nodes might
> still be executing the recovery protocol, and hence the coordinator will never
> be elected.
> To address this, add "ignored" responses to the Promise and Write 
> sub-protocols:
> if a proposer sees a quorum of "ignored" responses to a promise or write 
> request
> it has issued, it knows the request will never succeed.  When used for
> coordinator election, the current coding will retry immediately (without a
> backoff).
> Note that replicas will still silently drop promise/write requests if another
> kind of problem occurs (e.g., an I/O error prevents reading/writing log
> data). We might consider changing this, although it will require some thought:
> e.g., if a replica's disk is broken, sending an "ignored" message on every
> request might flood the network.
> * Test mock is incredibly ugly: it works, but we clearly need a better 
> approach
>   before committing this. I've been chatting with @tnachen to find a better
>   approach but haven't got anything that works yet.
> * Should we add a backoff when retrying after a failed coordinator election?
> * Should we also send back an "ignored" response if an I/O error occurs?
> Testing
> -------
> "make check" passes, including a new test that uses a newly constructed mock 
> to ensure we're testing the message schedule described above.
> I also wrote a script stops and starts mesos-master in a loop, removing the 
> replicated log each time. Without the patch, this occasionally fails with a 
> "registry fetch" timeout; with the patch, you can observe several scenarios 
> where coordinator election is reborted and retried because a quorum of 
> ignored responses is seen. Note that in some cases, we need to retry 
> coordinator election up to ~70 times (!), because we don't currently use a 
> backoff; that should probably be fixed, per comments above. But the important 
> point is that election eventually succeeds and we don't hang.
> Thanks,
> Neil Conway

