Todd Lipcon has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4133/5//COMMIT_MSG
Commit Message:

Line 45: The new design makes the PeerMessageQueue itself fully responsible for
> Can you please provide the equivalent diagram for the new design, or add it
Done


Line 47: invasive surgery because the PeerMessageQueue itself doesn't remember
> hmm, it is a bug that the peer message queue doesn't "remember" the terms o
nah, they aren't important for the logic of raft (note that the "volatile state 
on leaders" described in the raft paper is all index-based)


Line 51: This is itself also a simplification -- we previously were very messy
> TBH I would rather have this change split out from the rest of the patch, s
yea it's quite difficult to separate the two, sorry


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 447:   // following:
> would have into the history to remind myself why this is here, but I seem t
OK, I'll leave this TODO here and maybe we can try removing the if() later. It 
seems to me removing it might be beneficial in a case like:

- local peer has replicated opid 100
- remote peer A has replicated opid 100
- remote peer B has replication opid 10 and is catching up
- then remote peer 'A' goes down

Here we'd start getting 'is_last_exchange_successful == false' for peer 'A'. In 
that case, the 'all_replicated_watermark', which requires 3 peers, would not be 
updateable, even once we've replicated peer 'B' up to opid 100. It would get 
"stuck" at 10.

I added this scenario to the TODO comment.


Line 636:     }
> queue is decoupled from state so isn't it possible that we lost an election
added a comment, hopefully sorts it out


http://gerrit.cloudera.org:8080/#/c/4133/5/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 261:       // TODO: this code seems wrong. When we are leader, we've 
already bumped
> seems wrong to me too...
planning on reworking this a bit, so that the leader term change is set more 
explicitly


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to