Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4133 to look at the new patch set (#4). Change subject: Cleanup/refactor tracking of consensus watermarks ...................................................................... Cleanup/refactor tracking of consensus watermarks This is a fairly invasive cleanup/refactor to consensus in preparation for propagating replication watermarks from the leader to the followers. The key item here is to address a long-standing TODO that the PeerMessageQueue should itself calculate the commit index, rather than delegate that job to ReplicaState. The flow used to be something like this on the leader upon receiving a response from a peer that it has replicated some new operations: - PeerMessageQueue::ResponseFromPeer - updates peer last_received - calls PeerMessageQueue::AdvanceQueueWatermark - updates PeerMessageQueue::majority_replicated to the new majority watermark - calls NotifyObserversOfMajorityReplOp --> submits an async task to the raft threadpool - for each observer (best I can tell there is currently always only one, but that's a separate issue) -- observer->UpdateMajorityReplicated() to grab committed index (the observer is always the RaftConsensus instance) --- RaftConsensus::UpdateMajorityReplicated ---- ReplicaState::UpdateMajorityReplicatedUnlocked ----- this checks the condition that the majority-replicated watermark is within the leader's term before advancing the commit index. ----- calls AdvanceCommittedIndexUnlocked ------- commits stuff, etc ----- sets *committed_index out-param -- PeerMessageQueue picks up this out-param and sets the new value within the PeerMessageQueue This was very difficult to follow, given that the "observer" in fact was passing data back to the "notifier" through an out-parameter. Moreover, it wasn't obvious to see which class was "authoritative" for the tracking and advancement of watermarks. The new design makes the PeerMessageQueue itself fully responsible for calculating when the commit index can advance. This required a fairly invasive surgery because the PeerMessageQueue itself doesn't remember the terms of pending operations, and thus the watermarks became indexes instead of OpIds. This is itself also a simplification -- we previously were very messy about using the word "index" when in fact we had an OpId type. In almost every case, we only used the index of those OpIds. In the Raft paper itself, these watermarks are just indexes and not OpIds, so this change brings us closer to the implementation described in the original design. Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8 --- M docs/release_notes.adoc M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/integration-tests/raft_consensus-itest.cc 15 files changed, 400 insertions(+), 569 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/4 -- To view, visit http://gerrit.cloudera.org:8080/4133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>