Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4133

to look at the new patch set (#5).

Change subject: Cleanup/refactor tracking of consensus watermarks
......................................................................

Cleanup/refactor tracking of consensus watermarks

This is a fairly invasive cleanup/refactor to consensus in preparation
for propagating replication watermarks from the leader to the followers.

The key item here is to address a long-standing TODO that the
PeerMessageQueue should itself calculate the commit index, rather than
delegate that job to ReplicaState. The flow used to be something like
this on the leader upon receiving a response from a peer that it has
replicated some new operations:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
     - updates PeerMessageQueue::majority_replicated to the new majority
       watermark
   - calls NotifyObserversOfMajorityReplOp
   --> submits an async task to the raft threadpool
        - for each observer (best I can tell there is currently always
          only one, but that's a separate issue)
        -- observer->UpdateMajorityReplicated() to grab committed index
           (the observer is always the RaftConsensus instance)
        --- RaftConsensus::UpdateMajorityReplicated
        ---- ReplicaState::UpdateMajorityReplicatedUnlocked
        ----- this checks the condition that the majority-replicated
              watermark is within the leader's term before advancing the
              commit index.
        ----- calls AdvanceCommittedIndexUnlocked
        ------- commits stuff, etc
        ----- sets *committed_index out-param
        -- PeerMessageQueue picks up this out-param and sets the new
           value within the PeerMessageQueue

This was very difficult to follow, given that the "observer" in fact was
passing data back to the "notifier" through an out-parameter. Moreover,
it wasn't obvious to see which class was "authoritative" for the
tracking and advancement of watermarks.

The new design makes the PeerMessageQueue itself fully responsible for
calculating when the commit index can advance. This required a fairly
invasive surgery because the PeerMessageQueue itself doesn't remember
the terms of pending operations, and thus the watermarks became indexes
instead of OpIds.

This is itself also a simplification -- we previously were very messy
about using the word "index" when in fact we had an OpId type. In almost
every case, we only used the index of those OpIds. In the Raft paper
itself, these watermarks are just indexes and not OpIds, so this change
brings us closer to the implementation described in the original design.

Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
---
M docs/release_notes.adoc
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
15 files changed, 422 insertions(+), 591 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to