[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 8: -Verified

Build Started http://104.196.14.100/job/kudu-gerrit/3267/

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Todd Lipcon (Code Review)
Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4133

to look at the new patch set (#8).

Change subject: Cleanup/refactor tracking of consensus watermarks
..

Cleanup/refactor tracking of consensus watermarks

This is a fairly invasive cleanup/refactor to consensus in preparation
for propagating replication watermarks from the leader to the followers.

The key item here is to address a long-standing TODO that the
PeerMessageQueue should itself calculate the commit index, rather than
delegate that job to ReplicaState. The flow used to be something like
this on the leader upon receiving a response from a peer that it has
replicated some new operations:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark
   - calls NotifyObserversOfMajorityReplOp
   --> submits an async task to the raft threadpool
- for each observer (best I can tell there is currently always
  only one, but that's a separate issue)
-- observer->UpdateMajorityReplicated() to grab committed index
   (the observer is always the RaftConsensus instance)
--- RaftConsensus::UpdateMajorityReplicated
 ReplicaState::UpdateMajorityReplicatedUnlocked
- this checks the condition that the majority-replicated
  watermark is within the leader's term before advancing the
  commit index.
- calls AdvanceCommittedIndexUnlocked
--- commits stuff, etc
- sets *committed_index out-param
-- PeerMessageQueue picks up this out-param and sets the new
   value within the PeerMessageQueue

This was very difficult to follow, given that the "observer" in fact was
passing data back to the "notifier" through an out-parameter. Moreover,
it wasn't obvious to see which class was "authoritative" for the
tracking and advancement of watermarks.

The new design makes the PeerMessageQueue itself fully responsible for
calculating when the commit index can advance. The flow is now the
following, with '!' at the beginning of the line to denote where it is
changed from before:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark.
!  - If the majority-replicated watermark is within the current leader
!term, advances the commit index (stored in QueueState)
!- notifies observers of the new commit index
!  - calls observer->NotifyCommitIndex() with the new commit index
!- RaftConsensus::NotifyCommitIndex()
!  - ReplicaState::AdvanceCommittedIndexUnlocked
 - commits stuff, etc.

This required a fairly invasive surgery because the PeerMessageQueue
itself doesn't remember the terms of pending operations, and thus the
watermarks became indexes instead of OpIds.

This is itself also a simplification -- we previously were very messy
about using the word "index" when in fact we had an OpId type. In almost
every case, we only used the index of those OpIds. In the Raft paper
itself, these watermarks are just indexes and not OpIds, so this change
brings us closer to the implementation described in the original design.

I looped raft-consensus-itest.MultiThreadedInsertWithFailovers 800
times and the whole suite 500 times and got no failures[1].

[1] http://dist-test.cloudera.org//job?job_id=todd.1473280738.28972

Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
---
M docs/release_notes.adoc
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
15 files changed, 464 insertions(+), 592 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 

[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 8:

Build Started http://104.196.14.100/job/kudu-gerrit/3266/

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 7: Code-Review+1

(1 comment)

lgtm, just a syntax nit

http://gerrit.cloudera.org:8080/#/c/4133/7/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 703:   && queue_state_.majority_replicated_index > 
queue_state_.committed_index) {
nit: ampersands not consistent (put them at the end of the line)


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4133/7/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 267:   // changes, so the refactor isn't trivial.
> seems like premature optimization -- will make the code longer, and this ha
k, fair enough


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4133/7/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 267:   // changes, so the refactor isn't trivial.
> final q: do you think it might be worth it to just look at the last message
seems like premature optimization -- will make the code longer, and this has 
never shown up as a hot spot


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4133/6/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 459:   for (const PeersMap::value_type& peer : peers_map_) {
> yea, agreed that it's not a super impactful optimization, but more about co
well the all replicated is always supposed to go back when we add a new node, 
so maybe we can change the name, but its nature seems to be necessary :) agreed 
on the follow on.


http://gerrit.cloudera.org:8080/#/c/4133/7/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 267:   // changes, so the refactor isn't trivial.
final q: do you think it might be worth it to just look at the last message, 
like we did before and then reverse iterate only if the term actually changed?


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 7:

Build Started http://104.196.14.100/job/kudu-gerrit/3257/

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Todd Lipcon (Code Review)
Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4133

to look at the new patch set (#7).

Change subject: Cleanup/refactor tracking of consensus watermarks
..

Cleanup/refactor tracking of consensus watermarks

This is a fairly invasive cleanup/refactor to consensus in preparation
for propagating replication watermarks from the leader to the followers.

The key item here is to address a long-standing TODO that the
PeerMessageQueue should itself calculate the commit index, rather than
delegate that job to ReplicaState. The flow used to be something like
this on the leader upon receiving a response from a peer that it has
replicated some new operations:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark
   - calls NotifyObserversOfMajorityReplOp
   --> submits an async task to the raft threadpool
- for each observer (best I can tell there is currently always
  only one, but that's a separate issue)
-- observer->UpdateMajorityReplicated() to grab committed index
   (the observer is always the RaftConsensus instance)
--- RaftConsensus::UpdateMajorityReplicated
 ReplicaState::UpdateMajorityReplicatedUnlocked
- this checks the condition that the majority-replicated
  watermark is within the leader's term before advancing the
  commit index.
- calls AdvanceCommittedIndexUnlocked
--- commits stuff, etc
- sets *committed_index out-param
-- PeerMessageQueue picks up this out-param and sets the new
   value within the PeerMessageQueue

This was very difficult to follow, given that the "observer" in fact was
passing data back to the "notifier" through an out-parameter. Moreover,
it wasn't obvious to see which class was "authoritative" for the
tracking and advancement of watermarks.

The new design makes the PeerMessageQueue itself fully responsible for
calculating when the commit index can advance. The flow is now the
following, with '!' at the beginning of the line to denote where it is
changed from before:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark.
!  - If the majority-replicated watermark is within the current leader
!term, advances the commit index (stored in QueueState)
!- notifies observers of the new commit index
!  - calls observer->NotifyCommitIndex() with the new commit index
!- RaftConsensus::NotifyCommitIndex()
!  - ReplicaState::AdvanceCommittedIndexUnlocked
 - commits stuff, etc.

This required a fairly invasive surgery because the PeerMessageQueue
itself doesn't remember the terms of pending operations, and thus the
watermarks became indexes instead of OpIds.

This is itself also a simplification -- we previously were very messy
about using the word "index" when in fact we had an OpId type. In almost
every case, we only used the index of those OpIds. In the Raft paper
itself, these watermarks are just indexes and not OpIds, so this change
brings us closer to the implementation described in the original design.

I looped raft-consensus-itest.MultiThreadedInsertWithFailovers 800
times and the whole suite 500 times and got no failures[1].

[1] http://dist-test.cloudera.org//job?job_id=todd.1473280738.28972

Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
---
M docs/release_notes.adoc
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
15 files changed, 464 insertions(+), 592 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 

[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4133/6/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 137:   if (current_term != queue_state_.current_term) {
> are we verifying that the term doesn't go back? should we?
sure, I'll give that a try. I think we are, elsewhere, but maybe not here.


Line 263:   // TODO: this code seems wrong. When we are leader, we've 
already bumped
> yeah, this seems like it's making things a bit messier (not that they were 
oops, I actually meant to remove this TODO in the current revision.

I agree it would be nice to do the SetLeaderAndAppendNoOp thing, but we're 
currently abusing SetLeaderMode to handle refreshing the tracked peers on a 
configuration change, and I didn't want to open the can of worms of trying to 
add more refactoring on top. So, I just kept this as is. I'll update the TODO, 
though, to suggest this refactor.


Line 459: // TODO: The fact that we only consider peers whose last exchange 
was
> this is technically true, but not sure how important/impactful it is. If A 
yea, agreed that it's not a super impactful optimization, but more about code 
cleanliness and the fact that our "watermarks" actually move backwards, not 
just forwards (and thus aren't technically watermarks!)

But, would like to address it as a follow-on.


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-07 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4133/6/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 137:   if (current_term != queue_state_.current_term) {
are we verifying that the term doesn't go back? should we?


Line 263:   // TODO: this code seems wrong. When we are leader, we've 
already bumped
yeah, this seems like it's making things a bit messier (not that they were much 
more nice and neat before). Would it be possible to create a new SetLeaderMode 
like method that also took the first round to be appended and that would set 
both the first index and do the SetLeaderMode stuff? Something like 
SetLeaderAndAppendNoOp then we would always snoop for both index and term (even 
f if you don't like snooping, at least we'd be consistent). This may not work 
great with the replicating peers though (right now we set leader mode, update 
the peers config and only then append the no op)


Line 459: // TODO: The fact that we only consider peers whose last exchange 
was
this is technically true, but not sure how important/impactful it is. If A goes 
down we wouldn't be able to advance the index anyway, and we'll be able to 
again only once we've removed A and added a new node. So yeah, once we've 
stopped talking to a peer we stop moving the all_replicated and maybe we could 
advance it a little more, but IMO we should address it only if it becomes 
pathological. (not saying anything about the last_received, which I agree we 
should make match raft proper. iirc we actually tried to do that and hit some 
roadblock)


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-06 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/3252/

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-06 Thread Todd Lipcon (Code Review)
Hello David Ribeiro Alves, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4133

to look at the new patch set (#6).

Change subject: Cleanup/refactor tracking of consensus watermarks
..

Cleanup/refactor tracking of consensus watermarks

This is a fairly invasive cleanup/refactor to consensus in preparation
for propagating replication watermarks from the leader to the followers.

The key item here is to address a long-standing TODO that the
PeerMessageQueue should itself calculate the commit index, rather than
delegate that job to ReplicaState. The flow used to be something like
this on the leader upon receiving a response from a peer that it has
replicated some new operations:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark
   - calls NotifyObserversOfMajorityReplOp
   --> submits an async task to the raft threadpool
- for each observer (best I can tell there is currently always
  only one, but that's a separate issue)
-- observer->UpdateMajorityReplicated() to grab committed index
   (the observer is always the RaftConsensus instance)
--- RaftConsensus::UpdateMajorityReplicated
 ReplicaState::UpdateMajorityReplicatedUnlocked
- this checks the condition that the majority-replicated
  watermark is within the leader's term before advancing the
  commit index.
- calls AdvanceCommittedIndexUnlocked
--- commits stuff, etc
- sets *committed_index out-param
-- PeerMessageQueue picks up this out-param and sets the new
   value within the PeerMessageQueue

This was very difficult to follow, given that the "observer" in fact was
passing data back to the "notifier" through an out-parameter. Moreover,
it wasn't obvious to see which class was "authoritative" for the
tracking and advancement of watermarks.

The new design makes the PeerMessageQueue itself fully responsible for
calculating when the commit index can advance. The flow is now the
following, with '!' at the beginning of the line to denote where it is
changed from before:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark.
!  - If the majority-replicated watermark is within the current leader
!term, advances the commit index (stored in QueueState)
!- notifies observers of the new commit index
!  - calls observer->NotifyCommitIndex() with the new commit index
!- RaftConsensus::NotifyCommitIndex()
!  - ReplicaState::AdvanceCommittedIndexUnlocked
 - commits stuff, etc.

This required a fairly invasive surgery because the PeerMessageQueue
itself doesn't remember the terms of pending operations, and thus the
watermarks became indexes instead of OpIds.

This is itself also a simplification -- we previously were very messy
about using the word "index" when in fact we had an OpId type. In almost
every case, we only used the index of those OpIds. In the Raft paper
itself, these watermarks are just indexes and not OpIds, so this change
brings us closer to the implementation described in the original design.

I looped raft-consensus-itest.MultiThreadedInsertWithFailovers 800
times and the whole suite 500 times and got no failures.

Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
---
M docs/release_notes.adoc
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
15 files changed, 463 insertions(+), 592 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-01 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 311:   ASSERT_EQ(queue_->GetMajorityReplicatedIndexForTests(), 0);
> I disagree in this case, since it's obvious what the numeric constant is re
ok


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 137:   // TODO: reset first_index_in_term? make it optional, reset here 
and set on the
> hrm, having some trouble with this TODO actually. Trying to clean up the wa
sounds reasonable


Line 447:   // TODO: why is this only if successful? shouldn't we still use
> added this TODO in this patch but I think it's a pre-existing issue. Any th
would have into the history to remind myself why this is here, but I seem to 
recall this being an important thing


Line 499:   boost::optional updated_commit_index;
> given we already use boost::optional pretty often elsewhere, I dont think t
ok


Line 636: // If we're the leader, we are in charge of computing the new
> hrm, why do you say we might not be the leader? We're still inside the lock
queue is decoupled from state so isn't it possible that we lost an election but 
the queue doesn't know about it yet (i.e. that the call to change the queue's 
mode is queued behind this thread competing for the lock?


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1267:   // TODO: interrogate queue rather than state?
> trying to reduce the amount of duplication of responsibility (ideally could
that makes sense in general, though I'm not totally sure what it looks like


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-01 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4133

to look at the new patch set (#5).

Change subject: Cleanup/refactor tracking of consensus watermarks
..

Cleanup/refactor tracking of consensus watermarks

This is a fairly invasive cleanup/refactor to consensus in preparation
for propagating replication watermarks from the leader to the followers.

The key item here is to address a long-standing TODO that the
PeerMessageQueue should itself calculate the commit index, rather than
delegate that job to ReplicaState. The flow used to be something like
this on the leader upon receiving a response from a peer that it has
replicated some new operations:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark
   - calls NotifyObserversOfMajorityReplOp
   --> submits an async task to the raft threadpool
- for each observer (best I can tell there is currently always
  only one, but that's a separate issue)
-- observer->UpdateMajorityReplicated() to grab committed index
   (the observer is always the RaftConsensus instance)
--- RaftConsensus::UpdateMajorityReplicated
 ReplicaState::UpdateMajorityReplicatedUnlocked
- this checks the condition that the majority-replicated
  watermark is within the leader's term before advancing the
  commit index.
- calls AdvanceCommittedIndexUnlocked
--- commits stuff, etc
- sets *committed_index out-param
-- PeerMessageQueue picks up this out-param and sets the new
   value within the PeerMessageQueue

This was very difficult to follow, given that the "observer" in fact was
passing data back to the "notifier" through an out-parameter. Moreover,
it wasn't obvious to see which class was "authoritative" for the
tracking and advancement of watermarks.

The new design makes the PeerMessageQueue itself fully responsible for
calculating when the commit index can advance. This required a fairly
invasive surgery because the PeerMessageQueue itself doesn't remember
the terms of pending operations, and thus the watermarks became indexes
instead of OpIds.

This is itself also a simplification -- we previously were very messy
about using the word "index" when in fact we had an OpId type. In almost
every case, we only used the index of those OpIds. In the Raft paper
itself, these watermarks are just indexes and not OpIds, so this change
brings us closer to the implementation described in the original design.

Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
---
M docs/release_notes.adoc
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
15 files changed, 422 insertions(+), 591 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-01 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/3183/

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-09-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 4:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 333:   // The id of the last committed operation in the configuration. 
This is the
> s/id/index/
Done


Line 334:   // id of the last operation the leader deemed committed from a 
consensus
> s/id/index/
Done


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_peers-test.cc
File src/kudu/consensus/consensus_peers-test.cc:

Line 142:   message_queue_->SetLeaderMode(0, 0, BuildRaftConfigPBForTests(3));
> Maybe better to use kMinimumTerm and kMinimumOpIdIndex from opid_util.h her
Done


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 202:   queue_->SetLeaderMode(0, MinimumOpId().term(), 
BuildRaftConfigPBForTests(2));
> use MinimumOpId.index() of the equivalent constant
switched to using kMinimumOpIdIndex and kMinimumTerm here and elsewhere per 
Mike's suggestion


Line 242:   queue_->SetLeaderMode(0, MinimumOpId().term(), 
BuildRaftConfigPBForTests(2));
> same
Done


Line 305:   queue_->SetLeaderMode(0, MinimumOpId().term(), 
BuildRaftConfigPBForTests(3));
> same
Done


Line 311:   ASSERT_EQ(queue_->GetMajorityReplicatedIndexForTests(), 0);
> same here and below
I disagree in this case, since it's obvious what the numeric constant is 
representing (given the other side of the equality assertion). In the 
"SetLeaderMode" case it's nice because you don't necessarily remember the order 
of the two integer arguments.

Also I think it's important to realize that it's 0 as an integer, rather than 
maybe thinking it's 1 (which is actually the first term and index that any peer 
claims, right?)


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 112:   queue_state_.current_term = 0;
> use constants here and elsewhere?
actually per my reply elsewhere I think 0's clearer


Line 137:   // TODO: reset first_index_in_term? make it optional, reset here 
and set on the
hrm, having some trouble with this TODO actually. Trying to clean up the way 
the queue tracks current_term as leader, and I think some of our tests may be 
abusing the queue. In particular:

static inline void AppendReplicateMessagesToQueue(
PeerMessageQueue* queue,
const scoped_refptr& clock,
int first,
int count,
int payload_size = 0) {

  for (int i = first; i < first + count; i++) {
int term = i / 7;
int index = i;
CHECK_OK(queue->AppendOperation(make_scoped_refptr_replicate(
CreateDummyReplicate(term, index, clock->Now(), 
payload_size).release(;
  }
}

this code is appending ops from different terms to a leader-mode queue, which I 
don't think would ever happen in real life, right? Any time the term advances, 
the leader would step down and then have to call SetLeaderMode again for the 
next term?

Mike/David: If this seems right to you guys, I'll make the change so that the 
queue doesn't track current term by "snooping" on ops anymore, but instead 
takes a 'first_opid_in_term' parameter to SetLeaderMode, which will be the opid 
of the NO_OP.


Line 447:   // TODO: why is this only if successful? shouldn't we still use
added this TODO in this patch but I think it's a pre-existing issue. Any 
thoughts on this issue? otherwise I"m inclined to leave this as a TODO here


Line 499:   boost::optional updated_commit_index;
> do we really need to introduce a boost dependency here? since this is just 
given we already use boost::optional pretty often elsewhere, I dont think the 
"introduce a dependency" argument carries that much weight. I like optional 
because it's impossible to accidentally miss the "special" value (eg 
accidentally compare vs the -1 value and decide something is above this commit 
index, or whatever)


Line 636: // If we're the leader, we are in charge of computing the new
> I think it would make sense to document somewhere that calculating the comm
hrm, why do you say we might not be the leader? We're still inside the lock 
here, so we know we're in leader mode.


Line 658:   if (queue_state_.majority_replicated_index >= 
queue_state_.first_index_in_term) {
> maybe rename this var to "first_index_in_current_term"
Done


Line 669: VLOG_WITH_PREFIX_UNLOCKED(2) << "Commit index advanced to " 
<< *updated_commit_index;
> add the "before" ci
Done


Line 676: (peer->last_known_committed_idx < 
queue_state_.committed_index);
> nit: rename the suffix of this var to _index for consistency
Done


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

Line 139:   // of the commit 

[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-08-31 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue-test.cc
File src/kudu/consensus/consensus_queue-test.cc:

Line 202:   queue_->SetLeaderMode(0, MinimumOpId().term(), 
BuildRaftConfigPBForTests(2));
use MinimumOpId.index() of the equivalent constant


Line 242:   queue_->SetLeaderMode(0, MinimumOpId().term(), 
BuildRaftConfigPBForTests(2));
same


Line 305:   queue_->SetLeaderMode(0, MinimumOpId().term(), 
BuildRaftConfigPBForTests(3));
same


Line 311:   ASSERT_EQ(queue_->GetMajorityReplicatedIndexForTests(), 0);
same here and below


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 112:   queue_state_.current_term = 0;
use constants here and elsewhere?


Line 499:   boost::optional updated_commit_index;
do we really need to introduce a boost dependency here? since this is just an 
int why not just have a NOT_UPDATED = -1 constant?


Line 636: // If we're the leader, we are in charge of computing the new
I think it would make sense to document somewhere that calculating the commit 
index here is safe, independently of whether we're leader or not (we might not 
be by the time we reach here) since the commit index is a property of the 
shared state machine and not something that is "decided" by the leader.


Line 658:   if (queue_state_.majority_replicated_index >= 
queue_state_.first_index_in_term) {
maybe rename this var to "first_index_in_current_term"


Line 669: VLOG_WITH_PREFIX_UNLOCKED(2) << "Commit index advanced to " 
<< *updated_commit_index;
add the "before" ci


Line 676: (peer->last_known_committed_idx < 
queue_state_.committed_index);
nit: rename the suffix of this var to _index for consistency


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

Line 139:   // of the commit index from the follower code.
having some trouble parsing this sentence: "updating the queue of the commit 
index"


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/raft_consensus-test.cc
File src/kudu/consensus/raft_consensus-test.cc:

Line 388:   LOG(WARNING) << 
"";
??


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 1267:   // TODO: interrogate queue rather than state?
what would be the difference/advantage?


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

Line 171: 
nit: extra line


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/raft_consensus_state.cc
File src/kudu/consensus/raft_consensus_state.cc:

Line 575: LOG(FATAL) << "TODO: remove me once convinced that this 
doesnt happen in tests";
add an actual error message that people can parse also add prefix


Line 583: LOG_WITH_PREFIX_UNLOCKED(WARNING) << "given start C_O=" << 
committed_op
more info in the log statement or remove


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

Line 2543:   req.set_committed_index(0);
use constant


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-08-30 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 4:

(4 comments)

I am still working my way through this patch. There are several TODOs, not sure 
if you know the answers to those yet. But I will investigate when I pick up 
again on reviewing the logic changes in this patch.

http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 333:   // The id of the last committed operation in the configuration. 
This is the
s/id/index/


Line 334:   // id of the last operation the leader deemed committed from a 
consensus
s/id/index/


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_peers-test.cc
File src/kudu/consensus/consensus_peers-test.cc:

Line 142:   message_queue_->SetLeaderMode(0, 0, BuildRaftConfigPBForTests(3));
Maybe better to use kMinimumTerm and kMinimumOpIdIndex from opid_util.h here 
and elsewhere


http://gerrit.cloudera.org:8080/#/c/4133/4/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

Line 415:   // TODO: doc me
todo


-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-08-29 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4133

to look at the new patch set (#4).

Change subject: Cleanup/refactor tracking of consensus watermarks
..

Cleanup/refactor tracking of consensus watermarks

This is a fairly invasive cleanup/refactor to consensus in preparation
for propagating replication watermarks from the leader to the followers.

The key item here is to address a long-standing TODO that the
PeerMessageQueue should itself calculate the commit index, rather than
delegate that job to ReplicaState. The flow used to be something like
this on the leader upon receiving a response from a peer that it has
replicated some new operations:

 - PeerMessageQueue::ResponseFromPeer
   - updates peer last_received
   - calls PeerMessageQueue::AdvanceQueueWatermark
 - updates PeerMessageQueue::majority_replicated to the new majority
   watermark
   - calls NotifyObserversOfMajorityReplOp
   --> submits an async task to the raft threadpool
- for each observer (best I can tell there is currently always
  only one, but that's a separate issue)
-- observer->UpdateMajorityReplicated() to grab committed index
   (the observer is always the RaftConsensus instance)
--- RaftConsensus::UpdateMajorityReplicated
 ReplicaState::UpdateMajorityReplicatedUnlocked
- this checks the condition that the majority-replicated
  watermark is within the leader's term before advancing the
  commit index.
- calls AdvanceCommittedIndexUnlocked
--- commits stuff, etc
- sets *committed_index out-param
-- PeerMessageQueue picks up this out-param and sets the new
   value within the PeerMessageQueue

This was very difficult to follow, given that the "observer" in fact was
passing data back to the "notifier" through an out-parameter. Moreover,
it wasn't obvious to see which class was "authoritative" for the
tracking and advancement of watermarks.

The new design makes the PeerMessageQueue itself fully responsible for
calculating when the commit index can advance. This required a fairly
invasive surgery because the PeerMessageQueue itself doesn't remember
the terms of pending operations, and thus the watermarks became indexes
instead of OpIds.

This is itself also a simplification -- we previously were very messy
about using the word "index" when in fact we had an OpId type. In almost
every case, we only used the index of those OpIds. In the Raft paper
itself, these watermarks are just indexes and not OpIds, so this change
brings us closer to the implementation described in the original design.

Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
---
M docs/release_notes.adoc
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/integration-tests/raft_consensus-itest.cc
15 files changed, 400 insertions(+), 569 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Cleanup/refactor tracking of consensus watermarks

2016-08-29 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Cleanup/refactor tracking of consensus watermarks
..


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/3145/

-- 
To view, visit http://gerrit.cloudera.org:8080/4133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No