Todd Lipcon has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting

Patch Set 6:

File src/kudu/consensus/

PS6, Line 1485: && state_ == kRunning
why is this condition necesary? if we've just been tombstoned it still makes 
sense that we should withhold votes for a period of time, right?

PS6, Line 1509: CheckSafeToVoteUnlocked
the duplicate logic here with CheckSafeToVoteUnlocked is a little unnerving. 
What do you think about essentially inlining its logic here and merging with 
this if condition? eg something of this nature:

switch (state_) {
  case kShutdown:
    return IllegalState...
  case kRunning:
    local_last_logged = GetLatestFromLog()
    if (FLAGS_allow_tombstone_voting && tombstone_last_logged_opid) {
      local_last_logged = tombstone_last_logged;
    } else {
      return IllegalState

I think that might be easier to follow and net less lines of code

PS6, Line 1514: DCHECK(tombstone_last_logged_opid) << LogPrefixUnlocked() << 
"expected last-logged opid";
              :     local_last_logged_opid = *tombstone_last_logged_opid;
if this DCHECK fails, would we crash anyway trying to dereference the 
boost::none? if so, I think we should either use CHECK (so we get a nice crash 
log) or we should try to handle this case by just denying the vote (if we 
aren't 100% sure that this would always be the case, we can always vote 
File src/kudu/consensus/raft_consensus.h:

Line 130:   RaftConsensus(ConsensusOptions options,
can the ctor be made private? (may need ALLOW_MAKE_SHARED)

I don't see any external usage of it but maybe my grepping was bad.

PS6, Line 271: term
> nit: current_term() would be more explicit
I think CurrentTerm or Term would be better since this takes lock_ and that 
lock is often held for long periods of time (thus the call may be quite slow 
compared to your typical getter)

PS6, Line 292: Tombstoned voting
> not super clear what "tombstoned voting" is maybe explain it somewhere?
agreed. Maybe better here to just say that it synchronously transitions to 
kStopped state, and refer to the kStopped definition in the state enum for what 
behavior that entails?

Line 297:   void Shutdown();

Line 339:   enum State {
it would be nice to have a state transition diagram of sorts here. Or 
alternatively, for each state, list out the potential subsequent states and 
what triggers the transition? eg I can't remember now whether we can transition 
back from kStopped to kRunning or not.

PS6, Line 340:     // RaftConsensus has been freshly constructed.
             :     kNew,
is this state ever externally exposed, considering we now have the factory 
method that returns a post-init object? if not, worth noting as much

PS6, Line 343:     // RaftConsensus has been initialized.
             :     kInitialized,
in terms of behavior, how is this state different than Stopped? eg let's say I 
start up a server which has some tablets in tombstoned state, do their 
RaftConsensus instances end up in kInitialized state or kStopped state? Is it 
important to make a distinction?

PS6, Line 391:   // Initializes the RaftConsensus object. This should be called 
             :   // publishing this object to any thread other than the thread 
that invoked
             :   // the constructor.
             :   Status Init();
since the constructor is only invoked by the factory method, maybe not 
necessary to be so explicit?

PS6, Line 647:   // existence of a known last logged opid.
             :   Status CheckSafeToVoteUnlocked(const boost::optional<OpId>& 
similar to David's comment on the implementation: I think it would make more 
sense to pass a 'bool tombstoned_op_id_known' or somesuch, and in the 
implementation add a comment explaining the scenario(s) in which we would _not_ 
know the tombstoned op id and why it's not safe to vote in those scenarios.

Use of a bool instead of the opid makes it more clear that we just care about 
whether we know a valid opid, and aren't deciding here based on its value.

Line 783:   AtomicBool stopped_;
it seems it was already like this, but what's the distinction between this 
'stopped_' member and just checking 'state_ == kStopped'? mind adding a comment 
explaining and perhaps a TODO to get rid of it if possible? (not going to make 
you get rid of it in this patch since it's pre-existing and I dont think you 
made it worse)
File src/kudu/integration-tests/

PS6, Line 80: call Stop() on
you mean 'manually tombstone' right? I was confused by reading this comment 
into thinking you were SIGSTOPping TS1 and then expecting it to vote or 
something, which didn't make much sense.
File src/kudu/integration-tests/

Line 60:  protected:
this seems unused?

Line 182:   // We want to control leader election manually and we only want 2 
would be nice to have another more stressful test that doesn't attempt to 
verify any kind of correctness or coordinate the lifecycle changes with the 
vote requests. I think the coordination here has some effect that reduces the 
probability of races
File src/kudu/tablet/

PS6, Line 142:   consensus_ = std::move(consensus);
             :   set_state(INITIALIZED);
i'm surprised these lines don't need to take lock_? is this method always 
called before publishing a TabletReplica instance?

Line 315:     case STOPPED:
hrm, no case for STOPPING?
File src/kudu/tablet/tablet_replica.h:

PS6, Line 81:   // Initializes RaftConsensus.
            :   Status Init(ThreadPool* raft_pool);
left a similar question in the implementation: is it expected that this be 
called before publishing an instance? might also be nice to add a note that, 
even if this fails, it leaves the instance in a self-consistent state (which is 
in contrast to some other Init methods) so that it can be left registered in 
the replica map, etc?

PS6, Line 96: tombstoned voting
I think it's better to generalize this a bit. Is this _only_ used in the case 
of tombstoning? or can all stopped replicas vote? is it legal to call this 
without first putting the tablet into tombstoned state?

Line 102:   // Shut down this tablet replica permanently.
does this require that Stop be called first? again would be nice to have some 
diagram of the state transitions to refer to for these lifecycle methods, 
perhaps in the class doc
File src/kudu/tserver/

Line 947:   // from the TabletMetadata into RaftConsensus::RequestVote().
> Yeah, it's racy, but it's safe because we're checking after we have a refer
I guess my concern is less with the safety of the race but more with whether 
the state checks are even proper, given they are race.

I did some hacking on the stress test locally to inject some sleeps after 
grabbing the state and managed to catch the case where the tablet transitioned 
from READY to TOMBSTONED. This resulted in a call into 
RaftConsensus::RequestVote() with no last_logged_opid, so it returned:

> Illegal state: RaftConsensus is not running: State = Stopped

given we already need to handle this racy check within the RaftConsensus vote 
code itself, I'm not sure what extra value this is bringing to check it 
additionally here?

BTW, it would be really nice to include some class-level lifecycle comments 
somewhere. I keep forgetting what happens to the TabletReplica, Tablet, 
TabletMetadata, and RaftConsensus instances in the case of a tombstone and a 
re-copy -- which ones are reused, how we ensure that the old ones aren't used 
anymore after the new ones are created, etc. I remember drawing such a diagram 
with you a couple weeks ago, and would be nice to have it in the code for 

PS4, Line 966:   LOG(INFO) << "Received RequestConsensusVote() RPC: " << Secu
> > when would this equal MinimumOpId? if it's never been tombstoned?
> Additionally, just using max() seems wrong in case there was a log truncation 
> involved. We want to use our latest opid state.

If there is truncation then the truncating OpId would be 'higher' than the 
other one, because it has a higher term, right? i.e comparison is lexicographic 
comparison of <term, index> tuples
File src/kudu/tserver/

PS6, Line 979: replica->tablet_metadata()->tombstone_last_logged_opid
maybe cleaner for it to just return an optional<OpId> itself rather than the 
special uninitialized value?
File src/kudu/tserver/

Line 663:   Status s = DeleteTabletData(replica->tablet_metadata(), 
cmeta_manager_, delete_type,
per discussion on Slack, we may need to modify this so that, if there was no 
valid log prior to the deletion, we explicitly _clear_ the last_logged_opid, or 
set to some sentinel value or something to indicate that it's not safe to vote 
due to potentially lost log state

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <>
Gerrit-Reviewer: Alexey Serbin <>
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to