Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet

Patch Set 3:

File src/kudu/tools/

Line 218:       tablet_id_
> sure, removed the comparison check, relying on master to report leader info
But why not get the leader info from the consensus info too? Why go to two 
different places to get the combined set of information? That just opens the 
door for discrepancies between the two.

PS1, Line 230:             "leader_failure_max_missed_heartbeat_periods",
             :             &max_missed_hb_periods_str));
> Sorry for not being clear before. Lemme try to explain although I haven't r
Now I see what you mean; the number of attempts in AssertEventually is never 
actually incremented. That's a pretty serious problem; can you fix that, and 
rebase this patch on top of the fix? No unit test needed.

As for the correct use of AssertEventually, let's get to the bottom of it as it 
should work. I can imagine something like this working:

  string current_leader = ...
  int64_t current_term = ...
  AssertEventually([&]() {
    <do leader step down>
    GetInfoFromConsensus(&new_leader, &new_term)
    ASSERT_NE(new_leader, current_leader)
    ASSERT_GT(new_term, current_term)

It's possible that the broken backoff behavior in AssertEventually is 
responsible, which is why I suggested you fix the bug and rebase this patch on 
top of it. If not, use gdb or temporary logging to figure out what's going on.
File src/kudu/tools/

PS3, Line 221:     // Grab the default settings for heartbeat timeouts.
             :     string hb_timeout_str;
             :                                              &hb_timeout_str));
             :     int32 hb_timeout;
             :     ASSERT_TRUE(safe_strto32(hb_timeout_str, &hb_timeout));
             :     string max_missed_hb_periods_str;
             :     ASSERT_TRUE(
             :         google::GetCommandLineOption(
             :             "leader_failure_max_missed_heartbeat_periods",
             :             &max_missed_hb_periods_str));
             :     double max_missed_hb_periods;
             :     ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str,
             :                             &max_missed_hb_periods));
You don't need to go through all this trouble (and you certainly don't need to 
do it in a loop; it won't change during the loop). Just access 
FLAGS_raft_heartbeat_interval_ms and 
FLAGS_leader_failure_max_missed_heartbeat_periods directly, and  DECLARE_* them 
at the top of the file.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Dinesh Bhat <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to