Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet ......................................................................
Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 218: tablet_id_ > sure, removed the comparison check, relying on master to report leader info But why not get the leader info from the consensus info too? Why go to two different places to get the combined set of information? That just opens the door for discrepancies between the two. PS1, Line 230: "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); > Sorry for not being clear before. Lemme try to explain although I haven't r Now I see what you mean; the number of attempts in AssertEventually is never actually incremented. That's a pretty serious problem; can you fix that, and rebase this patch on top of the fix? No unit test needed. As for the correct use of AssertEventually, let's get to the bottom of it as it should work. I can imagine something like this working: string current_leader = ... int64_t current_term = ... AssertEventually([&]() { <do leader step down> GetInfoFromConsensus(&new_leader, &new_term) ASSERT_NE(new_leader, current_leader) ASSERT_GT(new_term, current_term) }); It's possible that the broken backoff behavior in AssertEventually is responsible, which is why I suggested you fix the bug and rebase this patch on top of it. If not, use gdb or temporary logging to figure out what's going on. http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS3, Line 221: // Grab the default settings for heartbeat timeouts. : string hb_timeout_str; : ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms", : &hb_timeout_str)); : int32 hb_timeout; : ASSERT_TRUE(safe_strto32(hb_timeout_str, &hb_timeout)); : string max_missed_hb_periods_str; : ASSERT_TRUE( : google::GetCommandLineOption( : "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); : double max_missed_hb_periods; : ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str, : &max_missed_hb_periods)); You don't need to go through all this trouble (and you certainly don't need to do it in a loop; it won't change during the loop). Just access FLAGS_raft_heartbeat_interval_ms and FLAGS_leader_failure_max_missed_heartbeat_periods directly, and DECLARE_* them at the top of the file. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
