Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet ......................................................................
Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 218: tablet_id_ > But why not get the leader info from the consensus info too? Why go to two One reason I was hesitant to use the same routine for leadership info was because GetConsensus collects the details from all the tservers, and I wasn't sure whether one of them could end up returning a stale info depending on when the RPC hits them. In the code below at L234, probability of that RPC hitting when the leader failover is in transition is very likely. If anything, this was mainly due to not knowing the codebase well and staying on safer side(go to master for leader info). PS1, Line 230: "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); > Now I see what you mean; the number of attempts in AssertEventually is neve Well we don't want to include the <do leader step down> under AssertEventually, because it's perfectly possible that even after specified timeout, we never elected a leader which is different than current_leader. ASSERT_NE(new_leader, current_leader) could yield us incorrect results. I will still debug what I observed is just conincidence or there is more to it than just meet the eye. However, the ++attempts kinda ensures that eventually new_leader emerges on the other side. http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS3, Line 221: // Grab the default settings for heartbeat timeouts. : string hb_timeout_str; : ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms", : &hb_timeout_str)); : int32 hb_timeout; : ASSERT_TRUE(safe_strto32(hb_timeout_str, &hb_timeout)); : string max_missed_hb_periods_str; : ASSERT_TRUE( : google::GetCommandLineOption( : "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); : double max_missed_hb_periods; : ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str, : &max_missed_hb_periods)); > You don't need to go through all this trouble (and you certainly don't need Good point, forgot about DECLARE, done. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes