Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet ......................................................................
Patch Set 3: (9 comments) TFTR Adar, pls see responses inline, also pls see why my jenkins are still failing. I see that they are still using ...llvm-3.8.0.src/ as per the logs. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS1, Line 204: string c > The CHECK_* macros come from glog, and they all do the same thing regardles Thank you for that awesome clarity on when to use them. This has gone in my favorite notes :) Line 218: tablet_id_ > Unfortunately a test environment with leader failure detection isn't determ sure, removed the comparison check, relying on master to report leader info now. I have kept the getconsensus intact to grab the new term info. PS1, Line 230: "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); > Couple things: Sorry for not being clear before. Lemme try to explain although I haven't root caused the issue myself yet. When I used AssertEventually the GetConsensus comes out around 1.5 secs or so with a new_term. But what is not clear to me yet is, if I sleep for ~1.5 secs to allow the consensus to settle down, I see eventually a new_leader emerging which won't happen if I keep using GetConsensus via AssertEventually(despite 20 retries). Logs are confimring that GetConsensus indeed is detecting the new leader correctly, it's just that new_leader happens to be the current_leader everytime. I want to see why I see same leader gets elected again and again unless I allow some more time window before calling GetConsensus in which case I see new_leader coming up after few retries. With the fix of '++attempts' however, I eventually see another leader coming up so thats kinda hiding the problem. I think that ++attempts fix should have been there to keep the sleep time monotonically increasing. PS1, Line 241: ASSERT_OK(GetTermFromConsensus(tservers, tablet_id_, : &new_ter > Again, the leader can change whenever, so this doesn't seem safe. Done Line 264: > The actual check (not in the CHECK sense of the word, but the comparison an yeah, missed out removing here i think. done. http://gerrit.cloudera.org:8080/#/c/4533/2/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS2, Line 251: AGS_num_ > Still got some leftover CHECK_* statements that should be converted to thei Done, thanks for explaining the distinctions earlier. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS1, Line 274: : > You missed these comments down here. Ugh... sorry, for some reason I totally missed out scrolling down thinking that only change I had introduced was in the routine above :) PS1, Line 278: : > I admit, this one was my bad, but can you consolidate all of these descript Done PS1, Line 280: > Likewise for this, it's copied extensively. Done -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Dinesh Bhat <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
