Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet

Patch Set 3:


TFTR Adar, pls see responses inline, also pls see why my jenkins are still 
failing. I see that they are still using ...llvm-3.8.0.src/ as per the logs.

File src/kudu/tools/kudu-admin-test.cc:

PS1, Line 204: string c
> The CHECK_* macros come from glog, and they all do the same thing regardles
Thank you for that awesome clarity on when to use them. This has gone in my 
favorite notes :)

Line 218:       tablet_id_
> Unfortunately a test environment with leader failure detection isn't determ
sure, removed the comparison check, relying on master to report leader info 
now. I have kept the getconsensus intact to grab the new term info.

PS1, Line 230:             "leader_failure_max_missed_heartbeat_periods",
             :             &max_missed_hb_periods_str));
> Couple things:
Sorry for not being clear before. Lemme try to explain although I haven't root 
caused the issue myself yet. When I used AssertEventually the GetConsensus 
comes out around 1.5 secs or so with a new_term. But what is not clear to me 
yet is, if I sleep for ~1.5 secs to allow the consensus to settle down, I see 
eventually a new_leader emerging which won't happen if I keep using 
GetConsensus via AssertEventually(despite 20 retries). Logs are confimring that 
GetConsensus indeed is detecting the new leader correctly, it's just that 
new_leader happens to be the current_leader everytime. 

I want to see why I see same leader gets elected again and again unless I allow 
some more time window before calling GetConsensus in which case I see 
new_leader coming up after few retries. With the fix of '++attempts' however, I 
eventually see another leader coming up so thats kinda hiding the problem. I 
think that ++attempts fix should have been there to keep the sleep time 
monotonically increasing.

PS1, Line 241:         ASSERT_OK(GetTermFromConsensus(tservers, tablet_id_,
             :                                        &new_ter
> Again, the leader can change whenever, so this doesn't seem safe.

Line 264: 
> The actual check (not in the CHECK sense of the word, but the comparison an
yeah, missed out removing here i think. done.

File src/kudu/tools/kudu-admin-test.cc:

PS2, Line 251: AGS_num_
> Still got some leftover CHECK_* statements that should be converted to thei
Done, thanks for explaining the distinctions earlier.

File src/kudu/tools/tool_action_tablet.cc:

PS1, Line 274: 
> You missed these comments down here.
Ugh... sorry, for some reason I totally missed out scrolling down thinking that 
only change I had introduced was in the routine above :)

PS1, Line 278: 
> I admit, this one was my bad, but can you consolidate all of these descript

PS1, Line 280: 
> Likewise for this, it's copied extensively.

To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to