[kudu-CR] [tools] Implement a manual leader step down for a tablet
Alexey Serbin has submitted this change and it was merged. Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Reviewed-on: http://gerrit.cloudera.org:8080/4533 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_remote_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc M src/kudu/tools/tool_action_test.cc 10 files changed, 190 insertions(+), 66 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Alexey Serbin has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/4533/12/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS12, Line 245: to step down > What if that '(if present)' is dropped? If it do not provide essential inf It kinda tells this command is effective only with leader present. However initial inclination was not to compromise on the actual help description to satisfy a test case. But command returns gracefully with a string if leader not present, so I guess we can drop it.. Updated. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#13). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_remote_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc M src/kudu/tools/tool_action_test.cc 10 files changed, 190 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/13 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Alexey Serbin has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 12: (1 comment) > > > Patch Set 11: Patch Set 10 was rebased > > > > Could you also add coverage for the new sub-command into > > ToolTest.TestModeHelp in kudu-tool-test.cc? > > Good idea, added. Please note the "*leader*step*" since I had to > workaround a small issue in how we validate the usage output when > lines are split by '\n' , because it doesn't go well with all the > output. For eg, > $ ./bin/kudu tablet > Usage: ./bin/kudu tablet [] > change_config Change a tablet's Raft configuration > leader_step_down Force the tablet's leader replica (if present) > to step > down > We really want to split the output such that one sub-command > becomes one vector element, instead of splitting them by '\n'. I > still don't know how to fix this, but will try to tackle this > problem in a different change. For now, we could just add another > regex with just "*down", but that doesn't look good, hence doing > away with 'down' in the kTabletModeRegexes at the moment. As an easier workaround, I would consider updating the help string for the new command. I added corresponding comment regarding that. Also, consider addressing Tidy Bot's warning about unused 'using' directive. Otherwise, looks good. http://gerrit.cloudera.org:8080/#/c/4533/12/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS12, Line 245: (if present) What if that '(if present)' is dropped? If it do not provide essential information you want to emphasize here, it would help with the test code command help coverage and would look better in 80-line terminal since no wrapping of lines. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 12: > > Patch Set 11: Patch Set 10 was rebased > > Could you also add coverage for the new sub-command into > ToolTest.TestModeHelp in kudu-tool-test.cc? Good idea, added. Please note the "*leader*step*" since I had to workaround a small issue in how we validate the usage output when lines are split by '\n' , because it doesn't go well with all the output. For eg, $ ./bin/kudu tablet Usage: ./bin/kudu tablet [] change_config Change a tablet's Raft configuration leader_step_down Force the tablet's leader replica (if present) to step down We really want to split the output such that one sub-command becomes one vector element, instead of splitting them by '\n'. I still don't know how to fix this, but will try to tackle this problem in a different change. For now, we could just add another regex with just "*down", but that doesn't look good, hence doing away with 'down' in the kTabletModeRegexes at the moment. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#12). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_remote_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc M src/kudu/tools/tool_action_test.cc 10 files changed, 191 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/12 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Alexey Serbin has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 11: > Patch Set 11: Patch Set 10 was rebased Could you also add coverage for the new sub-command into ToolTest.TestModeHelp in kudu-tool-test.cc? -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#10). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_remote_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc M src/kudu/tools/tool_action_test.cc 9 files changed, 189 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/10 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 9: (5 comments) > (5 comments) > > There is one extra instance of "master_addresses" string literal > and corresponding description in tool_action_test.cc. Could you > updated that to use the common constant as well? TFTR, that instance was missing since I hadn't rebased on top of your latest tool change I think, rebased and updated now. http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: PS9, Line 102: hostname:port > Nit: do you want to add that the 'port' part could be omitted if the master I guess so, although I would like to punt this into another patch after making sure all the RPCs used by the toolset comply to that theory. PS9, Line 104: I > Nit: Is capitalization of 'i' intentional? Sort of. I think we could be bit more consistent here; this was more in the spirit of 'Tablet Server, Kudu Master'.. and so on although comparison isn't really apple-to-apple. We have also used 'block identifier' to counter my argument here. http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: PS9, Line 81: extern > Does 'extern' have to be present due to some compilation issues? I would e Fixed the ordering. About 'extern', I remember looking up cpp guideline for this, since I couldn't find anything, I followed the existing code's policy. Besides, explicit keyword provides a really strong readability. http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS9, Line 175: string tablet_id > nit: const string& tablet_id ? Done Line 246: "to step down") > nit: parameter/line continuation shift is 4 spaces. I am not clear about this. Guideline seems to say : "start the arguments on a new line indented by four spaces and continue at that 4 space indent". This isn't necessarily starting the argument on a newline. I remember following what other files have followed here, for eg take a look at tool_action_*.cc. If needed correction, I can fix them all in this change or a different change. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Alexey Serbin has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 9: (5 comments) There is one extra instance of "master_addresses" string literal and corresponding description in tool_action_test.cc. Could you updated that to use the common constant as well? http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: PS9, Line 102: hostname:port Nit: do you want to add that the 'port' part could be omitted if the master listens at the default port? PS9, Line 104: I Nit: Is capitalization of 'i' intentional? http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: PS9, Line 81: extern Does 'extern' have to be present due to some compilation issues? I would expect it to compile without 'extern'. Also, consider moving those up to precede the first function in the kudu::tools namespace. This is to be more conformant with the declaration order specified by the style guide: https://google.github.io/styleguide/cppguide.html#Declaration_Order http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS9, Line 175: string tablet_id nit: const string& tablet_id ? Line 246: "to step down") nit: parameter/line continuation shift is 4 spaces. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 9: (6 comments) http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS6, Line 222: ctor tservers; > Note that BuildAndStart() wait for all tablets to reach the state of RUNNIN Hmmm, AFAICT the only way we could whitelist that string is by comparing with stderr output. Let me know if I misunderstood your comment. Done that change. http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 227: tablet_id_, > Nit: since the variable 's' is not used anywhere else down the code, consid Done http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: PS8, Line 121: > Is capitalization of 'i' intentional? Yeah, wanted to be consistent with rest of the places. http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: PS8, Line 53: const char* const kTableNameArg = "table_ > It seems this is used across different set of tools. Is it possible to def Done http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS8, Line 67: > I saw this string is defined in different file as well. Does it make sense Done PS8, Line 177: client > const string& here and further when using FindOrDie() ? Done -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#9). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_remote_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 8 files changed, 186 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/9 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Alexey Serbin has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 227: Status s = GetTermFromConsensus(tservers, tablet_id_, Nit: since the variable 's' is not used anywhere else down the code, consider ASSERT_TRUE(GetTermFromConsensus(...).IsNotFound()); http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: PS8, Line 121: I Is capitalization of 'i' intentional? http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: PS8, Line 53: const char* const kMasterAddressesArgDesc It seems this is used across different set of tools. Is it possible to define it only once, not repeating it over and over again across all those files which require this parameter? http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS8, Line 67: I I saw this string is defined in different file as well. Does it make sense to have it defined only in a single place? E.g., putting it into kudu::tools::util namespace or alike. PS8, Line 177: string const string& here and further when using FindOrDie() ? The idea is that FindOrDie() returns const reference, and the result is not modified in the code below. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS6, Line 222: while (current_leader == new_leader && --num_retries > 0); > Bunch of things I narrowed down here this morning: Note that BuildAndStart() wait for all tablets to reach the state of RUNNING in the master. For that to happen, each tablet has to have elected a leader. So just before the call to leader_step_down we can guarantee that at least one leader election happened, though we can't guarantee that there's a leader _right now_. I think your pseudocode is a little dangerous because if for some reason there's another hidden bug that causes leader_step_down to fail, we won't see it. What if you whitelisted the "Illegal state: not currently a leader" case and allowed the ::Call() result to be either that, or Status::OK()? Wouldn't that give you the best of both worlds? -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#8). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 5 files changed, 168 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/8 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS6, Line 219: --enable_leader_failure_detection=false > Why do you need to override the default timeout value of 30s? timeout_ms is Yeah, I am reverting to default timeout after bunch of tests I executed on dist_test this morning. Initial thought was to allow just enough room for consensus to change, but turned out to be a bad idea. PS6, Line 222: pendValuesFromMap(tablet_servers_, &tservers); > There's no difference in the outcome of the test if we retry 10 times witho Bunch of things I narrowed down here this morning: - The same leader getting elected was just an observation I made on local testbed, when moved to dist_test I was able to see different leader at times(and same leader as well bunch of times). - Also spoke to David on few things, it turns out a leader may never get elected even after terms are advanced multiple times which makes this retry loop even more flaky-prone. This is consistent with my test on dist_test too where I observed couple of times an election resulting into no leader emerging on other side and subsequently failed with an error code from the actual kudu command L208 "Illegal state: not currently a leader". - I also kept just one routine grabbing just the consensus term instead of going to master for leader_uuid, as it turns out master could have a stale info as well. This is inline with what you suggested in one of your earlier comments. Given your above suggestions and my test results, I am inclined to keep the code as following now: if (SubProcess::Call('kudu leader_step_down')) { AssertEventually([&]() { GetINfoFromConsensus(leader_uuid, new_term) ASSERT_GT(new_term, current_term); } } - The reason I am hesitant to ASSERT on leader_step_down command is because as we know leader might have changed in the background not resulting in a new leader, and also I don't happen to be waiting for a leader election after cluster is setup, in which case it's perfectly possible that the command will fail. We can expect the term to have advanced in 30 secs however if the command succeeds, hence safe to assert on that. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#7). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Test: Ran 8000 iterations of leader_step_down test on dist_test. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 5 files changed, 172 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/7 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS6, Line 219: MonoDelta::FromMilliseconds(timeout_ms) Why do you need to override the default timeout value of 30s? timeout_ms is almost assuredly going to be less. PS6, Line 222: while (current_leader == new_leader && --num_retries > 0); There's no difference in the outcome of the test if we retry 10 times without a change in leader, or if the leader does change. Either way, the test will pass. Put another way, if there was a bug in leader_step_down that prevented the leader from actually changing, this test would not catch that. As such, I don't see the point of the retry loop. Just try the outer loop logic once, looping until the term changes, and then end the test. The alternative would be to run the outer loop extensively and enforce that the leader really has changed (either by asserting that new_leader!=current_leader, or by asserting that there's no leader at all), but it didn't sound like you were comfortable doing that. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#6). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 5 files changed, 187 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/6 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#5). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 5 files changed, 187 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/5 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dan Burkert has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 4: Please rebase this patch before pushing another revision. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#4). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 5 files changed, 186 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/4 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 218: tablet_id_ > But why not get the leader info from the consensus info too? Why go to two One reason I was hesitant to use the same routine for leadership info was because GetConsensus collects the details from all the tservers, and I wasn't sure whether one of them could end up returning a stale info depending on when the RPC hits them. In the code below at L234, probability of that RPC hitting when the leader failover is in transition is very likely. If anything, this was mainly due to not knowing the codebase well and staying on safer side(go to master for leader info). PS1, Line 230: "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); > Now I see what you mean; the number of attempts in AssertEventually is neve Well we don't want to include the under AssertEventually, because it's perfectly possible that even after specified timeout, we never elected a leader which is different than current_leader. ASSERT_NE(new_leader, current_leader) could yield us incorrect results. I will still debug what I observed is just conincidence or there is more to it than just meet the eye. However, the ++attempts kinda ensures that eventually new_leader emerges on the other side. http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS3, Line 221: // Grab the default settings for heartbeat timeouts. : string hb_timeout_str; : ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms", : &hb_timeout_str)); : int32 hb_timeout; : ASSERT_TRUE(safe_strto32(hb_timeout_str, &hb_timeout)); : string max_missed_hb_periods_str; : ASSERT_TRUE( : google::GetCommandLineOption( : "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); : double max_missed_hb_periods; : ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str, : &max_missed_hb_periods)); > You don't need to go through all this trouble (and you certainly don't need Good point, forgot about DECLARE, done. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: Line 218: tablet_id_ > sure, removed the comparison check, relying on master to report leader info But why not get the leader info from the consensus info too? Why go to two different places to get the combined set of information? That just opens the door for discrepancies between the two. PS1, Line 230: "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); > Sorry for not being clear before. Lemme try to explain although I haven't r Now I see what you mean; the number of attempts in AssertEventually is never actually incremented. That's a pretty serious problem; can you fix that, and rebase this patch on top of the fix? No unit test needed. As for the correct use of AssertEventually, let's get to the bottom of it as it should work. I can imagine something like this working: string current_leader = ... int64_t current_term = ... AssertEventually([&]() { GetInfoFromConsensus(&new_leader, &new_term) ASSERT_NE(new_leader, current_leader) ASSERT_GT(new_term, current_term) }); It's possible that the broken backoff behavior in AssertEventually is responsible, which is why I suggested you fix the bug and rebase this patch on top of it. If not, use gdb or temporary logging to figure out what's going on. http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS3, Line 221: // Grab the default settings for heartbeat timeouts. : string hb_timeout_str; : ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms", : &hb_timeout_str)); : int32 hb_timeout; : ASSERT_TRUE(safe_strto32(hb_timeout_str, &hb_timeout)); : string max_missed_hb_periods_str; : ASSERT_TRUE( : google::GetCommandLineOption( : "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); : double max_missed_hb_periods; : ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str, : &max_missed_hb_periods)); You don't need to go through all this trouble (and you certainly don't need to do it in a loop; it won't change during the loop). Just access FLAGS_raft_heartbeat_interval_ms and FLAGS_leader_failure_max_missed_heartbeat_periods directly, and DECLARE_* them at the top of the file. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 3: (9 comments) TFTR Adar, pls see responses inline, also pls see why my jenkins are still failing. I see that they are still using ...llvm-3.8.0.src/ as per the logs. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS1, Line 204: string c > The CHECK_* macros come from glog, and they all do the same thing regardles Thank you for that awesome clarity on when to use them. This has gone in my favorite notes :) Line 218: tablet_id_ > Unfortunately a test environment with leader failure detection isn't determ sure, removed the comparison check, relying on master to report leader info now. I have kept the getconsensus intact to grab the new term info. PS1, Line 230: "leader_failure_max_missed_heartbeat_periods", : &max_missed_hb_periods_str)); > Couple things: Sorry for not being clear before. Lemme try to explain although I haven't root caused the issue myself yet. When I used AssertEventually the GetConsensus comes out around 1.5 secs or so with a new_term. But what is not clear to me yet is, if I sleep for ~1.5 secs to allow the consensus to settle down, I see eventually a new_leader emerging which won't happen if I keep using GetConsensus via AssertEventually(despite 20 retries). Logs are confimring that GetConsensus indeed is detecting the new leader correctly, it's just that new_leader happens to be the current_leader everytime. I want to see why I see same leader gets elected again and again unless I allow some more time window before calling GetConsensus in which case I see new_leader coming up after few retries. With the fix of '++attempts' however, I eventually see another leader coming up so thats kinda hiding the problem. I think that ++attempts fix should have been there to keep the sleep time monotonically increasing. PS1, Line 241: ASSERT_OK(GetTermFromConsensus(tservers, tablet_id_, :&new_ter > Again, the leader can change whenever, so this doesn't seem safe. Done Line 264: > The actual check (not in the CHECK sense of the word, but the comparison an yeah, missed out removing here i think. done. http://gerrit.cloudera.org:8080/#/c/4533/2/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS2, Line 251: AGS_num_ > Still got some leftover CHECK_* statements that should be converted to thei Done, thanks for explaining the distinctions earlier. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS1, Line 274: : > You missed these comments down here. Ugh... sorry, for some reason I totally missed out scrolling down thinking that only change I had introduced was in the routine above :) PS1, Line 278: : > I admit, this one was my bad, but can you consolidate all of these descript Done PS1, Line 280: > Likewise for this, it's copied extensively. Done -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#3). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_cluster.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc M src/kudu/util/test_util.cc 6 files changed, 198 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/3 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS1, Line 204: } > Sure, could you tell me what's the relationship between this being a main t The CHECK_* macros come from glog, and they all do the same thing regardless of the context in which they're called: 1. Evaluate an expression. 2. If evaluation fails, log something. 3. Dump a stack trace. 4. Call abort() to exit the process. The DCHECK_* macros (also from glog) are debug-only variants that are otherwise equivalent, but compiled out in non-debug builds. The ASSERT_* macros come from gtest, so they're specific to how gtest works. They too evaluate an expression, but on failure, they do some gtest stuff to capture the failure, end the run of the current test, and move to the next test. They have some foibles, and one of them is that ASSERT failures in the thread that isn't the main process thread don't seem to get noticed by gtest. Maybe there's something we need to do for that to happen; if there is, I don't know what it is. Oh, and EXPECT_* is like ASSERT_* except that the test keeps running. So ASSERT_* will only ever reveal the "first" failure, while EXPECT_* will reveal all of them. So to summarize: - Use CHECK_*/DCHECK_* macros in production code. Use DCHECK_* if you expect to get sufficient test coverage of the callsite, CHECK_* otherwise. - Use ASSERT_*/EXPECT_* macros in test-only code, unless that code is expected to run on a separate test thread, in which case use CHECK_* too. Default to EXPECT_*, unless the operation in question must succeed for the rest of the test to function properly, in which case use ASSERT_*. Hope this helps. Line 218: ASSERT_OK(Subprocess::Call({ > In reality yes, but given that we are operating in a deterministic test env Unfortunately a test environment with leader failure detection isn't deterministic. The test can run on a machine with arbitrary load (perhaps from other tests), in which case one of the tserver processes can pause for arbitrary periods of time, causing a new leader to be elected. This is one of the leading causes of (low-grade) test flakiness for us. PS1, Line 230: CHECK_OK(GetInfoFromConsensus(tservers, tablet_id_, : &new_leader, &new_term)); > I tried this with an unsuccessful result, later realized AssertEventually() Couple things: 1. I don't understand why AssertEventually() doesn't work here. AFAICT the goal of this snippet is "tell the leader to step down, wait for an election to happen, then assert that the new term is higher than the one before". If you wrap L229-L233 in AssertEventually(), doesn't that mean you don't need the 3 second sleep (which could be flaky to begin with, since a cluster running with ASAN/TSAN could take a lot longer to run an election)? I don't see what the issue is; can you help me understand? Look at some of the existing usages of AssertEventually() if it's not clear. 2. I'm not seeing the bug in the code you pasted. What is it? Line 264: s = GetInfoFromConsensus(tservers, tablet_id_, > Done The actual check (not in the CHECK sense of the word, but the comparison and assertion) is still here, though. http://gerrit.cloudera.org:8080/#/c/4533/2/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS2, Line 251: CHECK_OK Still got some leftover CHECK_* statements that should be converted to their ASSERT_* equivalents. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS1, Line 194: > Done, although not sure if we need a toggle given that the routine anyways On second thought, the ResolveAddresses output isn't even used, is it? So let's drop it altogether. Looks like address resolution happens within BuildProxy() so that extra call was redundant from day one (in ConfigChange() as well). PS1, Line 274: : > Nit: reformat as "Force the tablet's leader replica (if present) to step do You missed these comments down here. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Dinesh Bhat has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/4533/1//COMMIT_MSG Commit Message: PS1, Line 12: > Nit: got an extra space here. Done PS1, Line 18: tablet > You mean 'table' here, right? yeah, thanks for catching. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS1, Line 178: Status s = itest::GetConsensusState(ts, : tablet_id, : consensus::CONSENSUS_CONFIG_COMMITTED, : MonoDelta::FromSeconds(10), &cstate); : if (!s.ok()) { : return s; : } > Just use RETURN_NOT_OK(...). Done Line 198: int num_retries = 0; > Why is this defined way up here instead of closer to where it's used? Done, was thinking of keeping the *retries* variables together initially, but I agree about moving it down just before the use. Also just made it one variable now. PS1, Line 204: CHECK_EQ > Use ASSERT_* instead of CHECK_* since all of these are on the main test thr Sure, could you tell me what's the relationship between this being a main thread and these macros here ? I remember asking this to you earlier in one of earlier rev comments but forgot to follow up in-person. I am just curious if there are debug-only versions of these macros ? Line 218: CHECK_EQ(master_reported_leader, current_leader); > The leader can change between the GetInfoFromConsensus calls and L213, righ In reality yes, but given that we are operating in a deterministic test environment where the server failures are manually triggered by code, I thought this check is safe to have. However, I am not entirely sure what this check is buying us, I just happen to notice that there are more than one way we could find the leader replica, and I thought it would be good to compare them when we know that they don't change on the fly during the test. I can remove this if you think this might be a room for any flakiness. PS1, Line 230: // Wait for re-election to happen again. : // 3 seconds picked to accommodate consensus heartbeat timeout(1.5s) > Use AssertEventually() to make this more robust, and to minimize the amount I tried this with an unsuccessful result, later realized AssertEventually() kinda doesn't seem to fit this scenario here; Reason being: a) We really want to check the consensus state after the heartbeat timeout as opposed to executing f() and then sleeping; AssertEventually() aggressively(every ms) keeps executing the f() until the timeout hits. b) I won't be able to use assert from inside the lambda argument when we know that Consensus state is still under flux. Side note: I found one minor bug in AssertEventually() here which I can fix in another patch: int attempts = 0; int sleep_ms = (attempts < 10) ? (1 << attempts) : 1000; SleepFor(sleep_ms); Perhaps we can devise another routine which sleeps for N secs and then executes the function and returns the status ? Line 264: CHECK_EQ(master_reported_leader, ""); > Let's not check this. It's a common calling convention that if a function f Done PS1, Line 271: CHECK_EQ(current_leader, ""); : CHECK_EQ(current_term, 0); > Likewise here, which means you needn't initialize current_term. Done http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS1, Line 194: Status s = GetTabletLeader(client, tablet_id, &leader_uuid, &leader_hp); > Consider consolidating from here to ResolveAddresses in a helper function, Done, although not sure if we need a toggle given that the routine anyways returns the Status and caller can decide if NotFound is FATAL or not ? PS1, Line 198: else if (!s.ok()) { : return s; : } > To pile on with Tidy Bot, just do RETURN_NOT_OK(s) after the not found chec Done PS1, Line 209: ServerStatusPB status; : RETURN_NOT_OK(GetServerStatus(leader_hp.host(), leader_hp.port(), &status)); > What's this for? I don't see status used anywhere. I kinda misunderstood the combination of RETURN_NOT_OK(GetServerStatus... thinking that this will return failure if leader is not up; My goal here was to check if leader is up & running so that we need not send the RPC if the server is already down. It looks like this doesn't buy us anything since RPC will fail anyways if the leader is down. Hence removed. PS1, Line 224: return Status::RemoteError(resp.ShortDebugString()); > Why not return StatusFromPB(resp.error().status()) like ConfigChange? Sounds good, just didn't know about this routine, thanks.
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4533 to look at the new patch set (#2). Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Also snuck in a small change to display host:port details with 'kudu table list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 3 files changed, 180 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/2 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Adar Dembo has posted comments on this change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. Patch Set 1: (17 comments) Please also update kudu-tool-test's help tests with the new action. http://gerrit.cloudera.org:8080/#/c/4533/1//COMMIT_MSG Commit Message: PS1, Line 12: Nit: got an extra space here. PS1, Line 18: tablet You mean 'table' here, right? http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: PS1, Line 178: Status s = itest::GetConsensusState(ts, : tablet_id, : consensus::CONSENSUS_CONFIG_COMMITTED, : MonoDelta::FromSeconds(10), &cstate); : if (!s.ok()) { : return s; : } Just use RETURN_NOT_OK(...). Line 198: int num_retries = 0; Why is this defined way up here instead of closer to where it's used? PS1, Line 204: CHECK_EQ Use ASSERT_* instead of CHECK_* since all of these are on the main test thread. Line 218: CHECK_EQ(master_reported_leader, current_leader); The leader can change between the GetInfoFromConsensus calls and L213, right? If so, isn't this unsafe? PS1, Line 230: // Wait for re-election to happen again. : // 3 seconds picked to accommodate consensus heartbeat timeout(1.5s) Use AssertEventually() to make this more robust, and to minimize the amount of sleeping actually needed. PS1, Line 241: CHECK_OK(GetTabletLeaderUUIDFromMaster(tablet_id_, &master_reported_leader)); : CHECK_EQ(master_reported_leader, new_leader); Again, the leader can change whenever, so this doesn't seem safe. Line 264: CHECK_EQ(master_reported_leader, ""); Let's not check this. It's a common calling convention that if a function fails, it doesn't set any OUT parameters, and we don't need to verify that ourselves. PS1, Line 271: CHECK_EQ(current_leader, ""); : CHECK_EQ(current_term, 0); Likewise here, which means you needn't initialize current_term. http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: PS1, Line 194: Status s = GetTabletLeader(client, tablet_id, &leader_uuid, &leader_hp); Consider consolidating from here to ResolveAddresses in a helper function, used from ChangeConfig too. It'd need a toggle to specify whether NotFound is fatal or not. PS1, Line 198: else if (!s.ok()) { : return s; : } To pile on with Tidy Bot, just do RETURN_NOT_OK(s) after the not found check. PS1, Line 209: ServerStatusPB status; : RETURN_NOT_OK(GetServerStatus(leader_hp.host(), leader_hp.port(), &status)); What's this for? I don't see status used anywhere. PS1, Line 224: return Status::RemoteError(resp.ShortDebugString()); Why not return StatusFromPB(resp.error().status()) like ConfigChange? PS1, Line 274: "Request the current leader of the " : "tablet(if present) to step down" Nit: reformat as "Force the tablet's leader replica (if present) to step down". PS1, Line 278: "Comma-separated list of Kudu Master addresses where each address is " : "of form 'hostname:port'" I admit, this one was my bad, but can you consolidate all of these descriptions in a global string constant like for kMasterAddressesArg? PS1, Line 280: "Tablet Identifier" Likewise for this, it's copied extensively. -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tools] Implement a manual leader step down for a tablet
Hello David Ribeiro Alves, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/4533 to review the following change. Change subject: [tools] Implement a manual leader_step_down for a tablet .. [tools] Implement a manual leader_step_down for a tablet This change introduces a leader_step_down functionality under 'kudu tablet'. This tool may be handy to recover from situations when a single tablet server is overloaded and we want to kick off a new election to balance the load across the clusters. Although it is not guaranteed that a different replica will be elected as the leader, this is an optimistic effort to elect a new tablet server as the leader for the given tablet in the cluster. Also snuck in a small change to display host:port details with 'kudu tablet list --list_tablets' command. Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 --- M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/tool_action_table.cc M src/kudu/tools/tool_action_tablet.cc 3 files changed, 185 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/1 -- To view, visit http://gerrit.cloudera.org:8080/4533 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Todd Lipcon