[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Reviewed-on: http://gerrit.cloudera.org:8080/4533
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_action_test.cc
10 files changed, 190 insertions(+), 66 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4533/12/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS12, Line 245: to step down
> What if that '(if present)' is dropped?  If it do not provide essential inf
It kinda tells this command is effective only with leader present.  However 
initial inclination was not to compromise on the actual help description to 
satisfy a test case. But command returns gracefully with a string if leader not 
present, so I guess we can drop it.. Updated.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Dinesh Bhat (Code Review)
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#13).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_action_test.cc
10 files changed, 190 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 12:

(1 comment)

> > > Patch Set 11: Patch Set 10 was rebased
 > >
 > > Could you also add coverage for the new sub-command into
 > > ToolTest.TestModeHelp in kudu-tool-test.cc?
 > 
 > Good idea, added. Please note the "*leader*step*" since I had to
 > workaround a small issue in how we validate the usage output when
 > lines are split by '\n' , because it doesn't go well with all the
 > output. For eg,
 > $ ./bin/kudu tablet
 > Usage: ./bin/kudu tablet  []
 > change_config Change a tablet's Raft configuration
 > leader_step_down   Force the tablet's leader replica (if present)
 > to step
 > down
 > We really want to split the output such that one sub-command
 > becomes one vector element, instead of splitting them by '\n'. I
 > still don't know how to fix this, but will try to tackle this
 > problem in a different change. For now, we could just add another
 > regex with just "*down", but that doesn't look good, hence doing
 > away with 'down' in the kTabletModeRegexes at the moment.

As an easier workaround, I would consider updating the help string for the new 
command.  I added corresponding comment regarding that.

Also, consider addressing Tidy Bot's warning about unused 'using' directive.

Otherwise, looks good.

http://gerrit.cloudera.org:8080/#/c/4533/12/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS12, Line 245: (if present)
What if that '(if present)' is dropped?  If it do not provide essential 
information you want to emphasize here, it would help with the test code 
command help coverage and would look better in 80-line terminal since no 
wrapping of lines.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Dinesh Bhat (Code Review)
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#12).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_action_test.cc
10 files changed, 191 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-07 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 11:

> Patch Set 11: Patch Set 10 was rebased

Could you also add coverage for the new sub-command into ToolTest.TestModeHelp 
in kudu-tool-test.cc?

-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-03 Thread Dinesh Bhat (Code Review)
Hello Adar Dembo, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#10).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tools/tool_action_test.cc
9 files changed, 189 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-03 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 9:

(5 comments)

> (5 comments)
 > 
 > There is one extra instance of "master_addresses" string literal
 > and corresponding description in tool_action_test.cc.  Could you
 > updated that to use the common constant as well?

TFTR, that instance was missing since I hadn't rebased on top of your latest 
tool change I think, rebased and updated now.

http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

PS9, Line 102: hostname:port
> Nit: do you want to add that the 'port' part could be omitted if the master
I guess so, although I would like to punt this into another patch after making 
sure all the RPCs used by the toolset comply to that theory.


PS9, Line 104: I
> Nit: Is capitalization of 'i' intentional?
Sort of. I think we could be bit more consistent here; this was more in the 
spirit of 'Tablet Server, Kudu Master'.. and so on although comparison isn't 
really apple-to-apple. We have also used 'block identifier' to counter my 
argument here.


http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

PS9, Line 81: extern
> Does 'extern' have to be present due to some compilation issues?  I would e
Fixed the ordering. About 'extern', I remember looking up cpp guideline for 
this, since I couldn't find anything, I followed the existing code's policy. 
Besides, explicit keyword provides a really strong readability.


http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS9, Line 175: string tablet_id
> nit: const string& tablet_id ?
Done


Line 246: "to step down")
> nit: parameter/line continuation shift is 4 spaces.
I am not clear about this. Guideline seems to say : "start the arguments on a 
new line indented by four spaces and continue at that 4 space indent". This 
isn't necessarily starting the argument on a newline. I remember following what 
other files have followed here, for eg take a look at tool_action_*.cc. If 
needed correction, I can fix them all in this change or a different change.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-10-03 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 9:

(5 comments)

There is one extra instance of "master_addresses" string literal and 
corresponding description in tool_action_test.cc.  Could you updated that to 
use the common constant as well?

http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.cc
File src/kudu/tools/tool_action_common.cc:

PS9, Line 102: hostname:port
Nit: do you want to add that the 'port' part could be omitted if the master 
listens at the default port?


PS9, Line 104: I
Nit: Is capitalization of 'i' intentional?


http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

PS9, Line 81: extern
Does 'extern' have to be present due to some compilation issues?  I would 
expect it to compile without 'extern'.

Also, consider moving those up to precede the first function in the kudu::tools 
namespace.  This is to be more conformant with the declaration order specified 
by the style guide: 
https://google.github.io/styleguide/cppguide.html#Declaration_Order


http://gerrit.cloudera.org:8080/#/c/4533/9/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS9, Line 175: string tablet_id
nit: const string& tablet_id ?


Line 246: "to step down")
nit: parameter/line continuation shift is 4 spaces.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS6, Line 222: ctor tservers;
> Note that BuildAndStart() wait for all tablets to reach the state of RUNNIN
Hmmm, AFAICT the only way we could whitelist that string is by comparing with 
stderr output. Let me know if I misunderstood your comment. Done that change.


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 227: tablet_id_,
> Nit: since the variable 's' is not used anywhere else down the code, consid
Done


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_local_replica.cc
File src/kudu/tools/tool_action_local_replica.cc:

PS8, Line 121: 
> Is capitalization of 'i' intentional?
Yeah, wanted to be consistent with rest of the places.


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

PS8, Line 53: const char* const kTableNameArg = "table_
> It seems this is used across different set of tools.  Is it possible to def
Done


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS8, Line 67: 
> I saw this string is defined in different file as well.  Does it make sense
Done


PS8, Line 177: client
> const string&  here and further when using FindOrDie() ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#9).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_common.h
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
8 files changed, 186 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 227:   Status s = GetTermFromConsensus(tservers, tablet_id_,
Nit: since the variable 's' is not used anywhere else down the code, consider

ASSERT_TRUE(GetTermFromConsensus(...).IsNotFound());


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_local_replica.cc
File src/kudu/tools/tool_action_local_replica.cc:

PS8, Line 121: I
Is capitalization of 'i' intentional?


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

PS8, Line 53: const char* const kMasterAddressesArgDesc
It seems this is used across different set of tools.  Is it possible to define 
it only once, not repeating it over and over again across all those files which 
require this parameter?


http://gerrit.cloudera.org:8080/#/c/4533/8/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS8, Line 67: I
I saw this string is defined in different file as well.  Does it make sense to 
have it defined only in a single place?  E.g., putting it into 
kudu::tools::util namespace or alike.


PS8, Line 177: string
const string&  here and further when using FindOrDie() ?

The idea is that FindOrDie() returns const reference, and the result is not 
modified in the code below.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS6, Line 222: while (current_leader == new_leader && --num_retries > 0);
> Bunch of things I narrowed down here this morning:
Note that BuildAndStart() wait for all tablets to reach the state of RUNNING in 
the master. For that to happen, each tablet has to have elected a leader. So 
just before the call to leader_step_down we can guarantee that at least one 
leader election happened, though we can't guarantee that there's a leader 
_right now_.

I think your pseudocode is a little dangerous because if for some reason 
there's another hidden bug that causes leader_step_down to fail, we won't see 
it. What if you whitelisted the "Illegal state: not currently a leader" case 
and allowed the ::Call() result to be either that, or Status::OK()? Wouldn't 
that give you the best of both worlds?


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#8).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
5 files changed, 168 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS6, Line 219: --enable_leader_failure_detection=false
> Why do you need to override the default timeout value of 30s? timeout_ms is
Yeah, I am reverting to default timeout after bunch of tests I executed on 
dist_test this morning. Initial thought was to allow just enough room for 
consensus to change, but turned out to be a bad idea.


PS6, Line 222: pendValuesFromMap(tablet_servers_, );
> There's no difference in the outcome of the test if we retry 10 times witho
Bunch of things I narrowed down here this morning:
- The same leader getting elected was just an observation I made on local 
testbed, when moved to dist_test I was able to see different leader at 
times(and same leader as well bunch of times).

- Also spoke to David on few things, it turns out a leader may never get 
elected even after terms are advanced multiple times which makes this retry 
loop even more flaky-prone. This is consistent with my test on dist_test too 
where I observed couple of times an election resulting into no leader emerging 
on other side and subsequently failed with an error code from the actual kudu 
command L208 "Illegal state: not currently a leader". 

- I also kept just one routine grabbing just the consensus term instead of 
going to master for leader_uuid, as it turns out master could have a stale info 
as well. This is inline with what you suggested in one of your earlier comments.

Given your above suggestions and my test results, I am inclined to keep the 
code as following now:

if (SubProcess::Call('kudu leader_step_down')) {
  AssertEventually([&]() {
GetINfoFromConsensus(leader_uuid, new_term)
ASSERT_GT(new_term, current_term);
  }
}

- The reason I am hesitant to ASSERT on leader_step_down command is because as 
we know leader might have changed in the background not resulting in a new 
leader, and also I don't happen to be waiting for a leader election after 
cluster is setup, in which case it's perfectly possible that the command will 
fail. We can expect the term to have advanced in 30 secs however if the command 
succeeds, hence safe to assert on that.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#7).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Test: Ran 8000 iterations of leader_step_down test on dist_test.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
5 files changed, 172 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4533/6/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS6, Line 219: MonoDelta::FromMilliseconds(timeout_ms)
Why do you need to override the default timeout value of 30s? timeout_ms is 
almost assuredly going to be less.


PS6, Line 222: while (current_leader == new_leader && --num_retries > 0);
There's no difference in the outcome of the test if we retry 10 times without a 
change in leader, or if the leader does change. Either way, the test will pass. 
Put another way, if there was a bug in leader_step_down that prevented the 
leader from actually  changing, this test would not catch that.

As such, I don't see the point of the retry loop. Just try the outer loop logic 
once, looping until the term changes, and then end the test.

The alternative would be to run the outer loop extensively and enforce that the 
leader really has changed (either by asserting that new_leader!=current_leader, 
or by asserting that there's no leader at all), but it didn't sound like you 
were comfortable doing that.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-30 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#6).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
5 files changed, 187 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#5).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
5 files changed, 187 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 4:

Please rebase this patch before pushing another revision.

-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#4).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
5 files changed, 186 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 218:   tablet_id_
> But why not get the leader info from the consensus info too? Why go to two 
One reason I was hesitant to use the same routine for leadership info was 
because GetConsensus collects the details from all the tservers, and I wasn't 
sure whether one of them could end up returning a stale info depending on when 
the RPC hits them. In the code below at L234, probability of that RPC hitting 
when the leader failover is in transition is very likely. If anything, this was 
mainly due to not knowing the codebase well and staying on safer side(go to 
master for leader info).


PS1, Line 230: "leader_failure_max_missed_heartbeat_periods",
 : _missed_hb_periods_str));
> Now I see what you mean; the number of attempts in AssertEventually is neve
Well we don't want to include the  under AssertEventually, 
because it's perfectly possible that even after specified timeout, we never 
elected a leader which is different than current_leader. ASSERT_NE(new_leader, 
current_leader) could yield us incorrect results. I will still debug what I 
observed is just conincidence or there is more to it than just meet the eye. 
However, the ++attempts kinda ensures that eventually new_leader emerges on the 
other side.


http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS3, Line 221: // Grab the default settings for heartbeat timeouts.
 : string hb_timeout_str;
 : 
ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms",
 :  _timeout_str));
 : int32 hb_timeout;
 : ASSERT_TRUE(safe_strto32(hb_timeout_str, _timeout));
 : string max_missed_hb_periods_str;
 : ASSERT_TRUE(
 : google::GetCommandLineOption(
 : "leader_failure_max_missed_heartbeat_periods",
 : _missed_hb_periods_str));
 : double max_missed_hb_periods;
 : ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str,
 : _missed_hb_periods));
> You don't need to go through all this trouble (and you certainly don't need
Good point, forgot about DECLARE, done.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 218:   tablet_id_
> sure, removed the comparison check, relying on master to report leader info
But why not get the leader info from the consensus info too? Why go to two 
different places to get the combined set of information? That just opens the 
door for discrepancies between the two.


PS1, Line 230: "leader_failure_max_missed_heartbeat_periods",
 : _missed_hb_periods_str));
> Sorry for not being clear before. Lemme try to explain although I haven't r
Now I see what you mean; the number of attempts in AssertEventually is never 
actually incremented. That's a pretty serious problem; can you fix that, and 
rebase this patch on top of the fix? No unit test needed.

As for the correct use of AssertEventually, let's get to the bottom of it as it 
should work. I can imagine something like this working:

  string current_leader = ...
  int64_t current_term = ...
  AssertEventually([&]() {

GetInfoFromConsensus(_leader, _term)
ASSERT_NE(new_leader, current_leader)
ASSERT_GT(new_term, current_term)
  });

It's possible that the broken backoff behavior in AssertEventually is 
responsible, which is why I suggested you fix the bug and rebase this patch on 
top of it. If not, use gdb or temporary logging to figure out what's going on.


http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS3, Line 221: // Grab the default settings for heartbeat timeouts.
 : string hb_timeout_str;
 : 
ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms",
 :  _timeout_str));
 : int32 hb_timeout;
 : ASSERT_TRUE(safe_strto32(hb_timeout_str, _timeout));
 : string max_missed_hb_periods_str;
 : ASSERT_TRUE(
 : google::GetCommandLineOption(
 : "leader_failure_max_missed_heartbeat_periods",
 : _missed_hb_periods_str));
 : double max_missed_hb_periods;
 : ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str,
 : _missed_hb_periods));
You don't need to go through all this trouble (and you certainly don't need to 
do it in a loop; it won't change during the loop). Just access 
FLAGS_raft_heartbeat_interval_ms and 
FLAGS_leader_failure_max_missed_heartbeat_periods directly, and  DECLARE_* them 
at the top of the file.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 3:

(9 comments)

TFTR Adar, pls see responses inline, also pls see why my jenkins are still 
failing. I see that they are still using ...llvm-3.8.0.src/ as per the logs.

http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS1, Line 204: string c
> The CHECK_* macros come from glog, and they all do the same thing regardles
Thank you for that awesome clarity on when to use them. This has gone in my 
favorite notes :)


Line 218:   tablet_id_
> Unfortunately a test environment with leader failure detection isn't determ
sure, removed the comparison check, relying on master to report leader info 
now. I have kept the getconsensus intact to grab the new term info.


PS1, Line 230: "leader_failure_max_missed_heartbeat_periods",
 : _missed_hb_periods_str));
> Couple things:
Sorry for not being clear before. Lemme try to explain although I haven't root 
caused the issue myself yet. When I used AssertEventually the GetConsensus 
comes out around 1.5 secs or so with a new_term. But what is not clear to me 
yet is, if I sleep for ~1.5 secs to allow the consensus to settle down, I see 
eventually a new_leader emerging which won't happen if I keep using 
GetConsensus via AssertEventually(despite 20 retries). Logs are confimring that 
GetConsensus indeed is detecting the new leader correctly, it's just that 
new_leader happens to be the current_leader everytime. 

I want to see why I see same leader gets elected again and again unless I allow 
some more time window before calling GetConsensus in which case I see 
new_leader coming up after few retries. With the fix of '++attempts' however, I 
eventually see another leader coming up so thats kinda hiding the problem. I 
think that ++attempts fix should have been there to keep the sleep time 
monotonically increasing.


PS1, Line 241: ASSERT_OK(GetTermFromConsensus(tservers, tablet_id_,
 :_ter
> Again, the leader can change whenever, so this doesn't seem safe.
Done


Line 264: 
> The actual check (not in the CHECK sense of the word, but the comparison an
yeah, missed out removing here i think. done.


http://gerrit.cloudera.org:8080/#/c/4533/2/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS2, Line 251: AGS_num_
> Still got some leftover CHECK_* statements that should be converted to thei
Done, thanks for explaining the distinctions earlier.


http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS1, Line 274: 
 : 
> You missed these comments down here.
Ugh... sorry, for some reason I totally missed out scrolling down thinking that 
only change I had introduced was in the routine above :)


PS1, Line 278: 
 : 
> I admit, this one was my bad, but can you consolidate all of these descript
Done


PS1, Line 280: 
> Likewise for this, it's copied extensively.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-29 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#3).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/util/test_util.cc
6 files changed, 198 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-28 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS1, Line 204: }
> Sure, could you tell me what's the relationship between this being a main t
The CHECK_* macros come from glog, and they all do the same thing regardless of 
the context in which they're called:
1. Evaluate an expression.
2. If evaluation fails, log something.
3. Dump a stack trace.
4. Call abort() to exit the process.

The DCHECK_* macros (also from glog) are debug-only variants that are otherwise 
equivalent, but compiled out in non-debug builds.

The ASSERT_* macros come from gtest, so they're specific to how gtest works. 
They too evaluate an expression, but on failure, they do some gtest stuff to 
capture the failure, end the run of the current test, and move to the next 
test. They have some foibles, and one of them is that ASSERT failures in the 
thread that isn't the main process thread don't seem to get noticed  by gtest. 
Maybe there's something we need to do for that to happen; if there is, I don't 
know what it is.

Oh, and EXPECT_* is like ASSERT_* except that the test keeps running. So 
ASSERT_* will only ever reveal the "first" failure, while EXPECT_* will reveal 
all of them.

So to summarize:
- Use CHECK_*/DCHECK_* macros in production code. Use DCHECK_* if you expect to 
get sufficient test coverage of the callsite, CHECK_* otherwise.
- Use ASSERT_*/EXPECT_* macros in test-only code, unless that code is expected 
to run on a separate test thread, in which case use CHECK_* too. Default to 
EXPECT_*, unless the operation in question must succeed for the rest of the 
test to function properly, in which case use ASSERT_*.

Hope this helps.


Line 218: ASSERT_OK(Subprocess::Call({
> In reality yes, but given that we are operating in a deterministic test env
Unfortunately a test environment with leader failure detection isn't 
deterministic. The test can run on a machine with arbitrary load (perhaps from 
other tests), in which case one of the tserver processes can pause for 
arbitrary periods of time, causing a new leader to be elected. This is one of 
the leading causes of (low-grade) test flakiness for us.


PS1, Line 230: CHECK_OK(GetInfoFromConsensus(tservers, tablet_id_,
 :   _leader, _term));
> I tried this with an unsuccessful result, later realized AssertEventually()
Couple things:

1. I don't understand why AssertEventually() doesn't work here. AFAICT the goal 
of this snippet is "tell the leader to step down, wait for an election to 
happen, then assert that the new term is higher than the one before". If you 
wrap L229-L233 in AssertEventually(), doesn't that mean you don't need the 3 
second sleep (which could be flaky to begin with, since a cluster running with 
ASAN/TSAN could take a lot longer to run an election)? I don't see what the 
issue is; can you help me understand? Look at some of the existing usages of 
AssertEventually() if it's not clear.
2. I'm not seeing the bug in the code you pasted. What is it?


Line 264:   s = GetInfoFromConsensus(tservers, tablet_id_,
> Done
The actual check (not in the CHECK sense of the word, but the comparison and 
assertion) is still here, though.


http://gerrit.cloudera.org:8080/#/c/4533/2/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS2, Line 251: CHECK_OK
Still got some leftover CHECK_* statements that should be converted to their 
ASSERT_* equivalents.


http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS1, Line 194: 
> Done, although not sure if we need a toggle given that the routine anyways 
On second thought, the ResolveAddresses output isn't even used, is it? So let's 
drop it altogether. Looks like address resolution happens within BuildProxy() 
so that extra call was redundant from day one (in ConfigChange() as well).


PS1, Line 274: 
 : 
> Nit: reformat as "Force the tablet's leader replica (if present) to step do
You missed these comments down here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-28 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4533/1//COMMIT_MSG
Commit Message:

PS1, Line 12:   
> Nit: got an extra space here.
Done


PS1, Line 18: tablet
> You mean 'table' here, right?
yeah, thanks for catching.


http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS1, Line 178: Status s = itest::GetConsensusState(ts,
 : tablet_id,
 : 
consensus::CONSENSUS_CONFIG_COMMITTED,
 : 
MonoDelta::FromSeconds(10), );
 : if (!s.ok()) {
 :   return s;
 : }
> Just use RETURN_NOT_OK(...).
Done


Line 198:   int num_retries = 0;
> Why is this defined way up here instead of closer to where it's used?
Done, was thinking of keeping the *retries* variables together initially, but I 
agree about moving it down just before the use. Also just made it one variable 
now.


PS1, Line 204: CHECK_EQ
> Use ASSERT_* instead of CHECK_* since all of these are on the main test thr
Sure, could you tell me what's the relationship between this being a main 
thread and these macros here ? I remember asking this to you earlier in one of 
earlier rev comments but forgot to follow up in-person. 
I am just curious if there are debug-only versions of these macros ?


Line 218:   CHECK_EQ(master_reported_leader, current_leader);
> The leader can change between the GetInfoFromConsensus calls and L213, righ
In reality yes, but given that we are operating in a deterministic test 
environment where the server failures are manually triggered by code, I thought 
this check is safe to have. However, I am not entirely sure what this check is 
buying us, I just happen to notice that there are more than one way we could 
find the leader replica, and I thought it would be good to compare them when we 
know that they don't change on the fly during the test. I can remove this if 
you think this might be a room for any flakiness.


PS1, Line 230: // Wait for re-election to happen again.
 : // 3 seconds picked to accommodate consensus heartbeat 
timeout(1.5s)
> Use AssertEventually() to make this more robust, and to minimize the amount
I tried this with an unsuccessful result, later realized AssertEventually() 
kinda doesn't seem to fit this scenario here; Reason being: a) We really want 
to check the consensus state after the heartbeat timeout as opposed to 
executing f() and then sleeping;  AssertEventually() aggressively(every ms) 
keeps executing the f() until the timeout hits. b) I won't be able to use 
assert from inside the lambda argument when we know that Consensus state is 
still under flux.
Side note: I found one minor bug in AssertEventually() here which I can fix in 
another patch:
int attempts = 0;
int sleep_ms = (attempts < 10) ? (1 << attempts) : 1000;
SleepFor(sleep_ms);

Perhaps we can devise another routine which sleeps for N secs and then executes 
the function and returns the status ?


Line 264:   CHECK_EQ(master_reported_leader, "");
> Let's not check this. It's a common calling convention that if a function f
Done


PS1, Line 271:   CHECK_EQ(current_leader, "");
 :   CHECK_EQ(current_term, 0);
> Likewise here, which means you needn't initialize current_term.
Done


http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS1, Line 194:   Status s = GetTabletLeader(client, tablet_id, _uuid, 
_hp);
> Consider consolidating from here to ResolveAddresses in a helper function, 
Done, although not sure if we need a toggle given that the routine anyways 
returns the Status and caller can decide if NotFound is FATAL or not ?


PS1, Line 198: else if (!s.ok()) {
 : return s;
 :   }
> To pile on with Tidy Bot, just do RETURN_NOT_OK(s) after the not found chec
Done


PS1, Line 209:   ServerStatusPB status;
 :   RETURN_NOT_OK(GetServerStatus(leader_hp.host(), 
leader_hp.port(), ));
> What's this for? I don't see status used anywhere.
I kinda misunderstood the combination of RETURN_NOT_OK(GetServerStatus... 
thinking that this will return failure if leader is not up; My goal here was to 
check if leader is up & running so that we need not send the RPC if the server 
is already down. It looks like this doesn't buy us anything since RPC will fail 
anyways if the leader is down. Hence removed.


PS1, Line 224:  return Status::RemoteError(resp.ShortDebugString());
> Why not return StatusFromPB(resp.error().status()) like ConfigChange?
Sounds good, just didn't know about this routine, thanks.


-- 
To view, visit 

[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-28 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4533

to look at the new patch set (#2).

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Also snuck in a small change to display host:port details with
'kudu table list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
3 files changed, 180 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-26 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..


Patch Set 1:

(17 comments)

Please also update kudu-tool-test's help tests with the new action.

http://gerrit.cloudera.org:8080/#/c/4533/1//COMMIT_MSG
Commit Message:

PS1, Line 12:   
Nit: got an extra space here.


PS1, Line 18: tablet
You mean 'table' here, right?


http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS1, Line 178: Status s = itest::GetConsensusState(ts,
 : tablet_id,
 : 
consensus::CONSENSUS_CONFIG_COMMITTED,
 : 
MonoDelta::FromSeconds(10), );
 : if (!s.ok()) {
 :   return s;
 : }
Just use RETURN_NOT_OK(...).


Line 198:   int num_retries = 0;
Why is this defined way up here instead of closer to where it's used?


PS1, Line 204: CHECK_EQ
Use ASSERT_* instead of CHECK_* since all of these are on the main test thread.


Line 218:   CHECK_EQ(master_reported_leader, current_leader);
The leader can change between the GetInfoFromConsensus calls and L213, right? 
If so, isn't this unsafe?


PS1, Line 230: // Wait for re-election to happen again.
 : // 3 seconds picked to accommodate consensus heartbeat 
timeout(1.5s)
Use AssertEventually() to make this more robust, and to minimize the amount of 
sleeping actually needed.


PS1, Line 241:   CHECK_OK(GetTabletLeaderUUIDFromMaster(tablet_id_, 
_reported_leader));
 :   CHECK_EQ(master_reported_leader, new_leader);
Again, the leader can change whenever, so this doesn't seem safe.


Line 264:   CHECK_EQ(master_reported_leader, "");
Let's not check this. It's a common calling convention that if a function 
fails, it doesn't set any OUT parameters, and we don't need to verify that 
ourselves.


PS1, Line 271:   CHECK_EQ(current_leader, "");
 :   CHECK_EQ(current_term, 0);
Likewise here, which means you needn't initialize current_term.


http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/tool_action_tablet.cc
File src/kudu/tools/tool_action_tablet.cc:

PS1, Line 194:   Status s = GetTabletLeader(client, tablet_id, _uuid, 
_hp);
Consider consolidating from here to ResolveAddresses in a helper function, used 
from ChangeConfig too. It'd need a toggle to specify whether NotFound is fatal 
or not.


PS1, Line 198: else if (!s.ok()) {
 : return s;
 :   }
To pile on with Tidy Bot, just do RETURN_NOT_OK(s) after the not found check.


PS1, Line 209:   ServerStatusPB status;
 :   RETURN_NOT_OK(GetServerStatus(leader_hp.host(), 
leader_hp.port(), ));
What's this for? I don't see status used anywhere.


PS1, Line 224:  return Status::RemoteError(resp.ShortDebugString());
Why not return StatusFromPB(resp.error().status()) like ConfigChange?


PS1, Line 274: "Request the current leader of the "
 : "tablet(if present) to step down"
Nit: reformat as "Force the tablet's leader replica (if present) to step down".


PS1, Line 278: "Comma-separated list of Kudu Master addresses where each 
address is "
 : "of form 'hostname:port'"
I admit, this one was my bad, but can you consolidate all of these descriptions 
in a global string constant like for kMasterAddressesArg?


PS1, Line 280: "Tablet Identifier"
Likewise for this, it's copied extensively.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] [tools] Implement a manual leader step down for a tablet

2016-09-23 Thread Dinesh Bhat (Code Review)
Hello David Ribeiro Alves, Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/4533

to review the following change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
..

[tools] Implement a manual leader_step_down for a tablet

This change introduces a leader_step_down functionality
under 'kudu tablet'. This tool may be handy to recover from
situations when a single tablet server is overloaded and we want
to kick off  a new election to balance the load across the clusters.
Although it is not guaranteed that a different replica will be elected
as the leader, this is an optimistic effort to elect a new tablet
server as the leader for the given tablet in the cluster.

Also snuck in a small change to display host:port details with
'kudu tablet list  --list_tablets' command.

Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
---
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
3 files changed, 185 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4533/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon