[kudu-CR] [tools] Fix bug in CheckCompleteMove

2018-10-01 Thread Will Berkeley (Code Review)
Will Berkeley has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
..

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local 
UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Reviewed-on: http://gerrit.cloudera.org:8080/11508
Reviewed-by: Alexey Serbin 
Tested-by: Kudu Jenkins
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [tools] Fix bug in CheckCompleteMove

2018-10-01 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 01 Oct 2018 18:49:51 +
Gerrit-HasComments: No


[kudu-CR] [tools] Fix bug in CheckCompleteMove

2018-10-01 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@287
PS1, Line 287: nd the
 :   // replica-to-be-removed has stepped down as leader
> That's not the only case we need to handle here right?  The replica-to-be-r
Done


http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@291
PS1, Line 291: to_ts_uuid_is_a_voter
> I think additional !from_ts_uuid_in_config condition is missing here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 01 Oct 2018 18:34:28 +
Gerrit-HasComments: Yes


[kudu-CR] [tools] Fix bug in CheckCompleteMove

2018-10-01 Thread Will Berkeley (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11508

to look at the new patch set (#2).

Change subject: [tools] Fix bug in CheckCompleteMove
..

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local 
UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/2
--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [tools] Fix bug in CheckCompleteMove

2018-09-25 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11508 )

Change subject: [tools] Fix bug in CheckCompleteMove
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@287
PS1, Line 287: nd the
 :   // replica-to-be-removed has stepped down as leader
That's not the only case we need to handle here right?  The 
replica-to-be-removed might be a follower replica at this point, no?


http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@291
PS1, Line 291: to_ts_uuid_is_a_voter
I think additional !from_ts_uuid_in_config condition is missing here.


If from the procedural perspective there is nothing to do in case of the 3-4-3 
replica management scheme, this function should still report on successful 
completion only when the the source replica is gone from the tablet 
configuration.



--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Tue, 25 Sep 2018 21:29:45 +
Gerrit-HasComments: Yes


[kudu-CR] [tools] Fix bug in CheckCompleteMove

2018-09-25 Thread Will Berkeley (Code Review)
Will Berkeley has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11508


Change subject: [tools] Fix bug in CheckCompleteMove
..

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local 
UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/1
--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley