[kudu-CR] [tools] Fix bug in CheckCompleteMove
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11508 ) Change subject: [tools] Fix bug in CheckCompleteMove .. [tools] Fix bug in CheckCompleteMove It was possible for the following sequence to happen: 0. We are moving a replica from TS X to TS Y for tablet A. TS X is presently the leader. 1. We find the tablet leader (X) and build a proxy to it. 2. To remove X from A, we ask it to step down. 3. Leadership changes quickly and Z != X becomes the leader. 4. Since leadership has changed, we move to remove X from A. To prepare we gather consensus state using proxy, thinking we are talking to Z, but the proxy is pointed at X, causing a bad status like Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Z This bug has always been present but was exposed by the follow-up graceful leadership transfer patch, since #3 was unlikely with abrupt stepdown, and if CheckCompleteMove was retried after leadership changed it would not hit the same problem. This also reorganizes and re-comments CheckCompleteMove a bit, to try and make it easier to understand. Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Reviewed-on: http://gerrit.cloudera.org:8080/11508 Reviewed-by: Alexey Serbin Tested-by: Kudu Jenkins --- M src/kudu/tools/tool_replica_util.cc 1 file changed, 69 insertions(+), 37 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Gerrit-Change-Number: 11508 Gerrit-PatchSet: 3 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] Fix bug in CheckCompleteMove
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11508 ) Change subject: [tools] Fix bug in CheckCompleteMove .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Gerrit-Change-Number: 11508 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 01 Oct 2018 18:49:51 + Gerrit-HasComments: No
[kudu-CR] [tools] Fix bug in CheckCompleteMove
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11508 ) Change subject: [tools] Fix bug in CheckCompleteMove .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc File src/kudu/tools/tool_replica_util.cc: http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@287 PS1, Line 287: nd the : // replica-to-be-removed has stepped down as leader > That's not the only case we need to handle here right? The replica-to-be-r Done http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@291 PS1, Line 291: to_ts_uuid_is_a_voter > I think additional !from_ts_uuid_in_config condition is missing here. Done -- To view, visit http://gerrit.cloudera.org:8080/11508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Gerrit-Change-Number: 11508 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 01 Oct 2018 18:34:28 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Fix bug in CheckCompleteMove
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11508 to look at the new patch set (#2). Change subject: [tools] Fix bug in CheckCompleteMove .. [tools] Fix bug in CheckCompleteMove It was possible for the following sequence to happen: 0. We are moving a replica from TS X to TS Y for tablet A. TS X is presently the leader. 1. We find the tablet leader (X) and build a proxy to it. 2. To remove X from A, we ask it to step down. 3. Leadership changes quickly and Z != X becomes the leader. 4. Since leadership has changed, we move to remove X from A. To prepare we gather consensus state using proxy, thinking we are talking to Z, but the proxy is pointed at X, causing a bad status like Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Z This bug has always been present but was exposed by the follow-up graceful leadership transfer patch, since #3 was unlikely with abrupt stepdown, and if CheckCompleteMove was retried after leadership changed it would not hit the same problem. This also reorganizes and re-comments CheckCompleteMove a bit, to try and make it easier to understand. Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 --- M src/kudu/tools/tool_replica_util.cc 1 file changed, 69 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/2 -- To view, visit http://gerrit.cloudera.org:8080/11508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Gerrit-Change-Number: 11508 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [tools] Fix bug in CheckCompleteMove
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11508 ) Change subject: [tools] Fix bug in CheckCompleteMove .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc File src/kudu/tools/tool_replica_util.cc: http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@287 PS1, Line 287: nd the : // replica-to-be-removed has stepped down as leader That's not the only case we need to handle here right? The replica-to-be-removed might be a follower replica at this point, no? http://gerrit.cloudera.org:8080/#/c/11508/1/src/kudu/tools/tool_replica_util.cc@291 PS1, Line 291: to_ts_uuid_is_a_voter I think additional !from_ts_uuid_in_config condition is missing here. If from the procedural perspective there is nothing to do in case of the 3-4-3 replica management scheme, this function should still report on successful completion only when the the source replica is gone from the tablet configuration. -- To view, visit http://gerrit.cloudera.org:8080/11508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Gerrit-Change-Number: 11508 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Tue, 25 Sep 2018 21:29:45 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Fix bug in CheckCompleteMove
Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11508 Change subject: [tools] Fix bug in CheckCompleteMove .. [tools] Fix bug in CheckCompleteMove It was possible for the following sequence to happen: 0. We are moving a replica from TS X to TS Y for tablet A. TS X is presently the leader. 1. We find the tablet leader (X) and build a proxy to it. 2. To remove X from A, we ask it to step down. 3. Leadership changes quickly and Z != X becomes the leader. 4. Since leadership has changed, we move to remove X from A. To prepare we gather consensus state using proxy, thinking we are talking to Z, but the proxy is pointed at X, causing a bad status like Invalid argument: GetConsensusState: Wrong destination UUID requested. Local UUID: X. Requested UUID: Z This bug has always been present but was exposed by the follow-up graceful leadership transfer patch, since #3 was unlikely with abrupt stepdown, and if CheckCompleteMove was retried after leadership changed it would not hit the same problem. This also reorganizes and re-comments CheckCompleteMove a bit, to try and make it easier to understand. Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 --- M src/kudu/tools/tool_replica_util.cc 1 file changed, 69 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/1 -- To view, visit http://gerrit.cloudera.org:8080/11508 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16 Gerrit-Change-Number: 11508 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley