Hello Alexey Serbin, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11508
to look at the new patch set (#2).
Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................
[tools] Fix bug in CheckCompleteMove
It was possible for the following sequence to happen:
0. We are moving a replica from TS X to TS Y for tablet A. TS X is
presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
we gather consensus state using proxy, thinking we are talking to Z,
but the proxy is pointed at X, causing a bad status like
Invalid argument: GetConsensusState: Wrong destination UUID requested. Local
UUID: X. Requested UUID: Z
This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.
This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.
Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/2
--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins