Will Berkeley has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11508


Change subject: [tools] Fix bug in CheckCompleteMove
......................................................................

[tools] Fix bug in CheckCompleteMove

It was possible for the following sequence to happen:

0. We are moving a replica from TS X to TS Y for tablet A. TS X is
   presently the leader.
1. We find the tablet leader (X) and build a proxy to it.
2. To remove X from A, we ask it to step down.
3. Leadership changes quickly and Z != X becomes the leader.
4. Since leadership has changed, we move to remove X from A. To prepare
   we gather consensus state using proxy, thinking we are talking to Z,
   but the proxy is pointed at X, causing a bad status like

   Invalid argument: GetConsensusState: Wrong destination UUID requested. Local 
UUID: X. Requested UUID: Z

This bug has always been present but was exposed by the follow-up
graceful leadership transfer patch, since #3 was unlikely with abrupt
stepdown, and if CheckCompleteMove was retried after leadership changed
it would not hit the same problem.

This also reorganizes and re-comments CheckCompleteMove a bit, to try
and make it easier to understand.

Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
---
M src/kudu/tools/tool_replica_util.cc
1 file changed, 69 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/11508/1
--
To view, visit http://gerrit.cloudera.org:8080/11508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I227b8f833e8904dd1ac18fbe17345bea13c96c16
Gerrit-Change-Number: 11508
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <[email protected]>

Reply via email to