Hello Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11122
to look at the new patch set (#2).
Change subject: [consensus] KUDU-2335 increment term on explicit step down
......................................................................
[consensus] KUDU-2335 increment term on explicit step down
Prior to this patch, the catalog manager could get a tablet report
from a former leader replica with empty leader UUID and old term.
A dedicated logic in the catalog manager's code (see section 7d(i))
would amend the empty leader UUID to replace it with the previous
leader's UUID. As a result of those shenanigans, catalog manager
would interpret the incoming report as a report from a leader replica
that reports its own health status as UNKNOWN.
The TwoConcurrentRebalancers scenario of the recently introduced
ConcurrentRebalancersTest reproduces the issue pretty often
(about 1 in 100 runs failed), so it was easy to pin-point the problem.
Mike sketched the fix and I ran the new code via dist-test about 1K
times and verified the problem is gone.
As for the test coverage, in addition to the already mentioned
would-be-flaky TwoConcurrentRebalancers scenario, I modified
RaftConsensusElectionITest.LeaderStepDown to reliably catch regressions.
Change-Id: I4e1f1446176a78ba04e74dd1153f9048a32d8d5f
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/raft_consensus_election-itest.cc
2 files changed, 31 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/11122/2
--
To view, visit http://gerrit.cloudera.org:8080/11122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e1f1446176a78ba04e74dd1153f9048a32d8d5f
Gerrit-Change-Number: 11122
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>