Hello Adar Dembo, Will Berkeley,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7808
to look at the new patch set (#2).
Change subject: Fix flakiness of kudu-admin-test TestMoveTablet
......................................................................
Fix flakiness of kudu-admin-test TestMoveTablet
This fixes an issue in the admin tool for moving a tablet that would occur if
the tablet changed terms multiple times before settling during an election.
Previously, we would wait for a term change, and as soon as we saw any term
change, we'd wait for an operation in exactly that term to be committed. There
were two issues with the code:
1) It used cstate.has_leader_uuid() to check if there was a leader in the new
term. However, the server side actually sets this field to an empty string
rather than leaving it unset if there is no leader. So this check always
evaluated to true, meaning that we would proceed to looking for a committed op
in that term as soon as the term advanced, rather than waiting for an actual
leader to be elected.
2) In the case that somehow the leader changed a second time, the term could
be increased again while we are waiting for a committed operation. In that case
we'd be waiting for a committed op in exactly the term of the first leader
change we saw, rather than potentially refreshing our notion of the "current
term".
The patch fixes both issues by restructuring the loop a bit.
I additionally improved some logging in the consensus service and
implementation which I found helpful while debugging the issue.
This test became significantly more flaky (~20% in ASAN) after the commit
of 21b0f3d5e255760535e281efe5879fe657df1f1c which adjusted the election
timeout behavior. Apparently the new election behavior made it more likely
for the elections to have conflicts before getting a steady leader, which
exposed the above bug in the tool.
With the patch, I was able to run 500/500 successful in an ASAN build (vs
20% fail rate before).
Change-Id: I475d4a44c52f2da7fc42c93e9cf2f38e01735177
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/tool_action_tablet.cc
M src/kudu/tserver/tablet_service.cc
4 files changed, 54 insertions(+), 48 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/7808/2
--
To view, visit http://gerrit.cloudera.org:8080/7808
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I475d4a44c52f2da7fc42c93e9cf2f38e01735177
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>