Todd Lipcon has submitted this change and it was merged. Change subject: Fix flakiness of kudu-admin-test TestMoveTablet ......................................................................
Fix flakiness of kudu-admin-test TestMoveTablet This fixes an issue in the admin tool for moving a tablet that would occur if the tablet changed terms multiple times before settling during an election. Previously, we would wait for a term change, and as soon as we saw any term change, we'd wait for an operation in exactly that term to be committed. There were two issues with the code: 1) It used cstate.has_leader_uuid() to check if there was a leader in the new term. However, the server side actually sets this field to an empty string rather than leaving it unset if there is no leader. So this check always evaluated to true, meaning that we would proceed to looking for a committed op in that term as soon as the term advanced, rather than waiting for an actual leader to be elected. 2) In the case that somehow the leader changed a second time, the term could be increased again while we are waiting for a committed operation. In that case we'd be waiting for a committed op in exactly the term of the first leader change we saw, rather than potentially refreshing our notion of the "current term". The patch fixes both issues by restructuring the loop a bit. I additionally improved some logging in the consensus service and implementation which I found helpful while debugging the issue. This test became significantly more flaky (~20% in ASAN) after the commit of 21b0f3d5e255760535e281efe5879fe657df1f1c which adjusted the election timeout behavior. Apparently the new election behavior made it more likely for the elections to have conflicts before getting a steady leader, which exposed the above bug in the tool. With the patch, I was able to run 500/500 successful in an ASAN build (vs 20% fail rate before). Change-Id: I475d4a44c52f2da7fc42c93e9cf2f38e01735177 Reviewed-on: http://gerrit.cloudera.org:8080/7808 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Todd Lipcon <[email protected]> --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/tools/tool_action_tablet.cc M src/kudu/tserver/tablet_service.cc 4 files changed, 54 insertions(+), 48 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/7808 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I475d4a44c52f2da7fc42c93e9cf2f38e01735177 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]>
