Adar Dembo has uploaded a new patch set (#4). Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics ......................................................................
KUDU-2149: avoid election stacking by restoring failure monitor semantics Prior to commit 21b0f3d, the dedicated failure monitor thread invoked RaftConsensus::StartElection() synchronously, thus preventing it from surfacing additional failures during that time. This patch attempts to restore these semantics by short-circuiting and ignoring any failures detected while a Raft thread is in StartElection(). This is a super targeted fix geared towards a point release; a more correct fix would be to completely disable failure detection while an election is running, but that'll require more work. The included test is pretty ugly, especially the "feature flag" part. It failed 1 of 1000 runs in DEBUG mode with the number of votes being equal, so it's probably not robust enough to be merged as is. Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced --- M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus-itest.cc 4 files changed, 152 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/4 -- To view, visit http://gerrit.cloudera.org:8080/8107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]>
