Hello Mike Percy, Todd Lipcon,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/8107
to review the following change.
Change subject: KUDU-2149: avoid election stacking by restoring failure monitor
semantics
......................................................................
KUDU-2149: avoid election stacking by restoring failure monitor semantics
Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().
I spent some time trying to test this and couldn't come up with anything
reliable, even after adding latency injection to ConsensusMetadata::Flush().
This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.
Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 22 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/1
--
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>