Hello Mike Percy, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/8107

to review the following change.

Change subject: KUDU-2149: avoid election stacking by restoring failure monitor 
semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

I spent some time trying to test this and couldn't come up with anything
reliable, even after adding latency injection to ConsensusMetadata::Flush().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 22 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/8107/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to