Adar Dembo has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10987


Change subject: KUDU-2149: avoid election stacking by restoring failure monitor 
semantics
......................................................................

KUDU-2149: avoid election stacking by restoring failure monitor semantics

Prior to commit 21b0f3d, the dedicated failure monitor thread invoked
RaftConsensus::StartElection() synchronously, thus preventing it from
surfacing additional failures during that time. This patch attempts to
restore these semantics by short-circuiting and ignoring any failures
detected while a Raft thread is in StartElection().

This is a super targeted fix geared towards a point release; a more correct
fix would be to completely disable failure detection while an election is
running, but that'll require more work.

Originally I had written a test that injects latency into
ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote
request RPCs. I couldn't get it to be totally robust, and the "feature flag"
used in the toggle is likely to become obselete quickly. So in the end I
decided to drop the test from the patch.

Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Reviewed-on: http://gerrit.cloudera.org:8080/8107
Reviewed-by: Mike Percy <mpe...@apache.org>
Tested-by: Kudu Jenkins
(cherry picked from commit edd41cb40fbad206e2c356983baba8fbc57199b5)
---
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
2 files changed, 23 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10987/1
--
To view, visit http://gerrit.cloudera.org:8080/10987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced
Gerrit-Change-Number: 10987
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <a...@cloudera.com>

Reply via email to