[kudu-CR](branch-1.5.x) KUDU-2149: avoid election stacking by restoring failure monitor semantics
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10987 ) Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics .. Patch Set 1: Verified+1 Code-Review+2 Overriding Jenkins, the Python build failed due to a versioning issue but the C++ tests all passed. -- To view, visit http://gerrit.cloudera.org:8080/10987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-MessageType: comment Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Gerrit-Change-Number: 10987 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 19 Jul 2018 00:28:45 + Gerrit-HasComments: No
[kudu-CR](branch-1.5.x) KUDU-2149: avoid election stacking by restoring failure monitor semantics
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/10987 ) Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/10987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Gerrit-Change-Number: 10987 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo
[kudu-CR](branch-1.5.x) KUDU-2149: avoid election stacking by restoring failure monitor semantics
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10987 ) Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics .. KUDU-2149: avoid election stacking by restoring failure monitor semantics Prior to commit 21b0f3d, the dedicated failure monitor thread invoked RaftConsensus::StartElection() synchronously, thus preventing it from surfacing additional failures during that time. This patch attempts to restore these semantics by short-circuiting and ignoring any failures detected while a Raft thread is in StartElection(). This is a super targeted fix geared towards a point release; a more correct fix would be to completely disable failure detection while an election is running, but that'll require more work. Originally I had written a test that injects latency into ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote request RPCs. I couldn't get it to be totally robust, and the "feature flag" used in the toggle is likely to become obselete quickly. So in the end I decided to drop the test from the patch. Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Reviewed-on: http://gerrit.cloudera.org:8080/8107 Reviewed-by: Mike Percy Tested-by: Kudu Jenkins (cherry picked from commit edd41cb40fbad206e2c356983baba8fbc57199b5) Reviewed-on: http://gerrit.cloudera.org:8080/10987 Reviewed-by: Adar Dembo Tested-by: Adar Dembo --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 23 insertions(+), 3 deletions(-) Approvals: Adar Dembo: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/10987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-MessageType: merged Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Gerrit-Change-Number: 10987 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo
[kudu-CR](branch-1.5.x) KUDU-2149: avoid election stacking by restoring failure monitor semantics
Adar Dembo has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10987 Change subject: KUDU-2149: avoid election stacking by restoring failure monitor semantics .. KUDU-2149: avoid election stacking by restoring failure monitor semantics Prior to commit 21b0f3d, the dedicated failure monitor thread invoked RaftConsensus::StartElection() synchronously, thus preventing it from surfacing additional failures during that time. This patch attempts to restore these semantics by short-circuiting and ignoring any failures detected while a Raft thread is in StartElection(). This is a super targeted fix geared towards a point release; a more correct fix would be to completely disable failure detection while an election is running, but that'll require more work. Originally I had written a test that injects latency into ConsensusMetadata::Flush(), toggles the fix, and compares the number of vote request RPCs. I couldn't get it to be totally robust, and the "feature flag" used in the toggle is likely to become obselete quickly. So in the end I decided to drop the test from the patch. Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Reviewed-on: http://gerrit.cloudera.org:8080/8107 Reviewed-by: Mike Percy Tested-by: Kudu Jenkins (cherry picked from commit edd41cb40fbad206e2c356983baba8fbc57199b5) --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 23 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/10987/1 -- To view, visit http://gerrit.cloudera.org:8080/10987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-MessageType: newchange Gerrit-Change-Id: Ifeaf99ce57f7d5cd01a6c786c178567a98438ced Gerrit-Change-Number: 10987 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo