Hello Kudu Jenkins, Todd Lipcon,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/10237
to look at the new patch set (#2).
Change subject: [tests] fixed flake in consensus_peer_health_status
......................................................................
[tests] fixed flake in consensus_peer_health_status
Fixed flake in the TestPeerHealthStatusTransitions scenario of the
ConsensusPeerHealthStatusITest test. Prior to the fix, the flakiness
happened when the target tablet server was shutdown during an on-going
tablet copy, where the tablet copy was initiated by AddServer() call
while preparing the mini-cluster for the peer health sequence of
"HEALTHY -> UNKNOWN -> FAILED -> FAILED_UNRECOVERABLE".
In that situation, the source tablet server had corresponding WAL
segments anchored, so they could not be GCed. As a result, the tablet
replica would not get FAILED_UNRECOVERABLE health status in 30 seconds
because --tablet_copy_idle_timeout_sec is set to 600 seconds by default.
Test results using dist-test, before and after the fix (ASAN),
before:
http://dist-test.cloudera.org/job?job_id=aserbin.1525111507.120645
after:
http://dist-test.cloudera.org/job?job_id=aserbin.1525099526.63998
Change-Id: I8eb640604e98361029aa3342ffa3050e922b6629
---
M src/kudu/integration-tests/consensus_peer_health_status-itest.cc
1 file changed, 6 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/10237/2
--
To view, visit http://gerrit.cloudera.org:8080/10237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8eb640604e98361029aa3342ffa3050e922b6629
Gerrit-Change-Number: 10237
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>