Alexey Serbin has uploaded this change for review. (
http://gerrit.cloudera.org:8080/18967
Change subject: [tests] fix flakiness in TestFailDuringScanWorkload
......................................................................
[tests] fix flakiness in TestFailDuringScanWorkload
This patch fixes flakiness in the
TabletServerDiskErrorITest.TestFailDuringScanWorkload scenario.
There was a prior attempt to make the scenario more stable [1],
but that hadn't ruled out sporadic test failures due to
* various scheduler anomalies
* random distribution of replicas chosen by client to read data from
With that, the scenario was failing in about 1 out of 10 runs for
RELEASE and ASAN builds [2].
To eliminate the flakiness, it's necessary to make sure that
* the dedicated tablet server ends up with at least one replica
from which the client tries to fetch the data
* scan requests arrive to tablet replicas hosted by the dedicated
tablet server only after IO failures have been injected
This patch does so by
* having more control over the selection of tablet replicas that
client sends scan requests to
* starting the scan operation only after injecting IO errors
To verify the fix, I ran the test scenario built in ASAN configuration
with and without this patch. Without this patch, 96 out of 1024 runs
failed [3]. With the patch applied, 0 out of 1024 runs failed [4].
[1]
https://github.com/apache/kudu/commit/ccbbfb3006314f2c37f3a40bfec355db9fc90e02
[2]
http://dist-test.cloudera.org:8080/test_drilldown?test_name=disk_failure-itest
[3] http://dist-test.cloudera.org/job?job_id=aserbin.1662847551.105230
[4] http://dist-test.cloudera.org/job?job_id=aserbin.1662873124.94488
Change-Id: Ia29bfdc9761139426344532bab3e5d0b3c1b12ad
---
M src/kudu/integration-tests/disk_failure-itest.cc
1 file changed, 34 insertions(+), 10 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18967/1
--
To view, visit http://gerrit.cloudera.org:8080/18967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia29bfdc9761139426344532bab3e5d0b3c1b12ad
Gerrit-Change-Number: 18967
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>