Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18967


Change subject: [tests] fix flakiness in TestFailDuringScanWorkload
......................................................................

[tests] fix flakiness in TestFailDuringScanWorkload

This patch fixes flakiness in the
TabletServerDiskErrorITest.TestFailDuringScanWorkload scenario.
There was a prior attempt to make the scenario more stable [1],
but that hadn't ruled out sporadic test failures due to
  * various scheduler anomalies
  * random distribution of replicas chosen by client to read data from

With that, the scenario was failing in about 1 out of 10 runs for
RELEASE and ASAN builds [2].

To eliminate the flakiness, it's necessary to make sure that
  * the dedicated tablet server ends up with at least one replica
    from which the client tries to fetch the data
  * scan requests arrive to tablet replicas hosted by the dedicated
    tablet server only after IO failures have been injected
This patch does so by
  * having more control over the selection of tablet replicas that
    client sends scan requests to
  * starting the scan operation only after injecting IO errors

To verify the fix, I ran the test scenario built in ASAN configuration
with and without this patch.  Without this patch, 96 out of 1024 runs
failed [3].  With the patch applied, 0 out of 1024 runs failed [4].

[1] 
https://github.com/apache/kudu/commit/ccbbfb3006314f2c37f3a40bfec355db9fc90e02
[2] 
http://dist-test.cloudera.org:8080/test_drilldown?test_name=disk_failure-itest
[3] http://dist-test.cloudera.org/job?job_id=aserbin.1662847551.105230
[4] http://dist-test.cloudera.org/job?job_id=aserbin.1662873124.94488

Change-Id: Ia29bfdc9761139426344532bab3e5d0b3c1b12ad
---
M src/kudu/integration-tests/disk_failure-itest.cc
1 file changed, 34 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/18967/1
--
To view, visit http://gerrit.cloudera.org:8080/18967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia29bfdc9761139426344532bab3e5d0b3c1b12ad
Gerrit-Change-Number: 18967
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>

Reply via email to