Hello Todd Lipcon,
I'd like you to do a code review. Please visit
to review the following change.
Change subject: test_util: include shard number in test scratch directory name
test_util: include shard number in test scratch directory name
Amongst other things, build-and-test.sh checks that every test cleans up its
scratch directory. It only performs this check if all tests pass, because if
a test fails, it'll leave behind its scratch directory for inspection.
This gets a little murky when considering flaky tests, because if one fails,
it is retried a few times. As such, it's important that when run-test.sh
prepares to rerun a flaky test, it cleans up the scratch directory left
behind by the failed run. Unfortunately, run-test.sh isn't really aware of
what it needs to clean up, because each test in the test executable creates
its own test-specific scratch directory within a common root. To account for
this, run-test.sh figures out what to clean using the following algorithm:
1. Before the test, capture the list of all directories within TEST_TMPDIR.
2. After the test, repeat the capture.
3. Diff the two captures to extract a list of directories that exist after
but not before.
4. Search for the test name as a prefix in the extracted list.
While complex, the algorithm does match and delete all scratch directories
that were left behind by a particular test executable, even if TEST_TMPDIR
is in use by other tests running in parallel. However, sharded tests break
the algorithm, because the test name now includes a shard index, and that
index isn't included in the scratch directory's name.
I experimented with various fixes that included giving run-test.sh more
ownership over TEST_TMPDIR, but eventually settled on a smaller and more
targeted fix: if the test is being sharded, include the shard index in the
scratch directory name.
I tested this patch by running:
KUDU_FLAKY_TEST_LIST=<(echo raft_consensus-itest) \
ctest -j8 -R raft_consensus-itest
In the background, I also induced stress on the machine. This caused
RaftConsensusParamReplicationModesITest.Test_KUDU_1735/1 to flake and fail
from time to time.
Without the patch, the flaky test's scratch directory was always left behind
in TEST_TMPDIR. With it, the directory was always gone, unless the test
failed every time it was run, which is expected behavior.
1 file changed, 10 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/10069/1
To view, visit http://gerrit.cloudera.org:8080/10069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>