Todd Lipcon has submitted this change and it was merged. ( )

Change subject: test_util: include shard number in test scratch directory name

test_util: include shard number in test scratch directory name

Amongst other things, checks that every test cleans up its
scratch directory. It only performs this check if all tests pass, because if
a test fails, it'll leave behind its scratch directory for inspection.

This gets a little murky when considering flaky tests, because if one fails,
it is retried a few times. As such, it's important that when
prepares to rerun a flaky test, it cleans up the scratch directory left
behind by the failed run. Unfortunately, isn't really aware of
what it needs to clean up, because each test in the test executable creates
its own test-specific scratch directory within a common root. To account for
this, figures out what to clean using the following algorithm:
1. Before the test, capture the list of all directories within TEST_TMPDIR.
2. After the test, repeat the capture.
3. Diff the two captures to extract a list of directories that exist after
   but not before.
4. Search for the test name as a prefix in the extracted list.

While complex, the algorithm does match and delete all scratch directories
that were left behind by a particular test executable, even if TEST_TMPDIR
is in use by other tests running in parallel. However, sharded tests break
the algorithm, because the test name now includes a shard index, and that
index isn't included in the scratch directory's name.

I experimented with various fixes that included giving more
ownership over TEST_TMPDIR, but eventually settled on a smaller and more
targeted fix: if the test is being sharded, include the shard index in the
scratch directory name.

I tested this patch by running:

  TEST_TMPDIR=/tmp/kudutest-1000 \
  KUDU_FLAKY_TEST_LIST=<(echo raft_consensus-itest) \
    ctest -j8 -R raft_consensus-itest

In the background, I also induced stress on the machine. This caused
RaftConsensusParamReplicationModesITest.Test_KUDU_1735/1 to flake and fail
from time to time.

Without the patch, the flaky test's scratch directory was always left behind
in TEST_TMPDIR. With it, the directory was always gone, unless the test
failed every time it was run, which is expected behavior.

Change-Id: Ie5686eb7c3a011a94b6c8c4eadbedb23acbf6e60
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <>
M src/kudu/util/
1 file changed, 10 insertions(+), 1 deletion(-)

  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

To view, visit
To unsubscribe, visit

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie5686eb7c3a011a94b6c8c4eadbedb23acbf6e60
Gerrit-Change-Number: 10069
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <>

Reply via email to