Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11374 )
Change subject: build: retry failed tests by default ...................................................................... Patch Set 1: (5 comments) Summarizing/reposting some discussion we had on Slack. I'm aware of a few places we currently run all tests, configured in the following ways: 1. during pre-commit, we run all tests with dist test and retry Java tests up to 3 times 2. periodic runs of all tests, we run all tests with dist test and retry Java tests up to 3 times 3. flaky-test exploration, we run all tests without dist test, and retry flaky tests (C++ only) up to 3 times, and Java tests up to 3 times 4. flaky-test drilldown, we run only the flaky tests without dist test, and retry flaky tests (C++ only) up to 3 times (no Java tests get run, at least until we start reporting Java flakes) Regarding each point and how they would change if the default for dist-test vs non-dist-test runs were to be bumped. 1. it's important be mindful of introducing new flakies. There have been cases where a Java flaky test was introduced, and wasn't caught in pre-commit because we retry Java tests. If the default for dist test were bumped, there would be less friction in pre-commit flakes, but our flaky-catching would be only as good as our vigilance of the flaky dashboard. 2. similar to the pre-commit case, but by default we'd expect lower failure rates. 3. bumping the non-dist-test retry default, would go unchanged since flaky tests are already retried up to 3 times. 4. bumping the non-dist-test retry default, non-flaky tests would now be retried up to 3 times. The current PS2 does _not_ update the dist-test default because: - Developers actively use dist-test to test potentially broken tests. Retrying in these cases by default doesn't seem helpful. - The concerns about pre-commit made in the above bullet 1. - The bonus in bullet 2 isn't always a bonus. If we want to see this, it should be an active decision by setting KUDU_FLAKY_TEST_ATTEMPTS. http://gerrit.cloudera.org:8080/#/c/11374/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11374/1//COMMIT_MSG@13 PS1, Line 13: retry tests all tests > Nit: retry all failed tests Done http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/dist_test.py File build-support/dist_test.py: http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/dist_test.py@61 PS1, Line 61: # Whether to retry all failed C++ tests, rather than just known flaky tests. : # Since Java flaky tests are not reported by the test server, Java tests are : # always retried, regardless of this value. : RETRY_ALL_TESTS = int(os.environ.get('KUDU_RETRY_ALL_FAILED_TESTS', 0)) > This no longer exists, right? I mean, we no longer read this env var in run Yeah, good catch. http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/dist_test.py@376 PS1, Line 376: flaky_test_set=set()): > Doesn't seem like we're using this anymore either. Done http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/jenkins/build-and-test.sh@40 PS1, Line 40: # KUDU_FLAKY_TEST_ATTEMPTS Default: 1 > Shouldn't we default this to 3 to honor the intent of this change? Done. I didn't update the default in dist_test.py, since that's widely used for testing things that may well be broken/very flaky. http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/run-test.sh File build-support/run-test.sh: http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/run-test.sh@27 PS1, Line 27: This can be used in the gerrit workflow : # to prevent annoying false-negatives. > This script feels a little too far removed from gerrit to mention it. Done -- To view, visit http://gerrit.cloudera.org:8080/11374 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6fc069527d6f2553a4e04e14c821090f826cbc0a Gerrit-Change-Number: 11374 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Sat, 01 Sep 2018 00:54:32 +0000 Gerrit-HasComments: Yes
