Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11374 )

Change subject: build: retry failed tests by default
......................................................................


Patch Set 1:

(5 comments)

Summarizing/reposting some discussion we had on Slack. I'm aware of a few 
places we currently run all tests, configured in the following ways:
1. during pre-commit, we run all tests with dist test and retry Java tests up 
to 3 times
2. periodic runs of all tests, we run all tests with dist test and retry Java 
tests up to 3 times
3. flaky-test exploration, we run all tests without dist test, and retry flaky 
tests (C++ only) up to 3 times, and Java tests up to 3 times
4. flaky-test drilldown, we run only the flaky tests without dist test, and 
retry flaky tests (C++ only) up to 3 times (no Java tests get run, at least 
until we start reporting Java flakes)

Regarding each point and how they would change if the default for dist-test vs 
non-dist-test runs were to be bumped.
1. it's important be mindful of introducing new flakies. There have been cases 
where a Java flaky test was introduced, and wasn't caught in pre-commit because 
we retry Java tests. If the default for dist test were bumped, there would be 
less friction in pre-commit flakes, but our flaky-catching would be only as 
good as our vigilance of the flaky dashboard.
2. similar to the pre-commit case, but by default we'd expect lower failure 
rates.
3. bumping the non-dist-test retry default, would go unchanged since flaky 
tests are already retried up to 3 times.
4. bumping the non-dist-test retry default, non-flaky tests would now be 
retried up to 3 times.

The current PS2 does _not_ update the dist-test default because:
- Developers actively use dist-test to test potentially broken tests. Retrying 
in these cases by default doesn't seem helpful.
- The concerns about pre-commit made in the above bullet 1.
- The bonus in bullet 2 isn't always a bonus. If we want to see this, it should 
be an active decision by setting KUDU_FLAKY_TEST_ATTEMPTS.

http://gerrit.cloudera.org:8080/#/c/11374/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11374/1//COMMIT_MSG@13
PS1, Line 13: retry tests all tests
> Nit: retry all failed tests
Done


http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/dist_test.py
File build-support/dist_test.py:

http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/dist_test.py@61
PS1, Line 61: # Whether to retry all failed C++ tests, rather than just known 
flaky tests.
            : # Since Java flaky tests are not reported by the test server, 
Java tests are
            : # always retried, regardless of this value.
            : RETRY_ALL_TESTS = 
int(os.environ.get('KUDU_RETRY_ALL_FAILED_TESTS', 0))
> This no longer exists, right? I mean, we no longer read this env var in run
Yeah, good catch.


http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/dist_test.py@376
PS1, Line 376:                      flaky_test_set=set()):
> Doesn't seem like we're using this anymore either.
Done


http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/jenkins/build-and-test.sh@40
PS1, Line 40: #   KUDU_FLAKY_TEST_ATTEMPTS  Default: 1
> Shouldn't we default this to 3 to honor the intent of this change?
Done. I didn't update the default in dist_test.py, since that's widely used for 
testing things that may well be broken/very flaky.


http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/run-test.sh
File build-support/run-test.sh:

http://gerrit.cloudera.org:8080/#/c/11374/1/build-support/run-test.sh@27
PS1, Line 27: This can be used in the gerrit workflow
            : # to prevent annoying false-negatives.
> This script feels a little too far removed from gerrit to mention it.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11374
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6fc069527d6f2553a4e04e14c821090f826cbc0a
Gerrit-Change-Number: 11374
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Sat, 01 Sep 2018 00:54:32 +0000
Gerrit-HasComments: Yes

Reply via email to