Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11374 )
Change subject: build: retry all failed tests ...................................................................... Patch Set 4: (3 comments) > Patch Set 3: > > (3 comments) > > > Ah, actually pre-commit uses build-and-test too, just with enabled > > dist-test. So changing the default value in build-and-test.sh does affect > > the pre-commit case. > > We have to choose between two options: 1) the status quo, wherein only tests > from the flaky test list can be retried, or 2) this patch, wherein all tests > can be retried. Neither are particularly good for the reasons you've > outlined: #1 means we don't let ourselves retry tests that fail once per > week, and #2 means we can potentially introduce new flaky tests without > noticing at precommit time. > > If you're open to modifying test_result_server.py, we can explore other > options. For example, we could use the flaky database to figure out if a > particular test is "new" (i.e. has no runs) and not retry it, but retry the > rest. I'm still for #2, at the very least for C++ tests, we do have visibility into the flakiness of tests. Empirically, it seems like a majority of failed pre-commits come from flaky tests that aren't retried. I'm open to modifying test_result_server.py, maybe by extending the duration of failed test tracking (i.e. making the flaky list longer). Also empirically, it doesn't seem like "new" flaky tests aren't actually new tests, but tests that happen to be flaky less frequently than once a week. http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/dist_test.py File build-support/dist_test.py: http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/dist_test.py@59 PS3, Line 59: NUM_TEST_RETRIES = int(os.environ.get('KUDU_FLAKY_TEST_ATTEMPTS', 1)) - 1 > Might want to rename/recomment this so it reflects the fact that _any_ fail Agreed re KUDU_FLAKY_TEST_ATTEMPTS, done http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/dist_test.py@395 PS3, Line 395: }] * replicate_tasks > Can just put FLAKY_TEST_RETRIES directly in here. Done http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/run-test.sh File build-support/run-test.sh: http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/run-test.sh@71 PS3, Line 71: TEST_EXECUTION_ATTEMPTS=${KUDU_FLAKY_TEST_ATTEMPTS:-1} > dist-test uses run-test.sh, so won't this cause any dist-test triggered tes Right, since dist-test doesn't pass in this environment variable, it would've been retried 3 times in dist test. -- To view, visit http://gerrit.cloudera.org:8080/11374 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6fc069527d6f2553a4e04e14c821090f826cbc0a Gerrit-Change-Number: 11374 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 04 Sep 2018 18:46:26 +0000 Gerrit-HasComments: Yes
