Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11374 )

Change subject: build: retry all failed tests
......................................................................


Patch Set 4:

(3 comments)

> Patch Set 3:
>
> (3 comments)
>
> > Ah, actually pre-commit uses build-and-test too, just with enabled 
> > dist-test. So changing the default value in build-and-test.sh does affect 
> > the pre-commit case.
>
> We have to choose between two options: 1) the status quo, wherein only tests 
> from the flaky test list can be retried, or 2) this patch, wherein all tests 
> can be retried. Neither are particularly good for the reasons you've 
> outlined: #1 means we don't let ourselves retry tests that fail once per 
> week, and #2 means we can potentially introduce new flaky tests without 
> noticing at precommit time.
>
> If you're open to modifying test_result_server.py, we can explore other 
> options. For example, we could use the flaky database to figure out if a 
> particular test is "new" (i.e. has no runs) and not retry it, but retry the 
> rest.

I'm still for #2, at the very least for C++ tests, we do have visibility into 
the flakiness of tests. Empirically, it seems like a majority of failed 
pre-commits come from flaky tests that aren't retried.

I'm open to modifying test_result_server.py, maybe by extending the duration of 
failed test tracking (i.e. making the flaky list longer). Also empirically, it 
doesn't seem like "new" flaky tests aren't actually new tests, but tests that 
happen to be flaky less frequently than once a week.

http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/dist_test.py
File build-support/dist_test.py:

http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/dist_test.py@59
PS3, Line 59: NUM_TEST_RETRIES = int(os.environ.get('KUDU_FLAKY_TEST_ATTEMPTS', 
1)) - 1
> Might want to rename/recomment this so it reflects the fact that _any_ fail
Agreed re KUDU_FLAKY_TEST_ATTEMPTS, done


http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/dist_test.py@395
PS3, Line 395:                }] * replicate_tasks
> Can just put FLAKY_TEST_RETRIES directly in here.
Done


http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/run-test.sh
File build-support/run-test.sh:

http://gerrit.cloudera.org:8080/#/c/11374/3/build-support/run-test.sh@71
PS3, Line 71: TEST_EXECUTION_ATTEMPTS=${KUDU_FLAKY_TEST_ATTEMPTS:-1}
> dist-test uses run-test.sh, so won't this cause any dist-test triggered tes
Right, since dist-test doesn't pass in this environment variable, it would've 
been retried 3 times in dist test.



--
To view, visit http://gerrit.cloudera.org:8080/11374
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6fc069527d6f2553a4e04e14c821090f826cbc0a
Gerrit-Change-Number: 11374
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 04 Sep 2018 18:46:26 +0000
Gerrit-HasComments: Yes

Reply via email to