Yes, I think that this makes more sense than retrying. What is the current behavior when a patch introduces substantial flakiness? E.g. this scenario:
- First test run produces 5 failures. - Second test run produces 5 different failures. - Clean re-run produces no failures. This looks like decent evidence to make the EWS bubble red. I don't know what exactly the threshold should be, but that should certainly be below 30. - Alexey > 3 дек. 2019 г., в 8:40 AM, Aakash Jain <aakash_j...@apple.com> написал(а): > > Hi Everyone, > > We have various layout-tests which are flaky (which sometimes pass and > sometimes fail/crash/timeout). EWS needs to work despite these flaky tests, > and need to be able to tell whether the patch being tested introduced any > test failure or not. > > In EWS, we have logic (same logic in both old and new EWS) on how to deal > with flaky tests. The logic is roughly following: > > - We apply the patch and build. > > - Run layout-tests. During the test-run, we retry the failed tests. If those > tests pass in retry, the run is considered to have no failing test (this > retry part is same for EWS / build.webkit.org / manual-run and part of > run-webkit-test script itself). > > - If a test-run has some failures, EWS retry the test-run one more time (to > check if those test failures were flaky). > > - Then we do one more test-run on clean-tree (without the patch), and compare > the results of first run, second run, and clean tree run. > > - After that we analyze the results from all three test-runs (which we call > first_run, second_run and clean_tree_run). > > > If there are different test failures in first and second runs (i.e.: flaky > test failures), and the patch being tested did not introduce any new test > failure, we retry the entire build (probably hoping that next time we will > not see the flakiness). This retry can result in almost infinite loop, if > someone commits a very flaky test (which is not rare), and would cause EWS to > be almost stuck until the flakiness is fixed. > > From > https://trac.webkit.org/browser/webkit/trunk/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py#L352 > ('Defer' means retrying the build). > ''' > 350 # At this point we know that at least one test flaked, but no > consistent failures > 351 # were introduced. This is a bit of a grey-zone. > 352 return False # Defer patch > ''' > > > Retrying the entire build, just because of few flaky tests seems excessive > and inefficient. This doesn't help identify flaky tests, and only delays the > EWS result. Instead, we should mark the build as SUCCESS (since the patch did > not introduce any new consistent test failure). > > In other EWS test-suites like API tests and JSC tests, we do not retry the > entire build on flaky test failures, we ignore the flaky tests, and only > focus on tests which failed consistently. We should do the similar thing for > layout-tests as well. > > I am going to make that change for layout tests in > https://bugs.webkit.org/show_bug.cgi?id=204769. Please let me know if anyone > has a different opinion. > > Thanks > Aakash
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev