[webkit-dev] Handling flaky layout-test failures in EWS

Aakash Jain Tue, 03 Dec 2019 08:44:46 -0800

Hi Everyone,

We have various layout-tests which are flaky (which sometimes pass and 
sometimes fail/crash/timeout). EWS needs to work despite these flaky tests, and 
need to be able to tell whether the patch being tested introduced any test 
failure or not.

In EWS, we have logic (same logic in both old and new EWS) on how to deal with
flaky tests. The logic is roughly following:

- We apply the patch and build.

- Run layout-tests. During the test-run, we retry the failed tests. If those
tests pass in retry, the run is considered to have no failing test (this retry
part is same for EWS / build.webkit.org / manual-run and part of
run-webkit-test script itself).

- If a test-run has some failures, EWS retry the test-run one more time (to
check if those test failures were flaky).

- Then we do one more test-run on clean-tree (without the patch), and compare
the results of first run, second run, and clean tree run.

- After that we analyze the results from all three test-runs (which we call
first_run, second_run and clean_tree_run).

If there are different test failures in first and second runs (i.e.: flaky test
failures), and the patch being tested did not introduce any new test failure,
we retry the entire build (probably hoping that next time we will not see the
flakiness). This retry can result in almost infinite loop, if someone commits a
very flaky test (which is not rare), and would cause EWS to be almost stuck
until the flakiness is fixed.

>From
>https://trac.webkit.org/browser/webkit/trunk/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py#L352
> ('Defer' means retrying the build).
'''
350 # At this point we know that at least one test flaked, but no consistent
failures
351 # were introduced. This is a bit of a grey-zone.
352 return False # Defer patch
'''

Retrying the entire build, just because of few flaky tests seems excessive and
inefficient. This doesn't help identify flaky tests, and only delays the EWS
result. Instead, we should mark the build as SUCCESS (since the patch did not
introduce any new consistent test failure).

In other EWS test-suites like API tests and JSC tests, we do not retry the
entire build on flaky test failures, we ignore the flaky tests, and only focus
on tests which failed consistently. We should do the similar thing for
layout-tests as well.

I am going to make that change for layout tests in
https://bugs.webkit.org/show_bug.cgi?id=204769. Please let me know if anyone
has a different opinion.

Thanks
Aakash
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

[webkit-dev] Handling flaky layout-test failures in EWS

Reply via email to