Yes, I think that this makes more sense than retrying.

What is the current behavior when a patch introduces substantial flakiness? 
E.g. this scenario:

- First test run produces 5 failures.
- Second test run produces 5 different failures.
- Clean re-run produces no failures.

This looks like decent evidence to make the EWS bubble red. I don't know what 
exactly the threshold should be, but that should certainly be below 30.

- Alexey

> 3 дек. 2019 г., в 8:40 AM, Aakash Jain <> написал(а):
> Hi Everyone,
> We have various layout-tests which are flaky (which sometimes pass and 
> sometimes fail/crash/timeout). EWS needs to work despite these flaky tests, 
> and need to be able to tell whether the patch being tested introduced any 
> test failure or not.
> In EWS, we have logic (same logic in both old and new EWS) on how to deal 
> with flaky tests. The logic is roughly following:
> - We apply the patch and build.
> - Run layout-tests. During the test-run, we retry the failed tests. If those 
> tests pass in retry, the run is considered to have no failing test (this 
> retry part is same for EWS / / manual-run and part of 
> run-webkit-test script itself).
> - If a test-run has some failures, EWS retry the test-run one more time (to 
> check if those test failures were flaky). 
> - Then we do one more test-run on clean-tree (without the patch), and compare 
> the results of first run, second run, and clean tree run.
> - After that we analyze the results from all three test-runs (which we call 
> first_run, second_run and clean_tree_run). 
> If there are different test failures in first and second runs (i.e.: flaky 
> test failures), and the patch being tested did not introduce any new test 
> failure, we retry the entire build (probably hoping that next time we will 
> not see the flakiness). This retry can result in almost infinite loop, if 
> someone commits a very flaky test (which is not rare), and would cause EWS to 
> be almost stuck until the flakiness is fixed.
> From 
>   ('Defer' means retrying the build).
> '''
> 350    # At this point we know that at least one test flaked, but no 
> consistent failures
> 351    # were introduced. This is a bit of a grey-zone.
> 352    return False  # Defer patch
> '''
> Retrying the entire build, just because of few flaky tests seems excessive 
> and inefficient. This doesn't help identify flaky tests, and only delays the 
> EWS result. Instead, we should mark the build as SUCCESS (since the patch did 
> not introduce any new consistent test failure).
> In other EWS test-suites like API tests and JSC tests, we do not retry the 
> entire build on flaky test failures, we ignore the flaky tests, and only 
> focus on tests which failed consistently. We should do the similar thing for 
> layout-tests as well.
> I am going to make that change for layout tests in 
> Please let me know if anyone 
> has a different opinion.
> Thanks
> Aakash

webkit-dev mailing list

Reply via email to