Yes, I think that this makes more sense than retrying.

What is the current behavior when a patch introduces substantial flakiness? 
E.g. this scenario:

- First test run produces 5 failures.
- Second test run produces 5 different failures.
- Clean re-run produces no failures.

This looks like decent evidence to make the EWS bubble red. I don't know what 
exactly the threshold should be, but that should certainly be below 30.

- Alexey


> 3 дек. 2019 г., в 8:40 AM, Aakash Jain <aakash_j...@apple.com> написал(а):
> 
> Hi Everyone,
> 
> We have various layout-tests which are flaky (which sometimes pass and 
> sometimes fail/crash/timeout). EWS needs to work despite these flaky tests, 
> and need to be able to tell whether the patch being tested introduced any 
> test failure or not.
> 
> In EWS, we have logic (same logic in both old and new EWS) on how to deal 
> with flaky tests. The logic is roughly following:
> 
> - We apply the patch and build.
> 
> - Run layout-tests. During the test-run, we retry the failed tests. If those 
> tests pass in retry, the run is considered to have no failing test (this 
> retry part is same for EWS / build.webkit.org / manual-run and part of 
> run-webkit-test script itself).
> 
> - If a test-run has some failures, EWS retry the test-run one more time (to 
> check if those test failures were flaky). 
> 
> - Then we do one more test-run on clean-tree (without the patch), and compare 
> the results of first run, second run, and clean tree run.
> 
> - After that we analyze the results from all three test-runs (which we call 
> first_run, second_run and clean_tree_run). 
> 
> 
> If there are different test failures in first and second runs (i.e.: flaky 
> test failures), and the patch being tested did not introduce any new test 
> failure, we retry the entire build (probably hoping that next time we will 
> not see the flakiness). This retry can result in almost infinite loop, if 
> someone commits a very flaky test (which is not rare), and would cause EWS to 
> be almost stuck until the flakiness is fixed.
> 
> From 
> https://trac.webkit.org/browser/webkit/trunk/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py#L352
>   ('Defer' means retrying the build).
> '''
> 350    # At this point we know that at least one test flaked, but no 
> consistent failures
> 351    # were introduced. This is a bit of a grey-zone.
> 352    return False  # Defer patch
> '''
> 
> 
> Retrying the entire build, just because of few flaky tests seems excessive 
> and inefficient. This doesn't help identify flaky tests, and only delays the 
> EWS result. Instead, we should mark the build as SUCCESS (since the patch did 
> not introduce any new consistent test failure).
> 
> In other EWS test-suites like API tests and JSC tests, we do not retry the 
> entire build on flaky test failures, we ignore the flaky tests, and only 
> focus on tests which failed consistently. We should do the similar thing for 
> layout-tests as well.
> 
> I am going to make that change for layout tests in 
> https://bugs.webkit.org/show_bug.cgi?id=204769. Please let me know if anyone 
> has a different opinion.
> 
> Thanks
> Aakash


_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to