On 24/12/2021 15:00, Michael Catanzaro via webkit-dev wrote: > On Fri, Dec 24 2021 at 12:44:49 AM +0000, Carlos Alberto Lopez Perez via > webkit-dev <webkit-dev@lists.webkit.org> wrote: >> So we ended deploying a different version of the EWS that has a much >> higher tolerance to pre-existent failures (up to 500 before exiting >> early) and also that tries hard to discard pre-existent failures and >> flakies by repeating each failure 10 times with patch and 10 times >> without it. [1] > > Mixed thoughts on this: > > (1) Good job. Having layout tests on EWS is a great improvement. We've > been talking about this for a long time, and you finally made it happen! > > (2) That you needed to use such a big hammer to make the EWS work > reliably suggests either that either WebKitGTK quality or WebKit test > quality is quite low. I'm sure it's a mix of both, but mostly the > former, because test flakiness is not this severe for Apple ports. This > is not encouraging. >
Sorry, but I don't agree with your conclusion about quality. So, let me explain in more detail the factors that contribute to this issue with the tests: 1) Number of unexpected failures on the clean tree The higher number of unexpected failures on the clean tree is caused mainly by the following reasons: 1.1) Until now we didn't have an EWS. So it was pretty hard (if not impossible) for any developer to notice that the patch was going to break GTK tests. This didn't helped to avoid breaking patches landing. 1.2) We don't have a rule to roll-back patches breaking GTK tests. If a patch lands adding unexpected failures for GTK those usually stay there until some of our gardeners have time to fix the issue or mark the new failure as expected. Also having such rule wouldn't have made sense before having an EWS that developers can use. 1.3) We don't have anyone working full-time doing gardening. We try to share the effort between us on a best-effort basis. So unexpected failures once landed can remain there for days until those are gardened. 1.4) Patches landing via commit-queue run layout test on Mac before landing. So a patch won't land if it breaks layout tests on Mac. But it will land anyway if it breaks tests on GTK. 2) Number of unexpected flaky tests 2.1) It is true that we do have a higher number of flaky tests compared to Apple ports. But the flakiness issue is also a problem there. It is not unusual to see the standard EWS giving false positives due to some test being flaky. 2.2) I'm not sure if our higher number of flaky tests is caused by some issue on the code of the port or is just that we don't have enough manpower to be on top of flaky tests on a daily basis and mark any detected flaky test as soon as it is detected. And regarding quality or test quality: 3) Having the results of the layout tests "green" is not synonym of quality. Layout tests giving a "green" or "red" result is not about passing or failing the tests, is just about giving the "expected" result (which can be a failure). A port can have lot of failures marked as "expected failure" or lot of flaky tests marked as "expected flaky" and be more green than other port that has less failures or less flaky tests but not marked. If you want to compare the quality of the ports, then maybe something like wpt.fyi [1] can be more useful than WebKit layout tests, because tests there can't be "expected failures". So it will be only green if it passes the test. And looking ahead to improve things: 4) I expect the number of unexpected failures in the clean tree to start to be more controllable now that we have this EWS working an developers can be notified in advance of a breaking change before landing. 5) The EWS also has now code to detect flaky tests when it does all those runs and repeats, and is sending mails to the bot watchers with the names of all the flaky tests that it detects. We will be gardening those with the idea of reducing the number of unexpected flakies. [1] https://wpt.fyi/results/?label=master&label=experimental&product=safari&product=webkitgtk&aligned > (3) Any plans for WPE? > Yes. We look forward to add WPE testers as soon as possible. Hopefully it will happen in 2022-Q1. Best regards and happy holidays! -------------------------------- [1] https://wpt.fyi/results/?label=master&label=experimental&product=safari&product=webkitgtk&aligned _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev