Re: [webkit-dev] [PSA] WebKitGTK layout testers available on the Bugzilla EWS bubbles

2021-12-24 Thread Carlos Alberto Lopez Perez via webkit-dev
On 24/12/2021 15:00, Michael Catanzaro via webkit-dev wrote:
> On Fri, Dec 24 2021 at 12:44:49 AM +, Carlos Alberto Lopez Perez via
> webkit-dev  wrote:
>> So we ended deploying a different version of the EWS that has a much
>> higher tolerance to pre-existent failures (up to 500 before exiting
>> early) and also that tries hard to discard pre-existent failures and
>> flakies by repeating each failure 10 times with patch and 10 times
>> without it. [1]
> 
> Mixed thoughts on this:
> 
> (1) Good job. Having layout tests on EWS is a great improvement. We've
> been talking about this for a long time, and you finally made it happen!
> 
> (2) That you needed to use such a big hammer to make the EWS work
> reliably suggests either that either WebKitGTK quality or WebKit test
> quality is quite low. I'm sure it's a mix of both, but mostly the
> former, because test flakiness is not this severe for Apple ports. This
> is not encouraging.
> 


Sorry, but I don't agree with your conclusion about quality.

So, let me explain in more detail the factors that contribute to this
issue with the tests:

  1) Number of unexpected failures on the clean tree

The higher number of unexpected failures on the clean tree is caused
mainly by the following reasons:

  1.1) Until now we didn't have an EWS. So it was pretty hard (if not
  impossible) for any developer to notice that the patch was going to
  break GTK tests. This didn't helped to avoid breaking patches landing.

  1.2) We don't have a rule to roll-back patches breaking GTK tests.
  If a patch lands adding unexpected failures for GTK those usually 
  stay there until some of our gardeners have time to fix the issue
  or mark the new failure as expected. Also having such rule wouldn't
  have made sense before having an EWS that developers can use.

  1.3) We don't have anyone working full-time doing gardening. We try
  to share the effort between us on a best-effort basis. So unexpected
  failures once landed can remain there for days until those are gardened.

  1.4) Patches landing via commit-queue run layout test on Mac before
  landing. So a patch won't land if it breaks layout tests on Mac.
  But it will land anyway if it breaks tests on GTK.

  2) Number of unexpected flaky tests

  2.1) It is true that we do have a higher number of flaky tests compared
  to Apple ports. But the flakiness issue is also a problem there.
  It is not unusual to see the standard EWS giving false positives due
  to some test being flaky.

  2.2) I'm not sure if our higher number of flaky tests is caused by
  some issue on the code of the port or is just that we don't have 
  enough manpower to be on top of flaky tests on a daily basis and
  mark any detected flaky test as soon as it is detected.


And regarding quality or test quality:

  3) Having the results of the layout tests "green" is not synonym of quality.
  Layout tests giving a "green" or "red" result is not about passing or failing
  the tests, is just about giving the "expected" result (which can be a 
failure).
  A port can have lot of failures marked as "expected failure" or lot
  of flaky tests marked as "expected flaky" and be more green than
  other port that has less failures or less flaky tests but not marked.

  If you want to compare the quality of the ports, then maybe something like
  wpt.fyi [1] can be more useful than WebKit layout tests, because tests there
   can't be "expected failures". So it will be only green if it passes the test.


And looking ahead to improve things:

  4) I expect the number of unexpected failures in the clean tree to start
  to be more controllable now that we have this EWS working an developers
  can be notified in advance of a breaking change before landing.

  5) The EWS also has now code to detect flaky tests when it does all those
  runs and repeats, and is sending mails to the bot watchers with the names
  of all the flaky tests that it detects. We will be gardening those with
  the idea of reducing the number of unexpected flakies.


[1] 
https://wpt.fyi/results/?label=master=experimental=safari=webkitgtk

> (3) Any plans for WPE?
> 

Yes. We look forward to add WPE testers as soon as possible. Hopefully it will 
happen in 2022-Q1.


Best regards and happy holidays!


[1] 
https://wpt.fyi/results/?label=master=experimental=safari=webkitgtk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [PSA] WebKitGTK layout testers available on the Bugzilla EWS bubbles

2021-12-24 Thread Michael Catanzaro via webkit-dev
On Fri, Dec 24 2021 at 12:44:49 AM +, Carlos Alberto Lopez Perez 
via webkit-dev  wrote:

So we ended deploying a different version of the EWS that has a much
higher tolerance to pre-existent failures (up to 500 before exiting
early) and also that tries hard to discard pre-existent failures and
flakies by repeating each failure 10 times with patch and 10 times
without it. [1]


Mixed thoughts on this:

(1) Good job. Having layout tests on EWS is a great improvement. We've 
been talking about this for a long time, and you finally made it happen!


(2) That you needed to use such a big hammer to make the EWS work 
reliably suggests either that either WebKitGTK quality or WebKit test 
quality is quite low. I'm sure it's a mix of both, but mostly the 
former, because test flakiness is not this severe for Apple ports. This 
is not encouraging.


(3) Any plans for WPE?

Anyway, I agree this was the best approach given the current situation.

Happy holidays,

Michael


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev