Re: Visibility of disabled tests

2020-01-15 Thread Botond Ballo
On Sat, Jan 11, 2020 at 10:38 AM James Graham  wrote:
> So in addition to the specific changes for intermittent handling, can we
> document how one nominates a bug for retriage in general (or point me at
> those docs if they already exist)

My understanding is that clearing the "priority" field is a way to
nominate a bug for retriage, since triage dashboards like [1] use the
presence of a specified priority as an indication that a bug has been
triaged.

Cheers,
Botond

[1] https://mozilla.github.io/triage-center/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visibility of disabled tests

2020-01-15 Thread Botond Ballo
On Sat, Jan 11, 2020 at 10:38 AM Geoffrey Brown  wrote:
> It might be helpful if we explicitly consider some special cases. If the
> sheriffs have needinfo'd for "needswork" and that needinfo has been
> cleared, do we  want to set needinfo again when disabling? Always? If the
> triage owner has a huge needinfo queue, still needinfo? ...

The test's author, if still active on the project, may be a good
alternative person to needinfo.

Cheers,
Botond
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visibility of disabled tests

2020-01-11 Thread Geoffrey Brown
Thanks Johann. I agree it is important that we try to fix tests that have
been disabled. I think the sheriffs usually needinfo the triage owner
before/when disabling a test; I'm disappointed to hear that isn't happening
consistently.

However, I'd prefer not to change the review process for the disabling
patch. Currently sheriffs normally request review from
#intermittent-reviewers and that has been working well:
 - we strive for very low latency so that frequently failing tests can be
addressed right away
 - we watch for common errors in test manifests
 - we can help ensure consistency in the test disabling procedure.

Keep in mind that sheriffs also needinfo (typically the triage owner) when
a test is identified as "needswork", failing frequently but not yet at the
disabling threshold. Often those needinfo requests go unanswered or fail to
resolve the issue (no shaming here: we are all busy and have priorities). I
think that requesting review from test author/triage owner/component peer
risks adding another 2 day delay to the overall process -- more time where
those tests are failing.

Instead of changing the reviewers, how about:
 - we remind the sheriffs to needinfo
 - #intermittent-reviewers check that needinfo is in place when reviewing
disabling patches.

It might be helpful if we explicitly consider some special cases. If the
sheriffs have needinfo'd for "needswork" and that needinfo has been
cleared, do we  want to set needinfo again when disabling? Always? If the
triage owner has a huge needinfo queue, still needinfo? ...


Regarding regression finding, as I understand it, sheriffs currently look
for regression ranges for bugs where:
 - the test is failing frequently: since these are easier to verify
pass/fail on any push
 - the test was running reliably in the near past.
In my experience, a comment on the bug requesting a regression range can be
effective. I don't know if the sheriffs have much time for additional
regression searches.

 - Geoff

On Tue, Jan 7, 2020 at 6:29 AM Johann Hofmann  wrote:

> Hi folks,
>
> in the past I and other triage owners have experienced some frequently
> failing tests being disabled without a clear notice to the triage owner,
> component owner or test author. I've seen this specific pattern a few times:
>
> - An intermittent test starts failing very frequently very suddenly.
> - The Stockwell team reacts quickly (which is good) and disables the test,
> getting review from another sheriff or member of their team.
> - No analysis is done on the possible cause or regressing bug
> - The intermittent bug is left open without needinfo to anyone who could
> fix the test (some even with a P5 priority).
>
> This is problematic, since a) we're losing test coverage that way and b)
> these tests might be failing frequently because there's actually something
> wrong with the feature, not just a test issue.
>
> In most cases these get discovered sooner or later so I don't want to make
> this issue bigger than it is, but it's still suboptimal for some of us. It
> seems like we could easily remedy this by introducing a policy like:
>
> *For disabling tests, review from the test author, triage owner or a
> component peer is required. If they do not respond within 2? business days
> or if the frequency is higher than x, the test may be disabled without
> their consent, but the triage owner *must* be needinfo'd on such a bug in
> this case.*
>
> It would also be extremely helpful if Sheriffs could post a possible
> regression range for the frequent intermittent when disabling, where
> possible (because I assume that's also the best time to do a regression
> range).
>
> Any thoughts?
>
> Cheers.
>
> Johann
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visibility of disabled tests

2020-01-11 Thread Andrew Sutherland

On 1/8/20 12:50 PM, Geoffrey Brown wrote:

Instead of changing the reviewers, how about:
 - we remind the sheriffs to needinfo
 - #intermittent-reviewers check that needinfo is in place when 
reviewing disabling patches.


To try and help address the visibility issue, we could also make 
searchfox consume the test-info-disabled-by-os taskcluster task[1] and:


- put banners at the top of test files that say "Hey!  This is 
(sometimes) disabled on [android/linux/mac/windows]"


- put collapsible sections at the top of the directory listings that 
explicitly call out the disabled tests in that directory. The idea would 
be to avoid people needing to scroll through the potentially long list 
of files to know which are disabled and provide some ambient awareness 
of disabled tests.


If there's a way to get a similarly pre-built[2] mapping that would 
provide information about the orange factor of tests[3] or that it's 
been marked as needswork, that could also potentially be surfaced.


Andrew

1: 
https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/taskcluster/ci/source-test/file-metadata.yml#62


2: Emphasis on pre-built in the sense that searchfox's processing 
pipeline really doesn't want to be issuing a bunch of dynamic REST 
queries that would add to its processing latency.  It would want a 
taskcluster job that runs as part of the nightly build process so it can 
fetch a JSON blob at full network speed.


3: I guess test-info-all at 
https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/taskcluster/ci/source-test/file-metadata.yml#91 
does provide the "failed runs" count and "total runs" which could 
provide the orange factor?  The "total run time, seconds" scaled by the 
"total runs" would definitely be interesting to surface in searchfox.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visibility of disabled tests

2020-01-11 Thread James Graham

On 09/01/2020 10:46, David Burns wrote:
I think a lot of the problem is not necessarily a technical issue, 
meaning I am not sure that tooling will solve the problem, but it is 
more of a social problem.


To exapnd a little on this; we've had various attempts at making 
disabled tests more visible. ahal had "Test Informant" which for a while 
was giving weekly reports on how many tests were being newly disabled 
and links to full details on all disabled tests. For wpt the interop 
dashboard currently shows the number of disabled tests per component 
(e.g. [1]). So far I don't think we've seen great success from any of 
these efforts; I don't have precise data but the general pattern is that 
almost all tests that are disabled remain disabled indefinitely.


Adding data to searchfox is an interesting alternative that I hadn't 
previously considered; it would make the data ambiently available to 
people looking at the tests/code rather than requiring specific action 
to look at a dashboard or read a recurring email. So it definitely seems 
like it could be worth experimenting with that. But as David says, a lot 
of the problem is in the disconnect between knowing that an issue exists 
and giving priority to actually fixing the issue.


[1] 
https://jgraham.github.io/wptdash/?tab=Gecko+Data=core%3A%3Adom%3A+core+%26+html

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visibility of disabled tests

2020-01-11 Thread James Graham




On 07/01/2020 13:29, Johann Hofmann wrote:

/For disabling tests, review from the test author, triage owner or a 
component peer is required. If they do not respond within 2? business 
days or if the frequency is higher than x, the test may be disabled 
without their consent, but the triage owner *must* be needinfo'd on such 
a bug in this case./


This seems like a specific case of a more general problem.

Sometimes additional information comes up which means that a bug needs 
to be retriaged. For example a bug that's now observed to affect many 
users, or one that has a previously unknown web-compat impact. An 
intermittent becoming problematically frequent seems to clearly fit into 
this general category. So the process should be whatever the normal 
process is for the case where there's additional information that needs 
to be assessed by the triage owner. It's possible that "needinfo the 
triage owner" is indeed what one is supposed to do in such a case, but I 
can't find where that's documented; afaict the triage document at [1] 
doesn't mention the possibility of bugs returning to triage.


So in addition to the specific changes for intermittent handling, can we 
document how one nominates a bug for retriage in general (or point me at 
those docs if they already exist) and document some of the cases where 
retriage is appropriate.


[1] https://firefox-bug-handling.mozilla.org/triage-bugzilla

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visibility of disabled tests

2020-01-11 Thread David Burns
I think a lot of the problem is not necessarily a technical issue, meaning
I am not sure that tooling will solve the problem, but it is more of a
social problem.

I think the problem is making sure that items are triaged and placed into
peoples workflow sooner would solve this problem but I also appreciate the
difficulty when there are competing priorities within teams.

This was a problem I started working on last quarter, mostly for web
platform tests, and would like to continue it this quarter. I am going to
be reaching out to more people over the quarter to see if we can solve
this. If you would like to be part of the process please let me know and I
will schedule an interview with you.

David

On Thu, 9 Jan 2020 at 00:28, Andrew Sutherland 
wrote:

> On 1/8/20 12:50 PM, Geoffrey Brown wrote:
> > Instead of changing the reviewers, how about:
> >  - we remind the sheriffs to needinfo
> >  - #intermittent-reviewers check that needinfo is in place when
> > reviewing disabling patches.
>
> To try and help address the visibility issue, we could also make
> searchfox consume the test-info-disabled-by-os taskcluster task[1] and:
>
> - put banners at the top of test files that say "Hey!  This is
> (sometimes) disabled on [android/linux/mac/windows]"
>
> - put collapsible sections at the top of the directory listings that
> explicitly call out the disabled tests in that directory. The idea would
> be to avoid people needing to scroll through the potentially long list
> of files to know which are disabled and provide some ambient awareness
> of disabled tests.
>
> If there's a way to get a similarly pre-built[2] mapping that would
> provide information about the orange factor of tests[3] or that it's
> been marked as needswork, that could also potentially be surfaced.
>
> Andrew
>
> 1:
>
> https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/taskcluster/ci/source-test/file-metadata.yml#62
>
> 2: Emphasis on pre-built in the sense that searchfox's processing
> pipeline really doesn't want to be issuing a bunch of dynamic REST
> queries that would add to its processing latency.  It would want a
> taskcluster job that runs as part of the nightly build process so it can
> fetch a JSON blob at full network speed.
>
> 3: I guess test-info-all at
>
> https://searchfox.org/mozilla-central/rev/be7d1f2d52dd9474ca2df145190a817614c924e4/taskcluster/ci/source-test/file-metadata.yml#91
> does provide the "failed runs" count and "total runs" which could
> provide the orange factor?  The "total run time, seconds" scaled by the
> "total runs" would definitely be interesting to surface in searchfox.
>
> ___
> firefox-dev mailing list
> firefox-...@mozilla.org
> https://mail.mozilla.org/listinfo/firefox-dev
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Visibility of disabled tests

2020-01-08 Thread Johann Hofmann
Hi folks,

in the past I and other triage owners have experienced some frequently
failing tests being disabled without a clear notice to the triage owner,
component owner or test author. I've seen this specific pattern a few times:

- An intermittent test starts failing very frequently very suddenly.
- The Stockwell team reacts quickly (which is good) and disables the test,
getting review from another sheriff or member of their team.
- No analysis is done on the possible cause or regressing bug
- The intermittent bug is left open without needinfo to anyone who could
fix the test (some even with a P5 priority).

This is problematic, since a) we're losing test coverage that way and b)
these tests might be failing frequently because there's actually something
wrong with the feature, not just a test issue.

In most cases these get discovered sooner or later so I don't want to make
this issue bigger than it is, but it's still suboptimal for some of us. It
seems like we could easily remedy this by introducing a policy like:

*For disabling tests, review from the test author, triage owner or a
component peer is required. If they do not respond within 2? business days
or if the frequency is higher than x, the test may be disabled without
their consent, but the triage owner *must* be needinfo'd on such a bug in
this case.*

It would also be extremely helpful if Sheriffs could post a possible
regression range for the frequent intermittent when disabling, where
possible (because I assume that's also the best time to do a regression
range).

Any thoughts?

Cheers.

Johann
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform