Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-12 Thread Steve Fink

On 9/12/17 7:02 AM, James Graham wrote:

On 12/09/17 14:55, Andrew Halberstadt wrote:
On Mon, Sep 11, 2017 at 10:33 PM Robert O'Callahan 


wrote:


On Tue, Sep 12, 2017 at 11:38 AM, Andrew Halberstadt <
ahalberst...@mozilla.com> wrote:


I don't think so, that data already exists and is query-able from
ActiveData:
https://activedata.allizom.org/tools/query.html#query_id=8pDOpeni



That query tells you about disabled tests, but doesn't know about 
*why* a

test was disabled. E.g. you can't distinguish tests disabled because
they're not expected to work on some (or all) platforms from tests that
were disabled for intermittent failures that should, in principle, 
be fixed.


Rob



True, though I don't know that gps' proposal would solve that either.

But this is a good idea, and is easy to solve from a technical 
standpoint.

We'd just need to agree on some standard manifest keys:


I'm pretty sure that the problem isn't technical, but actually getting 
people to do that consistently (plus retrofitting the data onto 
thousands of currently disabled tests). You would at least have to add 
a lint and a free pass for all the existing tests. 


If you don't need to get it perfect, you can always do an hg blame and 
look at the commit message that disabled a test, then test it for 
whatever makes sense -- mentioning "intermittent", the author being a 
sheriff, or something similar. Or at least gathering them all together 
so you can skim through them and mark which ones sound like they were 
disabled for being intermittent.


I'm not just handwaving here, or at least not completely -- I have 
something like this set up for semi-automatically generating 
Spidermonkey release notes on changed API functions, where it looks at 
the revision where each function was added or removed and adds the bug 
number to the appropriate list, with some attempt made to avoid 
whitespace changes and reordering within the header files. (It doesn't 
work very well, mind.)


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-12 Thread James Graham

On 12/09/17 14:55, Andrew Halberstadt wrote:

On Mon, Sep 11, 2017 at 10:33 PM Robert O'Callahan 
wrote:


On Tue, Sep 12, 2017 at 11:38 AM, Andrew Halberstadt <
ahalberst...@mozilla.com> wrote:


I don't think so, that data already exists and is query-able from
ActiveData:
https://activedata.allizom.org/tools/query.html#query_id=8pDOpeni



That query tells you about disabled tests, but doesn't know about *why* a
test was disabled. E.g. you can't distinguish tests disabled because
they're not expected to work on some (or all) platforms from tests that
were disabled for intermittent failures that should, in principle, be fixed.

Rob



True, though I don't know that gps' proposal would solve that either.

But this is a good idea, and is easy to solve from a technical standpoint.
We'd just need to agree on some standard manifest keys:


I'm pretty sure that the problem isn't technical, but actually getting 
people to do that consistently (plus retrofitting the data onto 
thousands of currently disabled tests). You would at least have to add a 
lint and a free pass for all the existing tests.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-12 Thread Andrew Halberstadt
On Mon, Sep 11, 2017 at 10:33 PM Robert O'Callahan 
wrote:

> On Tue, Sep 12, 2017 at 11:38 AM, Andrew Halberstadt <
> ahalberst...@mozilla.com> wrote:
>
>> I don't think so, that data already exists and is query-able from
>> ActiveData:
>> https://activedata.allizom.org/tools/query.html#query_id=8pDOpeni
>
>
> That query tells you about disabled tests, but doesn't know about *why* a
> test was disabled. E.g. you can't distinguish tests disabled because
> they're not expected to work on some (or all) platforms from tests that
> were disabled for intermittent failures that should, in principle, be fixed.
>
> Rob
>

True, though I don't know that gps' proposal would solve that either.

But this is a good idea, and is easy to solve from a technical standpoint.
We'd just need to agree on some standard manifest keys:

[test_foo.html]
skip-if = 
reason = {'intermittent', 'fail', ... }
bugs = 1234567, 1234576

When we log the skip, we make sure this metadata makes it into the
structured log. Then tools like ActiveData or a manifest parsing mach
command can query it. Reftest and wpt complicate matters a bit, but we
could figure out something similar for them.

-Andrew
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-11 Thread Robert O'Callahan
On Tue, Sep 12, 2017 at 11:38 AM, Andrew Halberstadt <
ahalberst...@mozilla.com> wrote:

> I don't think so, that data already exists and is query-able from
> ActiveData:
> https://activedata.allizom.org/tools/query.html#query_id=8pDOpeni


That query tells you about disabled tests, but doesn't know about *why* a
test was disabled. E.g. you can't distinguish tests disabled because
they're not expected to work on some (or all) platforms from tests that
were disabled for intermittent failures that should, in principle, be fixed.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-11 Thread Andrew Halberstadt
On Fri, Sep 8, 2017 at 7:10 PM Gregory Szorc  wrote:

> I know we've topic in this topic in the past but I can't recall outcomes.
> Is it worthwhile to define and use a richer test manifest "schema" that
> will facilitate querying and building dashboards so we have better
> visibility into disabled tests?
>

I don't think so, that data already exists and is query-able from
ActiveData:
https://activedata.allizom.org/tools/query.html#query_id=8pDOpeni

We just need to build a dashboard around something like that query. I had
previously written a pure JS tool that did this [1], but it was clunky and
poorly written. But someone with some actual frontend experience could
probably whip something up pretty fast.

Another thing we can (and should) do is write a mach command to trigger the
build system manifest parsing and return the results. This could possibly
be a mode on |mach test|.

[1] https://github.com/ahal/test-informant
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-08 Thread Robert O'Callahan
On Sat, Sep 9, 2017 at 11:09 AM, Gregory Szorc  wrote:

> Is it worthwhile to define and use a richer test manifest "schema" that
> will facilitate querying and building dashboards so we have better
> visibility into disabled tests?
>

It would be great if there was a way to run all tests that were disabled
due to intermittent failures. You could then use rr, and better tools we're
building based on rr, to catch some subset of those failures and resolve
them.

But given that information doesn't exist for all the tests disabled so far
(many tests over many years), it hardly seems worth starting now. Hopefully
it can be deduced by looking up bug metadata associated with commits that
disabled tests.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-08 Thread Gregory Szorc
On Wed, Sep 6, 2017 at 2:10 PM,  wrote:

> Over the last 9 months a few of us have really watched intermittent test
> failures almost daily and done a lot to pester people as well as fix many.
> While there are over 420 bugs that have been fixed since the beginning of
> the year, there are half that many (211+) which have been disabled in some
> form (including turning off the jobs).
>
> We don't like to disable and have been pretty relaxed in recommending
> disabling a test.  Overall we have tried to adhere to a policy of:
> * >=30 failures/week- ask for owner to look at failure and fix it, if this
> persists for a few weeks with no real traction we would go ahead [and
> recommend] disabling it.
> * >= 75 failures/week- ask for people to fix this in a shorter time frame
> and recommend disabling the test in a week or so
> * >= 150 failures/week- often just disable the test
>
> This is confusing and hard to manage.  Since then we have started
> adjusting triage queries and some teams are doing their own triage and we
> are ignoring those bugs (while they are getting prioritized properly).
>
> What we are looking to start doing this month is adopting a simpler policy:
> * any bug that has >=200 instances in the last 30 days will be disabled
> ** this will be a manual process, so it will happen a couple times/week
>
> We expect the outcome of this to be a similar amount of disabling, just an
> easier method for doing so.  It is very possible we might recommend
> disabling a test before it hits the threshold- keep in mind a disabled test
> is easy to re-enable (so feel free to disable for that one platform until
> you have time to look at fixing it)
>
> To be clear we (and some component owners) will continue triaging bugs and
> trying to get fixes in place as often as possible and prefer a fix, not a
> disabled test!
>
> Please raise any concerns, otherwise we will move forward with this in the
> coming weeks.
>

A few replies on this thread revolve around how to determine if a test is
disabled.

Our canonical source of which tests run where is the in-tree test
manifests. e.g. a browser.ini file. These are consumed by moz.build files
and the build system produces a master list of all tests. The scheduling
logic (Taskgraph) turns these into tasks in CI. There is also the
out-of-tree SETA database that can influence test scheduling.

Our mechanism for disabling tests tends to be something like adding
`skip-if os == "win" # Bug 1393121` to a test manifest. This is useful as a
means to accomplish filtering at the test scheduling/execution layer. And
humans can read the metadata ("# Bug XXX") and other comments and commit
history to get a feel for history. But the metadata isn't rich enough to
build useful queries or dashboards. We can find all tests being skipped.
But the manifests themselves are lacking rich metadata around things like
why tests were disabled. This limits our ability to answer various
questions.

I know we've topic in this topic in the past but I can't recall outcomes.
Is it worthwhile to define and use a richer test manifest "schema" that
will facilitate querying and building dashboards so we have better
visibility into disabled tests?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread Daniel Veditz
On Wed, Sep 6, 2017 at 4:53 PM, Emma Humphries  wrote:

> This begs the question, why was that whiteboard tag being used that way?
>

​Surely there are other reasons to disable tests, and people might want to
track those too. If you want to restrict your new keyword to just "disabled
because intermittent" you should put intermittent in the keyword name​
(e.g. intermittent-test-disabled). Personally I'd go ahead and accept that
other people want to flag disabled tests; it's simple enough to query on
both test-disabled and intermittent-failure to get just that subset.

-
​Dan Veditz​
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread Emma Humphries
Andrew Swan noticed that whiteboard tag was showing up on bugs not filed by
the intermittent-test-failure bot (thanks!)

Here's the queries restricted to just bot filed bugs:

https://mzl.la/2eMnMRz (add ons)
https://mzl.la/2eMaPqQ (all core/ffx/toolkit)

This begs the question, why was that whiteboard tag being used that way?


On Wed, Sep 6, 2017 at 4:32 PM, Emma Humphries  wrote:

> Andy
>
> To start:
>
> https://bugzilla.mozilla.org/buglist.cgi?product=Core;
> product=Firefox=Firefox%20for%20Android&
> product=Firefox%20for%20iOS=Toolkit_format=
> advanced=---_whiteboard=test%20disabled%2Ctest-disabled%
> 2Ctestdisabled_whiteboard_type=anywordssubstr=bug_id=0
>
> Add-ons, Web Extensions, and Plugins related:
>
> https://bugzilla.mozilla.org/buglist.cgi?resolution=---;
> status_whiteboard_type=anywordssubstr_format=
> advanced_whiteboard=test%20disabled%2Ctest-disabled%2Ctestdisabled&
> component=Add-on%20Manager=Add-ons%20Manager&
> component=Plug-ins=WebExtensions%3A%20Android&
> component=WebExtensions%3A%20Compatibility=
> WebExtensions%3A%20Developer%20Tools=WebExtensions%3A%
> 20Experiments=WebExtensions%3A%20Frontend&
> component=WebExtensions%3A%20General=WebExtensions%3A%20Request%
> 20Handling=WebExtensions%3A%20Untriaged&
> product=Core=Firefox=Firefox%20for%
> 20Android=Firefox%20for%20iOS=Toolkit
>
> On Wed, Sep 6, 2017 at 3:46 PM, Andrew McKay  wrote:
>
>> Is there an easy way for me to track what tests have been disabled as
>> result of intermittent issues we haven't been able to fix?
>>
>> On 6 September 2017 at 14:10,   wrote:
>> > Over the last 9 months a few of us have really watched intermittent
>> test failures almost daily and done a lot to pester people as well as fix
>> many.  While there are over 420 bugs that have been fixed since the
>> beginning of the year, there are half that many (211+) which have been
>> disabled in some form (including turning off the jobs).
>> >
>> > We don't like to disable and have been pretty relaxed in recommending
>> disabling a test.  Overall we have tried to adhere to a policy of:
>> > * >=30 failures/week- ask for owner to look at failure and fix it, if
>> this persists for a few weeks with no real traction we would go ahead [and
>> recommend] disabling it.
>> > * >= 75 failures/week- ask for people to fix this in a shorter time
>> frame and recommend disabling the test in a week or so
>> > * >= 150 failures/week- often just disable the test
>> >
>> > This is confusing and hard to manage.  Since then we have started
>> adjusting triage queries and some teams are doing their own triage and we
>> are ignoring those bugs (while they are getting prioritized properly).
>> >
>> > What we are looking to start doing this month is adopting a simpler
>> policy:
>> > * any bug that has >=200 instances in the last 30 days will be disabled
>> > ** this will be a manual process, so it will happen a couple times/week
>> >
>> > We expect the outcome of this to be a similar amount of disabling, just
>> an easier method for doing so.  It is very possible we might recommend
>> disabling a test before it hits the threshold- keep in mind a disabled test
>> is easy to re-enable (so feel free to disable for that one platform until
>> you have time to look at fixing it)
>> >
>> > To be clear we (and some component owners) will continue triaging bugs
>> and trying to get fixes in place as often as possible and prefer a fix, not
>> a disabled test!
>> >
>> > Please raise any concerns, otherwise we will move forward with this in
>> the coming weeks.
>> > ___
>> > dev-platform mailing list
>> > dev-platform@lists.mozilla.org
>> > https://lists.mozilla.org/listinfo/dev-platform
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>>
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread Emma Humphries
In order to be more consistent, marking a failing test as disabled should
be done with a keyword instead of a whiteboard tag.

Proposed keyword: test-disabled
Description: Used by test automation team to indicate the failing test
associated with this bug has a failure rate above a threshold level and has
been disabled. To check the consistency of this keyword, bugs with it
keyword should be filed by the intermittent-bug-filer user.

On Wed, Sep 6, 2017 at 2:10 PM,  wrote:

> Over the last 9 months a few of us have really watched intermittent test
> failures almost daily and done a lot to pester people as well as fix many.
> While there are over 420 bugs that have been fixed since the beginning of
> the year, there are half that many (211+) which have been disabled in some
> form (including turning off the jobs).
>
> We don't like to disable and have been pretty relaxed in recommending
> disabling a test.  Overall we have tried to adhere to a policy of:
> * >=30 failures/week- ask for owner to look at failure and fix it, if this
> persists for a few weeks with no real traction we would go ahead [and
> recommend] disabling it.
> * >= 75 failures/week- ask for people to fix this in a shorter time frame
> and recommend disabling the test in a week or so
> * >= 150 failures/week- often just disable the test
>
> This is confusing and hard to manage.  Since then we have started
> adjusting triage queries and some teams are doing their own triage and we
> are ignoring those bugs (while they are getting prioritized properly).
>
> What we are looking to start doing this month is adopting a simpler policy:
> * any bug that has >=200 instances in the last 30 days will be disabled
> ** this will be a manual process, so it will happen a couple times/week
>
> We expect the outcome of this to be a similar amount of disabling, just an
> easier method for doing so.  It is very possible we might recommend
> disabling a test before it hits the threshold- keep in mind a disabled test
> is easy to re-enable (so feel free to disable for that one platform until
> you have time to look at fixing it)
>
> To be clear we (and some component owners) will continue triaging bugs and
> trying to get fixes in place as often as possible and prefer a fix, not a
> disabled test!
>
> Please raise any concerns, otherwise we will move forward with this in the
> coming weeks.
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread Andrew McKay
Cool, thanks.

On 6 September 2017 at 16:32, Emma Humphries  wrote:
> Andy
>
> To start:
>
> https://bugzilla.mozilla.org/buglist.cgi?product=Core=Firefox=Firefox%20for%20Android=Firefox%20for%20iOS=Toolkit_format=advanced=---_whiteboard=test%20disabled%2Ctest-disabled%2Ctestdisabled_whiteboard_type=anywordssubstr=bug_id=0
>
> Add-ons, Web Extensions, and Plugins related:
>
> https://bugzilla.mozilla.org/buglist.cgi?resolution=---_whiteboard_type=anywordssubstr_format=advanced_whiteboard=test%20disabled%2Ctest-disabled%2Ctestdisabled=Add-on%20Manager=Add-ons%20Manager=Plug-ins=WebExtensions%3A%20Android=WebExtensions%3A%20Compatibility=WebExtensions%3A%20Developer%20Tools=WebExtensions%3A%20Experiments=WebExtensions%3A%20Frontend=WebExtensions%3A%20General=WebExtensions%3A%20Request%20Handling=WebExtensions%3A%20Untriaged=Core=Firefox=Firefox%20for%20Android=Firefox%20for%20iOS=Toolkit
>
> On Wed, Sep 6, 2017 at 3:46 PM, Andrew McKay  wrote:
>>
>> Is there an easy way for me to track what tests have been disabled as
>> result of intermittent issues we haven't been able to fix?
>>
>> On 6 September 2017 at 14:10,   wrote:
>> > Over the last 9 months a few of us have really watched intermittent test
>> > failures almost daily and done a lot to pester people as well as fix many.
>> > While there are over 420 bugs that have been fixed since the beginning of
>> > the year, there are half that many (211+) which have been disabled in some
>> > form (including turning off the jobs).
>> >
>> > We don't like to disable and have been pretty relaxed in recommending
>> > disabling a test.  Overall we have tried to adhere to a policy of:
>> > * >=30 failures/week- ask for owner to look at failure and fix it, if
>> > this persists for a few weeks with no real traction we would go ahead [and
>> > recommend] disabling it.
>> > * >= 75 failures/week- ask for people to fix this in a shorter time
>> > frame and recommend disabling the test in a week or so
>> > * >= 150 failures/week- often just disable the test
>> >
>> > This is confusing and hard to manage.  Since then we have started
>> > adjusting triage queries and some teams are doing their own triage and we
>> > are ignoring those bugs (while they are getting prioritized properly).
>> >
>> > What we are looking to start doing this month is adopting a simpler
>> > policy:
>> > * any bug that has >=200 instances in the last 30 days will be disabled
>> > ** this will be a manual process, so it will happen a couple times/week
>> >
>> > We expect the outcome of this to be a similar amount of disabling, just
>> > an easier method for doing so.  It is very possible we might recommend
>> > disabling a test before it hits the threshold- keep in mind a disabled test
>> > is easy to re-enable (so feel free to disable for that one platform until
>> > you have time to look at fixing it)
>> >
>> > To be clear we (and some component owners) will continue triaging bugs
>> > and trying to get fixes in place as often as possible and prefer a fix, not
>> > a disabled test!
>> >
>> > Please raise any concerns, otherwise we will move forward with this in
>> > the coming weeks.
>> > ___
>> > dev-platform mailing list
>> > dev-platform@lists.mozilla.org
>> > https://lists.mozilla.org/listinfo/dev-platform
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread Emma Humphries
Andy

To start:

https://bugzilla.mozilla.org/buglist.cgi?product=Core=Firefox=Firefox%20for%20Android=Firefox%20for%20iOS=Toolkit_format=advanced=---_whiteboard=test%20disabled%2Ctest-disabled%2Ctestdisabled_whiteboard_type=anywordssubstr=bug_id=0

Add-ons, Web Extensions, and Plugins related:

https://bugzilla.mozilla.org/buglist.cgi?resolution=---_whiteboard_type=anywordssubstr_format=advanced_whiteboard=test%20disabled%2Ctest-disabled%2Ctestdisabled=Add-on%20Manager=Add-ons%20Manager=Plug-ins=WebExtensions%3A%20Android=WebExtensions%3A%20Compatibility=WebExtensions%3A%20Developer%20Tools=WebExtensions%3A%20Experiments=WebExtensions%3A%20Frontend=WebExtensions%3A%20General=WebExtensions%3A%20Request%20Handling=WebExtensions%3A%20Untriaged=Core=Firefox=Firefox%20for%20Android=Firefox%20for%20iOS=Toolkit

On Wed, Sep 6, 2017 at 3:46 PM, Andrew McKay  wrote:

> Is there an easy way for me to track what tests have been disabled as
> result of intermittent issues we haven't been able to fix?
>
> On 6 September 2017 at 14:10,   wrote:
> > Over the last 9 months a few of us have really watched intermittent test
> failures almost daily and done a lot to pester people as well as fix many.
> While there are over 420 bugs that have been fixed since the beginning of
> the year, there are half that many (211+) which have been disabled in some
> form (including turning off the jobs).
> >
> > We don't like to disable and have been pretty relaxed in recommending
> disabling a test.  Overall we have tried to adhere to a policy of:
> > * >=30 failures/week- ask for owner to look at failure and fix it, if
> this persists for a few weeks with no real traction we would go ahead [and
> recommend] disabling it.
> > * >= 75 failures/week- ask for people to fix this in a shorter time
> frame and recommend disabling the test in a week or so
> > * >= 150 failures/week- often just disable the test
> >
> > This is confusing and hard to manage.  Since then we have started
> adjusting triage queries and some teams are doing their own triage and we
> are ignoring those bugs (while they are getting prioritized properly).
> >
> > What we are looking to start doing this month is adopting a simpler
> policy:
> > * any bug that has >=200 instances in the last 30 days will be disabled
> > ** this will be a manual process, so it will happen a couple times/week
> >
> > We expect the outcome of this to be a similar amount of disabling, just
> an easier method for doing so.  It is very possible we might recommend
> disabling a test before it hits the threshold- keep in mind a disabled test
> is easy to re-enable (so feel free to disable for that one platform until
> you have time to look at fixing it)
> >
> > To be clear we (and some component owners) will continue triaging bugs
> and trying to get fixes in place as often as possible and prefer a fix, not
> a disabled test!
> >
> > Please raise any concerns, otherwise we will move forward with this in
> the coming weeks.
> > ___
> > dev-platform mailing list
> > dev-platform@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-platform
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread Andrew McKay
Is there an easy way for me to track what tests have been disabled as
result of intermittent issues we haven't been able to fix?

On 6 September 2017 at 14:10,   wrote:
> Over the last 9 months a few of us have really watched intermittent test 
> failures almost daily and done a lot to pester people as well as fix many.  
> While there are over 420 bugs that have been fixed since the beginning of the 
> year, there are half that many (211+) which have been disabled in some form 
> (including turning off the jobs).
>
> We don't like to disable and have been pretty relaxed in recommending 
> disabling a test.  Overall we have tried to adhere to a policy of:
> * >=30 failures/week- ask for owner to look at failure and fix it, if this 
> persists for a few weeks with no real traction we would go ahead [and 
> recommend] disabling it.
> * >= 75 failures/week- ask for people to fix this in a shorter time frame and 
> recommend disabling the test in a week or so
> * >= 150 failures/week- often just disable the test
>
> This is confusing and hard to manage.  Since then we have started adjusting 
> triage queries and some teams are doing their own triage and we are ignoring 
> those bugs (while they are getting prioritized properly).
>
> What we are looking to start doing this month is adopting a simpler policy:
> * any bug that has >=200 instances in the last 30 days will be disabled
> ** this will be a manual process, so it will happen a couple times/week
>
> We expect the outcome of this to be a similar amount of disabling, just an 
> easier method for doing so.  It is very possible we might recommend disabling 
> a test before it hits the threshold- keep in mind a disabled test is easy to 
> re-enable (so feel free to disable for that one platform until you have time 
> to look at fixing it)
>
> To be clear we (and some component owners) will continue triaging bugs and 
> trying to get fixes in place as often as possible and prefer a fix, not a 
> disabled test!
>
> Please raise any concerns, otherwise we will move forward with this in the 
> coming weeks.
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread jmaher
Over the last 9 months a few of us have really watched intermittent test 
failures almost daily and done a lot to pester people as well as fix many.  
While there are over 420 bugs that have been fixed since the beginning of the 
year, there are half that many (211+) which have been disabled in some form 
(including turning off the jobs).

We don't like to disable and have been pretty relaxed in recommending disabling 
a test.  Overall we have tried to adhere to a policy of:
* >=30 failures/week- ask for owner to look at failure and fix it, if this 
persists for a few weeks with no real traction we would go ahead [and 
recommend] disabling it.
* >= 75 failures/week- ask for people to fix this in a shorter time frame and 
recommend disabling the test in a week or so
* >= 150 failures/week- often just disable the test

This is confusing and hard to manage.  Since then we have started adjusting 
triage queries and some teams are doing their own triage and we are ignoring 
those bugs (while they are getting prioritized properly). 

What we are looking to start doing this month is adopting a simpler policy:
* any bug that has >=200 instances in the last 30 days will be disabled
** this will be a manual process, so it will happen a couple times/week

We expect the outcome of this to be a similar amount of disabling, just an 
easier method for doing so.  It is very possible we might recommend disabling a 
test before it hits the threshold- keep in mind a disabled test is easy to 
re-enable (so feel free to disable for that one platform until you have time to 
look at fixing it)

To be clear we (and some component owners) will continue triaging bugs and 
trying to get fixes in place as often as possible and prefer a fix, not a 
disabled test!

Please raise any concerns, otherwise we will move forward with this in the 
coming weeks.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform