I want to express my thanks to everyone who contributed to this thread. We
have a lot of passionate and smart people who care about this topic- thanks
again for weighing in so far.
Below is a slightly updated policy from the original, and following that is an
attempt to summarize the thread
On Tue, Apr 15, 2014 at 6:21 AM, jmaher joel.ma...@gmail.com wrote:
This policy will define an escalation path for when a single test case is
identified to be leaking or failing and is causing enough disruption on the
trees. Disruption is defined as:
1) Test case is on the list of top 20
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote:
On Tue, Apr 15, 2014 at 6:21 AM, jmaher joel.ma...@gmail.com wrote:
This policy will define an escalation path for when a single test case is
identified to be leaking or failing and is causing enough disruption on the
trees.
Thank you for putting this together. It is important.
jmaher writes:
This policy will define an escalation path for when a single test case is
identified to be leaking or failing and is causing enough disruption on the
trees.
Exceptions:
1) If this test has landed (or been modified) in
On 2014-04-08, 6:10 PM, Karl Tomlinson wrote:
I wonder whether the real problem here is that we have too many
bad tests that report false negatives, and these bad tests are
reducing the value of our testsuite in general. Tests also need
to be well documented so that people can understand what a
On 4/8/14, 6:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote:
The simple solution is to have a separate in-tree manifest
annotation for intermittents. Put another way, we can describe
exactly why we are not running a test. This is kinda/sorta the realm
of bug 922581.
The harder solution is to
On 4/9/14, 11:29 AM, L. David Baron wrote:
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote:
The simple solution is to have a separate in-tree manifest
annotation for intermittents. Put another way, we can describe
exactly why we are not running a test. This is kinda/sorta the realm
of
Gregory Szorc writes:
2) Run marked intermittent tests multiple times. If it works all
25 times, fail the test run for inconsistent metadata.
We need to consider intermittently failing tests as failed, and we
need to only test things that always pass.
We can't rely on statistics to tell us
On 4/9/14, 2:07 PM, Karl Tomlinson wrote:
Gregory Szorc writes:
2) Run marked intermittent tests multiple times. If it works all
25 times, fail the test run for inconsistent metadata.
We need to consider intermittently failing tests as failed, and we
need to only test things that always
On 4/9/14, 11:48 AM, Gregory Szorc wrote:
I feel a lot of people just shrug shoulders and allow the test to be
disabled (I'm guilty of it as much as anyone). From my perspective, it's
difficult to convince the powers at be that fixing intermittent failures
(that have been successfully swept
On 2014-04-09, 6:46 PM, Chris Peterson wrote:
On 4/9/14, 11:48 AM, Gregory Szorc wrote:
I feel a lot of people just shrug shoulders and allow the test to be
disabled (I'm guilty of it as much as anyone). From my perspective, it's
difficult to convince the powers at be that fixing intermittent
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
What you're saying above is true *if* someone investigates the intermittent
test failure and determines that the bug is not important. But in my
experience, that's not what happens at all. I think many people treat
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch
On 2014-04-08, 9:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still
On 2014-04-08, 8:15 AM, Aryeh Gregor wrote:
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
What you're saying above is true *if* someone investigates the intermittent
test failure and determines that the bug is not important. But in my
experience, that's not what
On 08/04/14 15:06, Ehsan Akhgari wrote:
On 2014-04-08, 9:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org
wrote:
If a bug is causing a test to fail
I see only two real goals for the proposed policy:
- ensure that module owners/peers have the opportunity to object to
any disable test decisions before they take effect
- set an expectation that intermittent orange failures are dealt with
promptly (dealt with first involves investigation, usually
On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote:
I see only two real goals for the proposed policy:
- ensure that module owners/peers have the opportunity to object to
any disable test decisions before they take effect
- set an expectation that intermittent orange failures are dealt with
On 2014-04-08, 3:15 PM, Chris Peterson wrote:
On 4/8/14, 11:41 AM, Gavin Sharp wrote:
Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But
Aryeh Gregor writes:
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
What you're saying above is true *if* someone investigates the
intermittent test failure and determines that the bug is not
important. But in my experience, that's not what happens at
all. I
On 07/04/14 04:33, Andrew Halberstadt wrote:
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari
ehsan.akhg...@gmail.com wrote:
Note that is only accurate to a certain point. There are other
things which
we can do to guesswork our way out of the situation
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt
ahalberst...@mozilla.com wrote:
Many of our test runners have that ability. But doing this implies that
intermittents are always the fault of the test. We'd be missing whole
classes of regressions (notably race conditions).
We already are,
On 07/04/14 05:10 AM, James Graham wrote:
On 07/04/14 04:33, Andrew Halberstadt wrote:
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
ahalberst...@mozilla.com wrote:
I would guess the former is true in most cases. But at least there we have a
*chance* at tracking down and fixing the failure, even if it takes awhile
before it becomes annoying enough to prioritize. If we made
On 4/7/2014 9:02 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
ahalberst...@mozilla.com wrote:
I would guess the former is true in most cases. But at least there we have a
*chance* at tracking down and fixing the failure, even if it takes awhile
before it becomes
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression
On 2014-04-07, 11:12 AM, Ted Mielczarek wrote:
It's difficult to say whether bugs we find via tests are more or less
important than bugs we find via users. It's entirely possible that
lots of the bugs that cause intermittent test failures cause
intermittent weird behavior for our users, we
On 2014-04-07, 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basically running orange
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're
On 06 April 2014 14:58:24, Ehsan Akhgari wrote:
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote:
Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass if it passes a few times in a row
On Fri, 4 Apr 2014 12:49:45 -0700 (PDT), jmaher wrote:
overburdened in other ways (e.g., reviews). the burden
needs to be placed on the regressing change rather than the original
author of the test.
I am open to ideas to help figure out the offending changes. My
understanding is many of
On Fri, 4 Apr 2014 11:58:28 -0700 (PDT), jmaher wrote:
Two exceptions:
2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many
tests will need to be disabled prior to getting the tests on tbpl.
It makes sense to disable some tests so that others can run.
I assume bugs
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're
On 04/04/14 03:44 PM, Ehsan Akhgari wrote:
On 2014-04-04, 3:12 PM, L. David Baron wrote:
Are you talking about newly-added tests, or tests that have been
passing for a long time and recently started failing?
In the latter case, the burden should fall on the regressing patch,
and the regressing
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
the time, it is time to escalate.
Escalation path:
1) Ensure we have a bug on file, with the
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10%
of the time, it is time to escalate.
On 2014-04-04, 3:12 PM, L. David Baron wrote:
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
4) In the case we go another 2 days with no response from a module owner,
we will disable the test.
Are you talking about newly-added tests, or tests that have been
passing for a long time and recently started failing?
In the latter case, the burden should fall on the
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifying the regressing patch is
difficult
Identifying the regressing patch is only difficult because we have so
many intermittently failing tests.
Intermittent oranges are one of the major blockers for Autoland. If TBPL
With respect to Autoland, I think we'll need to figure out how to make
it take intermittents into account. I don't think we'll ever be a state
with 0 intermittents.
Jonathan
On 4/4/2014 1:30 PM, Chris Peterson wrote:
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time
On 2014-04-04, 4:30 PM, Chris Peterson wrote:
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifying the regressing patch is
difficult
Identifying the regressing patch is only difficult because we have so
many intermittently failing tests.
Intermittent oranges are one of
On 2014-04-04, 4:58 PM, Jonathan Griffin wrote:
With respect to Autoland, I think we'll need to figure out how to make
it take intermittents into account. I don't think we'll ever be a state
with 0 intermittents.
That's not true, we were in that state once, before I stopped working on
this
On 2014-04-04, at 14:02, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
That's not true, we were in that state once, before I stopped working on this
issue. We can get there again if we wanted to. It's just a lot of hard work
which won't scale if we only have one person doing it.
It’s
On Friday 2014-04-04 12:49 -0700, jmaher wrote:
If this plan is applied to existing tests, then it will lead to
style system mochitests being turned off due to other regressions
because I'm the person who wrote them and the module owner, and I
don't always have time to deal with
47 matches
Mail list logo