Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread jmaher
I want to express my thanks to everyone who contributed to this thread. We have a lot of passionate and smart people who care about this topic- thanks again for weighing in so far. Below is a slightly updated policy from the original, and following that is an attempt to summarize the thread

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread Kyle Huey
On Tue, Apr 15, 2014 at 6:21 AM, jmaher joel.ma...@gmail.com wrote: This policy will define an escalation path for when a single test case is identified to be leaking or failing and is causing enough disruption on the trees. Disruption is defined as: 1) Test case is on the list of top 20

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread jmaher
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote: On Tue, Apr 15, 2014 at 6:21 AM, jmaher joel.ma...@gmail.com wrote: This policy will define an escalation path for when a single test case is identified to be leaking or failing and is causing enough disruption on the trees.

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread Karl Tomlinson
Thank you for putting this together. It is important. jmaher writes: This policy will define an escalation path for when a single test case is identified to be leaking or failing and is causing enough disruption on the trees. Exceptions: 1) If this test has landed (or been modified) in

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Ehsan Akhgari
On 2014-04-08, 6:10 PM, Karl Tomlinson wrote: I wonder whether the real problem here is that we have too many bad tests that report false negatives, and these bad tests are reducing the value of our testsuite in general. Tests also need to be well documented so that people can understand what a

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Gregory Szorc
On 4/8/14, 6:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread L. David Baron
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote: The simple solution is to have a separate in-tree manifest annotation for intermittents. Put another way, we can describe exactly why we are not running a test. This is kinda/sorta the realm of bug 922581. The harder solution is to

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Gregory Szorc
On 4/9/14, 11:29 AM, L. David Baron wrote: On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote: The simple solution is to have a separate in-tree manifest annotation for intermittents. Put another way, we can describe exactly why we are not running a test. This is kinda/sorta the realm of

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Karl Tomlinson
Gregory Szorc writes: 2) Run marked intermittent tests multiple times. If it works all 25 times, fail the test run for inconsistent metadata. We need to consider intermittently failing tests as failed, and we need to only test things that always pass. We can't rely on statistics to tell us

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Gregory Szorc
On 4/9/14, 2:07 PM, Karl Tomlinson wrote: Gregory Szorc writes: 2) Run marked intermittent tests multiple times. If it works all 25 times, fail the test run for inconsistent metadata. We need to consider intermittently failing tests as failed, and we need to only test things that always

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Chris Peterson
On 4/9/14, 11:48 AM, Gregory Szorc wrote: I feel a lot of people just shrug shoulders and allow the test to be disabled (I'm guilty of it as much as anyone). From my perspective, it's difficult to convince the powers at be that fixing intermittent failures (that have been successfully swept

Re: Policy for disabling tests which run on TBPL

2014-04-09 Thread Ehsan Akhgari
On 2014-04-09, 6:46 PM, Chris Peterson wrote: On 4/9/14, 11:48 AM, Gregory Szorc wrote: I feel a lot of people just shrug shoulders and allow the test to be disabled (I'm guilty of it as much as anyone). From my perspective, it's difficult to convince the powers at be that fixing intermittent

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Aryeh Gregor
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what happens at all. I think many people treat

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Andrew Halberstadt
On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham
On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari
On 2014-04-08, 9:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari
On 2014-04-08, 8:15 AM, Aryeh Gregor wrote: On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham
On 08/04/14 15:06, Ehsan Akhgari wrote: On 2014-04-08, 9:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Gavin Sharp
I see only two real goals for the proposed policy: - ensure that module owners/peers have the opportunity to object to any disable test decisions before they take effect - set an expectation that intermittent orange failures are dealt with promptly (dealt with first involves investigation, usually

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread L. David Baron
On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote: I see only two real goals for the proposed policy: - ensure that module owners/peers have the opportunity to object to any disable test decisions before they take effect - set an expectation that intermittent orange failures are dealt with

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari
On 2014-04-08, 3:15 PM, Chris Peterson wrote: On 4/8/14, 11:41 AM, Gavin Sharp wrote: Separately from all of that, we could definitely invest in better tools for dealing with intermittent failures in general. Anecdotally, I know chromium has some nice ways of dealing with them, for example. But

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Karl Tomlinson
Aryeh Gregor writes: On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what happens at all. I

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread James Graham
On 07/04/14 04:33, Andrew Halberstadt wrote: On 06/04/14 08:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt ahalberst...@mozilla.com wrote: Many of our test runners have that ability. But doing this implies that intermittents are always the fault of the test. We'd be missing whole classes of regressions (notably race conditions). We already are,

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Andrew Halberstadt
On 07/04/14 05:10 AM, James Graham wrote: On 07/04/14 04:33, Andrew Halberstadt wrote: On 06/04/14 08:59 AM, Aryeh Gregor wrote: Is there any reason in principle that we couldn't have the test runner automatically rerun tests with known intermittent failures a few times, and let the test pass

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt ahalberst...@mozilla.com wrote: I would guess the former is true in most cases. But at least there we have a *chance* at tracking down and fixing the failure, even if it takes awhile before it becomes annoying enough to prioritize. If we made

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Ted Mielczarek
On 4/7/2014 9:02 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt ahalberst...@mozilla.com wrote: I would guess the former is true in most cases. But at least there we have a *chance* at tracking down and fixing the failure, even if it takes awhile before it becomes

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Mike Hoye
On 2014-04-07, 11:12 AM, Ted Mielczarek wrote: It's difficult to say whether bugs we find via tests are more or less important than bugs we find via users. It's entirely possible that lots of the bugs that cause intermittent test failures cause intermittent weird behavior for our users, we

Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Ehsan Akhgari
On 2014-04-07, 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Aryeh Gregor
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of course they're resource/time intensive (basically running orange

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Ehsan Akhgari
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of course they're

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Ed Morley
On 06 April 2014 14:58:24, Ehsan Akhgari wrote: On 2014-04-06, 8:59 AM, Aryeh Gregor wrote: Is there any reason in principle that we couldn't have the test runner automatically rerun tests with known intermittent failures a few times, and let the test pass if it passes a few times in a row

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Karl Tomlinson
On Fri, 4 Apr 2014 12:49:45 -0700 (PDT), jmaher wrote: overburdened in other ways (e.g., reviews). the burden needs to be placed on the regressing change rather than the original author of the test. I am open to ideas to help figure out the offending changes. My understanding is many of

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Karl Tomlinson
On Fri, 4 Apr 2014 11:58:28 -0700 (PDT), jmaher wrote: Two exceptions: 2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many tests will need to be disabled prior to getting the tests on tbpl. It makes sense to disable some tests so that others can run. I assume bugs

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Andrew Halberstadt
On 06/04/14 08:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of course they're

Re: Policy for disabling tests which run on TBPL

2014-04-06 Thread Andrew Halberstadt
On 04/04/14 03:44 PM, Ehsan Akhgari wrote: On 2014-04-04, 3:12 PM, L. David Baron wrote: Are you talking about newly-added tests, or tests that have been passing for a long time and recently started failing? In the latter case, the burden should fall on the regressing patch, and the regressing

Policy for disabling tests which run on TBPL

2014-04-04 Thread jmaher
As the sheriff's know it is frustrating to deal with hundreds of tests that fail on a daily basis, but are intermittent. When a single test case is identified to be leaking or failing at least 10% of the time, it is time to escalate. Escalation path: 1) Ensure we have a bug on file, with the

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread L. David Baron
On Friday 2014-04-04 11:58 -0700, jmaher wrote: As the sheriff's know it is frustrating to deal with hundreds of tests that fail on a daily basis, but are intermittent. When a single test case is identified to be leaking or failing at least 10% of the time, it is time to escalate.

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Ehsan Akhgari
On 2014-04-04, 3:12 PM, L. David Baron wrote: On Friday 2014-04-04 11:58 -0700, jmaher wrote: As the sheriff's know it is frustrating to deal with hundreds of tests that fail on a daily basis, but are intermittent. When a single test case is identified to be leaking or failing at least 10% of

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread jmaher
4) In the case we go another 2 days with no response from a module owner, we will disable the test. Are you talking about newly-added tests, or tests that have been passing for a long time and recently started failing? In the latter case, the burden should fall on the

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Chris Peterson
On 4/4/14, 1:19 PM, Gavin Sharp wrote: The majority of the time identifying the regressing patch is difficult Identifying the regressing patch is only difficult because we have so many intermittently failing tests. Intermittent oranges are one of the major blockers for Autoland. If TBPL

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Jonathan Griffin
With respect to Autoland, I think we'll need to figure out how to make it take intermittents into account. I don't think we'll ever be a state with 0 intermittents. Jonathan On 4/4/2014 1:30 PM, Chris Peterson wrote: On 4/4/14, 1:19 PM, Gavin Sharp wrote: The majority of the time

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Ehsan Akhgari
On 2014-04-04, 4:30 PM, Chris Peterson wrote: On 4/4/14, 1:19 PM, Gavin Sharp wrote: The majority of the time identifying the regressing patch is difficult Identifying the regressing patch is only difficult because we have so many intermittently failing tests. Intermittent oranges are one of

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Ehsan Akhgari
On 2014-04-04, 4:58 PM, Jonathan Griffin wrote: With respect to Autoland, I think we'll need to figure out how to make it take intermittents into account. I don't think we'll ever be a state with 0 intermittents. That's not true, we were in that state once, before I stopped working on this

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread Martin Thomson
On 2014-04-04, at 14:02, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: That's not true, we were in that state once, before I stopped working on this issue. We can get there again if we wanted to. It's just a lot of hard work which won't scale if we only have one person doing it. It’s

Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread L. David Baron
On Friday 2014-04-04 12:49 -0700, jmaher wrote: If this plan is applied to existing tests, then it will lead to style system mochitests being turned off due to other regressions because I'm the person who wrote them and the module owner, and I don't always have time to deal with