Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-10-01 Thread Chris AtLee

On 17:26, Tue, 23 Sep, Kyle Huey wrote:

On Tue, Aug 26, 2014 at 8:23 AM, Chris AtLee cat...@mozilla.com wrote:

Just a short note to say that this experiment is now live on
mozilla-inbound.

___
dev-tree-management mailing list
dev-tree-managem...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tree-management



What was the outcome?


Thanks for the reminder.

The outcome of this experiment was inconclusive.

On the one hand, we know we didn't make anything worse. The skipping 
behaved as expected, and wasn't a burden on sheriffs. We didn't make 
wait times any worse.


On the other hand, it appears as though we improved wait times for the 
target platforms, but the signal there isn't clear due to other 
variables changing (e.g. overall load wasn't directly comparable between 
the two time windows).


We've left the skipping behaviour enabled for the moment, and are 
considering some tweaks to the amount of skipping that happens, and 
which branches/platforms it's enabled for.


Cheers,
Chris


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-09-23 Thread Kyle Huey
On Tue, Aug 26, 2014 at 8:23 AM, Chris AtLee cat...@mozilla.com wrote:
 Just a short note to say that this experiment is now live on
 mozilla-inbound.

 ___
 dev-tree-management mailing list
 dev-tree-managem...@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-tree-management


What was the outcome?

- Kyle
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-26 Thread Chris AtLee
Just a short note to say that this experiment is now live on 
mozilla-inbound.


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Chris AtLee

On 17:37, Wed, 20 Aug, Jonas Sicking wrote:

On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert jgilb...@mozilla.com wrote:

I have been asked in the past if we really need to run WebGL tests on Android, 
if they have coverage on Desktop platforms.
And then again later, why B2G if we have Android.

There seems to be enough belief in test-once-run-everywhere that I feel the 
need to *firmly* establish that this is not acceptable, at least for the code I 
work with.
I'm happy I'm not alone in this.


I'm a firm believer that we ultimately need to run basically all
combinations of tests and platforms before allowing code to reach
mozilla-central. There's lots of platform specific code paths, and
it's hard to track which tests trigger them, and which don't.


I think we can agree on this. However, not running all tests on all 
platforms per push on mozilla-inbound (or other branch) doesn't mean 
that they won't be run on mozilla-central, or even on mozilla-inbound 
prior to merging.


I'm a firm believer that running all tests for all platforms for all 
pushes is a waste of our infrastructure and human resources.


I think the gap we need to figure out how to fill is between getting 
per-push efficiency and full test coverage prior to merging.



It would however be really cool if we were able to pull data on which
tests tend to fail in a way that affects all platforms, and which ones
tend to fail on one platform only. If we combine this with the ability
of having tbpl (or treeherder) fill in the blanks whenever a test
fails, it seems like we could run many of our tests only one one
platform for most checkins to mozilla-inbound.


There are dozens of really interesting approaches we could take here.
Skipping every nth debug test run is one of the simplest, and I hope we 
can learn a lot from the experiment.


Cheers,
Chris


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Milan Sreckovic

--
- Milan

On Aug 21, 2014, at 10:12 , Chris AtLee cat...@mozilla.com wrote:

 On 17:37, Wed, 20 Aug, Jonas Sicking wrote:
 On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert jgilb...@mozilla.com wrote:
 I have been asked in the past if we really need to run WebGL tests on 
 Android, if they have coverage on Desktop platforms.
 And then again later, why B2G if we have Android.
 
 There seems to be enough belief in test-once-run-everywhere that I feel the 
 need to *firmly* establish that this is not acceptable, at least for the 
 code I work with.
 I'm happy I'm not alone in this.
 
 I'm a firm believer that we ultimately need to run basically all
 combinations of tests and platforms before allowing code to reach
 mozilla-central. There's lots of platform specific code paths, and
 it's hard to track which tests trigger them, and which don't.
 
 I think we can agree on this. However, not running all tests on all platforms 
 per push on mozilla-inbound (or other branch) doesn't mean that they won't be 
 run on mozilla-central, or even on mozilla-inbound prior to merging.
 
 I'm a firm believer that running all tests for all platforms for all pushes 
 is a waste of our infrastructure and human resources.
 
 I think the gap we need to figure out how to fill is between getting per-push 
 efficiency and full test coverage prior to merging.

The cost of not catching a problem with a test and letting the code land is 
huge.  I only know this for the graphics team, but to Ehsan’s and Jonas’ point, 
I’m sure it’s not specific to graphics.  Now, one is preventative cost (tests), 
one is treatment cost (fixing issues that snuck through), so it’s sometimes 
difficult to compare, and we are not alone in first going after the 
preventative costs, but it’s a big mistake to do so.

Now, if we need to save some electricity or cash, I understand that as well, 
and it eventually translates to the cost to the company the same as people’s 
time.  If we can do something by skipping every n-th debug run, sure, let’s try 
it.  We have to make sure that a failure on a debug test run triggers us going 
back and re-running the skipped ones, so that we don’t have any gaps in the 
tests where something may have gone wrong.


 
 It would however be really cool if we were able to pull data on which
 tests tend to fail in a way that affects all platforms, and which ones
 tend to fail on one platform only. If we combine this with the ability
 of having tbpl (or treeherder) fill in the blanks whenever a test
 fails, it seems like we could run many of our tests only one one
 platform for most checkins to mozilla-inbound.
 
 There are dozens of really interesting approaches we could take here.
 Skipping every nth debug test run is one of the simplest, and I hope we can 
 learn a lot from the experiment.
 
 Cheers,
 Chris
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Ed Morley
I think much of the pushback in this thread is due to a misunderstanding 
of some combination of:

* our current buildbot scheduling
* the proposal
* how trees are sheriffed and merged

To clarify:

1) We already have coalescing [*] of jobs on all trees apart from try.

2) This coalescing means that all jobs are still run at some point, but 
just may not run on every push.


3) When failures are detected, coalescing means that regression ranges 
are larger and so sometimes result in longer tree integration repo 
closures, whilst the sheriffs force trigger jobs on the revisions that 
did not originally run them.


4) When merging into mozilla-central, sheriffs ensure that all jobs are 
green - including those that got coalesced and those that are only 
scheduled periodically (eg non-unified  PGO builds are only run every 3 
hours). (This is a fairly manual process currently, but better tooling 
should be possible with treeherder).


5) This proposal does not mean debug-only issues are somehow not worth 
acting on or that they'll end up shipped/on mozilla-central, thanks to #4.


6) This proposal is purely trying to make existing coalescing (#1/#2) 
more intelligent, to ensure that we expend the finite amount of machine 
time we have at present on the most appropriate jobs at each point, in 
order to reduce the impact of #3.


Fwiw I'm on the fence as to whether the algorithm suggested in this 
proposal is the most effective way to aid with #3 - however it's worth 
trying to find out.


Best wishes,

Ed

[*] Collapsing of pending jobs of the same type, when the queue size is 
greater than 1.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Chris Peterson

On 8/21/14 9:35 AM, Ed Morley wrote:

4) When merging into mozilla-central, sheriffs ensure that all jobs are
green - including those that got coalesced and those that are only
scheduled periodically (eg non-unified  PGO builds are only run every 3
hours). (This is a fairly manual process currently, but better tooling
should be possible with treeherder).


To ensure that all code landing in mozilla-central has passed debug 
tests, sheriffs could merge only from the mozilla-inbound changesets 
that ran the debug tests.



chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Jonathan Griffin
Thanks Ed.  To paraphrase, no test coverage is being lost here, we're 
just being a little more deliberate with job coalescing.  All tests will 
be run on all platforms (including debug tests) on a commit before a 
merge to m-c.


Jonathan

On 8/21/2014 9:35 AM, Ed Morley wrote:
I think much of the pushback in this thread is due to a 
misunderstanding of some combination of:

* our current buildbot scheduling
* the proposal
* how trees are sheriffed and merged

To clarify:

1) We already have coalescing [*] of jobs on all trees apart from try.

2) This coalescing means that all jobs are still run at some point, 
but just may not run on every push.


3) When failures are detected, coalescing means that regression ranges 
are larger and so sometimes result in longer tree integration repo 
closures, whilst the sheriffs force trigger jobs on the revisions that 
did not originally run them.


4) When merging into mozilla-central, sheriffs ensure that all jobs 
are green - including those that got coalesced and those that are only 
scheduled periodically (eg non-unified  PGO builds are only run every 
3 hours). (This is a fairly manual process currently, but better 
tooling should be possible with treeherder).


5) This proposal does not mean debug-only issues are somehow not worth 
acting on or that they'll end up shipped/on mozilla-central, thanks to 
#4.


6) This proposal is purely trying to make existing coalescing (#1/#2) 
more intelligent, to ensure that we expend the finite amount of 
machine time we have at present on the most appropriate jobs at each 
point, in order to reduce the impact of #3.


Fwiw I'm on the fence as to whether the algorithm suggested in this 
proposal is the most effective way to aid with #3 - however it's worth 
trying to find out.


Best wishes,

Ed

[*] Collapsing of pending jobs of the same type, when the queue size 
is greater than 1.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Jonathan Griffin

Hey Martin,

This is a good idea, and we've been thinking about approaches like 
this.  Basically, the idea is to run tests that (nearly) always pass 
less often.  There are currently some tests that fit into this category, 
like dom level0,1,2 tests in mochitest-plain, and those are 
time-consuming to run.  Your idea takes this a step further, by 
identifying tests that sometimes fail, correlating those with code 
changes, and ensuring those get run.


Both of these require some tooling to implement, so we're experimenting 
initially with approaches that we can get nearly for free, like 
running some tests only every other commit, and letting sheriffs trigger 
the missing tests in case a failure occurs.


The ultimate solution may blend a bit of both approaches, and will have 
to balance implementation cost with the gain we get from the related 
reduction in slave load.


Jonathan


On 8/21/2014 10:07 AM, Martin Thomson wrote:

On 20/08/14 17:37, Jonas Sicking wrote:

It would however be really cool if we were able to pull data on which
tests tend to fail in a way that affects all platforms, and which ones
tend to fail on one platform only.


Here's a potential project that might help.  For all of the trees 
(probably try especially), look at the checkins and for each directory 
affected build up a probability of failure for each of the tests.


You would have to find which commits were on m-c at the time of the 
run to set the baseline for the checkin; and intermittent failures 
would add a certain noise floor.


The basic idea though is that the information would be very simple to 
use: For each directory touched in a commit, find all the tests that 
cross a certain failure threshold across the assembled dataset and 
ensure that those test groups are run.


And this would need to include prerequisites, like builds for the 
given runs.  You would, of course, include builds as tests.


Setting the threshold might take some tuning, because failure rates 
will vary across different test groups.  I keep hearing bad things 
about certain ones, for instance and build failures are far less 
common than test failures on the whole, naturally.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Jonas Sicking
What will be the policy if a test fails and it's unclear which push
caused the regression? Is it the sheriff's job, or the people who
pushed's job to figure out which push was the culprit and make sure
that that push gets backed out?

I.e. if 4 pushes land between two testruns, and we see a regression,
will the 4 pushes be backed out? Or will sheriffs run the missing
tests and only back out the offending push?

/ Jonas

On Thu, Aug 21, 2014 at 10:50 AM, Jonathan Griffin jgrif...@mozilla.com wrote:
 Thanks Ed.  To paraphrase, no test coverage is being lost here, we're just
 being a little more deliberate with job coalescing.  All tests will be run
 on all platforms (including debug tests) on a commit before a merge to m-c.

 Jonathan


 On 8/21/2014 9:35 AM, Ed Morley wrote:

 I think much of the pushback in this thread is due to a misunderstanding
 of some combination of:
 * our current buildbot scheduling
 * the proposal
 * how trees are sheriffed and merged

 To clarify:

 1) We already have coalescing [*] of jobs on all trees apart from try.

 2) This coalescing means that all jobs are still run at some point, but
 just may not run on every push.

 3) When failures are detected, coalescing means that regression ranges are
 larger and so sometimes result in longer tree integration repo closures,
 whilst the sheriffs force trigger jobs on the revisions that did not
 originally run them.

 4) When merging into mozilla-central, sheriffs ensure that all jobs are
 green - including those that got coalesced and those that are only scheduled
 periodically (eg non-unified  PGO builds are only run every 3 hours). (This
 is a fairly manual process currently, but better tooling should be possible
 with treeherder).

 5) This proposal does not mean debug-only issues are somehow not worth
 acting on or that they'll end up shipped/on mozilla-central, thanks to #4.

 6) This proposal is purely trying to make existing coalescing (#1/#2) more
 intelligent, to ensure that we expend the finite amount of machine time we
 have at present on the most appropriate jobs at each point, in order to
 reduce the impact of #3.

 Fwiw I'm on the fence as to whether the algorithm suggested in this
 proposal is the most effective way to aid with #3 - however it's worth
 trying to find out.

 Best wishes,

 Ed

 [*] Collapsing of pending jobs of the same type, when the queue size is
 greater than 1.


 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Mike Hommey
On Thu, Aug 21, 2014 at 03:03:30PM -0700, Jonas Sicking wrote:
 What will be the policy if a test fails and it's unclear which push
 caused the regression?

You may have missed the main point that it's not What will, but What
is. It *is* already the case that tests are skipped.

Mike
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Jonathan Griffin
It will be handled just like coalesced jobs today:  sheriffs will 
backfill the missing data, and backout only the offender.


An illustration might help.  Today we might have something like this, 
for a given job:


 linux64-debug  win7-debug  osx8-debug
commit 1 pass  pass   pass
commit 2 pass  pass   pass
commit 3 pass  fail   pass
commit 4 pass  fail   pass

In this case (assuming the two failures are the same), it's easy for 
sheriffs to see that commit 3 is the culprit and the one that needs to 
be backed out.


During the experiment, we might see something like this:

 linux64-debug  win7-debug  osx8-debug
commit 1 pass  pass   pass
commit 2 pass  not runnot run
commit 3 pass  fail   pass
commit 4 pass  not runnot run

Here, it isn't obvious whether the problem is caused by commit 2 or 
commit 3.  (This situation already occurs today because of random 
coalescing.)


In this case, the sheriffs will backfill missing test data, so we might see:

 linux64-debug  win7-debug  osx8-debug
commit 1 pass  pass   pass
commit 2 pass  pass   not run
commit 3 pass  fail   pass
commit 4 pass  fail   not run

...and then they have enough data to determine that commit 3 (and not 
commit 2) is to blame, and can take the appropriate action.


In summary, the sheriffs won't be backing out extra commits because of 
the coalescing, and it remains the sheriffs' job to backfill tests when 
they determine they need to do so in order to bisect a failure.   We 
aren't placing any extra burden on developers with this experiment, and 
part of the reason for this experiment is to determine how much of an 
extra burden this is for the sheriffs.


Jonathan

On 8/21/2014 3:03 PM, Jonas Sicking wrote:

What will be the policy if a test fails and it's unclear which push
caused the regression? Is it the sheriff's job, or the people who
pushed's job to figure out which push was the culprit and make sure
that that push gets backed out?

I.e. if 4 pushes land between two testruns, and we see a regression,
will the 4 pushes be backed out? Or will sheriffs run the missing
tests and only back out the offending push?

/ Jonas

On Thu, Aug 21, 2014 at 10:50 AM, Jonathan Griffin jgrif...@mozilla.com wrote:

Thanks Ed.  To paraphrase, no test coverage is being lost here, we're just
being a little more deliberate with job coalescing.  All tests will be run
on all platforms (including debug tests) on a commit before a merge to m-c.

Jonathan


On 8/21/2014 9:35 AM, Ed Morley wrote:

I think much of the pushback in this thread is due to a misunderstanding
of some combination of:
* our current buildbot scheduling
* the proposal
* how trees are sheriffed and merged

To clarify:

1) We already have coalescing [*] of jobs on all trees apart from try.

2) This coalescing means that all jobs are still run at some point, but
just may not run on every push.

3) When failures are detected, coalescing means that regression ranges are
larger and so sometimes result in longer tree integration repo closures,
whilst the sheriffs force trigger jobs on the revisions that did not
originally run them.

4) When merging into mozilla-central, sheriffs ensure that all jobs are
green - including those that got coalesced and those that are only scheduled
periodically (eg non-unified  PGO builds are only run every 3 hours). (This
is a fairly manual process currently, but better tooling should be possible
with treeherder).

5) This proposal does not mean debug-only issues are somehow not worth
acting on or that they'll end up shipped/on mozilla-central, thanks to #4.

6) This proposal is purely trying to make existing coalescing (#1/#2) more
intelligent, to ensure that we expend the finite amount of machine time we
have at present on the most appropriate jobs at each point, in order to
reduce the impact of #3.

Fwiw I'm on the fence as to whether the algorithm suggested in this
proposal is the most effective way to aid with #3 - however it's worth
trying to find out.

Best wishes,

Ed

[*] Collapsing of pending jobs of the same type, when the queue size is
greater than 1.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-21 Thread Jonas Sicking
On Thu, Aug 21, 2014 at 3:21 PM, Jonathan Griffin jgrif...@mozilla.com wrote:
 In summary, the sheriffs won't be backing out extra commits because of the
 coalescing, and it remains the sheriffs' job to backfill tests when they
 determine they need to do so in order to bisect a failure.   We aren't
 placing any extra burden on developers with this experiment, and part of the
 reason for this experiment is to determine how much of an extra burden this
 is for the sheriffs.

As long as sheriffs are in support of this (which it sounds like is
the case), then this sounds awesome to me.

/ Jonas
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Jeff Gilbert
I was just going to ask about this. I glanced through the mozconfigs in the 
tree for at least Linux debug, but it looks like it only has --enable-debug, 
not even -O1. Maybe it's buried somewhere in there, but I didn't find it with a 
quick look.

I took a look at the build log for WinXP debug, and --enable-opt is only 
present on the configure line for nspr, whereas --enable-debug is in a number 
of other places.

Can we get confirmation for whether debug builds are (partially?) optimized? If 
not, we should do that. (Unless I'm missing a reason not to, especially if we 
only care about pass/fail, and not crash stacks/debugability)

-Jeff

- Original Message -
From: Kyle Huey m...@kylehuey.com
To: Joshua Cranmer  pidgeo...@gmail.com
Cc: dev-platform dev-platform@lists.mozilla.org
Sent: Tuesday, August 19, 2014 3:56:27 PM
Subject: Re: Experiment with running debug tests less often on mozilla-inbound  
the week of August 25

I'm pretty sure the debug builds on our CI infrastructure are built
with optimization.

- Kyle

On Tue, Aug 19, 2014 at 3:42 PM, Joshua Cranmer  pidgeo...@gmail.com wrote:
 On 8/19/2014 5:25 PM, Ehsan Akhgari wrote:

 Yep, the debug tests indeed take more time, mostly because they run more
 checks.


 Actually, the bigger cause in the slowdown is probably that debug tests
 don't have any optimizations, not more checks. An atomic increment on a
 debug build invokes something like a hundred instructions (including several
 call instructions) whereas the equivalent operation on an opt build is just
 one.

 --
 Joshua Cranmer
 Thunderbird and DXR developer
 Source code archæologist


 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Mike Hommey
On Tue, Aug 19, 2014 at 11:26:42PM -0700, Jeff Gilbert wrote:
 I was just going to ask about this. I glanced through the mozconfigs
 in the tree for at least Linux debug, but it looks like it only has
 --enable-debug, not even -O1. Maybe it's buried somewhere in there,
 but I didn't find it with a quick look.
 
 I took a look at the build log for WinXP debug, and --enable-opt is
 only present on the configure line for nspr, whereas --enable-debug is
 in a number of other places.

Optimized builds have been the default for a while, if not ever[1]. So
unless you add an explicit --disable-optimize, you still get an
optimized build, whether you use --enable-debug or not.

As a matter of fact, we *did* have --disable-optimize in the debug build
mozconfigs, but that was removed 3 years ago, in bug 669953.

Mike

1. At least, it was the case in the oldest tree we have in mercurial.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Benjamin Smedberg

On 8/20/2014 3:07 AM, Mike Hommey wrote:


Optimized builds have been the default for a while, if not ever[1].


Bug 54828 made optimized builds the default in 2004 right before we 
released Firefox 1.0. It only took four years to make that decision ;-)


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Ed Morley

On 19/08/2014 21:55, Benoit Girard wrote:

I completely agree with Jeff Gilbert on this one.

I think we should try to coalesce -better-. I just checked the current
state of mozilla-inbound and it doesn't feel any of the current patch
really need their own set of tests because they're are not time
sensitive or sufficiently complex. Right now developers are asked to
create bugs for their own change with their own patch. This leads to a
lot of little patches being landed by individual developers which
seems to reflect the current state of mozilla-inbound.

Perhaps we should instead promote checkin-needed (or a similar simple)
to coalesce simple changes together. Opting into this means that your
patch may take significantly longer to get merged if it's landed with
another bad patch and should only be used when that's acceptable.
Right now developers with commit access are not encouraged to make use
of checkin-needed AFAIK. If we started recommending against individual
landings for simple changes, and improved the process, we could
probably significantly cut the number of tests jobs by cutting the
number of pushes.


I agree we should try to coalesce better - however doing this via a 
manual let's get someone to push a bunch of checkin-needed patches in 
one go is suboptimal:
1) By tweaking coalescing in buildbot  pushing patches individually, we 
could get the same build+test job per commit ratio as doing 
checkin-neededs, but with the bonus of being able to backfill jobs where 
needed. This isn't possible when say 10-20 checkin-neededs are landed in 
one push, since our tooling can only trigger (and more importantly 
display the results of) jobs on a per push level.
2) Tooling can help make these decisions much more effectively and 
quickly than someone picking through bugs - ie we should expand the 
current only schedule job X if directory Y changed buildbotcustom 
logic further.
3) Adding a human in the workflow increases r+-to-committed cycle times, 
uses up scarce sheriff time, and also means the person who wrote the 
patch is not the one landing it, and so someone unfamiliar with the code 
often ends up being the one to resolve conflicts. We should be using 
tooling, not human cycles to lands patches in a repo (ie the 
long-promised autoland).


Best wishes,

Ed
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Chris AtLee

On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote:

On 2014-08-19, 5:49 PM, Jonathan Griffin wrote:

On 8/19/2014 2:41 PM, Ehsan Akhgari wrote:

On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:

I would actually say that debug tests are more important for
continuous integration than opt tests. At least in code I deal with,
we have a ton of asserts to guarantee behavior, and we really want
test coverage with these via CI. If a test passes on debug, it should
almost certainly pass on opt, just faster. The opposite is not true.

They take a long time and then break is part of what I believe
caused us to not bother with debug testing on much of Android and
B2G, which we still haven't completely fixed. It should be
unacceptable to ship without CI on debug tests, but here we are
anyways. (This is finally nearly fixed, though there is still some
work to do)

I'm not saying running debug tests less often is on the same scale of
bad, but I would like to express my concerns about heading in that
direction.


I second this.  I'm curious to know why you picked debug tests for
this experiment.  Would it not make more sense to run opt tests on
desktop platforms on every other run?


Just based on the fact that they take longer and thus running them less
frequently would have a larger impact.  If there's a broad consensus
that debug runs are more valuable, we could switch to running opt tests
less frequently instead.


Yep, the debug tests indeed take more time, mostly because they run 
more checks.  :-)  The checks in opt builds are not exactly a subset 
of the ones in debug builds, but they are close.  Based on that, I 
think running opt tests on every other push is a more conservative 
one, and I support it more.  That being said, for this one week 
limited trial, given that the sheriffs will help backfill the skipped 
tests, I don't care very strongly about this, as long as it doesn't 
set the precedence that we can ignore debug tests!


I'd like to highlight that we're still planning on running debug 
linux64 tests for every build. This is based on the assumption that 
debug-specific failures are generally cross-platform failures as well.


Does this help alleviate some concern? Or is that assumption just plain 
wrong?


Cheers,
Chris


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Ehsan Akhgari

On 2014-08-20, 12:02 PM, Chris AtLee wrote:

On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote:

On 2014-08-19, 5:49 PM, Jonathan Griffin wrote:

On 8/19/2014 2:41 PM, Ehsan Akhgari wrote:

On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:

I would actually say that debug tests are more important for
continuous integration than opt tests. At least in code I deal with,
we have a ton of asserts to guarantee behavior, and we really want
test coverage with these via CI. If a test passes on debug, it should
almost certainly pass on opt, just faster. The opposite is not true.

They take a long time and then break is part of what I believe
caused us to not bother with debug testing on much of Android and
B2G, which we still haven't completely fixed. It should be
unacceptable to ship without CI on debug tests, but here we are
anyways. (This is finally nearly fixed, though there is still some
work to do)

I'm not saying running debug tests less often is on the same scale of
bad, but I would like to express my concerns about heading in that
direction.


I second this.  I'm curious to know why you picked debug tests for
this experiment.  Would it not make more sense to run opt tests on
desktop platforms on every other run?


Just based on the fact that they take longer and thus running them less
frequently would have a larger impact.  If there's a broad consensus
that debug runs are more valuable, we could switch to running opt tests
less frequently instead.


Yep, the debug tests indeed take more time, mostly because they run
more checks.  :-)  The checks in opt builds are not exactly a subset
of the ones in debug builds, but they are close.  Based on that, I
think running opt tests on every other push is a more conservative
one, and I support it more.  That being said, for this one week
limited trial, given that the sheriffs will help backfill the skipped
tests, I don't care very strongly about this, as long as it doesn't
set the precedence that we can ignore debug tests!


I'd like to highlight that we're still planning on running debug linux64
tests for every build. This is based on the assumption that
debug-specific failures are generally cross-platform failures as well.

Does this help alleviate some concern? Or is that assumption just plain
wrong?


well, yes, most of our code is cross platform, but there are debug only 
checks in our platform specific code as well, so if we're talking about 
something more permanent than that week long experiment, then running 
debug tests on Linux64 doesn't alleviate all concerns.  But it's fine 
for this short experiment.


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Mike Hommey
On Wed, Aug 20, 2014 at 03:58:55PM +0100, Ed Morley wrote:
 On 19/08/2014 21:55, Benoit Girard wrote:
 I completely agree with Jeff Gilbert on this one.
 
 I think we should try to coalesce -better-. I just checked the current
 state of mozilla-inbound and it doesn't feel any of the current patch
 really need their own set of tests because they're are not time
 sensitive or sufficiently complex. Right now developers are asked to
 create bugs for their own change with their own patch. This leads to a
 lot of little patches being landed by individual developers which
 seems to reflect the current state of mozilla-inbound.
 
 Perhaps we should instead promote checkin-needed (or a similar simple)
 to coalesce simple changes together. Opting into this means that your
 patch may take significantly longer to get merged if it's landed with
 another bad patch and should only be used when that's acceptable.
 Right now developers with commit access are not encouraged to make use
 of checkin-needed AFAIK. If we started recommending against individual
 landings for simple changes, and improved the process, we could
 probably significantly cut the number of tests jobs by cutting the
 number of pushes.
 
 I agree we should try to coalesce better - however doing this via a manual
 let's get someone to push a bunch of checkin-needed patches in one go is
 suboptimal:
 1) By tweaking coalescing in buildbot  pushing patches individually, we
 could get the same build+test job per commit ratio as doing checkin-neededs,
 but with the bonus of being able to backfill jobs where needed. This isn't
 possible when say 10-20 checkin-neededs are landed in one push, since our
 tooling can only trigger (and more importantly display the results of) jobs
 on a per push level.

It would have been useful on several occasions to be able to trigger
builds at changeset level instead of push level, independently of
checkin-needed.

Mike
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Jeff Gilbert
Graphics in particular is plagued by non-cross-platform code. Debug coverage on 
Linux gives us no practical coverage for our windows, mac, android, or b2g 
code. Maybe this is better solved with reviving the Graphics branch, however.

-Jeff

- Original Message -
From: Chris AtLee cat...@mozilla.com
To: Ehsan Akhgari ehsan.akhg...@gmail.com
Cc: Jonathan Griffin jgrif...@mozilla.com, Jeff Gilbert 
jgilb...@mozilla.com, dev-platform@lists.mozilla.org
Sent: Wednesday, August 20, 2014 9:02:14 AM
Subject: Re: Experiment with running debug tests less often on mozilla-inbound 
the week of August 25

On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote:
On 2014-08-19, 5:49 PM, Jonathan Griffin wrote:
On 8/19/2014 2:41 PM, Ehsan Akhgari wrote:
On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:
I would actually say that debug tests are more important for
continuous integration than opt tests. At least in code I deal with,
we have a ton of asserts to guarantee behavior, and we really want
test coverage with these via CI. If a test passes on debug, it should
almost certainly pass on opt, just faster. The opposite is not true.

They take a long time and then break is part of what I believe
caused us to not bother with debug testing on much of Android and
B2G, which we still haven't completely fixed. It should be
unacceptable to ship without CI on debug tests, but here we are
anyways. (This is finally nearly fixed, though there is still some
work to do)

I'm not saying running debug tests less often is on the same scale of
bad, but I would like to express my concerns about heading in that
direction.

I second this.  I'm curious to know why you picked debug tests for
this experiment.  Would it not make more sense to run opt tests on
desktop platforms on every other run?

Just based on the fact that they take longer and thus running them less
frequently would have a larger impact.  If there's a broad consensus
that debug runs are more valuable, we could switch to running opt tests
less frequently instead.

Yep, the debug tests indeed take more time, mostly because they run 
more checks.  :-)  The checks in opt builds are not exactly a subset 
of the ones in debug builds, but they are close.  Based on that, I 
think running opt tests on every other push is a more conservative 
one, and I support it more.  That being said, for this one week 
limited trial, given that the sheriffs will help backfill the skipped 
tests, I don't care very strongly about this, as long as it doesn't 
set the precedence that we can ignore debug tests!

I'd like to highlight that we're still planning on running debug 
linux64 tests for every build. This is based on the assumption that 
debug-specific failures are generally cross-platform failures as well.

Does this help alleviate some concern? Or is that assumption just plain 
wrong?

Cheers,
Chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Ehsan Akhgari

On 2014-08-20, 5:46 PM, Jeff Gilbert wrote:

Graphics in particular is plagued by non-cross-platform code. Debug coverage on 
Linux gives us no practical coverage for our windows, mac, android, or b2g 
code. Maybe this is better solved with reviving the Graphics branch, however.


Having more branches doesn't necessarily help with consuming less infra 
resources, unless if the builds will be run with a lower frequency or 
something.


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Jeff Gilbert
If running debug tests on a single platform is generally sufficient for 
non-graphics bugs, it might be useful to have the Graphics branch run debug 
tests on all platforms, for use with graphics checkins. (while running a 
decreased number of debug tests on the main branches) It's still possible for 
non-graphics code to expose platform-specific bugs, but it's less likely, so 
maybe larger regression windows are acceptable for platform-specific bugs in 
non-graphics code.

-Jeff

- Original Message -
From: Ehsan Akhgari ehsan.akhg...@gmail.com
To: Jeff Gilbert jgilb...@mozilla.com, Chris AtLee cat...@mozilla.com
Cc: Jonathan Griffin jgrif...@mozilla.com, dev-platform@lists.mozilla.org
Sent: Wednesday, August 20, 2014 3:16:31 PM
Subject: Re: Experiment with running debug tests less often on mozilla-inbound 
the week of August 25

On 2014-08-20, 5:46 PM, Jeff Gilbert wrote:
 Graphics in particular is plagued by non-cross-platform code. Debug coverage 
 on Linux gives us no practical coverage for our windows, mac, android, or b2g 
 code. Maybe this is better solved with reviving the Graphics branch, however.

Having more branches doesn't necessarily help with consuming less infra 
resources, unless if the builds will be run with a lower frequency or 
something.

Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Ehsan Akhgari

On 2014-08-20, 6:29 PM, Jeff Gilbert wrote:

If running debug tests on a single platform is generally sufficient for 
non-graphics bugs,


It is not.  That is the point I was trying to make.  :-)

 it might be useful to have the Graphics branch run debug tests on all 
platforms, for use with graphics checkins. (while running a decreased 
number of debug tests on the main branches) It's still possible for 
non-graphics code to expose platform-specific bugs, but it's less 
likely, so maybe larger regression windows are acceptable for 
platform-specific bugs in non-graphics code.


I don't really understand how graphics is special here.  We do have 
platform specific code outside of graphics as well, so we don't need to 
solve this problem for gfx specifically.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Jeff Gilbert
 From: Ehsan Akhgari ehsan.akhg...@gmail.com
 To: Jeff Gilbert jgilb...@mozilla.com
 Cc: Chris AtLee cat...@mozilla.com, Jonathan Griffin 
 jgrif...@mozilla.com, dev-platform@lists.mozilla.org
 Sent: Wednesday, August 20, 2014 4:00:15 PM
 Subject: Re: Experiment with running debug tests less often on 
 mozilla-inbound the week of August 25
 
 On 2014-08-20, 6:29 PM, Jeff Gilbert wrote:
  If running debug tests on a single platform is generally sufficient for
  non-graphics bugs,
 
 It is not.  That is the point I was trying to make.  :-)
 
   it might be useful to have the Graphics branch run debug tests on all
 platforms, for use with graphics checkins. (while running a decreased
 number of debug tests on the main branches) It's still possible for
 non-graphics code to expose platform-specific bugs, but it's less
 likely, so maybe larger regression windows are acceptable for
 platform-specific bugs in non-graphics code.
 
 I don't really understand how graphics is special here.  We do have
 platform specific code outside of graphics as well, so we don't need to
 solve this problem for gfx specifically.
 

Maybe Graphics isn't that special, but this stuff hits really close to home for 
us.

I have been asked in the past if we really need to run WebGL tests on Android, 
if they have coverage on Desktop platforms.
And then again later, why B2G if we have Android.

There seems to be enough belief in test-once-run-everywhere that I feel the 
need to *firmly* establish that this is not acceptable, at least for the code I 
work with.
I'm happy I'm not alone in this.

-Jeff
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-20 Thread Jonas Sicking
On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert jgilb...@mozilla.com wrote:
 I have been asked in the past if we really need to run WebGL tests on 
 Android, if they have coverage on Desktop platforms.
 And then again later, why B2G if we have Android.

 There seems to be enough belief in test-once-run-everywhere that I feel the 
 need to *firmly* establish that this is not acceptable, at least for the code 
 I work with.
 I'm happy I'm not alone in this.

I'm a firm believer that we ultimately need to run basically all
combinations of tests and platforms before allowing code to reach
mozilla-central. There's lots of platform specific code paths, and
it's hard to track which tests trigger them, and which don't.

It would however be really cool if we were able to pull data on which
tests tend to fail in a way that affects all platforms, and which ones
tend to fail on one platform only. If we combine this with the ability
of having tbpl (or treeherder) fill in the blanks whenever a test
fails, it seems like we could run many of our tests only one one
platform for most checkins to mozilla-inbound.

/ Jonas
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Jeff Gilbert
I would actually say that debug tests are more important for continuous 
integration than opt tests. At least in code I deal with, we have a ton of 
asserts to guarantee behavior, and we really want test coverage with these via 
CI. If a test passes on debug, it should almost certainly pass on opt, just 
faster. The opposite is not true.

They take a long time and then break is part of what I believe caused us to 
not bother with debug testing on much of Android and B2G, which we still 
haven't completely fixed. It should be unacceptable to ship without CI on debug 
tests, but here we are anyways. (This is finally nearly fixed, though there is 
still some work to do)

I'm not saying running debug tests less often is on the same scale of bad, but 
I would like to express my concerns about heading in that direction.

-Jeff

- Original Message -
From: Jonathan Griffin jgrif...@mozilla.com
To: dev-platform@lists.mozilla.org
Sent: Tuesday, August 19, 2014 12:22:21 PM
Subject: Experiment with running debug tests less often on mozilla-inbound  
the week of August 25

Our pools of test slaves are often at or over capacity, and this has the 
effect of increasing job coalescing and test wait times.  This, in turn, 
can lead to longer tree closures caused by test bustage, and can cause 
try runs to be very slow to complete.

One of the easiest ways to mitigate this is to run tests less often.

To assess the impact of doing this, we will be performing an experiment 
the week of August 25, in which we will run debug tests on 
mozilla-inbound on most desktop platforms every other run, instead of 
every run as we do now.  Debug tests on linux64 will continue to run 
every time.  Non-desktop platforms and trees other than mozilla-inbound 
will not be affected.

This approach is based on the premise that the number of debug-only 
platform-specific failures on desktop is low enough to be manageable, 
and that the extra burden this imposes on the sheriffs will be small 
enough compared to the improvement in test slave metrics to justify the 
cost.

While this experiment is in progress, we will be monitoring job 
coalescing and test wait times, as well as impacts on sheriffs and 
developers.  If the experiment causes sheriffs to be unable to perform 
their job effectively, it can be terminated prematurely.

We intend to use the data we collect during the experiment to inform 
decisions about additional tooling we need to make this or a similar 
plan permanent at some point in the future, as well as validating the 
premise on which this experiment is based.

After the conclusion of this experiment, a follow-up post will be made 
which will discuss our findings.  If you have any concerns, feel free to 
reach out to me.

Jonathan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Benoit Girard
I completely agree with Jeff Gilbert on this one.

I think we should try to coalesce -better-. I just checked the current
state of mozilla-inbound and it doesn't feel any of the current patch
really need their own set of tests because they're are not time
sensitive or sufficiently complex. Right now developers are asked to
create bugs for their own change with their own patch. This leads to a
lot of little patches being landed by individual developers which
seems to reflect the current state of mozilla-inbound.

Perhaps we should instead promote checkin-needed (or a similar simple)
to coalesce simple changes together. Opting into this means that your
patch may take significantly longer to get merged if it's landed with
another bad patch and should only be used when that's acceptable.
Right now developers with commit access are not encouraged to make use
of checkin-needed AFAIK. If we started recommending against individual
landings for simple changes, and improved the process, we could
probably significantly cut the number of tests jobs by cutting the
number of pushes.

On Tue, Aug 19, 2014 at 3:57 PM, Jeff Gilbert jgilb...@mozilla.com wrote:
 I would actually say that debug tests are more important for continuous 
 integration than opt tests. At least in code I deal with, we have a ton of 
 asserts to guarantee behavior, and we really want test coverage with these 
 via CI. If a test passes on debug, it should almost certainly pass on opt, 
 just faster. The opposite is not true.

 They take a long time and then break is part of what I believe caused us to 
 not bother with debug testing on much of Android and B2G, which we still 
 haven't completely fixed. It should be unacceptable to ship without CI on 
 debug tests, but here we are anyways. (This is finally nearly fixed, though 
 there is still some work to do)

 I'm not saying running debug tests less often is on the same scale of bad, 
 but I would like to express my concerns about heading in that direction.

 -Jeff

 - Original Message -
 From: Jonathan Griffin jgrif...@mozilla.com
 To: dev-platform@lists.mozilla.org
 Sent: Tuesday, August 19, 2014 12:22:21 PM
 Subject: Experiment with running debug tests less often on mozilla-inbound
   the week of August 25

 Our pools of test slaves are often at or over capacity, and this has the
 effect of increasing job coalescing and test wait times.  This, in turn,
 can lead to longer tree closures caused by test bustage, and can cause
 try runs to be very slow to complete.

 One of the easiest ways to mitigate this is to run tests less often.

 To assess the impact of doing this, we will be performing an experiment
 the week of August 25, in which we will run debug tests on
 mozilla-inbound on most desktop platforms every other run, instead of
 every run as we do now.  Debug tests on linux64 will continue to run
 every time.  Non-desktop platforms and trees other than mozilla-inbound
 will not be affected.

 This approach is based on the premise that the number of debug-only
 platform-specific failures on desktop is low enough to be manageable,
 and that the extra burden this imposes on the sheriffs will be small
 enough compared to the improvement in test slave metrics to justify the
 cost.

 While this experiment is in progress, we will be monitoring job
 coalescing and test wait times, as well as impacts on sheriffs and
 developers.  If the experiment causes sheriffs to be unable to perform
 their job effectively, it can be terminated prematurely.

 We intend to use the data we collect during the experiment to inform
 decisions about additional tooling we need to make this or a similar
 plan permanent at some point in the future, as well as validating the
 premise on which this experiment is based.

 After the conclusion of this experiment, a follow-up post will be made
 which will discuss our findings.  If you have any concerns, feel free to
 reach out to me.

 Jonathan

 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Ralph Giles
On 2014-08-19 1:55 PM, Benoit Girard wrote:
 Perhaps we should instead promote checkin-needed (or a similar simple)
 to coalesce simple changes together.

I would prefer to use 'checkin-needed' for more things, but am blocked
by the try-needed requirement. We need some way to bless small changes
for inbound without a try push. Look up the author's commit access maybe?

 -r
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Ehsan Akhgari

On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:

I would actually say that debug tests are more important for continuous 
integration than opt tests. At least in code I deal with, we have a ton of 
asserts to guarantee behavior, and we really want test coverage with these via 
CI. If a test passes on debug, it should almost certainly pass on opt, just 
faster. The opposite is not true.

They take a long time and then break is part of what I believe caused us to 
not bother with debug testing on much of Android and B2G, which we still haven't 
completely fixed. It should be unacceptable to ship without CI on debug tests, but here 
we are anyways. (This is finally nearly fixed, though there is still some work to do)

I'm not saying running debug tests less often is on the same scale of bad, but 
I would like to express my concerns about heading in that direction.


I second this.  I'm curious to know why you picked debug tests for this 
experiment.  Would it not make more sense to run opt tests on desktop 
platforms on every other run?


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Jonathan Griffin
I also agree about coalescing better.  We are looking at ways to do that 
in conjunction with 
https://wiki.mozilla.org/Auto-tools/Projects/Autoland, which we'll have 
a prototype of by the end of the quarter.  In this model, commits that 
are going through autoland could be coalesced when landing on inbound, 
which would reduce slave load on all platforms.


Until that's deployed and in widespread use, we have other options to 
decrease slave load, and this experiment is the simplest.  It won't 
result in reduced test coverage, since sheriffs will backfill in the 
case of a regression.  Essentially, we're not running tests that would 
have passed anyway.


Depending on feedback we receive after this experiment, we may opt to 
change our approach in the future:  i.e., run tests every Nth opt build 
instead of debug build, or try to identify sets of never failing tests 
and just run those less frequently, or always include at least one 
flavor of Windows, OSX and Linux on every commit, etc.


Regards,

Jonathan


On 8/19/2014 1:55 PM, Benoit Girard wrote:

I completely agree with Jeff Gilbert on this one.

I think we should try to coalesce -better-. I just checked the current
state of mozilla-inbound and it doesn't feel any of the current patch
really need their own set of tests because they're are not time
sensitive or sufficiently complex. Right now developers are asked to
create bugs for their own change with their own patch. This leads to a
lot of little patches being landed by individual developers which
seems to reflect the current state of mozilla-inbound.

Perhaps we should instead promote checkin-needed (or a similar simple)
to coalesce simple changes together. Opting into this means that your
patch may take significantly longer to get merged if it's landed with
another bad patch and should only be used when that's acceptable.
Right now developers with commit access are not encouraged to make use
of checkin-needed AFAIK. If we started recommending against individual
landings for simple changes, and improved the process, we could
probably significantly cut the number of tests jobs by cutting the
number of pushes.

On Tue, Aug 19, 2014 at 3:57 PM, Jeff Gilbert jgilb...@mozilla.com wrote:

I would actually say that debug tests are more important for continuous 
integration than opt tests. At least in code I deal with, we have a ton of 
asserts to guarantee behavior, and we really want test coverage with these via 
CI. If a test passes on debug, it should almost certainly pass on opt, just 
faster. The opposite is not true.

They take a long time and then break is part of what I believe caused us to 
not bother with debug testing on much of Android and B2G, which we still haven't 
completely fixed. It should be unacceptable to ship without CI on debug tests, but here 
we are anyways. (This is finally nearly fixed, though there is still some work to do)

I'm not saying running debug tests less often is on the same scale of bad, but 
I would like to express my concerns about heading in that direction.

-Jeff

- Original Message -
From: Jonathan Griffin jgrif...@mozilla.com
To: dev-platform@lists.mozilla.org
Sent: Tuesday, August 19, 2014 12:22:21 PM
Subject: Experiment with running debug tests less often on mozilla-inbound  
the week of August 25

Our pools of test slaves are often at or over capacity, and this has the
effect of increasing job coalescing and test wait times.  This, in turn,
can lead to longer tree closures caused by test bustage, and can cause
try runs to be very slow to complete.

One of the easiest ways to mitigate this is to run tests less often.

To assess the impact of doing this, we will be performing an experiment
the week of August 25, in which we will run debug tests on
mozilla-inbound on most desktop platforms every other run, instead of
every run as we do now.  Debug tests on linux64 will continue to run
every time.  Non-desktop platforms and trees other than mozilla-inbound
will not be affected.

This approach is based on the premise that the number of debug-only
platform-specific failures on desktop is low enough to be manageable,
and that the extra burden this imposes on the sheriffs will be small
enough compared to the improvement in test slave metrics to justify the
cost.

While this experiment is in progress, we will be monitoring job
coalescing and test wait times, as well as impacts on sheriffs and
developers.  If the experiment causes sheriffs to be unable to perform
their job effectively, it can be terminated prematurely.

We intend to use the data we collect during the experiment to inform
decisions about additional tooling we need to make this or a similar
plan permanent at some point in the future, as well as validating the
premise on which this experiment is based.

After the conclusion of this experiment, a follow-up post will be made
which will discuss our findings.  If you have any concerns, feel free to
reach out to me.

Jonathan


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Jonathan Griffin

On 8/19/2014 2:41 PM, Ehsan Akhgari wrote:

On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:
I would actually say that debug tests are more important for 
continuous integration than opt tests. At least in code I deal with, 
we have a ton of asserts to guarantee behavior, and we really want 
test coverage with these via CI. If a test passes on debug, it should 
almost certainly pass on opt, just faster. The opposite is not true.


They take a long time and then break is part of what I believe 
caused us to not bother with debug testing on much of Android and 
B2G, which we still haven't completely fixed. It should be 
unacceptable to ship without CI on debug tests, but here we are 
anyways. (This is finally nearly fixed, though there is still some 
work to do)


I'm not saying running debug tests less often is on the same scale of 
bad, but I would like to express my concerns about heading in that 
direction.


I second this.  I'm curious to know why you picked debug tests for 
this experiment.  Would it not make more sense to run opt tests on 
desktop platforms on every other run?


Just based on the fact that they take longer and thus running them less 
frequently would have a larger impact.  If there's a broad consensus 
that debug runs are more valuable, we could switch to running opt tests 
less frequently instead.


Jonathan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Matthew N.

On 8/19/14 12:22 PM, Jonathan Griffin wrote:

To assess the impact of doing this, we will be performing an experiment
the week of August 25, in which we will run debug tests on
mozilla-inbound on most desktop platforms every other run, instead of
every run as we do now.  Debug tests on linux64 will continue to run
every time.  Non-desktop platforms and trees other than mozilla-inbound
will not be affected.


To clarify, is fx-team affected by this change? I ask because you 
mention desktop and that is where the desktop front-end team does 
landings. I suspect fx-team landings are less likely to hit debug-only 
issues than mozilla-inbound as fx-team has much fewer C++ changes and 
anecdotally JS-only changes seem to trigger debug-only failures less often.



This approach is based on the premise that the number of debug-only
platform-specific failures on desktop is low enough to be manageable,
and that the extra burden this imposes on the sheriffs will be small
enough compared to the improvement in test slave metrics to justify the
cost.


FWIW, I think fx-team is more desktop-specific (although Android 
front-end stuff also lands there and I'm not familiar with that).


MattN
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Ehsan Akhgari

On 2014-08-19, 5:49 PM, Jonathan Griffin wrote:

On 8/19/2014 2:41 PM, Ehsan Akhgari wrote:

On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:

I would actually say that debug tests are more important for
continuous integration than opt tests. At least in code I deal with,
we have a ton of asserts to guarantee behavior, and we really want
test coverage with these via CI. If a test passes on debug, it should
almost certainly pass on opt, just faster. The opposite is not true.

They take a long time and then break is part of what I believe
caused us to not bother with debug testing on much of Android and
B2G, which we still haven't completely fixed. It should be
unacceptable to ship without CI on debug tests, but here we are
anyways. (This is finally nearly fixed, though there is still some
work to do)

I'm not saying running debug tests less often is on the same scale of
bad, but I would like to express my concerns about heading in that
direction.


I second this.  I'm curious to know why you picked debug tests for
this experiment.  Would it not make more sense to run opt tests on
desktop platforms on every other run?


Just based on the fact that they take longer and thus running them less
frequently would have a larger impact.  If there's a broad consensus
that debug runs are more valuable, we could switch to running opt tests
less frequently instead.


Yep, the debug tests indeed take more time, mostly because they run more 
checks.  :-)  The checks in opt builds are not exactly a subset of the 
ones in debug builds, but they are close.  Based on that, I think 
running opt tests on every other push is a more conservative one, and I 
support it more.  That being said, for this one week limited trial, 
given that the sheriffs will help backfill the skipped tests, I don't 
care very strongly about this, as long as it doesn't set the precedence 
that we can ignore debug tests!


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread David Burns
I know this is tangential but the small changes are the least tested 
changes in my experience. The try push requirement for checkin-needed 
has had a wonderful impact on the amount of times the tree is closed[1]. 
The tree is less likely to be closed these days.


David

[1] http://futurama.theautomatedtester.co.uk/

On 19/08/2014 22:04, Ralph Giles wrote:

On 2014-08-19 1:55 PM, Benoit Girard wrote:

Perhaps we should instead promote checkin-needed (or a similar simple)
to coalesce simple changes together.

I would prefer to use 'checkin-needed' for more things, but am blocked
by the try-needed requirement. We need some way to bless small changes
for inbound without a try push. Look up the author's commit access maybe?

  -r
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Trevor Saunders
On Tue, Aug 19, 2014 at 02:49:48PM -0700, Jonathan Griffin wrote:
 On 8/19/2014 2:41 PM, Ehsan Akhgari wrote:
 On 2014-08-19, 3:57 PM, Jeff Gilbert wrote:
 I would actually say that debug tests are more important for continuous
 integration than opt tests. At least in code I deal with, we have a ton
 of asserts to guarantee behavior, and we really want test coverage with
 these via CI. If a test passes on debug, it should almost certainly pass
 on opt, just faster. The opposite is not true.
 
 They take a long time and then break is part of what I believe caused
 us to not bother with debug testing on much of Android and B2G, which we
 still haven't completely fixed. It should be unacceptable to ship
 without CI on debug tests, but here we are anyways. (This is finally
 nearly fixed, though there is still some work to do)
 
 I'm not saying running debug tests less often is on the same scale of
 bad, but I would like to express my concerns about heading in that
 direction.
 
 I second this.  I'm curious to know why you picked debug tests for this
 experiment.  Would it not make more sense to run opt tests on desktop
 platforms on every other run?
 
 Just based on the fact that they take longer and thus running them less
 frequently would have a larger impact.  If there's a broad consensus that
 debug runs are more valuable, we could switch to running opt tests less
 frequently instead.

It seems to me our goal here is basically to pick so that the expected
time to detect bustage is minimized without increasing the maximum time
it can take to detect bustage.  That is take p(d) to be the probability
only debug tests will fail, p(o) the probability only opt tests will
fail, and p(b) the probability both will fail.  Then take t(d) and t(o)
the time for a debug and opt test to run respectively.  Now you want to
decide which to run first debug or opt.  you'd expect that if you choose
debug you'd expect to detect bustage in
(p(d) + p(b)) * t(d) + p(o) * (t(o) + t(d))
which simplifies to
t(d) + p(o) * t(o)
On the other hand if you choose to test opt first you get
t(o) + p(d) * t(d)

I suspect we all agree t(d)  t(o) and it seems likely p(d)  p(o), but
it should be clear which is the better choice depends on the exact
values of those numbers (and  this is not a good model of reality in
many ways).

Trev

 
 Jonathan
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Jonathan Griffin
No, fx-team is not affected by this experiment; we intend to target 
mozilla-inbound only for this 1-week trial.  The reason is that the 
number of commits on m-i seems larger than fx-team, and therefore the 
impacts should be more visible.


Jonathan

On 8/19/2014 3:19 PM, Matthew N. wrote:

On 8/19/14 12:22 PM, Jonathan Griffin wrote:

To assess the impact of doing this, we will be performing an experiment
the week of August 25, in which we will run debug tests on
mozilla-inbound on most desktop platforms every other run, instead of
every run as we do now.  Debug tests on linux64 will continue to run
every time.  Non-desktop platforms and trees other than mozilla-inbound
will not be affected.


To clarify, is fx-team affected by this change? I ask because you 
mention desktop and that is where the desktop front-end team does 
landings. I suspect fx-team landings are less likely to hit debug-only 
issues than mozilla-inbound as fx-team has much fewer C++ changes and 
anecdotally JS-only changes seem to trigger debug-only failures less 
often.



This approach is based on the premise that the number of debug-only
platform-specific failures on desktop is low enough to be manageable,
and that the extra burden this imposes on the sheriffs will be small
enough compared to the improvement in test slave metrics to justify the
cost.


FWIW, I think fx-team is more desktop-specific (although Android 
front-end stuff also lands there and I'm not familiar with that).


MattN
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Joshua Cranmer 

On 8/19/2014 5:25 PM, Ehsan Akhgari wrote:
Yep, the debug tests indeed take more time, mostly because they run 
more checks.


Actually, the bigger cause in the slowdown is probably that debug tests 
don't have any optimizations, not more checks. An atomic increment on a 
debug build invokes something like a hundred instructions (including 
several call instructions) whereas the equivalent operation on an opt 
build is just one.


--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25

2014-08-19 Thread Kyle Huey
I'm pretty sure the debug builds on our CI infrastructure are built
with optimization.

- Kyle

On Tue, Aug 19, 2014 at 3:42 PM, Joshua Cranmer  pidgeo...@gmail.com wrote:
 On 8/19/2014 5:25 PM, Ehsan Akhgari wrote:

 Yep, the debug tests indeed take more time, mostly because they run more
 checks.


 Actually, the bigger cause in the slowdown is probably that debug tests
 don't have any optimizations, not more checks. An atomic increment on a
 debug build invokes something like a hundred instructions (including several
 call instructions) whereas the equivalent operation on an opt build is just
 one.

 --
 Joshua Cranmer
 Thunderbird and DXR developer
 Source code archæologist


 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform