Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-10-31 Thread Alexander Keybl
I think it makes a lot of sense to test the spread. +1

- Original Message -
From: Armen Zambrano G. arme...@mozilla.com
To: dev-platform@lists.mozilla.org
Sent: Tue, 29 Oct 2013 13:31:33 -0700 (PDT)
Subject: Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

Hello all,
I would like to re-visit this.

I would like to look into stop running tests and talos for 10.7 and
re-purpose those machines as 10.6 machines.
* We have many more users on 10.6 than on 10.7.
* No new updates have been given to 10.6 since July 2011 [1]
* No new updates have been given to 10.7 since October, 2012 [2]

This will improve our current Mac OSX testing wait times.

On another note, 10.9 has come out and I already started seeing a decent
dip on 10.8 users (since it is a free update).

On another note, I would like to consider stop running jobs on 10.8 and
only run them on 10.9 once we have the infrastructure up and running.

cheers,
Armen

[1] https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard#Release_history
[2] https://en.wikipedia.org/wiki/Mac_OS_X_Lion#Release_history

On 2013-04-25 1:30 PM, Armen Zambrano G. wrote:
 (please follow up through mozilla.dev.planning)
 
 Hello all,
 I have recently been looking into our Mac OS X test wait times which
 have been bad for many months and progressively getting worst.
 Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start
 within 15 minutes of being requested.
 This slows down getting tests results for OS X and makes tree closures
 longer if we have Mac OS X test back logs.
 Unfortunately, we can't buy any more revision 4 Mac minis (they're not
 sold anymore) as Apple discontinues old hardware as new ones comes out.
 
 In order to improve the turnaround time for Mac testing, we have to look
 into reducing our test load in one of these two OSes (both of them run
 on revision 4 minis).
 We have over a third of our OS X users running 10.6. Eventually, down
 the road, we could drop 10.6 but we still have a significant amount of
 our users there; even though Mac stopped serving them major updates
 since July 2011 [1].
 
 Our current Mac OS X distribution looks like this:
 * 10.6 - 43%
 * 10.7 - 30%
 * 10.8 - 27%
 OS X 10.8 is the only version that is growing.
 
 In order to improve our wait times, I propose that we stop testing on
 tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as
 10.6 to increase our capacity.
 
 Please let us know if this plan is unacceptable and needs further
 discussion.
 
 best regards,
 Armen Zambrano - Mozilla's Release Engineering

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-10-31 Thread Ryan VanderMeulen

On 10/29/2013 4:31 PM, Armen Zambrano G. wrote:

In order to improve our wait times, I propose that we stop testing on
tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as
10.6 to increase our capacity.

Please let us know if this plan is unacceptable and needs further
discussion.

best regards,
Armen Zambrano - Mozilla's Release Engineering


+1 to repurposing all rev4s as 10.6 slaves and all rev5s as 10.9!

I guess the only question is how many people are stuck on 10.7 (my 
understanding is that some 10.7-supporting hardware configurations 
aren't supported on 10.9) and is that population large enough that we 
explicitly need to test for them?


My offhand recollection is that the main discrepancies between the 
different OSX versions we see in our test infrastructure largely have to 
do with what hardware they're running on and whether OMTC is enabled or 
not. So IMO, 10.6 on rev4 w/o OMTC and 10.9 on rev5 w/ OMTC is probably 
representative enough that we aren't likely to miss any major regressions.


-Ryan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread jmaher
On Thursday, April 25, 2013 4:12:16 PM UTC-4, Ed Morley wrote:
 On 25 April 2013 20:14:10, Justin Lebar wrote:
 
  Is this what you're saying?
 
  * 10.6 opt tests - per-checkin (no change)
 
  * 10.6 debug tests- reduced
 
  * 10.7 opt tests - reduced
 
  * 10.7 debug tests - reduced
 
 
 
  * reduced -- m-c, m-a, m-b, m-r, esr17
 
 
 
  Yes.
 
 
 
  Now that I think about this more, maybe we should go big or go home:
 
  change 10.6 opt tests to reduced as well, and see how it goes.  We can
 
  always change it back.
 
 
 
  If it goes well, we can try to do the same thing with the Windows tests.
 
 
 
  We should get the sheriffs to sign off.
 
 
 
 Worth a shot, we can always revert :-) Only thing I might add, is that 
 
 we'll need a way to opt into 10.6 test jobs on Try, in case someone has 
 
 to debug issues found on mozilla-central (eg using sfink's undocumented 
 
 OS version specific syntax).
 
 
 
 Ed

I had to revert a talos change on inbound due to 10.6 failures only just on 
Wednesday.  This was due to a different version of python on 10.6 :(  

-Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread jmaher
On Friday, April 26, 2013 9:49:18 AM UTC-4, Armen Zambrano G. wrote:
 
 Maybe we can keep one of the talos jobs around? (until releng fixes the 
 
 various python versions' story)
 
 IIUC this was more of an infra issue rather than a Firefox testing issue.

It was infra related, but it was specific to the 10.6 platform.  Even knowing 
that, I fully support the proposed plan.  We could have easily determined the 
root cause of the 10.6 specific failure a day later on a different branch.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Phil Ringnalda
On 4/25/13 1:12 PM, Ed Morley wrote:
 On 25 April 2013 20:14:10, Justin Lebar wrote:
 Is this what you're saying?
 * 10.6 opt tests - per-checkin (no change)
 * 10.6 debug tests- reduced
 * 10.7 opt tests - reduced
 * 10.7 debug tests - reduced

 * reduced -- m-c, m-a, m-b, m-r, esr17

 Yes.

 Now that I think about this more, maybe we should go big or go home:
 change 10.6 opt tests to reduced as well, and see how it goes.  We can
 always change it back.

 If it goes well, we can try to do the same thing with the Windows tests.

 We should get the sheriffs to sign off.
 
 Worth a shot, we can always revert :-) Only thing I might add, is that
 we'll need a way to opt into 10.6 test jobs on Try, in case someone has
 to debug issues found on mozilla-central (eg using sfink's undocumented
 OS version specific syntax).

So what we're saying is that we are going to completely reverse our
previous tree management policy?

Currently, m-c is supposed to be the tree that's safely unbroken, and we
know it's unbroken because the tests that we run on it have already been
run on the tree that merged into it, and you should almost never push
directly to it unless you're in a desperate hurry to hit a nightly.

This change would mean that we expect to have merges of hundreds of
csets from inbound sometimes break m-c with no idea which one broke it,
that we expect to sometimes have permaorange on it for days, and that
it's better to push your widget/cocoa/ pushes directly to m-c than to
inbound.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Justin Lebar
 So what we're saying is that we are going to completely reverse our
 previous tree management policy?

Basically, yes.

Although, due to coalescing, do you always have a full run of tests on
the tip of m-i before merging to m-c?

A better solution would be to let you trigger a full set of tests (w/o
coalescing) on m-i before merging to m-c.  We've been asking for a
similar feature for tryserver (let us add new jobs to my push) for a
long time.  Perhaps if we made this change, we could get releng to
implement that feature sooner rather than later, particularly if this
change caused pain to other teams who pull from a broken m-c.

I am not above effecting a sense of urgency in order to get bugs fixed.  :)

 Currently, m-c is supposed to be the tree that's safely unbroken, and we
 know it's unbroken because the tests that we run on it have already been
 run on the tree that merged into it, and you should almost never push
 directly to it unless you're in a desperate hurry to hit a nightly.

 This change would mean that we expect to have merges of hundreds of
 csets from inbound sometimes break m-c with no idea which one broke it,
 that we expect to sometimes have permaorange on it for days, and that
 it's better to push your widget/cocoa/ pushes directly to m-c than to
 inbound.
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Ryan VanderMeulen

On 4/26/2013 11:11 AM, Justin Lebar wrote:

So what we're saying is that we are going to completely reverse our
previous tree management policy?


Basically, yes.

Although, due to coalescing, do you always have a full run of tests on
the tip of m-i before merging to m-c?



Yes. Note that we generally aren't merging inbound tip to m-c - we're 
taking a known-green cset (including PGO tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Phil Ringnalda
On 4/26/13 8:11 AM, Justin Lebar wrote:
 So what we're saying is that we are going to completely reverse our
 previous tree management policy?
 
 Basically, yes.
 
 Although, due to coalescing, do you always have a full run of tests on
 the tip of m-i before merging to m-c?

It's not just coincidence that the tip of most m-i - m-c merges is a
backout - for finding a mergeable cset in the daytime, you're usually
looking at the last backout during a tree closure, when we sat and
waited to get tests run on it. Otherwise, you pick one that looks
possible, and then figure out what got coalesced up and see how that did
where it got coalesced.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Armen Zambrano G.

Would we be able to go back to where we disabled 10.7 altogether?
Product (Asa in separate thread) and release drivers (Akeybl) were OK to 
the compromise of version specific test coverage being removed completely.


Side note: adding Mac PGO would increase the build load (Besides this we 
have to do a large PO as we expect Mac wait times to be showing up as 
general load increases).


Not all reducing load approaches are easy to implement (due to the way 
that buildbot is designed) and it does not ensure that we would reduce 
it enough. It's expensive enough to support 3 different versions of Mac 
as is without bringing 10.9 into the table. We have to cut things at times.


One compromise that would be easy to implement and *might* reduce the 
load is to disable all debug jobs for 10.7.


cheers,
Armen

On 2013-04-26 11:29 AM, Justin Lebar wrote:

As a compromise, how hard would it be to run the Mac 10.6 and 10.7
tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
trigger them on the same csets as we run PGO; it seems like that would
be useful.)

On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote:

On 4/26/2013 11:11 AM, Justin Lebar wrote:


So what we're saying is that we are going to completely reverse our
previous tree management policy?



Basically, yes.

Although, due to coalescing, do you always have a full run of tests on
the tip of m-i before merging to m-c?



Yes. Note that we generally aren't merging inbound tip to m-c - we're taking
a known-green cset (including PGO tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Armen Zambrano G.
Just disabling debug and talos jobs for 10.7 should reduce more than 50% 
of the load on 10.7. That might be sufficient for now.


Any objections on this plan?
We can re-visit later on if we need more disabled.

cheers,
Armen

On 2013-04-26 11:50 AM, Armen Zambrano G. wrote:

Would we be able to go back to where we disabled 10.7 altogether?
Product (Asa in separate thread) and release drivers (Akeybl) were OK to
the compromise of version specific test coverage being removed completely.

Side note: adding Mac PGO would increase the build load (Besides this we
have to do a large PO as we expect Mac wait times to be showing up as
general load increases).

Not all reducing load approaches are easy to implement (due to the way
that buildbot is designed) and it does not ensure that we would reduce
it enough. It's expensive enough to support 3 different versions of Mac
as is without bringing 10.9 into the table. We have to cut things at times.

One compromise that would be easy to implement and *might* reduce the
load is to disable all debug jobs for 10.7.

cheers,
Armen

On 2013-04-26 11:29 AM, Justin Lebar wrote:

As a compromise, how hard would it be to run the Mac 10.6 and 10.7
tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
trigger them on the same csets as we run PGO; it seems like that would
be useful.)

On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com
wrote:

On 4/26/2013 11:11 AM, Justin Lebar wrote:


So what we're saying is that we are going to completely reverse our
previous tree management policy?



Basically, yes.

Although, due to coalescing, do you always have a full run of tests on
the tip of m-i before merging to m-c?



Yes. Note that we generally aren't merging inbound tip to m-c - we're
taking
a known-green cset (including PGO tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform




___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Justin Lebar
 Would we be able to go back to where we disabled 10.7 altogether?

On m-i and try only, or everywhere?

On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote:
 Just disabling debug and talos jobs for 10.7 should reduce more than 50% of
 the load on 10.7. That might be sufficient for now.

 Any objections on this plan?
 We can re-visit later on if we need more disabled.

 cheers,
 Armen


 On 2013-04-26 11:50 AM, Armen Zambrano G. wrote:

 Would we be able to go back to where we disabled 10.7 altogether?
 Product (Asa in separate thread) and release drivers (Akeybl) were OK to
 the compromise of version specific test coverage being removed completely.

 Side note: adding Mac PGO would increase the build load (Besides this we
 have to do a large PO as we expect Mac wait times to be showing up as
 general load increases).

 Not all reducing load approaches are easy to implement (due to the way
 that buildbot is designed) and it does not ensure that we would reduce
 it enough. It's expensive enough to support 3 different versions of Mac
 as is without bringing 10.9 into the table. We have to cut things at
 times.

 One compromise that would be easy to implement and *might* reduce the
 load is to disable all debug jobs for 10.7.

 cheers,
 Armen

 On 2013-04-26 11:29 AM, Justin Lebar wrote:

 As a compromise, how hard would it be to run the Mac 10.6 and 10.7
 tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
 trigger them on the same csets as we run PGO; it seems like that would
 be useful.)

 On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com
 wrote:

 On 4/26/2013 11:11 AM, Justin Lebar wrote:


 So what we're saying is that we are going to completely reverse our
 previous tree management policy?



 Basically, yes.

 Although, due to coalescing, do you always have a full run of tests on
 the tip of m-i before merging to m-c?


 Yes. Note that we generally aren't merging inbound tip to m-c - we're
 taking
 a known-green cset (including PGO tests).

 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform



 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Armen Zambrano G.


On 2013-04-26 12:14 PM, Justin Lebar wrote:

Would we be able to go back to where we disabled 10.7 altogether?


On m-i and try only, or everywhere?


The initial proposal was for disabling everywhere.

We could leave 10.7 opt jobs running everywhere as a compromise and 
re-visit after I re-purpose the first batch of machines.


best regards,
Armen



On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote:

Just disabling debug and talos jobs for 10.7 should reduce more than 50% of
the load on 10.7. That might be sufficient for now.

Any objections on this plan?
We can re-visit later on if we need more disabled.

cheers,
Armen


On 2013-04-26 11:50 AM, Armen Zambrano G. wrote:


Would we be able to go back to where we disabled 10.7 altogether?
Product (Asa in separate thread) and release drivers (Akeybl) were OK to
the compromise of version specific test coverage being removed completely.

Side note: adding Mac PGO would increase the build load (Besides this we
have to do a large PO as we expect Mac wait times to be showing up as
general load increases).

Not all reducing load approaches are easy to implement (due to the way
that buildbot is designed) and it does not ensure that we would reduce
it enough. It's expensive enough to support 3 different versions of Mac
as is without bringing 10.9 into the table. We have to cut things at
times.

One compromise that would be easy to implement and *might* reduce the
load is to disable all debug jobs for 10.7.

cheers,
Armen

On 2013-04-26 11:29 AM, Justin Lebar wrote:


As a compromise, how hard would it be to run the Mac 10.6 and 10.7
tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
trigger them on the same csets as we run PGO; it seems like that would
be useful.)

On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com
wrote:


On 4/26/2013 11:11 AM, Justin Lebar wrote:



So what we're saying is that we are going to completely reverse our
previous tree management policy?




Basically, yes.

Although, due to coalescing, do you always have a full run of tests on
the tip of m-i before merging to m-c?



Yes. Note that we generally aren't merging inbound tip to m-c - we're
taking
a known-green cset (including PGO tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform





___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Justin Lebar
I don't think I'm comfortable disabling this platform across the
board, or even disabling debug-only runs across the board.

As jmaher pointed out, there are platform differences here.  If we
disable this platform entirely, we lose visibility into rare but, we
seem to believe, possible events.

It seems like the only reason to disable everywhere instead of only on
m-i/try (or running less frequently on m-i, like we do with PGO) is
that the former is easier to implement.  It seems like we're proposing
taking a lot of risk here to work around our own failings...

On Fri, Apr 26, 2013 at 1:03 PM, Armen Zambrano G. arme...@mozilla.com wrote:

 On 2013-04-26 12:14 PM, Justin Lebar wrote:

 Would we be able to go back to where we disabled 10.7 altogether?


 On m-i and try only, or everywhere?


 The initial proposal was for disabling everywhere.

 We could leave 10.7 opt jobs running everywhere as a compromise and re-visit
 after I re-purpose the first batch of machines.

 best regards,
 Armen



 On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com
 wrote:

 Just disabling debug and talos jobs for 10.7 should reduce more than 50%
 of
 the load on 10.7. That might be sufficient for now.

 Any objections on this plan?
 We can re-visit later on if we need more disabled.

 cheers,
 Armen


 On 2013-04-26 11:50 AM, Armen Zambrano G. wrote:


 Would we be able to go back to where we disabled 10.7 altogether?
 Product (Asa in separate thread) and release drivers (Akeybl) were OK to
 the compromise of version specific test coverage being removed
 completely.

 Side note: adding Mac PGO would increase the build load (Besides this we
 have to do a large PO as we expect Mac wait times to be showing up as
 general load increases).

 Not all reducing load approaches are easy to implement (due to the way
 that buildbot is designed) and it does not ensure that we would reduce
 it enough. It's expensive enough to support 3 different versions of Mac
 as is without bringing 10.9 into the table. We have to cut things at
 times.

 One compromise that would be easy to implement and *might* reduce the
 load is to disable all debug jobs for 10.7.

 cheers,
 Armen

 On 2013-04-26 11:29 AM, Justin Lebar wrote:


 As a compromise, how hard would it be to run the Mac 10.6 and 10.7
 tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
 trigger them on the same csets as we run PGO; it seems like that would
 be useful.)

 On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com
 wrote:


 On 4/26/2013 11:11 AM, Justin Lebar wrote:



 So what we're saying is that we are going to completely reverse our
 previous tree management policy?




 Basically, yes.

 Although, due to coalescing, do you always have a full run of tests
 on
 the tip of m-i before merging to m-c?


 Yes. Note that we generally aren't merging inbound tip to m-c - we're
 taking
 a known-green cset (including PGO tests).

 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform




 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform


 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Armen Zambrano G.

On 2013-04-26 1:31 PM, Justin Lebar wrote:

I don't think I'm comfortable disabling this platform across the
board, or even disabling debug-only runs across the board.

As jmaher pointed out, there are platform differences here.  If we
disable this platform entirely, we lose visibility into rare but, we
seem to believe, possible events.


That was a python issue that was related to talos.
It was not a Firefox issue that would have only failed on a specific 
version of Mac.



It seems like the only reason to disable everywhere instead of only on
m-i/try (or running less frequently on m-i, like we do with PGO) is
that the former is easier to implement.  It seems like we're proposing
taking a lot of risk here to work around our own failings...

Yes, it is lot of work to try to change the way that buildbot works to 
try to optimize not-a-standard method of operations.
Just by doing jobs on PGO and not on every checkin it would make the 
10.7 platform less than the other versions.


I could also have not even started the thread trying to improve our wait 
times for 10.6 and when one day someone complained about wait times on 
rev4 I would say we can not buy more machines.


Just a little before on the thread you were asking go big or go home 
and asked to disable even 10.6 debug tests. I'm confused about the 
different messages.




On Fri, Apr 26, 2013 at 1:03 PM, Armen Zambrano G. arme...@mozilla.com wrote:


On 2013-04-26 12:14 PM, Justin Lebar wrote:


Would we be able to go back to where we disabled 10.7 altogether?



On m-i and try only, or everywhere?



The initial proposal was for disabling everywhere.

We could leave 10.7 opt jobs running everywhere as a compromise and re-visit
after I re-purpose the first batch of machines.

best regards,
Armen




On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com
wrote:


Just disabling debug and talos jobs for 10.7 should reduce more than 50%
of
the load on 10.7. That might be sufficient for now.

Any objections on this plan?
We can re-visit later on if we need more disabled.

cheers,
Armen


On 2013-04-26 11:50 AM, Armen Zambrano G. wrote:



Would we be able to go back to where we disabled 10.7 altogether?
Product (Asa in separate thread) and release drivers (Akeybl) were OK to
the compromise of version specific test coverage being removed
completely.

Side note: adding Mac PGO would increase the build load (Besides this we
have to do a large PO as we expect Mac wait times to be showing up as
general load increases).

Not all reducing load approaches are easy to implement (due to the way
that buildbot is designed) and it does not ensure that we would reduce
it enough. It's expensive enough to support 3 different versions of Mac
as is without bringing 10.9 into the table. We have to cut things at
times.

One compromise that would be easy to implement and *might* reduce the
load is to disable all debug jobs for 10.7.

cheers,
Armen

On 2013-04-26 11:29 AM, Justin Lebar wrote:



As a compromise, how hard would it be to run the Mac 10.6 and 10.7
tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
trigger them on the same csets as we run PGO; it seems like that would
be useful.)

On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com
wrote:



On 4/26/2013 11:11 AM, Justin Lebar wrote:




So what we're saying is that we are going to completely reverse our
previous tree management policy?





Basically, yes.

Although, due to coalescing, do you always have a full run of tests
on
the tip of m-i before merging to m-c?



Yes. Note that we generally aren't merging inbound tip to m-c - we're
taking
a known-green cset (including PGO tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform






___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Armen Zambrano G.

After re-reading, I'm happy to disable just m-i/try for now.

Modifying to trigger *some* jobs on m-i through would be some decent 
amount of work (adding Mac pgo builders) but still different than normal 
operations and increase the 10.6/10.8 test load.


On 2013-04-26 1:31 PM, Justin Lebar wrote:

I don't think I'm comfortable disabling this platform across the
board, or even disabling debug-only runs across the board.

As jmaher pointed out, there are platform differences here.  If we
disable this platform entirely, we lose visibility into rare but, we
seem to believe, possible events.

It seems like the only reason to disable everywhere instead of only on
m-i/try (or running less frequently on m-i, like we do with PGO) is
that the former is easier to implement.  It seems like we're proposing
taking a lot of risk here to work around our own failings...

On Fri, Apr 26, 2013 at 1:03 PM, Armen Zambrano G. arme...@mozilla.com wrote:


On 2013-04-26 12:14 PM, Justin Lebar wrote:


Would we be able to go back to where we disabled 10.7 altogether?



On m-i and try only, or everywhere?



The initial proposal was for disabling everywhere.

We could leave 10.7 opt jobs running everywhere as a compromise and re-visit
after I re-purpose the first batch of machines.

best regards,
Armen




On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com
wrote:


Just disabling debug and talos jobs for 10.7 should reduce more than 50%
of
the load on 10.7. That might be sufficient for now.

Any objections on this plan?
We can re-visit later on if we need more disabled.

cheers,
Armen


On 2013-04-26 11:50 AM, Armen Zambrano G. wrote:



Would we be able to go back to where we disabled 10.7 altogether?
Product (Asa in separate thread) and release drivers (Akeybl) were OK to
the compromise of version specific test coverage being removed
completely.

Side note: adding Mac PGO would increase the build load (Besides this we
have to do a large PO as we expect Mac wait times to be showing up as
general load increases).

Not all reducing load approaches are easy to implement (due to the way
that buildbot is designed) and it does not ensure that we would reduce
it enough. It's expensive enough to support 3 different versions of Mac
as is without bringing 10.9 into the table. We have to cut things at
times.

One compromise that would be easy to implement and *might* reduce the
load is to disable all debug jobs for 10.7.

cheers,
Armen

On 2013-04-26 11:29 AM, Justin Lebar wrote:



As a compromise, how hard would it be to run the Mac 10.6 and 10.7
tests on m-i occasionally, like we run the PGO tests?  (Maybe we could
trigger them on the same csets as we run PGO; it seems like that would
be useful.)

On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com
wrote:



On 4/26/2013 11:11 AM, Justin Lebar wrote:




So what we're saying is that we are going to completely reverse our
previous tree management policy?





Basically, yes.

Although, due to coalescing, do you always have a full run of tests
on
the tip of m-i before merging to m-c?



Yes. Note that we generally aren't merging inbound tip to m-c - we're
taking
a known-green cset (including PGO tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform






___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread Matt Brubeck

On 4/26/2013 9:10 AM, Armen Zambrano G. wrote:

Just disabling debug and talos jobs for 10.7 should reduce more than 50%
of the load on 10.7. That might be sufficient for now.


I'd be happy for us to disable all Talos jobs on 10.7, on all trees. 
I've been keeping track of Talos stuff recently and I have not seen any 
genuine regressions that are 10.7-specific, so I don't think it's 
providing us much benefit to run these benchmarks on three Mac platforms 
simultaneously.


In terms of tracking regressions, it would be better to have more 
complete data 10.6 alone than to have incomplete data (due to 
coalescing) on 10.6 and 10.7.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-25 Thread Armen Zambrano G.

(please follow up through mozilla.dev.planning)

Hello all,
I have recently been looking into our Mac OS X test wait times which 
have been bad for many months and progressively getting worst.

Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start
within 15 minutes of being requested.
This slows down getting tests results for OS X and makes tree closures 
longer if we have Mac OS X test back logs.
Unfortunately, we can't buy any more revision 4 Mac minis (they're not 
sold anymore) as Apple discontinues old hardware as new ones comes out.


In order to improve the turnaround time for Mac testing, we have to look 
into reducing our test load in one of these two OSes (both of them run 
on revision 4 minis).
We have over a third of our OS X users running 10.6. Eventually, down 
the road, we could drop 10.6 but we still have a significant amount of 
our users there; even though Mac stopped serving them major updates 
since July 2011 [1].


Our current Mac OS X distribution looks like this:
* 10.6 - 43%
* 10.7 - 30%
* 10.8 - 27%
OS X 10.8 is the only version that is growing.

In order to improve our wait times, I propose that we stop testing on 
tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as 
10.6 to increase our capacity.


Please let us know if this plan is unacceptable and needs further 
discussion.


best regards,
Armen Zambrano - Mozilla's Release Engineering
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-25 Thread Justin Lebar
It would be nice if we had data indicating how often tests fail on
just one version of MacOS, so we didn't have guess how useful having
10.6, 10.7, and 10.8 tests are.  That's bug 860870.  It's currently
blocked on treeherder, but maybe it should be re-prioritized, since we
keep running into cases where this data would be helpful.

Anyway, disabling the 10.7 tests sounds reasonable to me given no
data, but maybe we continue running these tests on m-c?  Maybe we also
deprecate the 10.7 tests on tryserver, so you only get the tests if
you really really want them?

On Thu, Apr 25, 2013 at 1:40 PM, Andreas Gal g...@mozilla.com wrote:

 How many 10.7 machines do we operate in that pool?

 Andreas

 On Apr 25, 2013, at 10:30 AM, Armen Zambrano G. arme...@mozilla.com wrote:

 (please follow up through mozilla.dev.planning)

 Hello all,
 I have recently been looking into our Mac OS X test wait times which have 
 been bad for many months and progressively getting worst.
 Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start
 within 15 minutes of being requested.
 This slows down getting tests results for OS X and makes tree closures 
 longer if we have Mac OS X test back logs.
 Unfortunately, we can't buy any more revision 4 Mac minis (they're not sold 
 anymore) as Apple discontinues old hardware as new ones comes out.

 In order to improve the turnaround time for Mac testing, we have to look 
 into reducing our test load in one of these two OSes (both of them run on 
 revision 4 minis).
 We have over a third of our OS X users running 10.6. Eventually, down the 
 road, we could drop 10.6 but we still have a significant amount of our users 
 there; even though Mac stopped serving them major updates since July 2011 
 [1].

 Our current Mac OS X distribution looks like this:
 * 10.6 - 43%
 * 10.7 - 30%
 * 10.8 - 27%
 OS X 10.8 is the only version that is growing.

 In order to improve our wait times, I propose that we stop testing on tbpl 
 per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as 10.6 to 
 increase our capacity.

 Please let us know if this plan is unacceptable and needs further discussion.

 best regards,
 Armen Zambrano - Mozilla's Release Engineering
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-25 Thread Alex Keybl
 We could come to the compromise of running them on m-c, m-a, m-b and m-r. 
 Only this would help a lot since most of the load comes from m-i and try. We 
 could make it a non-by-default platform on try.

This strategy would prevent any holes in our coverage, but accomplish the goal 
of reducing load. Seems very reasonable, given how infrequently I've seen tests 
fail for one OS X version but not another.

-Alex

On Apr 25, 2013, at 11:02 AM, Armen Zambrano G. arme...@mozilla.com wrote:

 On 2013-04-25 1:40 PM, Andreas Gal wrote:
  How many 10.7 machines do we operate in that pool?
 
  Andreas
 84 of them are 10.6
 86 of them are 10.7
 
 Unfortunately, we have a lot of them down (maybe a dozen) trying to fix them 
 (broken hard drives, bad memory, NIC). They don't have warranty.
 
 On 2013-04-25 1:55 PM, Justin Lebar wrote:
 It would be nice if we had data indicating how often tests fail on
 just one version of MacOS, so we didn't have guess how useful having
 10.6, 10.7, and 10.8 tests are.  That's bug 860870.  It's currently
 blocked on treeherder, but maybe it should be re-prioritized, since we
 keep running into cases where this data would be helpful.
 
 It would be nice indeed.
 
 Anyway, disabling the 10.7 tests sounds reasonable to me given no
 data, but maybe we continue running these tests on m-c?  Maybe we also
 deprecate the 10.7 tests on tryserver, so you only get the tests if
 you really really want them?
 
 We could come to the compromise of running them on m-c, m-a, m-b and m-r. 
 Only this would help a lot since most of the load comes from m-i and try. We 
 could make it a non-by-default platform on try.
 I assume that the wait times for 10.6 should be good enough but we should be 
 willing to revisit later down the road if they get bad again.
 
 We can start with decreasing the load and visit again down the road.
 
 Sounds good?
 
 cheers,
 Armen
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-25 Thread Justin Lebar
 We could come to the compromise of running them on m-c, m-a, m-b and m-r. 
 Only this would help a lot since most of the load comes from m-i and try. We 
 could make it a non-by-default platform on try.

I wonder if we should do the same for debug 10.6 tests (and maybe builds).

The fact of the matter is that coalescing reduces our test coverage on
m-i as it is; so long as we run these tests on central and we're OK
with occasional bustage there, this seems pretty reasonable to me.

On Thu, Apr 25, 2013 at 2:35 PM, Alex Keybl ake...@mozilla.com wrote:
 We could come to the compromise of running them on m-c, m-a, m-b and m-r. 
 Only this would help a lot since most of the load comes from m-i and try. We 
 could make it a non-by-default platform on try.

 This strategy would prevent any holes in our coverage, but accomplish the 
 goal of reducing load. Seems very reasonable, given how infrequently I've 
 seen tests fail for one OS X version but not another.

 -Alex

 On Apr 25, 2013, at 11:02 AM, Armen Zambrano G. arme...@mozilla.com wrote:

 On 2013-04-25 1:40 PM, Andreas Gal wrote:
  How many 10.7 machines do we operate in that pool?
 
  Andreas
 84 of them are 10.6
 86 of them are 10.7

 Unfortunately, we have a lot of them down (maybe a dozen) trying to fix them 
 (broken hard drives, bad memory, NIC). They don't have warranty.

 On 2013-04-25 1:55 PM, Justin Lebar wrote:
 It would be nice if we had data indicating how often tests fail on
 just one version of MacOS, so we didn't have guess how useful having
 10.6, 10.7, and 10.8 tests are.  That's bug 860870.  It's currently
 blocked on treeherder, but maybe it should be re-prioritized, since we
 keep running into cases where this data would be helpful.

 It would be nice indeed.

 Anyway, disabling the 10.7 tests sounds reasonable to me given no
 data, but maybe we continue running these tests on m-c?  Maybe we also
 deprecate the 10.7 tests on tryserver, so you only get the tests if
 you really really want them?

 We could come to the compromise of running them on m-c, m-a, m-b and m-r. 
 Only this would help a lot since most of the load comes from m-i and try. We 
 could make it a non-by-default platform on try.
 I assume that the wait times for 10.6 should be good enough but we should be 
 willing to revisit later down the road if they get bad again.

 We can start with decreasing the load and visit again down the road.

 Sounds good?

 cheers,
 Armen
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-25 Thread Justin Lebar
 Is this what you're saying?
 * 10.6 opt tests - per-checkin (no change)
 * 10.6 debug tests- reduced
 * 10.7 opt tests - reduced
 * 10.7 debug tests - reduced

 * reduced -- m-c, m-a, m-b, m-r, esr17

Yes.

Now that I think about this more, maybe we should go big or go home:
change 10.6 opt tests to reduced as well, and see how it goes.  We can
always change it back.

If it goes well, we can try to do the same thing with the Windows tests.

We should get the sheriffs to sign off.

On Thu, Apr 25, 2013 at 2:47 PM, Armen Zambrano Gasparnian
arme...@mozilla.com wrote:
 On 2013-04-25 2:39 PM, Justin Lebar wrote:

 We could come to the compromise of running them on m-c, m-a, m-b and
 m-r. Only this would help a lot since most of the load comes from m-i and
 try. We could make it a non-by-default platform on try.

 I wonder if we should do the same for debug 10.6 tests (and maybe builds).

 Is this what you're saying?
 * 10.6 opt tests - per-checkin (no change)
 * 10.6 debug tests- reduced
 * 10.7 opt tests - reduced
 * 10.7 debug tests - reduced

 * reduced -- m-c, m-a, m-b, m-r, esr17



 The fact of the matter is that coalescing reduces our test coverage on
 m-i as it is; so long as we run these tests on central and we're OK
 with occasional bustage there, this seems pretty reasonable to me.

 Great!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-25 Thread Ed Morley

On 25 April 2013 20:14:10, Justin Lebar wrote:

Is this what you're saying?
* 10.6 opt tests - per-checkin (no change)
* 10.6 debug tests- reduced
* 10.7 opt tests - reduced
* 10.7 debug tests - reduced

* reduced -- m-c, m-a, m-b, m-r, esr17


Yes.

Now that I think about this more, maybe we should go big or go home:
change 10.6 opt tests to reduced as well, and see how it goes.  We can
always change it back.

If it goes well, we can try to do the same thing with the Windows tests.

We should get the sheriffs to sign off.


Worth a shot, we can always revert :-) Only thing I might add, is that 
we'll need a way to opt into 10.6 test jobs on Try, in case someone has 
to debug issues found on mozilla-central (eg using sfink's undocumented 
OS version specific syntax).


Ed
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform