Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
I think it makes a lot of sense to test the spread. +1 - Original Message - From: Armen Zambrano G. arme...@mozilla.com To: dev-platform@lists.mozilla.org Sent: Tue, 29 Oct 2013 13:31:33 -0700 (PDT) Subject: Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load Hello all, I would like to re-visit this. I would like to look into stop running tests and talos for 10.7 and re-purpose those machines as 10.6 machines. * We have many more users on 10.6 than on 10.7. * No new updates have been given to 10.6 since July 2011 [1] * No new updates have been given to 10.7 since October, 2012 [2] This will improve our current Mac OSX testing wait times. On another note, 10.9 has come out and I already started seeing a decent dip on 10.8 users (since it is a free update). On another note, I would like to consider stop running jobs on 10.8 and only run them on 10.9 once we have the infrastructure up and running. cheers, Armen [1] https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard#Release_history [2] https://en.wikipedia.org/wiki/Mac_OS_X_Lion#Release_history On 2013-04-25 1:30 PM, Armen Zambrano G. wrote: (please follow up through mozilla.dev.planning) Hello all, I have recently been looking into our Mac OS X test wait times which have been bad for many months and progressively getting worst. Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start within 15 minutes of being requested. This slows down getting tests results for OS X and makes tree closures longer if we have Mac OS X test back logs. Unfortunately, we can't buy any more revision 4 Mac minis (they're not sold anymore) as Apple discontinues old hardware as new ones comes out. In order to improve the turnaround time for Mac testing, we have to look into reducing our test load in one of these two OSes (both of them run on revision 4 minis). We have over a third of our OS X users running 10.6. Eventually, down the road, we could drop 10.6 but we still have a significant amount of our users there; even though Mac stopped serving them major updates since July 2011 [1]. Our current Mac OS X distribution looks like this: * 10.6 - 43% * 10.7 - 30% * 10.8 - 27% OS X 10.8 is the only version that is growing. In order to improve our wait times, I propose that we stop testing on tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as 10.6 to increase our capacity. Please let us know if this plan is unacceptable and needs further discussion. best regards, Armen Zambrano - Mozilla's Release Engineering ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 10/29/2013 4:31 PM, Armen Zambrano G. wrote: In order to improve our wait times, I propose that we stop testing on tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as 10.6 to increase our capacity. Please let us know if this plan is unacceptable and needs further discussion. best regards, Armen Zambrano - Mozilla's Release Engineering +1 to repurposing all rev4s as 10.6 slaves and all rev5s as 10.9! I guess the only question is how many people are stuck on 10.7 (my understanding is that some 10.7-supporting hardware configurations aren't supported on 10.9) and is that population large enough that we explicitly need to test for them? My offhand recollection is that the main discrepancies between the different OSX versions we see in our test infrastructure largely have to do with what hardware they're running on and whether OMTC is enabled or not. So IMO, 10.6 on rev4 w/o OMTC and 10.9 on rev5 w/ OMTC is probably representative enough that we aren't likely to miss any major regressions. -Ryan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On Thursday, April 25, 2013 4:12:16 PM UTC-4, Ed Morley wrote: On 25 April 2013 20:14:10, Justin Lebar wrote: Is this what you're saying? * 10.6 opt tests - per-checkin (no change) * 10.6 debug tests- reduced * 10.7 opt tests - reduced * 10.7 debug tests - reduced * reduced -- m-c, m-a, m-b, m-r, esr17 Yes. Now that I think about this more, maybe we should go big or go home: change 10.6 opt tests to reduced as well, and see how it goes. We can always change it back. If it goes well, we can try to do the same thing with the Windows tests. We should get the sheriffs to sign off. Worth a shot, we can always revert :-) Only thing I might add, is that we'll need a way to opt into 10.6 test jobs on Try, in case someone has to debug issues found on mozilla-central (eg using sfink's undocumented OS version specific syntax). Ed I had to revert a talos change on inbound due to 10.6 failures only just on Wednesday. This was due to a different version of python on 10.6 :( -Joel ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On Friday, April 26, 2013 9:49:18 AM UTC-4, Armen Zambrano G. wrote: Maybe we can keep one of the talos jobs around? (until releng fixes the various python versions' story) IIUC this was more of an infra issue rather than a Firefox testing issue. It was infra related, but it was specific to the 10.6 platform. Even knowing that, I fully support the proposed plan. We could have easily determined the root cause of the 10.6 specific failure a day later on a different branch. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 4/25/13 1:12 PM, Ed Morley wrote: On 25 April 2013 20:14:10, Justin Lebar wrote: Is this what you're saying? * 10.6 opt tests - per-checkin (no change) * 10.6 debug tests- reduced * 10.7 opt tests - reduced * 10.7 debug tests - reduced * reduced -- m-c, m-a, m-b, m-r, esr17 Yes. Now that I think about this more, maybe we should go big or go home: change 10.6 opt tests to reduced as well, and see how it goes. We can always change it back. If it goes well, we can try to do the same thing with the Windows tests. We should get the sheriffs to sign off. Worth a shot, we can always revert :-) Only thing I might add, is that we'll need a way to opt into 10.6 test jobs on Try, in case someone has to debug issues found on mozilla-central (eg using sfink's undocumented OS version specific syntax). So what we're saying is that we are going to completely reverse our previous tree management policy? Currently, m-c is supposed to be the tree that's safely unbroken, and we know it's unbroken because the tests that we run on it have already been run on the tree that merged into it, and you should almost never push directly to it unless you're in a desperate hurry to hit a nightly. This change would mean that we expect to have merges of hundreds of csets from inbound sometimes break m-c with no idea which one broke it, that we expect to sometimes have permaorange on it for days, and that it's better to push your widget/cocoa/ pushes directly to m-c than to inbound. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? A better solution would be to let you trigger a full set of tests (w/o coalescing) on m-i before merging to m-c. We've been asking for a similar feature for tryserver (let us add new jobs to my push) for a long time. Perhaps if we made this change, we could get releng to implement that feature sooner rather than later, particularly if this change caused pain to other teams who pull from a broken m-c. I am not above effecting a sense of urgency in order to get bugs fixed. :) Currently, m-c is supposed to be the tree that's safely unbroken, and we know it's unbroken because the tests that we run on it have already been run on the tree that merged into it, and you should almost never push directly to it unless you're in a desperate hurry to hit a nightly. This change would mean that we expect to have merges of hundreds of csets from inbound sometimes break m-c with no idea which one broke it, that we expect to sometimes have permaorange on it for days, and that it's better to push your widget/cocoa/ pushes directly to m-c than to inbound. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 4/26/13 8:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? It's not just coincidence that the tip of most m-i - m-c merges is a backout - for finding a mergeable cset in the daytime, you're usually looking at the last backout during a tree closure, when we sat and waited to get tests run on it. Otherwise, you pick one that looks possible, and then figure out what got coalesced up and see how that did where it got coalesced. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. Any objections on this plan? We can re-visit later on if we need more disabled. cheers, Armen On 2013-04-26 11:50 AM, Armen Zambrano G. wrote: Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
Would we be able to go back to where we disabled 10.7 altogether? On m-i and try only, or everywhere? On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote: Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. Any objections on this plan? We can re-visit later on if we need more disabled. cheers, Armen On 2013-04-26 11:50 AM, Armen Zambrano G. wrote: Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 2013-04-26 12:14 PM, Justin Lebar wrote: Would we be able to go back to where we disabled 10.7 altogether? On m-i and try only, or everywhere? The initial proposal was for disabling everywhere. We could leave 10.7 opt jobs running everywhere as a compromise and re-visit after I re-purpose the first batch of machines. best regards, Armen On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote: Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. Any objections on this plan? We can re-visit later on if we need more disabled. cheers, Armen On 2013-04-26 11:50 AM, Armen Zambrano G. wrote: Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
I don't think I'm comfortable disabling this platform across the board, or even disabling debug-only runs across the board. As jmaher pointed out, there are platform differences here. If we disable this platform entirely, we lose visibility into rare but, we seem to believe, possible events. It seems like the only reason to disable everywhere instead of only on m-i/try (or running less frequently on m-i, like we do with PGO) is that the former is easier to implement. It seems like we're proposing taking a lot of risk here to work around our own failings... On Fri, Apr 26, 2013 at 1:03 PM, Armen Zambrano G. arme...@mozilla.com wrote: On 2013-04-26 12:14 PM, Justin Lebar wrote: Would we be able to go back to where we disabled 10.7 altogether? On m-i and try only, or everywhere? The initial proposal was for disabling everywhere. We could leave 10.7 opt jobs running everywhere as a compromise and re-visit after I re-purpose the first batch of machines. best regards, Armen On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote: Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. Any objections on this plan? We can re-visit later on if we need more disabled. cheers, Armen On 2013-04-26 11:50 AM, Armen Zambrano G. wrote: Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 2013-04-26 1:31 PM, Justin Lebar wrote: I don't think I'm comfortable disabling this platform across the board, or even disabling debug-only runs across the board. As jmaher pointed out, there are platform differences here. If we disable this platform entirely, we lose visibility into rare but, we seem to believe, possible events. That was a python issue that was related to talos. It was not a Firefox issue that would have only failed on a specific version of Mac. It seems like the only reason to disable everywhere instead of only on m-i/try (or running less frequently on m-i, like we do with PGO) is that the former is easier to implement. It seems like we're proposing taking a lot of risk here to work around our own failings... Yes, it is lot of work to try to change the way that buildbot works to try to optimize not-a-standard method of operations. Just by doing jobs on PGO and not on every checkin it would make the 10.7 platform less than the other versions. I could also have not even started the thread trying to improve our wait times for 10.6 and when one day someone complained about wait times on rev4 I would say we can not buy more machines. Just a little before on the thread you were asking go big or go home and asked to disable even 10.6 debug tests. I'm confused about the different messages. On Fri, Apr 26, 2013 at 1:03 PM, Armen Zambrano G. arme...@mozilla.com wrote: On 2013-04-26 12:14 PM, Justin Lebar wrote: Would we be able to go back to where we disabled 10.7 altogether? On m-i and try only, or everywhere? The initial proposal was for disabling everywhere. We could leave 10.7 opt jobs running everywhere as a compromise and re-visit after I re-purpose the first batch of machines. best regards, Armen On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote: Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. Any objections on this plan? We can re-visit later on if we need more disabled. cheers, Armen On 2013-04-26 11:50 AM, Armen Zambrano G. wrote: Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
After re-reading, I'm happy to disable just m-i/try for now. Modifying to trigger *some* jobs on m-i through would be some decent amount of work (adding Mac pgo builders) but still different than normal operations and increase the 10.6/10.8 test load. On 2013-04-26 1:31 PM, Justin Lebar wrote: I don't think I'm comfortable disabling this platform across the board, or even disabling debug-only runs across the board. As jmaher pointed out, there are platform differences here. If we disable this platform entirely, we lose visibility into rare but, we seem to believe, possible events. It seems like the only reason to disable everywhere instead of only on m-i/try (or running less frequently on m-i, like we do with PGO) is that the former is easier to implement. It seems like we're proposing taking a lot of risk here to work around our own failings... On Fri, Apr 26, 2013 at 1:03 PM, Armen Zambrano G. arme...@mozilla.com wrote: On 2013-04-26 12:14 PM, Justin Lebar wrote: Would we be able to go back to where we disabled 10.7 altogether? On m-i and try only, or everywhere? The initial proposal was for disabling everywhere. We could leave 10.7 opt jobs running everywhere as a compromise and re-visit after I re-purpose the first batch of machines. best regards, Armen On Fri, Apr 26, 2013 at 12:10 PM, Armen Zambrano G. arme...@mozilla.com wrote: Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. Any objections on this plan? We can re-visit later on if we need more disabled. cheers, Armen On 2013-04-26 11:50 AM, Armen Zambrano G. wrote: Would we be able to go back to where we disabled 10.7 altogether? Product (Asa in separate thread) and release drivers (Akeybl) were OK to the compromise of version specific test coverage being removed completely. Side note: adding Mac PGO would increase the build load (Besides this we have to do a large PO as we expect Mac wait times to be showing up as general load increases). Not all reducing load approaches are easy to implement (due to the way that buildbot is designed) and it does not ensure that we would reduce it enough. It's expensive enough to support 3 different versions of Mac as is without bringing 10.9 into the table. We have to cut things at times. One compromise that would be easy to implement and *might* reduce the load is to disable all debug jobs for 10.7. cheers, Armen On 2013-04-26 11:29 AM, Justin Lebar wrote: As a compromise, how hard would it be to run the Mac 10.6 and 10.7 tests on m-i occasionally, like we run the PGO tests? (Maybe we could trigger them on the same csets as we run PGO; it seems like that would be useful.) On Fri, Apr 26, 2013 at 11:19 AM, Ryan VanderMeulen rya...@gmail.com wrote: On 4/26/2013 11:11 AM, Justin Lebar wrote: So what we're saying is that we are going to completely reverse our previous tree management policy? Basically, yes. Although, due to coalescing, do you always have a full run of tests on the tip of m-i before merging to m-c? Yes. Note that we generally aren't merging inbound tip to m-c - we're taking a known-green cset (including PGO tests). ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 4/26/2013 9:10 AM, Armen Zambrano G. wrote: Just disabling debug and talos jobs for 10.7 should reduce more than 50% of the load on 10.7. That might be sufficient for now. I'd be happy for us to disable all Talos jobs on 10.7, on all trees. I've been keeping track of Talos stuff recently and I have not seen any genuine regressions that are 10.7-specific, so I don't think it's providing us much benefit to run these benchmarks on three Mac platforms simultaneously. In terms of tracking regressions, it would be better to have more complete data 10.6 alone than to have incomplete data (due to coalescing) on 10.6 and 10.7. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Improving Mac OS X 10.6 test wait times by reducing 10.7 load
(please follow up through mozilla.dev.planning) Hello all, I have recently been looking into our Mac OS X test wait times which have been bad for many months and progressively getting worst. Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start within 15 minutes of being requested. This slows down getting tests results for OS X and makes tree closures longer if we have Mac OS X test back logs. Unfortunately, we can't buy any more revision 4 Mac minis (they're not sold anymore) as Apple discontinues old hardware as new ones comes out. In order to improve the turnaround time for Mac testing, we have to look into reducing our test load in one of these two OSes (both of them run on revision 4 minis). We have over a third of our OS X users running 10.6. Eventually, down the road, we could drop 10.6 but we still have a significant amount of our users there; even though Mac stopped serving them major updates since July 2011 [1]. Our current Mac OS X distribution looks like this: * 10.6 - 43% * 10.7 - 30% * 10.8 - 27% OS X 10.8 is the only version that is growing. In order to improve our wait times, I propose that we stop testing on tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as 10.6 to increase our capacity. Please let us know if this plan is unacceptable and needs further discussion. best regards, Armen Zambrano - Mozilla's Release Engineering ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
It would be nice if we had data indicating how often tests fail on just one version of MacOS, so we didn't have guess how useful having 10.6, 10.7, and 10.8 tests are. That's bug 860870. It's currently blocked on treeherder, but maybe it should be re-prioritized, since we keep running into cases where this data would be helpful. Anyway, disabling the 10.7 tests sounds reasonable to me given no data, but maybe we continue running these tests on m-c? Maybe we also deprecate the 10.7 tests on tryserver, so you only get the tests if you really really want them? On Thu, Apr 25, 2013 at 1:40 PM, Andreas Gal g...@mozilla.com wrote: How many 10.7 machines do we operate in that pool? Andreas On Apr 25, 2013, at 10:30 AM, Armen Zambrano G. arme...@mozilla.com wrote: (please follow up through mozilla.dev.planning) Hello all, I have recently been looking into our Mac OS X test wait times which have been bad for many months and progressively getting worst. Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start within 15 minutes of being requested. This slows down getting tests results for OS X and makes tree closures longer if we have Mac OS X test back logs. Unfortunately, we can't buy any more revision 4 Mac minis (they're not sold anymore) as Apple discontinues old hardware as new ones comes out. In order to improve the turnaround time for Mac testing, we have to look into reducing our test load in one of these two OSes (both of them run on revision 4 minis). We have over a third of our OS X users running 10.6. Eventually, down the road, we could drop 10.6 but we still have a significant amount of our users there; even though Mac stopped serving them major updates since July 2011 [1]. Our current Mac OS X distribution looks like this: * 10.6 - 43% * 10.7 - 30% * 10.8 - 27% OS X 10.8 is the only version that is growing. In order to improve our wait times, I propose that we stop testing on tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as 10.6 to increase our capacity. Please let us know if this plan is unacceptable and needs further discussion. best regards, Armen Zambrano - Mozilla's Release Engineering ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try. This strategy would prevent any holes in our coverage, but accomplish the goal of reducing load. Seems very reasonable, given how infrequently I've seen tests fail for one OS X version but not another. -Alex On Apr 25, 2013, at 11:02 AM, Armen Zambrano G. arme...@mozilla.com wrote: On 2013-04-25 1:40 PM, Andreas Gal wrote: How many 10.7 machines do we operate in that pool? Andreas 84 of them are 10.6 86 of them are 10.7 Unfortunately, we have a lot of them down (maybe a dozen) trying to fix them (broken hard drives, bad memory, NIC). They don't have warranty. On 2013-04-25 1:55 PM, Justin Lebar wrote: It would be nice if we had data indicating how often tests fail on just one version of MacOS, so we didn't have guess how useful having 10.6, 10.7, and 10.8 tests are. That's bug 860870. It's currently blocked on treeherder, but maybe it should be re-prioritized, since we keep running into cases where this data would be helpful. It would be nice indeed. Anyway, disabling the 10.7 tests sounds reasonable to me given no data, but maybe we continue running these tests on m-c? Maybe we also deprecate the 10.7 tests on tryserver, so you only get the tests if you really really want them? We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try. I assume that the wait times for 10.6 should be good enough but we should be willing to revisit later down the road if they get bad again. We can start with decreasing the load and visit again down the road. Sounds good? cheers, Armen ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try. I wonder if we should do the same for debug 10.6 tests (and maybe builds). The fact of the matter is that coalescing reduces our test coverage on m-i as it is; so long as we run these tests on central and we're OK with occasional bustage there, this seems pretty reasonable to me. On Thu, Apr 25, 2013 at 2:35 PM, Alex Keybl ake...@mozilla.com wrote: We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try. This strategy would prevent any holes in our coverage, but accomplish the goal of reducing load. Seems very reasonable, given how infrequently I've seen tests fail for one OS X version but not another. -Alex On Apr 25, 2013, at 11:02 AM, Armen Zambrano G. arme...@mozilla.com wrote: On 2013-04-25 1:40 PM, Andreas Gal wrote: How many 10.7 machines do we operate in that pool? Andreas 84 of them are 10.6 86 of them are 10.7 Unfortunately, we have a lot of them down (maybe a dozen) trying to fix them (broken hard drives, bad memory, NIC). They don't have warranty. On 2013-04-25 1:55 PM, Justin Lebar wrote: It would be nice if we had data indicating how often tests fail on just one version of MacOS, so we didn't have guess how useful having 10.6, 10.7, and 10.8 tests are. That's bug 860870. It's currently blocked on treeherder, but maybe it should be re-prioritized, since we keep running into cases where this data would be helpful. It would be nice indeed. Anyway, disabling the 10.7 tests sounds reasonable to me given no data, but maybe we continue running these tests on m-c? Maybe we also deprecate the 10.7 tests on tryserver, so you only get the tests if you really really want them? We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try. I assume that the wait times for 10.6 should be good enough but we should be willing to revisit later down the road if they get bad again. We can start with decreasing the load and visit again down the road. Sounds good? cheers, Armen ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
Is this what you're saying? * 10.6 opt tests - per-checkin (no change) * 10.6 debug tests- reduced * 10.7 opt tests - reduced * 10.7 debug tests - reduced * reduced -- m-c, m-a, m-b, m-r, esr17 Yes. Now that I think about this more, maybe we should go big or go home: change 10.6 opt tests to reduced as well, and see how it goes. We can always change it back. If it goes well, we can try to do the same thing with the Windows tests. We should get the sheriffs to sign off. On Thu, Apr 25, 2013 at 2:47 PM, Armen Zambrano Gasparnian arme...@mozilla.com wrote: On 2013-04-25 2:39 PM, Justin Lebar wrote: We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try. I wonder if we should do the same for debug 10.6 tests (and maybe builds). Is this what you're saying? * 10.6 opt tests - per-checkin (no change) * 10.6 debug tests- reduced * 10.7 opt tests - reduced * 10.7 debug tests - reduced * reduced -- m-c, m-a, m-b, m-r, esr17 The fact of the matter is that coalescing reduces our test coverage on m-i as it is; so long as we run these tests on central and we're OK with occasional bustage there, this seems pretty reasonable to me. Great! ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load
On 25 April 2013 20:14:10, Justin Lebar wrote: Is this what you're saying? * 10.6 opt tests - per-checkin (no change) * 10.6 debug tests- reduced * 10.7 opt tests - reduced * 10.7 debug tests - reduced * reduced -- m-c, m-a, m-b, m-r, esr17 Yes. Now that I think about this more, maybe we should go big or go home: change 10.6 opt tests to reduced as well, and see how it goes. We can always change it back. If it goes well, we can try to do the same thing with the Windows tests. We should get the sheriffs to sign off. Worth a shot, we can always revert :-) Only thing I might add, is that we'll need a way to opt into 10.6 test jobs on Try, in case someone has to debug issues found on mozilla-central (eg using sfink's undocumented OS version specific syntax). Ed ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform