Re: PSA: Python min version bumped to 3.6 for building gecko
The pyenv[1] project is a great way to manage multiple versions of python on your system. I've found it easier than trying to compile directly from source. Cheers, Chris [1] https://github.com/pyenv/pyenv On Wed, 10 Jun 2020 at 16:52, Kartikaya Gupta wrote: > For those of you who like me are still running Ubuntu 16.04 LTS: the > minimum version of python required to build gecko got bumped from 3.5 > to 3.6. As Ubuntu 16.04 doesn't offer python3.6 out of the box, you > may need to build it from source to get going again. See > https://bugzilla.mozilla.org/show_bug.cgi?id=1644845#c10 for steps > that worked for me. > > Cheers, > kats > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposal to adjust testing to run on PGO builds only and not test on OPT builds
Thank you Joel for writing up this proposal! Are you also proposing that we stop the linux64-opt and win64-opt builds as well, except for leaving them as an available option on try? If we're not testing them on integration or release branches, there doesn't seem to be much purpose in doing the builds. On Thu, 3 Jan 2019 at 11:20, jmaher wrote: > I would like to propose that we do not run tests on linux64-opt, > windows7-opt, and windows10-opt. > > Why am I proposing this: > 1) All test regressions that were found on trunk are mostly on debug, and > in fewer cases on PGO. There are no unique regressions found in the last 6 > months (all the data I looked at) that are exclusive to OPT builds. > 2) On mozilla-beta, mozilla-release, and ESR, we only build/test PGO > builds, we do not run tests on plan OPT builds > 3) This will reduce the jobs (about 16%) we run which in turn reduces, cpu > time, money spent, turnaround time, intermittents, complexity of the > taskgraph. > 4) PGO builds are very similar to OPT builds, but we add flags to generate > profile data and small adjustments to build scripts behind MOZ_PGO flag > in-tree, then we launch the browser, collect data, and repack our binaries > for faster performance. > 5) We ship PGO builds, not OPT builds > > What are the risks associated with this? > 1) try server build times will increase as we will be testing on PGO > instead of OPT > 2) we could miss a regression that only shows up on OPT, but if we only > ship PGO and once we leave central we do not build OPT, this is a very low > risk. > > I would like to hear any concerns you might have on this or other areas > which I have overlooked. Assuming there are no risks which block this, I > would like to have a decision by January 11th, and make the adjustments on > January 28th when Firefox 67 is on trunk. > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Launch of Phabricator and Lando for mozilla-central
This is really great news, I'm really excited to start using it! Automated landings from code review is such a game changer for productivity and security. Congrats to everyone involved. Cheers, Chris On Wed, 6 Jun 2018 at 11:01, Mark Côté wrote: > > The Engineering Workflow team is happy to announce the release of Phabricator > and Lando for general use. Going forward, Phabricator will be the primary > code-review tool for modifications to the mozilla-central repository, > replacing both MozReview and Splinter. Lando is an all-new automatic-landing > system that works with Phabricator. This represents about a year of work > integrating Phabricator with our systems and building out Lando. Phabricator > has been in use by a few teams since last year, and Lando has been used by > the Engineering Workflow team for several weeks and lately has successfully > landed a few changesets to mozilla-central. > > Phabricator is a suite of applications, but we are primarily using the > code-review tool, called Differential, which will be taking the place of > MozReview and Splinter. Bug tracking will continue to be done with Bugzilla, > which is integrated with Phabricator. You will log into Phabricator via > Bugzilla. We will soon begin sunsetting MozReview, and Splinter will be made > read-only (or replaced with another patch viewer). An upcoming post will > outline the plans for the deprecation, archival, and decommission of > MozReview, with Splinter to follow. > > I also want to thank Phacility, the company behind Phabricator, who provided > both excellent support and work on Phabricator itself to meet our > requirements in an exceptionally helpful and responsive way. > > User documentation on Phabricator catered specifically to Mozillians can be > found at https://moz-conduit.readthedocs.io/en/latest/phabricator-user.html. > It is also linked from within Phabricator, in the left-hand menu on the home > page. > > User documentation on Lando can be found at > https://moz-conduit.readthedocs.io/en/latest/lando-user.html. > > MDN documentation is currently being updated. > > At the moment, Phabricator can support confidential revisions when they are > associated with a confidential bug, that is, a bug with one or more security > groups applied. Lando, however, cannot currently land these revisions. This > is a limitation we plan to fix in Q3. You can follow > https://bugzilla.mozilla.org/show_bug.cgi?id=1443704 for developments. See > http://moz-conduit.readthedocs.io/en/latest/phabricator-user.html#landing-patches > for our recommendations on landing patches in Phabricator without Lando. > > Similarly, there are two other features which are not part of initial launch > but will follow in subsequent releases: > * Stacked revisions. If you have a stack of revisions, that is, two or more > revisions with parent-child relationships, Lando cannot land them all at > once. You will need to individually land them. This is filed as > https://bugzilla.mozilla.org/show_bug.cgi?id=1457525. > * Try support. Users will have to push to the Try server manually until this > is implemented. See https://bugzilla.mozilla.org/show_bug.cgi?id=1466275. > > Finally, we realize there are a few oddities with the UI that we will also be > fixing in parallel with the new features. See > https://bugzilla.mozilla.org/show_bug.cgi?id=1466120. > > The documentation lists several ways of getting in touch with the Engineering > Workflow team, but #phabricator and #lando on IRC are good starting points. > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing tinderbox-builds from archive.mozilla.org
On Tue, 29 May 2018 at 14:21, L. David Baron wrote: > > On Monday 2018-05-28 15:52 -0400, Chris AtLee wrote: > > Here's a bit of a strawman proposal...What if we keep the > > {mozilla-central,mozilla-inbound,autoland}-{linux,linux64,macosx64,win32,win64}{,-pgo}/ > > directories in tinderbox-builds for now, and delete all the others. Does > > that cover the majority of the use cases for wanting to access these old > > builds? > > > > I'm guessing the historical builds for old esr branches aren't useful now. > > Nor are the mozilla-aurora, mozilla-beta, mozilla-release, or b2g-inbound > > builds. > > This seems reasonable to me, with the one caveat that I think > b2g-inbound belongs in the other bucket. It was essentially used as > another peer to mozilla-inbound and autoland, and while many of the > changes landed there were b2g-only, many of them weren't, and may > have caused regressions that affect products that we still maintain. Ok, we can do that. For mobile, I haven't heard anybody express a desire to keep around old CI builds in https://archive.mozilla.org/pub/mobile/tinderbox-builds/, so I'm planning to have those deleted in July. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing tinderbox-builds from archive.mozilla.org
On Sun, 20 May 2018 at 19:40, Karl Tomlinson wrote: > On Fri, 18 May 2018 13:13:04 -0400, Chris AtLee wrote: > > IMO, it's not reasonable to keep CI builds around forever, so the question > > is then how long to keep them? 1 year doesn't quite cover a full ESR cycle, > > would 18 months be sufficient for most cases? > > > > Alternatively, we could investigate having different expiration policies > > for different type of artifacts. My assumption is that the Firefox binaries > > for the opt builds are the most useful over the long term, and that other > > build configurations and artifacts are less useful. How accurate is that > > assumption? > Having a subset of builds around for longer would be more useful > to me than having all builds available for a shorter period. > The nightly builds often include large numbers of changesets, > sometimes collected over several days, and so it becomes hard to > identify which code change modified a particular behavior. > I always use opt builds for regression testing, and so your > assumption is consistent with my experience. > I assume there are more pgo builds than nightly builds, but fewer > than all opt builds. If so, then having a long expiration policy > on pgo builds could be a helpful way to reduce storage costs but > maintain the most valuable builds. Here's a bit of a strawman proposal...What if we keep the {mozilla-central,mozilla-inbound,autoland}-{linux,linux64,macosx64,win32,win64}{,-pgo}/ directories in tinderbox-builds for now, and delete all the others. Does that cover the majority of the use cases for wanting to access these old builds? I'm guessing the historical builds for old esr branches aren't useful now. Nor are the mozilla-aurora, mozilla-beta, mozilla-release, or b2g-inbound builds. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing tinderbox-builds from archive.mozilla.org
The discussion about what to do about these particular buildbot builds has naturally shifted into a discussion about what kind of retention policy is appropriate for CI builds. I believe that right now we keep all CI build artifacts for 1 year. Nightly and release builds are kept forever. There's certainly an advantage to keeping the CI builds, as they assist in bisecting regressions. However, they become less useful over time. IMO, it's not reasonable to keep CI builds around forever, so the question is then how long to keep them? 1 year doesn't quite cover a full ESR cycle, would 18 months be sufficient for most cases? Alternatively, we could investigate having different expiration policies for different type of artifacts. My assumption is that the Firefox binaries for the opt builds are the most useful over the long term, and that other build configurations and artifacts are less useful. How accurate is that assumption? Archiving these artifacts into Glacier would cut the cost of storing them significantly, but also make them much harder to access. It can take 3-5 hours to retrieve objects from Glacier, and we would need to implement some API or process to request access to archived objects. On Thu, 17 May 2018 at 10:33, Mike Kaplywrote: > Can we move the builds temporarily and see if it affects workflows over a > few months and if not, then remove them? > > Mike > > On Thu, May 17, 2018 at 9:22 AM, Tom Ritter wrote: > > > I agree with ekr in general, but I would also be curious to discover > > what failures we would experience in practice and how we could > > overcome them. > > > > I think many of the issues experienced with local builds are > > preventable by doing a TC-like build; just build in a docker container > > (for Linux/Mac) and auto-build any toolchains needed. (Which would be > > part of bisect in the cloud automatically.) I've been doing this > > locally lately and it is not a friendly process right now though. > > > > Of course on Windows it's an entirely different story. But one more > > reason to pursue clang-cl builds on Linux ;) > > > > -tom > > > > > > On Tue, May 15, 2018 at 12:53 PM, Randell Jesup > > wrote: > > >>On 5/11/18 7:06 PM, Gregory Szorc wrote: > > >>> Artifact retention and expiration boils down to a > > >>> trade-off between the cost of storage and the convenience of > accessing > > >>> something immediately (as opposed to waiting several dozen minutes to > > >>> populate the cache). > > >> > > >>Just to be clear, when doing a bisect, one _can_ just deal with local > > >>builds. But the point is that then it takes tens of minutes per build > as > > >>you point out. So a bisect task that might otherwise take 10-15 > minutes > > >>total (1 minute per downloaded build) ends up taking hours... > > > > > > Also (as others have pointed out) going too far back (often not that > > > far) may run you into tool differences that break re-building old revs. > > > Hopefully you don't get variable behavior, just a failure-to-build at > > > some point. I'm not sure how much Rust has made this worse. > > > > > > -- > > > Randell Jesup, Mozilla Corp > > > remove "news" for personal email > > > ___ > > > dev-platform mailing list > > > dev-platform@lists.mozilla.org > > > https://lists.mozilla.org/listinfo/dev-platform > > ___ > > dev-platform mailing list > > dev-platform@lists.mozilla.org > > https://lists.mozilla.org/listinfo/dev-platform > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
“approval required” for changes affecting CI infrastructure
To ensure a successful Firefox 57 release, teams responsible for Firefox CI & release infrastructure have adopted an “approval required” policy for changes that could impact Firefox development or release. This includes systems like buildbot, Taskcluster services, puppet, hg, product delivery, and in-tree changes that could impact task scheduling. If you have a change you’d like to land that impacts one of the above systems, or you think could impact the infrastructure, please let the firefox-ci@ list know. Most changes are fine to land, we just want to be aware of what’s changing in the overall system in the leadup to 57. If you don’t hear a response back in 24h, you can assume that your proposal is fine, and proceed to land it. Thanks in advance, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Firefox Nightly - now with twice as many builds a day!
Bug 1349227[1] landed a few days ago, which means we are now doing "nightly" builds twice a day at 1000 and 2200 UTC. The purpose of doing multiple nightlies is to get fixes out to users in Europe, Africa and Asia sooner. We have some concerns about possible impact to the build infrastructure, so for now we're keeping an eye on load. We may need to revert if this causes too much backlog. In the meanwhile, enjoy a more up-to-date Nightly more often! Please comment on the bug if there are other issues with doing multiple nightlies a day. Cheers, Chris [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1349227 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Nightly updates disabled for bug 1364059
Updates are enabled again for all platforms. Not all locales have finished yet, they will receive updates once the repacks finish. On 11 May 2017 at 09:30, Chris AtLee <cat...@mozilla.com> wrote: > We've disabled updates for a bad crash: https://bugzilla.mozilla.org/ > show_bug.cgi?id=1364059 > > We're working on backing out the offending patches and will re-spin > nightly builds shortly. > > Cheers, > Chris > > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Nightly updates disabled for bug 1364059
We've disabled updates for a bad crash: https://bugzilla.mozilla.org/show_bug.cgi?id=1364059 We're working on backing out the offending patches and will re-spin nightly builds shortly. Cheers, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Reminder - TCW tomorrow May 6th from 0500-1200 PT
As indicated on our status page: https://status.mozilla.org/incidents/cpnkqqb6b5kh We will be closing trees tomorrow from 0500-1200PT. Tracking bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1355897 Thank you for your patience ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Reproducible builds
Regarding timestamps in tarballs, using tar's --mtime option to force timestamps to MOZ_BUILD_DATE (or a derivative thereof) could work. On 19 July 2016 at 04:11, Kurt Roeckxwrote: > On 2016-07-18 20:56, Gregory Szorc wrote: > >> >> Then of course there is build signing, which takes a private key >> and cryptographically signs builds/installers. With these in play, there >> is >> no way for anybody not Mozilla to do a bit-for-bit reproduction of most >> (all?) of the Firefox distributions at >> https://www.mozilla.org/en-US/firefox/all/. >> > > There is at least a section about this here: > https://reproducible-builds.org/docs/embedded-signatures/ > > > Kurt > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Win8 tests disabled by default on Try
We've been having a lot of problems with capacity for our Windows test pools, with Windows 8 being particularly bad. Today we disabled running Windows 8 64-bit tests by default on Try. If you do really need Windows 8 tests for your try pushes, you can add try syntax like this to enable: "try: -b o -p win64 -u mochitests[Windows 8]" Treeherder's "Add new jobs" feature is also a great way to select additional tests for your try push. Please be mindful of our limited hardware capacity when choosing which tests you need. We do publish a report of top pushers to try: https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/highscores/highscores.html We have been working on migrating as many tests as possible to AWS. So far we have migrated many Windows 7 test suites over, but none of the Windows 8 suites have been migrated yet. Our plan is to focus instead on providing Windows 10 testing in AWS, and then disable the Win8 tests once those are ready. Cheers, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Windows 7 tests in AWS
I'm very happy to let you know that we've recently started running some of our Windows 7 tests in AWS. Currently we're running these suites in Amazon for all branches of gecko 49 and higher: * Web platform tests + reftests * gtest * cppunit * jittest * jsreftest * crashtest Since these are now working in AWS, it means we can scale up the number of machines with load. This should mean a big improvement in getting test results back for Windows 7! Work is being tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1271355. If you find any issues, please reach out in #releng, or file a bug and link it to the one above. Thanks in particular to jmaher and Q for helping to get this work done. Cheers, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: To bump mochitest's timeout from 45 seconds to 90 seconds
On 9 February 2016 at 14:51, Marco Bonardowrote: > On Tue, Feb 9, 2016 at 6:54 PM, Ryan VanderMeulen > wrote: > > > I'd have a much easier time accepting that argument if my experience > > didn't tell me that nearly every single "Test took longer than expected" > or > > "Test timed out" intermittent ends with a RequestLongerTimeout as the fix > > > this sounds equivalent to saying "Since we don't have enough resources (or > a plan) to investigate why some tests take so long, let's give up"... But > then maybe we should have that explicit discussion, rather than assuming > it's a truth. > Since we are focused on quality I don't think it's acceptable to say we are > fine if a test takes an unexpected amount of time to run. The fact those > bugs end up being resolved by bumping the timeout without any kind of > investigation (and it happens, I know) is worrisome. > I agree. However, this has traditionally been a very difficult area for Release Engineering and Engineering Productivity to make progress in. Who can we work with to understand these timing characteristics in more depth? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using the Taskcluster index to find builds
In this case, latest is just latest from wherever. I agree that l10n nightlies should be under 'nightly' as well. On Wed, Dec 2, 2015 at 3:04 PM, Axel Hecht <l...@mozilla.com> wrote: > On 12/1/15 3:48 PM, Chris AtLee wrote: > >> Localized builds should be at e.g. >> gecko.v2.mozilla-central.latest.firefox-l10n.win32-opt >> >> And yes, once we've got the naming structure nailed down, wget-en-US >> should >> change to use the index. >> > > I would expect l10n nightlies to be under nightly? > > How does one distinguish nightlies from non-nightlies under > mozilla-central.latest? Assuming that nightlies might end up there on > occasion? > > Axel > > >> On Tue, Dec 1, 2015 at 5:22 AM, Axel Hecht <l...@mozilla.com> wrote: >> >> I haven't found localized builds and their assets by glancing at things. >>> Are those to come? >>> >>> Also, I suspect we should rewrite wget-en-US? Or add an alternative >>> that's >>> index-bound? >>> >>> Axel >>> >>> On 11/30/15 9:43 PM, Chris AtLee wrote: >>> >>> The RelEng, Cloud Services and Taskcluster teams have been doing a lot of >>>> work behind the scenes over the past few months to migrate the backend >>>> storage for builds from the old "FTP" host to S3. While we've tried to >>>> make >>>> this as seamless as possible, the new system is not a 100% drop-in >>>> replacement for the old system, resulting in some confusion about where >>>> to >>>> find certain types of builds. >>>> >>>> At the same time, we've been working on publishing builds to the >>>> Taskcluster Index [1]. This service provides a way to find a build given >>>> various different attributes, such as its revision or date it was built. >>>> Our plan is to make the index be the primary mechanism for discovering >>>> build artifacts. As part of the ongoing buildbot to Taskcluster >>>> migration >>>> project, builds happening on Taskcluster will no longer upload to >>>> https://archive.mozilla.org (aka https://ftp.mozilla.org). Once we shut >>>> off >>>> platforms in buildbot, the index will be the only mechanism for >>>> discovering >>>> new builds. >>>> >>>> I posted to planet Mozilla last week [2] with some more examples and >>>> details. Please explore the index, and ask questions about how to find >>>> what >>>> you're looking for! >>>> >>>> Cheers, >>>> Chris >>>> >>>> [1] http://docs.taskcluster.net/services/index/ >>>> [2] >>>> http://atlee.ca/blog/posts/firefox-builds-on-the-taskcluster-index.html >>>> >>>> >>>> ___ >>> dev-platform mailing list >>> dev-platform@lists.mozilla.org >>> https://lists.mozilla.org/listinfo/dev-platform >>> >>> > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Faster Windows builds everywhere!
A few weeks ago I posted about switching our Windows builds on Try over to EC2, resulting in a 30 minute speed improvement. Last week we made the same change to the rest of the Windows build infrastructure. All our Windows builds are now running in AWS. We're seeing good performance gains there too. On mozilla-inbound, we've reduced opt build times by at least 45 minutes, and nearly two hours (!!) off of our PGO build times. Big thanks again to Rob Thijssen (:grenade), Mark Cornmesser (:markco) and the rest of our Release Engineering and Operations team for getting this done. Please send your kudos and thanks to them on #releng, or in person at Orlando next week! Cheers, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Faster Windows builds everywhere!
Right now we've got debug OSX builds in the cloud on Try in parallel with the regular builds. There's a bunch more work to be done there to be able to switch over, but we're definitely making progress. All Windows / OSX unit tests are currently done on our own infra. Q Fortier is working on getting Windows unittests stood up in AWS, and the results are very promising. I'm not sure when we'll be able to switch over yet though. There are no obvious solutions for OSX test infrastructure other than maintaining our own racks of minis. Perf tests on all platforms will stay on hardware for now. Some people have done experiments on EC2 to see how talos performs, but I don't think we know enough about the impact of this to decide if we can move these off bare metal or not. We also run some tests for mobile on panda boards, but those are going away eventually. On Tue, Dec 1, 2015 at 4:52 PM, Justin Dolske <dol...@mozilla.com> wrote: > On 12/1/15 12:41 PM, Chris AtLee wrote: > > Last week we made the same change to the rest of the Windows build >> infrastructure. All our Windows builds are now running in AWS. We're >> seeing >> good performance gains there too. On mozilla-inbound, we've reduced opt >> build times by at least 45 minutes, and nearly two hours (!!) off of our >> PGO build times. >> > > Nice! > > What builds/tests have _not_ moved to the cloud? AIUI the two biggies are > OS X (can't move because OS licensing), and perf tests... How close are we > to transitioning everything else off MoCo metal? > > Justin > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Faster Windows builds everywhere!
On Tue, Dec 1, 2015 at 5:27 PM, Gregory Szorc <g...@mozilla.com> wrote: > > On Tue, Dec 1, 2015 at 2:21 PM, Chris AtLee <cat...@mozilla.com> wrote: > >> Right now we've got debug OSX builds in the cloud on Try in parallel with >> the regular builds. There's a bunch more work to be done there to be able >> to switch over, but we're definitely making progress. >> >> All Windows / OSX unit tests are currently done on our own infra. Q >> Fortier >> is working on getting Windows unittests stood up in AWS, and the results >> are very promising. I'm not sure when we'll be able to switch over yet >> though. There are no obvious solutions for OSX test infrastructure other >> than maintaining our own racks of minis. >> >> Perf tests on all platforms will stay on hardware for now. Some people >> have >> done experiments on EC2 to see how talos performs, but I don't think we >> know enough about the impact of this to decide if we can move these off >> bare metal or not. >> > > Amazon now supports dedicated instances, which means you fully control > what runs on the machine and other random tenants aren't fighting you for > CPU and I/O. Assuming the performance variance from other tenants is what > was preventing us from moving Talos to AWS, that blocker may no longer > exist. > I think you probably want Dedicated Hosts rather than dedicated instances, otherwise multiple of your own workloads could end up on the same physical box I think. https://aws.amazon.com/ec2/dedicated-hosts/ I have two main concerns with dedicated infra on AWS: * You're still running under a hypervisor of some kind; will it introduce too much noise into the results? Seems like a worthwhile experiment! * Dedicated host pricing is quite a bit more expensive than what we're paying now for test infrastructure. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using the Taskcluster index to find builds
One approach we've taken when considering changes to the routes used is to play in the 'garbage.' prefix. You can see the results of earlier experiments there: https://tools.taskcluster.net/index/#garbage/garbage Regarding your proposal, I find the word 'nightly' overloaded, and it needs more context to make sense. There are lots of 'nightly' builds, but only one 'nightly' channel (for Firefox anyway). So "gecko.v2.firefox.win64-opt.nightly.latest" isn't clear to me at first glance. I'm curious what others think though. On Tue, Dec 1, 2015 at 11:46 AM, Julien Wajsberg <jwajsb...@mozilla.com> wrote: > hi, > > Because we have an index, it's now very easy to add new routes. I think > it would be a lot more user-friendly to have an index that starts with > the product name ("firefox" for example). > > For example: "gecko.v2.firefox.win64-opt.nightly.latest" instead of > "gecko.v2.mozilla-central.nightly.latest.firefox.win64-opt > < > https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-central.nightly.latest.firefox/gecko.v2.mozilla-central.nightly.latest.firefox.win64-opt > >". > Going from the most general to the most specific. > Using an index that starts with "mozilla-central" is really technical > and does not make things easy to find :/ Even if it can be good for > automated tools. > > Fortunately now it's not either one or the other, we can have both. But > before filing a bug, I'd like to know if the general population thinks > it's a good idea. > > -- > Julien > > Le 30/11/2015 21:43, Chris AtLee a écrit : > > The RelEng, Cloud Services and Taskcluster teams have been doing a lot of > > work behind the scenes over the past few months to migrate the backend > > storage for builds from the old "FTP" host to S3. While we've tried to > make > > this as seamless as possible, the new system is not a 100% drop-in > > replacement for the old system, resulting in some confusion about where > to > > find certain types of builds. > > > > At the same time, we've been working on publishing builds to the > > Taskcluster Index [1]. This service provides a way to find a build given > > various different attributes, such as its revision or date it was built. > > Our plan is to make the index be the primary mechanism for discovering > > build artifacts. As part of the ongoing buildbot to Taskcluster migration > > project, builds happening on Taskcluster will no longer upload to > > https://archive.mozilla.org (aka https://ftp.mozilla.org). Once we shut > off > > platforms in buildbot, the index will be the only mechanism for > discovering > > new builds. > > > > I posted to planet Mozilla last week [2] with some more examples and > > details. Please explore the index, and ask questions about how to find > what > > you're looking for! > > > > Cheers, > > Chris > > > > [1] http://docs.taskcluster.net/services/index/ > > [2] > http://atlee.ca/blog/posts/firefox-builds-on-the-taskcluster-index.html > > ___ > > dev-platform mailing list > > dev-platform@lists.mozilla.org > > https://lists.mozilla.org/listinfo/dev-platform > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using the Taskcluster index to find builds
Localized builds should be at e.g. gecko.v2.mozilla-central.latest.firefox-l10n.win32-opt And yes, once we've got the naming structure nailed down, wget-en-US should change to use the index. On Tue, Dec 1, 2015 at 5:22 AM, Axel Hecht <l...@mozilla.com> wrote: > I haven't found localized builds and their assets by glancing at things. > Are those to come? > > Also, I suspect we should rewrite wget-en-US? Or add an alternative that's > index-bound? > > Axel > > On 11/30/15 9:43 PM, Chris AtLee wrote: > >> The RelEng, Cloud Services and Taskcluster teams have been doing a lot of >> work behind the scenes over the past few months to migrate the backend >> storage for builds from the old "FTP" host to S3. While we've tried to >> make >> this as seamless as possible, the new system is not a 100% drop-in >> replacement for the old system, resulting in some confusion about where to >> find certain types of builds. >> >> At the same time, we've been working on publishing builds to the >> Taskcluster Index [1]. This service provides a way to find a build given >> various different attributes, such as its revision or date it was built. >> Our plan is to make the index be the primary mechanism for discovering >> build artifacts. As part of the ongoing buildbot to Taskcluster migration >> project, builds happening on Taskcluster will no longer upload to >> https://archive.mozilla.org (aka https://ftp.mozilla.org). Once we shut >> off >> platforms in buildbot, the index will be the only mechanism for >> discovering >> new builds. >> >> I posted to planet Mozilla last week [2] with some more examples and >> details. Please explore the index, and ask questions about how to find >> what >> you're looking for! >> >> Cheers, >> Chris >> >> [1] http://docs.taskcluster.net/services/index/ >> [2] >> http://atlee.ca/blog/posts/firefox-builds-on-the-taskcluster-index.html >> >> > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using the Taskcluster index to find builds
The expiration is currently set to one year, but we can (and should!) change that for nightlies. That work is being tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1145300 On Mon, Nov 30, 2015 at 7:00 PM, Ryan VanderMeulen <rya...@gmail.com> wrote: > On 11/30/2015 3:43 PM, Chris AtLee wrote: > >> The RelEng, Cloud Services and Taskcluster teams have been doing a lot of >> work behind the scenes over the past few months to migrate the backend >> storage for builds from the old "FTP" host to S3. While we've tried to >> make >> this as seamless as possible, the new system is not a 100% drop-in >> replacement for the old system, resulting in some confusion about where to >> find certain types of builds. >> >> At the same time, we've been working on publishing builds to the >> Taskcluster Index [1]. This service provides a way to find a build given >> various different attributes, such as its revision or date it was built. >> Our plan is to make the index be the primary mechanism for discovering >> build artifacts. As part of the ongoing buildbot to Taskcluster migration >> project, builds happening on Taskcluster will no longer upload to >> https://archive.mozilla.org (aka https://ftp.mozilla.org). Once we shut >> off >> platforms in buildbot, the index will be the only mechanism for >> discovering >> new builds. >> >> I posted to planet Mozilla last week [2] with some more examples and >> details. Please explore the index, and ask questions about how to find >> what >> you're looking for! >> >> Cheers, >> Chris >> >> [1] http://docs.taskcluster.net/services/index/ >> [2] >> http://atlee.ca/blog/posts/firefox-builds-on-the-taskcluster-index.html >> >> If I understand correctly, Taskcluster builds are only archived for one > year, whereas we have nightly archives going back 10+ years now. What are > our options for long-term archiving in this setup? > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Using the Taskcluster index to find builds
You're right in that we can't change the expiration after the fact, but we can copy all of the artifacts to new tasks with the longer expiration. On Tue, Dec 1, 2015 at 9:53 AM, Ryan VanderMeulen <rya...@gmail.com> wrote: > What does that mean for jobs that have already run? My understanding is > that we can't change the expiration after the fact for them? Though I guess > that it's not an issue as long as we fix bug 1145300 prior to shutting off > publishing to archive.m.o? > > I just want to avoid any gaps in nightly build coverage as the archived > builds are critical for regression hunting. > > On 12/1/2015 9:49 AM, Chris AtLee wrote: > >> The expiration is currently set to one year, but we can (and should!) >> change that for nightlies. That work is being tracked in >> https://bugzilla.mozilla.org/show_bug.cgi?id=1145300 >> >> On Mon, Nov 30, 2015 at 7:00 PM, Ryan VanderMeulen <rya...@gmail.com> >> wrote: >> >> On 11/30/2015 3:43 PM, Chris AtLee wrote: >>> >>> The RelEng, Cloud Services and Taskcluster teams have been doing a lot of >>>> work behind the scenes over the past few months to migrate the backend >>>> storage for builds from the old "FTP" host to S3. While we've tried to >>>> make >>>> this as seamless as possible, the new system is not a 100% drop-in >>>> replacement for the old system, resulting in some confusion about where >>>> to >>>> find certain types of builds. >>>> >>>> At the same time, we've been working on publishing builds to the >>>> Taskcluster Index [1]. This service provides a way to find a build given >>>> various different attributes, such as its revision or date it was built. >>>> Our plan is to make the index be the primary mechanism for discovering >>>> build artifacts. As part of the ongoing buildbot to Taskcluster >>>> migration >>>> project, builds happening on Taskcluster will no longer upload to >>>> https://archive.mozilla.org (aka https://ftp.mozilla.org). Once we shut >>>> off >>>> platforms in buildbot, the index will be the only mechanism for >>>> discovering >>>> new builds. >>>> >>>> I posted to planet Mozilla last week [2] with some more examples and >>>> details. Please explore the index, and ask questions about how to find >>>> what >>>> you're looking for! >>>> >>>> Cheers, >>>> Chris >>>> >>>> [1] http://docs.taskcluster.net/services/index/ >>>> [2] >>>> http://atlee.ca/blog/posts/firefox-builds-on-the-taskcluster-index.html >>>> >>>> If I understand correctly, Taskcluster builds are only archived for one >>>> >>> year, whereas we have nightly archives going back 10+ years now. What are >>> our options for long-term archiving in this setup? >>> >>> ___ >>> dev-platform mailing list >>> dev-platform@lists.mozilla.org >>> https://lists.mozilla.org/listinfo/dev-platform >>> >>> > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Using the Taskcluster index to find builds
The RelEng, Cloud Services and Taskcluster teams have been doing a lot of work behind the scenes over the past few months to migrate the backend storage for builds from the old "FTP" host to S3. While we've tried to make this as seamless as possible, the new system is not a 100% drop-in replacement for the old system, resulting in some confusion about where to find certain types of builds. At the same time, we've been working on publishing builds to the Taskcluster Index [1]. This service provides a way to find a build given various different attributes, such as its revision or date it was built. Our plan is to make the index be the primary mechanism for discovering build artifacts. As part of the ongoing buildbot to Taskcluster migration project, builds happening on Taskcluster will no longer upload to https://archive.mozilla.org (aka https://ftp.mozilla.org). Once we shut off platforms in buildbot, the index will be the only mechanism for discovering new builds. I posted to planet Mozilla last week [2] with some more examples and details. Please explore the index, and ask questions about how to find what you're looking for! Cheers, Chris [1] http://docs.taskcluster.net/services/index/ [2] http://atlee.ca/blog/posts/firefox-builds-on-the-taskcluster-index.html ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Faster Windows builds on Try
Over the past months we've been working on migrating our Windows builds from the legacy hardware machines into Amazon. I'm very happy to announce that we've wrapped up the initial work here, and all our Windows builds on Try are now happening in Amazon. The biggest win from this is that our Windows builds are now nearly 30 minutes faster than they used to be. As of today, windows builds on try generally take around 50 minutes to complete, down from 1h20 before. Our next step is to migrate the non-try builds onto Amazon as well. Big thanks to Rob, Mark, and the rest of our Release Engineering and Operations team for making this possible! Cheers, Chris https://bugzilla.mozilla.org/show_bug.cgi?id=1199267 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Now measuring Firefox size per-commit. What else should we be tracking?
On Mon, Nov 9, 2015 at 6:39 PM, William Lachancewrote: > On 2015-11-06 5:56 PM, Mark Finkle wrote: > >> I also think measuring build times, and other build related stats, would >> be >> useful. I'd like to see Mozilla capturing those stats for developer builds >> though. I'm less interested in build times for the automation. That data >> is >> already looked at by the automation team. >> > > Chris Manchester has volunteered to look into submitting build times in > automation to perfherder here: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1222549 > > I actually do think that perfherder has some advantages over the existing > grafana system, in that it has the capability of being sheriffed easily > down to the per-commit level by anyone (not just releng/ateam). > > We'll see I guess! The proof will be in bugs filed and fixed when > regressions occur. I really think developer build times are strongly > correlated with build times in automation, so my hope is that there will be > a trickle-down effect if this system proves useful. > Yes, I agree this is a better approach. Having the build system produce logs or artifacts than can be ingested by perfherder is a more flexible model than the one we're currently using. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Now measuring Firefox size per-commit. What else should we be tracking?
This is really great, thanks for adding support for this! I'd like to see the size of the complete updates measured as well, in addition to the installer sizes. Do we have alerts for these set up yet? Cheers, Chris On Wed, Nov 4, 2015 at 10:55 AM, William Lachancewrote: > Hey, so as described here: > > http://wrla.ch/blog/2015/11/perfherder-onward/ > > ... I recently added tracking for Firefox installer size inside > Perfherder. This should let us track how bloated (or not) Firefox is on our > various supported platforms, on a per-commit basis: > > > https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,4eb0cde5431ee9aeb5eb14512ddb3da6d4702cf0,1]=[mozilla-inbound,80cac7ef44b76864458627c574af1a18a425f338,1]=[mozilla-inbound,0060252bdfb7632df5877b7594b4d16f1b5ca4c9,1] > > As I mentioned in the blog post, it's now *very* easy (maybe too easy? > heh) to submit "performance" (read: quantitative) data for any job > reporting to treeherder by outputting a line called "PERFHERDER_DATA" to > the log. > > Is there anything we could be tracking as part of our build or test jobs > that we should be? Build times are one thing that immediately comes to > mind. Is there anything else? > > In order to be a good candidate for measurement in this kind of system, a > metric should be: > > 1. Relatively deterministic. > 2. Something people actually care about and are willing to act on, on a > per-commit basis. If you're only going to look at it once a quarter or so, > it doesn't need to be in Perfherder. > > Anyway, just thought I'd open the floor to brainstorming. I'd prefer to > add stuff incrementally, to make sure Perfherder can handle the load, but > I'd love to hear all your ideas. > > Will > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Partial updates temporarily disabled for Nightly and Dev-Edition
Partial updates should be functional again now. Sorry for the inconvenience! On Thu, Oct 29, 2015 at 4:49 PM, Chris AtLee <cat...@mozilla.com> wrote: > We've temporarily disabled generation of partial updates for Nightly and > Dev-Edition (Aurora) versions of Firefox. > > Given that Dev-Edition updates are currently frozen as part of our uplift > process, the main impact of this is on Nightly users. > > We hope to have partial update generation re-enabled in the next few days. > > Sorry for the inconvenience. > > Chris > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Partial updates temporarily disabled for Nightly and Dev-Edition
We've temporarily disabled generation of partial updates for Nightly and Dev-Edition (Aurora) versions of Firefox. Given that Dev-Edition updates are currently frozen as part of our uplift process, the main impact of this is on Nightly users. We hope to have partial update generation re-enabled in the next few days. Sorry for the inconvenience. Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Per-test chaos mode now available, use it to help win the war on orange!
Very interesting, thank you! Would there be a way to add an environment variable or harness flag to run all tests in chaos mode? On Thu, Jun 4, 2015 at 5:31 PM, Chris Peterson cpeter...@mozilla.com wrote: On 6/4/15 11:32 AM, kgu...@mozilla.com wrote: I just landed bug 1164218 on inbound, which adds the ability to run individual mochitests and reftests in chaos mode. (For those unfamiliar with chaos mode, it's a feature added by roc a while back that makes already-random things more random; see [1] or bug 955888 for details). The idea with making it available per-test is that new tests should be written and tested locally/on try with chaos mode enabled, to flush out possible intermittent failures faster. Ideally we should also land them with chaos mode enabled. At this time we're still not certain if this will provide a lot of value (i.e. if chaos-mode-triggered failures are representative of real bugs) so it's not mandatory to make your tests run in chaos mode, but please do let me know if you try enabling it on your test and are either successful or not. We need to collect more data on the usefulness of this to see where we should take it. If it does turn out to be valuable, my hope is that we can start making pre-existing tests chaos-mode enabled as well, and eventually reduce the intermittent failure rate. Will chaos mode enabled tests run on Try and release branches? We don't know if chaos mode test failures are representative of real bugs, but could chaos mode hide bugs that only reveal themselves when users run without chaos mode? See [2] for an example of how to enable chaos mode in your tests. Basically you can add chaos-mode to the reftest.list file for reftests, or call SimpleTest.testInChaosMode() for mochitests. If you do run into intermittent failures, the best way to debug them is usually to grab a recording of the failure using rr [3] and then debug the recording to see what was going on. This only works on Linux (and has some hardware requirements as well) but it's a really great tool to have. Cheers, kats [1] http://robert.ocallahan.org/2014/03/introducing-chaos-mode.html [2] https://hg.mozilla.org/integration/mozilla-inbound/rev/89ac61464a45 [3] http://rr-project.org/ or https://github.com/mozilla/rr/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: It is now possible to apply arbitrary tags to tests/manifests and run all tests with a given tag
Sounds great! I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1161282 for this. According to https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/highscores/highscores.html, we still have a ton of people using '-p all -u all' on try On Mon, May 4, 2015 at 5:12 PM, Gregory Szorc g...@mozilla.com wrote: Wait - you're telling me that it is now possible to limit try pushes but not just jobs but tests within jobs?! Stop the presses: this is huge! If used by the masses, this could drastically reduce try turnaround times and decrease automation load and costs. Could we encourage use of --tag by having the automation scheduler up-weight jobs that opt in to reduced load? On Thu, Apr 30, 2015 at 4:21 PM, Christopher Manchester chmanches...@gmail.com wrote: You can now add --tag arguments to try syntax and they will get passed to test harnesses in your try push. Details of the implementation are in bug 978846, but if you're interested in passing other arguments from try syntax to a test harness, this can be done by adding those arguments to testing/config/mozharness/try_arguments.py. Note this is still rather coarse in the sense that arguments are forwarded without regard for whether a harness supports a particular argument, but I can imagine it being useful in a number of cases (for instance, when testing the feature with xpcshell and --tag devtools, I was able to get feedback in about ten minutes whether things were working rather than waiting for every xpcshell test to run). Chris On Thu, Apr 2, 2015 at 2:22 PM, Andrew Halberstadt ahalberst...@mozilla.com wrote: Minor update. It was pointed out that other list-like manifestparser attributes (like head and support-files) are whitespace delimited instead of comma delimited. To be consistent I switched tags to whitespace delimitation as well. E.g both these forms are ok: [test_foo.html] tags = foo bar baz [test_bar.html] tags = foo bar baz -Andrew On 31/03/15 12:30 PM, Andrew Halberstadt wrote: As of bug 987360, you can now run all tests with a given tag for mochitest (and variants), xpcshell and marionette based harnesses. Tags can be applied to either individual tests, or the DEFAULT section in manifests. Tests can have multiple tags, in which case they should be comma delimited. To run all tests with a given tag, pass in --tag tag name to the mach command. For example, let's say we want to group all mochitest-plain tests related to canvas together. First we'd add a 'canvas' tag to the DEFAULT section in https://dxr.mozilla.org/mozilla-central/source/dom/canvas/test/mochitest.ini [DEFAULT] tags = canvas We notice there is also a canvas related test under dom/media, namely: https://dxr.mozilla.org/mozilla-central/source/dom/media/test/mochitest.ini#541 Let's pretend it is already tagged with the 'media' tag, but that's ok, we can add a second tag no problem: [test_video_to_canvas.html] tags = media,canvas Repeat above for any other tests or manifests scattered in the tree that are related to canvas. Now we can run all mochitest-plain tests with: ./mach mochitest-plain --tag canvas You can also run the union of two tags by specifying --tag more than once (though the intersection of two tags is not supported): ./mach mochitest-plain --tag canvas --tag media So far the xpcshell (./mach xpcshell-test --tag name) and marionette (./mach marionette-test --tag name) commands are also supported. Reftest is not supported as it has its own special manifest format. Applying tags to tests will not affect automation or other people's tags. So each organization or team should feel free to use tags in whatever creative ways they see fit. Eventually, we'll start using tags as a foundation for some more advanced features and analysis. For example, we may implement a way to run all tests with a given tag across multiple different suites. If you have any questions or things aren't working, please let me know! Cheers, Andrew ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 17:26, Tue, 23 Sep, Kyle Huey wrote: On Tue, Aug 26, 2014 at 8:23 AM, Chris AtLee cat...@mozilla.com wrote: Just a short note to say that this experiment is now live on mozilla-inbound. ___ dev-tree-management mailing list dev-tree-managem...@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tree-management What was the outcome? Thanks for the reminder. The outcome of this experiment was inconclusive. On the one hand, we know we didn't make anything worse. The skipping behaved as expected, and wasn't a burden on sheriffs. We didn't make wait times any worse. On the other hand, it appears as though we improved wait times for the target platforms, but the signal there isn't clear due to other variables changing (e.g. overall load wasn't directly comparable between the two time windows). We've left the skipping behaviour enabled for the moment, and are considering some tweaks to the amount of skipping that happens, and which branches/platforms it's enabled for. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
Just a short note to say that this experiment is now live on mozilla-inbound. signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 17:37, Wed, 20 Aug, Jonas Sicking wrote: On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert jgilb...@mozilla.com wrote: I have been asked in the past if we really need to run WebGL tests on Android, if they have coverage on Desktop platforms. And then again later, why B2G if we have Android. There seems to be enough belief in test-once-run-everywhere that I feel the need to *firmly* establish that this is not acceptable, at least for the code I work with. I'm happy I'm not alone in this. I'm a firm believer that we ultimately need to run basically all combinations of tests and platforms before allowing code to reach mozilla-central. There's lots of platform specific code paths, and it's hard to track which tests trigger them, and which don't. I think we can agree on this. However, not running all tests on all platforms per push on mozilla-inbound (or other branch) doesn't mean that they won't be run on mozilla-central, or even on mozilla-inbound prior to merging. I'm a firm believer that running all tests for all platforms for all pushes is a waste of our infrastructure and human resources. I think the gap we need to figure out how to fill is between getting per-push efficiency and full test coverage prior to merging. It would however be really cool if we were able to pull data on which tests tend to fail in a way that affects all platforms, and which ones tend to fail on one platform only. If we combine this with the ability of having tbpl (or treeherder) fill in the blanks whenever a test fails, it seems like we could run many of our tests only one one platform for most checkins to mozilla-inbound. There are dozens of really interesting approaches we could take here. Skipping every nth debug test run is one of the simplest, and I hope we can learn a lot from the experiment. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote: On 2014-08-19, 5:49 PM, Jonathan Griffin wrote: On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. They take a long time and then break is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. I second this. I'm curious to know why you picked debug tests for this experiment. Would it not make more sense to run opt tests on desktop platforms on every other run? Just based on the fact that they take longer and thus running them less frequently would have a larger impact. If there's a broad consensus that debug runs are more valuable, we could switch to running opt tests less frequently instead. Yep, the debug tests indeed take more time, mostly because they run more checks. :-) The checks in opt builds are not exactly a subset of the ones in debug builds, but they are close. Based on that, I think running opt tests on every other push is a more conservative one, and I support it more. That being said, for this one week limited trial, given that the sheriffs will help backfill the skipped tests, I don't care very strongly about this, as long as it doesn't set the precedence that we can ignore debug tests! I'd like to highlight that we're still planning on running debug linux64 tests for every build. This is based on the assumption that debug-specific failures are generally cross-platform failures as well. Does this help alleviate some concern? Or is that assumption just plain wrong? Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Always brace your ifs
On 17:37, Sat, 22 Feb, L. David Baron wrote: On Saturday 2014-02-22 15:57 -0800, Gregory Szorc wrote: On Feb 22, 2014, at 8:18, Kyle Huey m...@kylehuey.com wrote: If you needed another reason to follow the style guide: https://www.imperialviolet.org/2014/02/22/applebug.html Code coverage would have caught this as well. The time investment into 100% line and branch coverage is debatable. But you can't argue that code coverage has its place, especially for high-importance code such as crypto. AFAIK, our automation currently does not collect code coverage from any test suite. Should that change? There was some automation running code coverage reports (with gcov, I think) for at least reftests + mochitests for an extended period of time; I found it useful for improving style system test coverage while it was running. I'm not sure how strong our commitment is to keeping such tests running, though; I frequently have to defend the tests against people who want to disable them because they take a long time (i.e., a long time for a single test file, which sometimes leads the tests to approach the per-file timeouts on slow VMs) or because they happen to exhibit the latest JIT crash frequently because they run a lot of code. I'm worried we're moving to a model where tests need to have active defenders to keep them running (even though that isn't how features on the Web platform work), because we blame the old test rather than the new regression. Tests need owners just like any other piece of code. One of the big problems with the previous code coverage reports was that they were failing to run, and nobody was stepping up to fix them. When we're resource constrained, it's a big waste of resources to run things that are broken and nobody is working on fixing. Slow tests are a slightly different issue. If you're adding a test that takes 60s to run, you're saying you think it's important enough to make all other developers wait another minute for their test runs to complete locally. You're saying it's worthwhile to spend an extra minute per push per platform, to delay all future landings and merges by an extra minute per push. At ~249 pushes per day and at least 18 test platforms, you're adding at least 74 hours of additional machine time per day. I know the cost of *not* testing code is even higher than this! It's even more expensive and painful to track down regressions after the fact. This doesn't mean we can't put more effort into writing efficient tests. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Non-unified builds now running periodically on all trees
Starting today [1], you'll see a new symbol on TBPL: Bn. These are builds running with unified sources disabled. We're now running these periodically on 64-bit linux (opt and debug) on all trees on the same cadence as the PGO builds. The purpose of these builds is to catch build problems that are masked by unifying the source files. By doing regular builds with unified sources disabled we'll have a smaller regression window to help pinpoint the changes which broke the non-unified configuration. Once we shake out all the issues with the linux64 non-unified builds, we'll look at enabling other platforms. Testing non-unified builds on Try is simple - just ensure your mozconfig has 'ac_add_options --disable-unified-compilation' in it. For more details, please see bug 942167 [2]. Cheers, Chris [1] Once the new tbpl code is deployed - https://bugzilla.mozilla.org/show_bug.cgi?id=960173 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=942167 signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Thinking about the merge with unified build
On 18:23, Mon, 02 Dec, Ehsan Akhgari wrote: As for identifying broken non-unified builds, can we configure one of our mozilla-inbound platforms to be non-unified (like 32-bit Linux Debug)? I think the answer to that question depends on how soon bug 942167 can be fixed. Chris, any ideas? We're trying to figure out the best way to implement it. It'll be a week or so at least. signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Pushes to Backouts on Mozilla Inbound
On 15:10, Tue, 05 Nov, James Graham wrote: On 05/11/13 14:57, Kyle Huey wrote: On Tue, Nov 5, 2013 at 10:44 PM, David Burns dbu...@mozilla.com wrote: We appear to be doing 1 backout for every 15 pushes on a rough average[4]. This number I am sure you can all agree is far too high especially if we think about the figures that John O'Duinn suggests[5] for the cost of each push for running and testing. With the offending patch + backout we are using 508 computing hours for essentially doing no changes to the tree and then we do another 254 computing hours for the fixed reland. Note the that the 508 hours doesn't include retriggers done by the Sheriffs to see if it is intermittent or not. This is a lot of wasted effort when we should be striving to get patches to stick first time. Let's see if we can try make this figure 1 in 30 patches getting backed out. What is your proposal for doing that? What are the costs involved? It isn't very useful to say X is bad, let's not do X, without looking at what it costs to not do X. To give one hypothetical example, if it requires just two additional full try pushes to avoid one backout, we haven't actually saved any computing time. So, as far as I can tell that the heart of the problem is that the end-to-end time for the build+test infrastructure is unworkably slow. I understand that waiting half a dozen hours — a significant fraction of a work day — for a try run is considered normal. This has a huge knock-on effect e.g. it requires people to context switch away from one problem whilst they wait, and context switch back into it once they have the results. Presumably it also encourages landing changes without proper testing, which increases the backout rate. It seems that this will cost a great deal not just in terms of compute hours (which are easy to measure) but also in terms of developer productivity (which is harder to measure, but could be even more significant). Wht data do we currently have about why the wait time is so long? If this data doesn't exist, can we start to collect it? Are there easy wins to be had, or do we need to think about restructuring the way that we do builds and/or testing to achieve greater throughput? We're publishing data in several places about total run time for jobs. For overall build metrics, you can try http://brasstacks.mozilla.com/gofaster/ For specific revisions you can query self-serve, e.g. https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/5ff9d60c6803, or in json https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/5ff9d60c6803?format=json For historical data, you can look at all our archived build data here: http://builddata.pub.build.mozilla.org/buildjson/ Average times for builds/tests on m-c are published here: https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/branch_times/output.txt end-to-end times for try are here: https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/end2end_try/end2end.html I hope this helps! Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Shutting off leak tests?
Hi! Leak tests on OSX have been failing intermittently for nearly a year now[1]. As yet, we don't have any ideas why they're failing, and nobody is working on fixing them. Would anybody be very sad if we shut them off? Are these tests providing useful information any more? If they are still important to run, can we get some help fixing them? Cheers, Chris [1] https://bugzilla.mozilla.org/show_bug.cgi?id=774844 signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposal for an inbound2 branch
On 02:54, Tue, 30 Apr, Justin Lebar wrote: Is there sanity to this proposal or am I still crazy? If we had a lot more project branches, wouldn't that increase the load on infra dramatically, because we'd have less coalescing? Yes, it would decrease coalescing. I wonder how many tree closures and backouts we'd have though? It seems like a tree used by a smaller, more focused group of people could cope better with leaving some orange on the tree for short periods of time. Instead of backing out suspect revisions, and closing the tree to wait for the results of the backout to come back, could the tree remain open to landings while the test failures are being investigated. I think this is easier to coordinate with a smaller group of people, and with a slower check-in cadence. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 14:29, Fri, 26 Apr, Gregory Szorc wrote: On 4/26/2013 2:06 PM, Kartikaya Gupta wrote: On 13-04-26 11:37 , Phil Ringnalda wrote: Unfortunately, engineering is totally indifferent to things like having doubled the cycle time for Win debug browser-chrome since last November. Is there a bug filed for this? I just cranked some of the build.json files through some scripts and got the average time (in seconds) for all the jobs run on the mozilla-central_xp-debug_test-mochitest-browser-chrome builders, and there is in fact a significant increase since November. This makes me think that we need a resource usage regression alarm of some sort too. builds-2012-11-01.js: 4063 builds-2012-11-15.js: 4785 builds-2012-12-01.js: 5311 builds-2012-12-15.js: 5563 builds-2013-01-01.js: 6326 builds-2013-01-15.js: 5706 builds-2013-02-01.js: 5823 builds-2013-02-15.js: 6103 builds-2013-03-01.js: 5642 builds-2013-03-15.js: 5187 builds-2013-04-01.js: 5643 builds-2013-04-15.js: 6207 Well, wall time will [likely] increase as we write new tests. I'm guessing (OK, really hoping) the number of mochitest files has increased in rough proportion to the wall time? Also, aren't we executing some tests on virtual machines now? On any virtual machine (and especially on EC2), you don't know what else is happening on the physical machine, so CPU and I/O steal are expected to cause variations and slowness in execution time. Those tests are still on exactly the same hardware. philor points out in https://bugzilla.mozilla.org/show_bug.cgi?id=864085#c0 that the time increase is disproportionate for win7. It would be interesting to look at all the other suites too. Perhaps a regular report of how much our wall-clock times for builds and different test suite has changed week-over-week would be useful? That aside, how do we cope with an ever-increasing runtime requirement of tests? Keep adding more chunks? signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 16:34, Tue, 23 Apr, Gervase Markham wrote: On 23/04/13 10:17, Ed Morley wrote: Given that local machine time scales linearly with the rate at which we hire devs (unlike our automation capacity), I think we need to work out why (some) people aren't doing things like compiling locally and running their team's directory of tests before pushing. I would hazard a guess that if we improved incremental build times created mach commands to simplify the edit-compile-test loop, then we could cut out many of these obvious inbound bustage cases. That would be the carrot. The stick would be finding some way of finding out whether a changeset was pushed to try before it was pushed to m-i. If a developer failed to push to try and then broke m-i, we could (in a pre-commit hook) refuse to let them commit to m-i in future unless they'd already pushed to try. For a week, on first offence, a month on subsequent offences :-) This, of course, is predicated on being able to detect in real time whether a changeset being pushed to m-i has previously been pushed to try. We've considered enforcing this using some cryptographic token. After you push to try and get good results, the system gives you a token you need to include in your commit to m-i. Alternatively, you could indicate the try revision you pushed, and we could look up the results and refuse the commit based on your build/tests results on try, or if you commit to m-i is too different than the push to try. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New backout policy for Ts regressions on mozilla-inbound
On 18/10/12 06:44 PM, Justin Lebar wrote: Do we still have the bug where a test that finishes first, but is from a later cset (say a later cset IMPROVES Ts by 4% or more) would make us think we regressed it on an earlier cset if that earlier talos run finishes later? Such that we set graph points by the time the test finished, not time the push was, etc. https://bugzilla.mozilla.org/show_bug.cgi?id=688534 That applies to the rendering of the graphs on graphs.m.o only. The regression detection uses the push time to order the results. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: try: -p all considered harmful?
On 30/09/12 03:43 AM, Justin Lebar wrote: We're all trying to build the best system we can here. We've been publishing as much raw data as we can, as well as reports like wait time data for ages. We're not trying to hide this stuff away. I understand. My point is just that the data we currently have isn't what we actually want to measure. Wait times for individual parts of a try push don't tell the whole story. If Linux-64 wait times go down, what fraction of people get their full try results faster? (That is, how often is Linux-64 on the critical path for a try push?) I honestly don't know. [snip] I hope we all agree that by this metric, we're currently failing. The current infrastructure does not meet demand. (Indeed, demand is actually higher than the jobs we're currently running, because we would very much like to disable coalescing on m-i, but we can't do that for lack of capacity.) All I'm saying is that we currently don't have the right public data to determine, after X amount of time has passed, whether we've made any progress in this respect. I just want to highlight that the data _is_ available publicly through several different mechanisms. Raw build data going back to October 2009 is available here: http://builddata.pub.build.mozilla.org/buildjson/ In addition, per-push information is available via self-serve, e.g. https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/ae6f597c4a09 https://secure.pub.build.mozilla.org/buildapi/self-serve/try/rev/ae6f597c4a09?format=json The Try High Scores data is generated by pulling hg pushlog, and then looking up each push in self-serve. I'm not relying on private data. You shouldn't feel blocked by RelEng to get the data that you want. I'd most likely use these same APIs to look at end-to-end time. Cheers, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform