Canonical's Ubuntu release engineering team, plus a couple of hangers-on like myself, held a sprint last week in London. It's been a long time since many of us have been in the same place, and it was tremendously useful. The essence of this kind of infrastructure work is normally that the less you notice it the better a job we're doing; but we touched on quite a few interesting topics, so here are some notes of what we did.
== Attendees == * Adam Conrad * Colin Watson * Steve Kowalik * Stéphane Graber * Tim Chavez * Ursula Junque * William Grant == Image build pipeline == We reviewed the pipeline from developer upload to built images with an eye to finding and fixing inefficiencies, particularly in the Ubuntu Touch images (which have a final system-image phase). Our assessment at the start of the sprint was that the base overhead was a little under two hours, but with several sources of occasional extra latency. A noticeable amount of time here will be improved by pending hardware upgrades. Adam spent some sprint time working on the installation of new Calxeda systems; we don't know how much those will shave off the livefs build phase (currently 52m or so) but it wouldn't be a surprise if they removed 20m or so, and the current Panda boards occasionally corrupt data which causes extra delays while people debug them. We've also requested a dedicated system for offloaded archive administration jobs such as proposed-migration, which should make several processes more predictable. On the upload and publication side, we made upload processing run every minute rather than every five minutes; Adam worked on source package caching in apt-ftparchive, which I think he's now handed off to Marc Deslauriers; and Ursula moved translations processing in the archive publisher out to an asynchronous job, eliminating a source of publication latency which has been known to cause occasional multi-hour delays in the past. In the system-image phase, Stéphane is working on converting the compression step to pxz, which will save about 15m. I made a first stab at documenting the proposed-migration workflow (https://wiki.ubuntu.com/ProposedMigration). We identified some other potential savings which we haven't yet had a chance to work on: * Push-trigger proposed-migration, with a 15-minute fail-safe * Review notifications of proposed-migration/autopkgtest failures * Selective base system caching in live builds (could save about 5m) == Live filesystem builds in Launchpad == This is really part of the image build pipeline too, but it's been on our backlog for a long time and is an interesting project in its own right. The general plan is that, instead of having a separate set of machines dedicated to building live filesystems - typically only one per architecture, and if more then the scheduling is manual and cumbersome - we should have live filesystems be a new type of build job in Launchpad, thus simultaneously giving us much more flexibility for building live filesystems (especially around release time when we want to do lots of work in parallel) and giving us more package build resources during the majority of the time when no live filesystems are being built. So far so good, but we took advantage of having almost everyone who knows anything about our build daemon infrastructure in one room to nail down a lot of the details and get moving on the implementation. We identified build cancellation on non-virtualised builders as a prerequisite (so that we don't end up in a situation where we can't do live filesystem builds because all the builders are occupied in parallel by long-running package builds). William, Adam, Steve, and I spent some time sorting out the detailed design for that, and I got nearly all the code written on both the slave and master sides; this should be ready to land in the next week or two. Meanwhile, Adam wrote a good part of the slave side of live filesystem builds, and William wrote most of the master side. I don't have an ETA yet, but I hope that won't be too much longer either. == Maintenance work == Adam handled the release engineering for 13.04 alpha 2, worked on preparing 12.04.3, and fixed a few miscellaneous launchpad-buildd bugs. Ursula worked on generating an inter-image changelog in cdimage, similar to that currently available in ubuntu-touch-preview builds. This involved some work by Stéphane and I (still in progress) on the layout of changelogs.ubuntu.com so that the cdimage code can fetch changelogs reasonably efficiently. Ursula also started work on figuring out the bugs that cause us to occasionally lose binary publications when multiple override operations happen in a single publication window. Steve fixed an OOPS in DistroSeries:+queue (https://bugs.launchpad.net/bugs/941926), worked on infrastructure for keeping an audit trail of various Launchpad operations, and worked on making package diff generation more responsive (https://bugs.launchpad.net/bugs/1170120). I fixed https://bugs.launchpad.net/bugs/1205407, which has been plaguing our build farm with hung builds for a few months. William discussed with IS the SAN upgrade plan to resolve ongoing librarian space issues. (Among other things, this is blocking improved handling of ddebs.) Andy Whitcroft visited us one day and worked with Adam on some refactoring of kernel packaging. == Other discussions == We talked with Tim about divergence in the PES build apparatus, and made some preliminary plans towards consolidation. Adam and William went through our local sbuild changes and confirmed that the plan to upgrade away from a fork of a nine-year-old version of sbuild is still valid (among other things, this blocks some improvements to backports). William and I thrashed out the remaining points of dispute on how to handle the development series alias with respect to PPAs (https://bugs.launchpad.net/bugs/1198279). -- Colin Watson [[email protected]] -- ubuntu-devel mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
