Dear Qubes Community,

A new article has just been published on the Qubes website:

"Reproducible builds for Debian: a big step forward" by Frédéric Pierret
https://www.qubes-os.org/news/2021/10/08/reproducible-builds-for-debian-a-big-step-forward/

For your convenience, the original Markdown text is reproduced below.

========================================================================

---
layout: post
title: "Reproducible builds for Debian: a big step forward"
categories: articles
author: Frédéric Pierret
---

_This is the second article in the "reproducible builds" series.
Previously: [Improvements in testing and building: GitLab CI and reproducible builds](https://www.qubes-os.org/news/2021/02/28/improvements-in-testing-and-building/)._

In the previous article, [Improvements in testing and building: GitLab CI and reproducible builds](https://www.qubes-os.org/news/2021/02/28/improvements-in-testing-and-building/#reproducible-builds), we discussed reproducible builds and our current short-term goals for them in Qubes OS. Notably, we aimed to start by building our Debian templates such that packages can be installed only when configured rebuilders confirm that they really came from the source code we publish. Today, we go beyond this expectation.

Reproducible builds: retrieve the past
--------------------------------------

The challenge in reproducible builds lies in rebuilding a package in the same environment in which it was officially published. This means that we need to retrieve every single package version that was used as dependency to rebuild a given package. For Debian, some packages in the current release were built several releases in the past but not necessarily with the exact same dependencies. In order to retrieve them, there is only one solution: a Debian service called `snapshot.debian.org`, which is an archive acting as a [Wayback Machine](https://web.archive.org/) that allows access to old packages based on dates and version numbers. It contains all past and present packages that the Debian archive provides. Unfortunately, this service is known to suffer significant blocking issues on usability. For example, watch the DebConf 2021 talk [Making use of snapshot.debian.org for fun and profit](https://debconf21.debconf.org/talks/22-making-use-of-snapshotdebianorg-for-fun-and-profit/) and have a look at some related Debian issues like [#977653](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%977653), [#960304](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%960304), [#969906](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%969906), [#969603](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%969603), and [#782857](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=%782857). To summarize: There are throttling limits and availability issues such as repeatedly cutting off connections, returning partial content, etc. As announced in our previous article, we developed our own rebuilder tool, [debrebuild](https://github.com/fepitre/debrebuild), which is able to rebuild a single Debian package together with a rebuilder orchestrator [PackageRebuilder](https://github.com/fepitre/package-rebuilder). We started to put it in production in order to actively rebuild Qubes OS and Debian packages, but it quickly ceased to function, as the `snapshot.debian.org` service was unable to sustain the load of rebuilding even a single Debian package. That said, the question was: How should we proceed in order to make it work? Clearly, those issues are critical and make the `snapshot.debian.org` service awful or useless for reproducible builds.

Is rebuilding Debian really possible?
-------------------------------------

The `snapshot.debian.org` issues have still not been addressed even after several years. The service has existed for more than a decade, yet it still suffers from the aforementioned limitations. It's either a design problem or a lack of resources, but we still had to do something.

That's why we decided to create our own [snapshot](https://github.com/fepitre/debian-snapshot) service. Easy to say, but not to do. First, the original snapshot service from Debian is roughly 90 TB of repository data. Second, we cannot download files easily because only HTTP(S) is available, and downloading multiple files means we are impeded by availability issues. In order to work around the huge volume of data, we decided to get repositories from 2017 to today (which corresponds approximately to when Debian "Buster" was released) and only related architectures `amd64`, `source`, and `all`. (`all` indicates no specific architecture in the Debian world.) For the download part itself, we needed to parse the metadata of each Debian repository in order to get the list of files to download for every timestamp for which a snapshot had been made. Then, we developed `resume` and `retry` download functions, which unfortunately are brute force download functions. For storing the data, a simple approach has been employed: storing files as SHA-256 names, then creating symlinks to reconstruct the repository layout. In order to get file information (package and repository metadata), we rely on simply reading a symlink. It took 3-4 months to get 4.2 TB of data, which represents 2017 to the present. Most of the information about the downloaded files and their source repository is stored in a database. In parallel, we added --- like the original `snapshot.debian.org` --- an API, [snapshot-api](https://github.com/fepitre/debian-snapshot#API), to expose information about repositories. Unlike the original one, we added much more information that rebuilder software, e.g. `debrebuild`, needs to have when requesting package information, such as the exact location of a given package in terms of Debian archive, timestamp, suite, architecture and component. The service is now publicly exposed at <https://snapshot.notset.fr> and the API endpoints at <https://snapshot.notset.fr/mr>. The service is home-hosted by the author.

This is exactly where the dream of **rebuilding Debian packages** in the same environment in which they were official published became a **reality**. Thanks to our standalone orchestrator and rebuilder software `debrebuild`, results of the rebuilding process, links to reproducible attestations called [in-toto metadata](https://in-toto.io/), and even why a package is not reproducible can all be found at <https://rebuild.notset.fr>. As of this writing, we have successfully rebuilt more than 80% of the latest Debian packages for the `unstable` release while doing tests. Since it started, several adjustments have been made, and we have finally reached a stable rebuilding process. That is why, after a few late improvements during this almost first full rebuild, we flushed it all and started again for latest Debian stable release, Bullseye. We will again rebuild `unstable` after the full rebuild of Bullseye is complete. As time passes, we will have fewer and fewer pending tasks, as there are a couple thousand package rebuilds remaining. Please note that, in addition to the initial package build, the process of rebuilding a package means querying the `snapshot.notset.fr` API multiple times to get package information and location, set up the same environment as the original published one, and finally, actually build it. All of this is possible thanks to several servers, home-hosted by the author, that intensively build packages non-stop for more than a month.

What's next?
------------

For Qubes OS, we already track reproducibility status in our continuous integration (CI) tests (see the [previous article](https://www.qubes-os.org/news/2021/02/28/improvements-in-testing-and-building/) for details), and they are also rebuilt independently like Debian packages in the same Package Rebuilder instance. We already have most of the reproducible attestations for our specific Debian packages (see <https://rebuild.notset.fr/qubesos.html>), and we will soon have all the needed ones for Debian. In consequence, we are happy to announce that we have already started the process of integrating the rebuild check status both at the build phase of our Debian templates and when later installing a package in the template itself. That's the reason we restarted the whole process of a full rebuild for Bullseye.

There is preliminary work for integrating Fedora into the orchestrator, but that deserves a separate effort. The rebuilder [rpmreproduce](https://github.com/fepitre/rpmreproduce) can be used to rebuild Fedora packages, but some discussions with RPM upstream are still needed (see <https://github.com/rpm-software-management/rpm/pull/1532>). Also, we plan to support input other than a `buildinfo` file for RPM, such as a Koji build description (which is the build infrastructure used by Fedora and CentOS) or any description piece that would make it clear how an RPM package was built. We also plan to add other distributions pretty easily and quickly, like Arch Linux, which we are going to ship officially soon.

Conclusion
----------

Improved documentation for the orchestrator is in progress to make it easier for others who want to rebuild Qubes OS or Debian in the same way that we are currently doing it. Having more independent rebuilders publishing reproducibility attestations would be especially good for the community.

In all of these efforts, we are really satisfied that the [Reproducible Builds Project](https://reproducible-builds.org/) has decided to use our work and results as an example of what it has been advocating for years, notably for Debian. The official website <https://beta.tests.reproducible-builds.org> currently mirrors our results website <https://rebuild.notset.fr>.

_The author warmly thanks Marta Marczykowska-Górecka and Marek Marczykowski-Górecki for their moral support and technical discussions throughout this rough and intensive journey while juggling other projects._

--
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/da69be58-d158-d838-d784-6249b5c2a8df%40qubes-os.org.

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to