Re: mitigating non-determinism
"Bernhard M. Wiedemann via rb-general" wrote: > ASLR: > Influences from address-space-layout-randomization(ASLR) can be avoided > with setarch -R COMMAND or globally with echo 0 > > /proc/sys/kernel/randomize_va_space . This also helps with some cases of > uninitialized memory. Anytime we find programs using uninitialized memory, we should debug them, not change the build environment to make them seem OK. (I found a couple of bugs like this in the GNU assembler back in the 1990s, that produced different instruction sequences based on reading an uninitialized variable. This hadn't been noticed before testing reproducibility, because all the sequences were valid instructions. I think the bug was in picking long or short offsets, perhaps in jump instructions.) John
Re: Reproducible Builds in May 2024: Missing paper link
Chris Lamb wrote: > Secondly, Ludovic Courtès, Timothy Sample, Simon Tournier and Stefano > Zacchiroli have collaborated to publish a paper on "Source Code > Archiving to the Rescue of Reproducible Deployment" [42]. Their paper > was motivated because: > > > The ability to verify research results and to experiment with > > methodologies are core tenets of science. As research results are > > increasingly the outcome of computational processes, software plays > > a central role. GNU Guix [43] is a software deployment tool that > > supports reproducible software deployment, making it a foundation > > for computational research workflows. To achieve reproducibility, we > > must first ensure the source code of software packages Guix deploys > > remains available. > > A PDF of this article [44] is also available. > > [42] https://hal.science/hal-04586520 > [43] https://guix.gnu.org/ > [44] https://hal.science/hal-04582287/document Those links 42 and 44 do not lead to the cited paper. They lead to the first paper discussed (which apparently appears twice in hal.science). John
[no subject]
FreeBSD 14.1-BETA2 Now Available https://lists.freebsd.org/archives/freebsd-stable/2024-May/002133.html "A summary of changes since BETA1 includes: ... Kernels are now built reproducibly." Yay! John
Re: Arch Linux minimal container userland 100% reproducible - now what?
James Addison wrote that local storage can contain errors. I agree. > My guess is that we could get into near-unsolvable philosophical territory > along this path, but I think it's worth being skeptical of the notions that > local-storage is always trustworthy and that the network should always be > avoided. For me, the distinction is that the local storage is under the direct control of the person trying to rebuild, while the network and the servers elsewhere in the network are not. If local storage is unreliable, you can fix or replace it, and continue with your work. I am looking for reproducibility that is completely doable by the person trying to do it, at any time after when they obtain a limited number of key items by any means: the bootable binary of the OS release, and what the GPL calls the "Corresponding Source". And, I am very happy to be seeing lots of incremental progress along the way! John PS: I have a local archive of the source ISO images and the binary ISO images of many Ubuntu, Fedora, Debian, BSD, etc releases. It all fits easily on a single hard disk drive, and that drive has many backups from different times. The images all have checksums that were checked when I obtained the images. The checksums are in the backups, so I can see if my copies were tampered with or merely suffered from storage degradation over time. And I can easily copy the whole thing and send you a copy, if you want one; or put it on the Internet (some of the releases are available from me now via BitTorrent). If those distros were reproducible, I could verify that each of those binary releases was untampered. Or YOU could, without my help, after you got a copy from me or from anyone. And if you suspected a binary Ken Thompson attack, you could use those releases locally at your site, as the source material for an arbitrarily intense diverse double-compilation check. Without my help, and without the help of anyone else on the Internet. In short, making a local archive of reproducible binaries and their corresponding sources, readily enables all the verifications that we are trying to make common in the world.
Re: Arch Linux minimal container userland 100% reproducible - now what?
kpcyrd wrote: > 1) There's currently no way to tell if a package can be built offline > (without trying yourself). Packages that can't be built offline are not reproducible, by definition. They depend on outside events and circumstances in order for a third party to reproduce them successfully. So, fixing that in each package would be a prerequisite to making a reproducible Arch distro (in my opinion). I don't understand why a "source tree" would store a checksum of a source tarball or source file, rather than storing the actual source tarball or source file. You can't compile a checksum. kpcyrd wrote: > Specifically Gentoo and OpenBSD Ports have solutions for this that I > really like, they store a generated list of URLs along with a > cryptographic checksum in a separate file, which includes crates > referenced in e.g. a project's Cargo.lock. I don't know what a crate or a Cargo.lock is, but rather than fix the problem at its source (include the source files), you propose to add another complex circumvention alongside the existing package building infrastructure? What is the advantage of that over merely doing the "cargo fetch" early rather than late and putting all the resulting source files into the Arch source package? > 3) All of this doesn't take BUILDINFO files into account The BUILDINFO files are part of the source distribution needed to reproduce the binary distribution. So they would go on the source ISO image. > I did some digging and downloaded the buildinfo files for each package > that is present in the archlinux-2024.03.01 iso Thank you for doing that digging! > Using plenty of different gcc versions looks > annoying, but is only an issue for bootstrapping, not for reproducible > builds (as long as everything is fully documented). I agree that it's annoying. It compounds the complexity of reproducing the build. Does Arch get some benefit from doing so? Ideally, a binary release ISO would be built with a single set of compiler tools. Why is Arch using a dozen compiler versions? Just to avoid rebuilding binary packages once the binary release's engineers decide what compiler is going to be this release's gold-standard compiler? (E.g. The one that gets installed when the user runs pacman to install gcc.) Or do the release-engineers never actually standardize on a compiler -- perhaps new ones get thrown onto some server whenever someone likes, and suddenly all the users who install a compiler just start using that one? It currently seems that there is no guarantee that on day X, if you install gcc on Arch (from the Internet) and on the same day you pull in the source code of pacman package Y, that it will even build with the Day X version of gcc. Is that true? John
Re: Arch Linux minimal container userland 100% reproducible - now what?
John Gilmore wrote: > It seems to me that the next step in making the Arch release ISOs > reproducible is to have the Arch release engineering team create a > source-code release ISO that matches each binary release ISO. Then you > (or anyone) could test the reproducibility of the release by having > merely those two ISO images and a bare amd64 computer (without even an > Internet connection). kpcyrd wrote: > I think this falls under "bootstrappable builds", a bare amd64 computer > still needs something to boot into (a CD with only source code won't do > the trick). Bootstrappable builds are a different thing. Worthwhile, but not what I was asking for. I just wanted provable reproducibility from two ISO images and nothing more. I was asking that a bare amd64 be able to boot from an Arch Linux *binary* ISO image. And then be fed a matching Arch Linux *source* ISO image. And that the scripts in the source image would be able to reproduce the binary image from its source code, running the binaries (like the kernel, shell, and compiler) from the binary ISO image to do the rebuilds (without Internet access). This should be much simpler than doing a bootstrap from bare metal *without* a binary ISO image. And if your source/binary ISO images can do that, it's not just an academic exercise in reproducibility. It can also produce a new binary ISO that is built from that source ISO plus a few patches (e.g. for fixing security issues). Or, it can "recompile-the-world" after you (or any user) makes a small change to a kernel, include file, library, or compiler -- and show exactly how many programs compile to something *different* as a result. Basically, that pair of ISOs becomes a seed that can carry forward, or fork, the whole distribution. For anybody who receives them. That is the promise of free software, but the complexity of modern distros plus the convenience of ubiquitous Internet have inadvertently tended to undermine that promise. Until the reproducible builds effort! If someday an Electromagnetic Pulse weapon destroys all the running computers, we'd like to bootstrap the whole industry up again, without breadboarding 8-bit micros and manually toggling in programs. Instead, a chip foundry can take these two ISOs and a bare laptop out of a locked fire-safe, reboot the (Arch Linux) world from them, and then use that Linux machine to control the chip-making and chip-testing machines that can make more high-function chips. (This would depend on the chip-makers keeping good offline fireproof backups of their own application software -- but even if they had that, they can't reboot and maintain the chip foundry without working source code for their controller's OS.) John
Re: Arch Linux minimal container userland 100% reproducible - now what?
Congratulations on closing in toward Arch Linux reproducibility!!! kpcyrd wrote: > Specifically what I mean - given a line like this: > > FROM > archlinux@sha256:2dbd72d1e5510e047db7f441bf9069e9c53391b87e04e5bee3f379cd03cec060 > > I want to reproduce the artifact(s) that are pulled in by this, with > the packages our Arch Linux rebuilders have reproduced from source > code. From what I understand this hash points to a json manifest that > is not contained in the container image itself and was generated by > the registry (should we archive them?), and this manifest then points > to the sha256 of the tar containing the filesystem (I'm possibly > missing an indirection here). I have no experience with Arch -- am just reading what's on their website. From a quick glance at their docs, the Arch distribution *only* distributes binary packages. They only offer URLs for source code, requiring that users depend on a working Internet connection and what could be a large, arbitrary set of HTTPS servers that in theory contain the matching source code. See: https://wiki.archlinux.org/title/Arch_build_system (I'm not sure how that even meets the requirements of the GPL for binary distributors to make the matching source code available to recipients of the binaries.) It seems to me that the next step in making the Arch release ISOs reproducible is to have the Arch release engineering team create a source-code release ISO that matches each binary release ISO. Then you (or anyone) could test the reproducibility of the release by having merely those two ISO images and a bare amd64 computer (without even an Internet connection). (Someone other than their releng team could do this shortly after the binary release, hoping that none of the URLs becomes inaccessible in the meantime. But the right time to gather the full source code for reproducibility is when they themselves pull in the source code to BUILD those binary packages that they will put in their release ISO.) Making users reproduce an ISO full of binary packages by downloading the sources from all over the Internet seems highly prone to fail -- in the first few months, let alone five or ten years later. Even Arch's binary releases are only available from Arch for three (monthly) release cycles. Then you're on your own if you want to find a copy of what they released, like the one that was current last Christmas. See: https://archlinux.org/releng/releases/ Arch may do great release engineering (I hope they do!), but it's apparently not *archival* release engineering. John
Re: Two questions about build-path reproducibility in Debian
Thanks, everyone, for your contributions to this discussion. A quick note: Vagrant Cascadian wrote: > It would be pretty impractical, at least for Debian tests, to test > without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from > debian/changelog for quite a few years now. Making a small patch to the local dpkg to alter or remove the value of SOURCE_DATE_EPOCH, then trying to reproduce all the packages from source using that version of dpkg, would tell you which of them (newly) fail to reproduce because they depend on SOURCE_DATE_EPOCH. > Sounds like an interesting project for someone with significant spare > time and computing resources to take on! It looks to me like the whole Ubuntu source code (that gets into the standard release) fits in about 25 GB. The Debian 12.0.0 release sources fit in 83GB (19 DVD images). Both of these are under 1% of a 10TB disk drive that runs about $200. A recent Ryzen mini-desktop, with a 0.5TB SSD that could cache it all, costs about $300. Is this significant computing resources? For another $40 we could add a better heat sink and a USB fan. How many days would recompiling a whole release take on this $540 worth of hardware? (I agree that the "spare" time to set it up and configure the build would be the hard part. This is why I advocate for writing and releasing, directly in the source release DVDs, the tools that would automate the recompilation and binary comparison. The end user should be able to boot the matching binary release DVD, download or copy in the source DVD images, and type "reproduce-release".) John
Re: Two questions about build-path reproducibility in Debian
>> But today, if you're building an executable for others, it's common to build >> using a >> container/chroot or similar that makes it easy to implement "must compile >> with these paths", >> while *fixing* this is often a lot of work. I know that my opinion is not popular, but let me try again before we lay this decision to rest. In avoiding fixing directory dependencies, you can move the complexity around, but in doing so you didn't reduce the complexity. Our instructions for reproducing any package would have to identify what container/chroot/namespace/whatever the end-user must set up to be able to successfully reproduce a package. Will these be the same for every package, for every distro, and for every other environment in which we want to inspire reproducibility? Do we need to add those constraints to the Linux Foundation's Filesystem Hierarchy Standard? Do we need to add them to the buildinfo files? Ideally the tools that ordinary people traditionally use to reproduce one, such as dpkg-buildpackage or rpmbuild, will have been improved to do the container/chroot setup automatically. Otherwise, naive users will have to figure out what a container is or why it is necessary for them to grok this obscure environmental thing in order to tell if their binary package was tampered with or not. Will they always have to build software as root, because chroot doesn't and can't work for ordinary users? If we punt this, there will be an ongoing flow of "my package doesn't build to the same binary, somebody must be 0wning me" emails from people who do the obvious thing like type "make" and "cmp". Do we want successful reproducibility to depend on setting up servers and virtual machines and web-servers and databases and build farms and CI-queues and such? Yes, to reproduce a whole distro, reproducibility has to WORK there, but does it have to DEPEND on that complex infrastructure? I'm an old Unix guy and so are millions of end-users and sysadmins. Containers are a recent Linux thing. Namespaces ditto. I still have never found a use for containers; I tried using Docker for something and was bemused to discover that it could calculate all kinds of stuff, but none of the output of the calculation could come back into my ordinary Linux filesystem (without some kind of obscure per-invocation JCL-like configuration setup), so I stopped trying to use it. Another time, I tried booting an on-disk, installed copy of Ubuntu inside a virtual machine, so I could keep running an older service that's hard to port forward, while migrating the rest of my machine to a newer Ubuntu release. VM/360 could do that decades ago, but I discovered that that use-case is not well supported in the Linux vm tools and documentation, so I gave up on that too. There are more things in heaven and earth, Horatius, than spending all of your time doing sysadmin. These newfangled tools are just not as well rounded as the stuff that's been well understood in Unix since the 1970s or 1980s, like "directories". If only seventeen experts in the world can figure out if a package has been tampered with, we will have labored mightily but not done much to improve computer security. Also recall what pains the full-source bootstrap people are having to go through after some imho foolish decisions were made about depending on modern C++ features inside core tools like gcc and gdb. Reproducible builds should make the underlying software LESS dependent on the particular configuration of the build environment; that's kind of the point. >>> ... it makes reproducibilty from around 80-85% of all >>> packages to >95%, IOW with this shortcut we can have meaningful >>> reproducibility >>> *many years* sooner, than without. If we move the goal posts in order to claim victory, who are we fooling but ourselves? I'd rather that we knew and documented that 57% of packages are absolutely reproducible, 23% require SOURCE_DATE_EPOCH, and 12% still require a standardized source code directory, than to claim all 95% are "meaningfully reproducible" today. John
Re: Two questions about build-path reproducibility in Debian
Vagrant Cascadian wrote: > > > to make it easier to debug other issues, although deprioritizing them > > > makes sense, given buildd.debian.org now normalizes them. James Addison via rb-general wrote: > Ok, thank you both. A number of these bugs are currently recorded at severity > level 'normal'; unless told not to, I'll spend some time to double-check their > details and - assuming all looks OK - will bulk downgrade them to 'wishlist' > severity a week or so from now. I may be confused about this. These bug reports are that a package cannot be reproducibly built because its output binary depends on the directory in which it was built? Why would these become "wishlist" bugs as opposed to actual reproducibility bugs that deserve fixing, just because one server at Debian no longer invokes this bug because it always uses the same build directory? If an end user can't download a source package (into any directory on any machine), and build it into the same exact binary as the one that Debian ships, this is not a "wishlist" idea for some future enhancement. This is a real issue that prevents the code from being reproducible. How am I confused? John
Re: Irregular status update about reproducible live-build ISO images
Roland, thank you for your ongoing work and reporting to make Debian reproducible! One question: > * Last month a question was raised, whether the distributed sources > are sufficient to rebuild the images. The answer is: probably yes, but > I haven't tried. > The chain is: source code --compiler--> executable files --debian > packaging--> .deb archives --live-build--> live images > I've focused on the last section of this chain; the installation of > the .deb archives into the live images. Thank you for focusing on the last part of the chain. You are very, very close there! I am wondering if there is any low-hanging fruit anywhere else in the chain, that you may have the expertise and time to address. For example, how does the live-build process decide which binary .deb archives are selected for inclusion in the live image? Are these lists or criteria stored in the source code archives? If not, can they be put into the source code archives? Similarly, are there any other inputs to the live-build process? Perhaps a template of a binary ISO image? Or a binary program that creates a prototype ISO image, which is run during the live-build process. I note that when running jigdo-lite to reproduce a live image, not only is there a set of .deb's that are copied in, but also a .template file which has the portions of the image that don't directly come from a .deb file. Is there an equivalent template in the live-build process, or where do these nonzero and non-.deb parts of the resulting live-image come from? Is there full source code for those? Also, is there an easy way to start from the set of binary .deb files to be included in an image, and from each one, produce a list of the source files (.tar.gz's, Debian control files, patches, etc) that were used to create it? If so, you could create a master list of all the source files that were used to create a particular live-image. And an automated process could compare that list of source files to the contents of the matching "Sources" DVD image, to ensure that all of the required source files are actually included in the "matching source" DVD. When a rebuilt image differs in some small way from the original, what tools do you use to determine what files the differences are in, and why? Are these tools to compare a live-image with a rebuilt-live-image also in the Debian source tree and in the Debian source DVDs? Being able to do any of these things, and correct any lapses now, before the next official Debian release, will enable you or anyone to complete the ultimate job of proving that a source DVD plus a live DVD can fully reproduce the official live DVD, without access to any network resources. (And thus that a live DVD, a source DVD, plus a small set of patches can verifiably produce a live DVD that includes only the changes made in that set of patches, and no others.) Thanks again! John
Re: Please review the draft for December's report
https://reproducible-builds.org/reports/2023-12/ "Reproducible Builds in December 2023 Welcome to the November 2023 report..." It seems better to NOT reproduce the previous month's header quite so accurately. ;-/ John
Priority claim re bootstrapping
I congratulate the Guix bootstrap team on their continuing progress on reproducibility. Yet, there is some controversy over one statement made in their blog, claiming priority over building: a package graph of more than 22,000 nodes rooted in a 357-byte program in the first paragraph of: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/ Claims of priority in innovation have some importance to history. You can see the level of controversy created by a latecomer who claimed that he "invented email" here: https://www.emailonacid.com/blog/article/industry-news/who-really-invented-email/ https://en.wikipedia.org/wiki/Shiva_Ayyadurai#EMAIL_invention_controversy And there's the key role that priority has in whether inventions are patentable and by whom, which has led to huge financial implications; such as: https://en.wikipedia.org/wiki/Apple_Inc._v._Samsung_Electronics_Co. Email and smartphones have become pervasive in the decades since their invention in small niches, making such claims actually important to the world. We hope and work toward our own area of invention, reproducibility of software, becoming equally pervasive in the decades to follow. Therefore, in a community of technical experts, when a difference of opinion comes up on the priority of a useful invention, it is useful for the issue to be examined in more detail, among people who "were there at the time". Such an examination tends to bring out more facts, which become very useful to later observers who weren't there and have to figure out what may have happened from scanty documentation. In many such cases, different smaller innovations were often made by a variety of people or teams. Perhaps only one or none of the claimants can credibly support the statement that they made the "key" invention in that area, but many may be able to share some level of public credit for their work. (As another example, when I claimed years ago on this list that Cygnus made the GNU compiler tools reproducible in the 1990s, some people called bullshit on me -- until I provided much more detail linked to the published source trees and early releases involved, as well as copies of internal company emails and marketing announcements on the topic.) Personally I don't have a stake in which claims about "bootstrapping a large collection of source code programs from a small binary seed" hold up to detailed historical and technical examination or not. It seems to me that cross-bootstrapping from a more mature software architecture has for decades been the norm, even in the creation of the original UNIX. (As UNIX was then used to cross-bootstrap the GNU software at FSF and at Cygnus many years later.) Toggling in binary seed programs was unusual even in the first (8-bit) microprocessors able to self-host in the early 1970s, such as in building Bill Gates's first 8080 BASIC interpreter, Steve Wozniak's ROMs for the 6502-based Apple II, let alone the 68000-based SUN boards built at Stanford in the 1980s. Didn't IBM cross-build from the 7094 for the first IBM 360 tools even in the early 1960s? But before dismissing the idea out of hand, let's see what facts we can turn up, and what new innovations can be produced there to improve supply chain integrity. I do think the topic is a suitable one for the Reproducible Builds community to discuss. Politely conducted disputes should not be dismissed as "nonsense" with a suggestion that the parties unsubscribe from the list. Inflating the emotional tone of the discussion is not constructive toward the community discovering whatever contemporaneous truths may be findable behind the various claims. Thank you for listening. John Gilmore
Re: Reproducibility terminology/definitions
Pol Dellaiera wrote: > To that end, I'm currently drafting a formal definition of > reproducibility that I hope to contribute. However, before I proceed > further, I would like to know whether any of you have already worked > on formulating such a definition. Here are a few emails (from prior R-B discussions) that go into how "reproducibility" might be formally defined and verified. I'm sure that many other inputs would also be useful; these are just two that I could recall and easily find. The final series of emails below describes how Cygnus made the GNU cross-compiler tools "reproducible" in the early 1990s -- in the sense that if you cross-compiled the same source code on any of 9 hosting platforms, the tools would produce the exact bit-for-bit identical binaries for the target platform. (E.g. a Mac, a Windows machine, and a SPARC Solaris machine were verified to cross-compile the "GNU make" source code into an identical "gmake" binary, that was targeted to run on a Motorola 68000 running SunOS.) This reproducibility also included being able to cross-build identical binaries for all the GNU compiler tools themselves (as well as for all of our hundreds of compiler test cases). The several man-years of engineering required to stamp out all the bugs that made the compilers not reproducible, was a prerequisite to today's efforts to make whole linux distributions (that are compiled by those GNU tools) reproducible. John To: General discussions about reproducible builds References: <87sgxgok9k@gnu.org> Comments: In-reply-to =?us-ascii?Q?=3D=3Futf-8=3FQ=3FLudovic=5FCourt=3DC3?= =?us-ascii?Q?=3DA8s=3F=3D?= message dated "Fri, 25 Jan 2019 15:59:51 +0100." MIME-Version: 1.0 Date: Mon, 28 Jan 2019 23:18:43 -0800 Message-ID: <15495.1548746...@hop.toad.com> From: John Gilmore Subject: Re: [rb-general] Definition of "reproducible build" Content-Type: multipart/mixed; boundary="===0660915116==" Errors-To: rb-general-bounces+gnu=toad@lists.reproducible-builds.org Sender: "rb-general" --===0660915116== Content-Type: text/plain Content-Transfer-Encoding: 8bit =?utf-8?Q?Ludovic_Court=C3=A8s?= wrote: > I agree that insisting on provenance is crucial. Dockerfiles (andsimilar) > are often viewed as “source”, but they really aren’t source:the actual source > would come with the distros they refer to (Debian,pip, etc.) > Those distros might in turn refer to external pre-built binaries,though, such > as “bootstrap binaries” for compilers (Rust, OpenJDK, andso on.) I propose a definition for whether a bootable OS distro is reproducible. (If what you're building is not a whole distro that can self-compile, this definition doesn't apply.) Our initial goal would be to produce a bootable binary release (DVD or USB stick) and a source release (ditto). The source release would include the script that allows the binary release to recompile the source release to a new binary release that ends up bit-for-bit identical. Such a binary/source release pair would be called "reproducible". That's useful: If you have to fix a bug in it, you can make the mods you need in the source tree, rebuild the world, and out will come a release with just that one change in the binaries, verifiably identical except where it matters. And developers can use such a release to detect what changes matter to whom, such as: when you alter a system include file, which binaries change? During development, the code would be built by some earlier release's tools, built piecemeal, etc, like current build processes do. Anytime before release, the developers can test whether a draft source release builds into a binary release that itself can build the sources into the same binary release. And fix any discrepancies, ideally long before release. This is similar to what GCC does to test itself, or what Cygnus did to test the whole toolchain for cross-compiling. But applied to the entire OS release. Such a paired source/binary release doesn't require a chain of provenance of earlier binary software, particularly if people can demonstrate bootstrapping it using several different earlier compiler toolchains, still producing the same binaries. You can bootstrap it with itself. The separate efforts to minimize the amount of binary code we have to trust to do a rebuild are laudable and fascinating. Keep going! But we shouldn't require whole distros to do that yet. We haven't even accomplished a basic paired binary/source reproducible release yet, for any major release -- or have we? John PS: For extra points, the binary release should be able to cross-compile its source release into a binary release for each other supported platform, reproducibly. And those other-platform binary releases should cross-compile the source release back bit-for-bit into the
Re: Introducing: Semantically reproducible builds
David A. Wheeler wrote: > Please don't view the text above as opposing reproducible builds. > I think reproducible builds are the gold standard for countering subverted > builds, and I will continue to encourage them. > But when you can't get them (e.g., because you don't have time to patch every > program > in the universe or the builders won't make changes to their build process), > it's useful to look for some *workable* backoff alternatives. The backoffs > may not give > you all you wanted, but they can at least help users focus on their biggest > risks first. To the extent that the text causes the public to be confused about what reproducibility means, that text *will* oppose reproducible builds. Can you call packages that aren't reproducible because the maintainers insist on keeping timestamps or temp file names or etc in the binaries, (or whose maintainers simply don't care), "irreproducible" rather than "semantically reproducible"? That would be much clearer. John
Re: Sphinx: localisation changes / reproducibility
James Addison wrote: > When the goal is to build the software as it was available to the > author at the time of code commit/check-in - and I think that that is > a valid use case - then that makes sense. I think of the goal as being less related to the author, and more related to the creator of a widespread binary release (such as a Linux distribution, or an app that goes into an app-store). The goal is then that the recipient of that binary release can verify that the source code they obtained from the same place is able to rebuild that exact widespread binary release. This proves that the source code can be trusted for some purposes, such as being used to read it to understand what the binary does. Or to make small bug-fixes to it. Or to become the base for further evolution of the project if the maintainer is suddenly "hit by a bus" and stops making further releases. James Addison wrote: > Inverting the question somewhat: if a single source-base is rebuilt > using two different SOURCE_DATE_EPOCH values (let's say, 1970-01-01 > and 2023-04-18), then what are expected/valid differences in the > resulting output? In the ideal circumstances, the resulting output would be identical, because the build process would have no dependencies on SOURCE_DATE_EPOCH. In these ideal circumstances, the code is "portable", in the same sense that people understand "portable" code will build and run the same on an ARM running MacOS as it does on an x86 running Windows. There are many ways to make code portable, but the most robust of them is to *eliminate* dependencies. A more fragile way would be to #ifdef your code to adjust for every supported build or run environment. That fragile way breaks as soon as it needs to build or run in a new environment, whereas the robust way has already made it likely to "just work" in a new environment that it has never encountered before (or to have only one or two minor things that need adjusting). Note that if it built fine in a Linux system version X, then a later Linux system version Y is a "new environment" and might break the code. The robust version is again less likely to break, because it inherently, by design, cares less about the nitty gritty details of its environment. Much code in Linux does not reach that ideal (yet!). Instead, builds of non-ideal code use SOURCE_DATE_EPOCH as a crutch to limit their dependencies on the local build environment, replacing those dependencies with a dependency on SOURCE_DATE_EPOCH. So, if you rebuild a non-ideal package with two different values of SOURCE_DATE_EPOCH, you will get two different binaries that differ in the areas of dependency. For example, if the documentation embeds a build-date in its page footer, you'd expect every page of the built documentation would differ. If the "--version" output of the program embeds the build date, then the code that produces that output would differ. Etc. In fact, "fuzzing" their code with different values of SOURCE_DATE_EPOCH can help a maintainer identify where those dependencies still remain. We try to talk package authors out of such dependencies, but ultimately it's their package and they make the architectural decisions. To some of them it's incredibly important that the build date appears in the man-page. Reproducibility usually features lower among their priorities than it does in ours. John
Re: Sphinx: localisation changes / reproducibility
James Addison via rb-general wrote: > In general, we should be able to > pick two times, "s" and "t", s <= t, where "s" is the > source-package-retrieval time, and "t" is the build-time, and using > those, any two people should be able to create exactly the same > (bit-for-bit) documentation. I think that SOURCE_DATE_EPOCH generally > refers to "t". I think that SOURCE_DATE_EPOCH generally refers to the check-IN time of each of the source package(s) being rebuilt. You can retrieve the packages anytime later than that, and you can do the build at any time later, and SOURCE_DATE_EPOCH should not change (and the built binaries and docs should also not change). John
Re: Three bytes in a zip file
Larry Doolittle wrote: > $ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u > fab-ea2bb52c-mb) > 22c22 > < -rw-r--r-- 1 redacted redacted 644661 2023-04-04 18:10:00.0 -0700 > marble-ipc-d-356.txt > --- > > -rw-r--r-- 1 redacted redacted 644661 2023-04-06 00:25:03.0 -0700 > > marble-ipc-d-356.txt So I'm guessing that even before the zip file is re-created, the rebuild process is leaking the rebuild timestamp into the last-modified metadata of the generated marble-ipc-d-356.txt file? That seems like it should be handled by the build process explicitly setting its timestamp to something related to the last-source-code-checkin time (with "touch --date=XXX") rather than to current time. Truncating the timestamps to DOS timestamps wouldn't work to eliminate this difference anyway, since the date in the two files is two days different; DOS timestamps are accurate to 2 seconds, as I recall. John
Re: Does diffoscope compares disk partitions
>> So, overall, I actually don't think that diffoscope has the requested >> support, and it's not "just" a bug of failed identification. I have been surprised at how much effort has gone into "diffoscope" as a total fraction of the Reproducible Builds effort. Perhaps it is a case akin to the drunk looking for his keys under the streetlight where he can see, rather than in the dark where he dropped them. (It's easier to hack diffoscope than to hack thousands of irreproducible packages.) I for one am happy that diffoscope DOESN'T have support for umpteen disk partitioning schemes and file system formats. John PS: Has anyone on the list considered writing an article for the Journal of Irreproducible Results about our effort?
Re: Call for real-world scenarios prevented by RB practices
On 22/03/2022 13.46, Chris Lamb wrote: > Just wondering if anyone on this list is aware of any real-world > instances where RB practices have made a difference and flagged > something legitimately "bad"? The GNU compilers are already tested for complete reproducibility. We at Cygnus Support built that infrastructure back in the 1990s, when we made gcc into a cross-compiler (compiling on any architecture + OS, targeting any other). We built the Deja Gnu test harness, and some compiler/assembler/linker test suites, that rebuilt not just our own tools, but also a test suite with hundreds or thousands of programs. We compared their binaries until they were bit-for-bit identical when built on many different host machines of different architectures. To make it work, we had to fix many bugs and misfeatures, including even some high-level design bugs, like object file formats that demanded a timestamp (we decided that 0 was a fine timestamp). A few of those bugs involved generating different but working instruction sequences -- I recall fixing one that depended on an uninitialized local variable. We never found any malicious code in the GNU tools during that process, just poorly debugged code and unportable code. I don't know whether that's because nobody malevolent actually knew what a lever they would have had by infesting our code, or whether we really weren't as important as we thought we were :-/. I was still manually making and reading the diff between the previous release and each new release, to make sure that no change that I didn't recognize would slip through. It was a pretty heady feeling to make a GNU tool release, send an email to info-gnu, and have thousands of people running it in the next few days. We took the responsibility seriously. (Caveat: We weren't shipping binaries, except to Cygnus customers. Maliciously patched binaries are what RB is designed to prevent.) John
Re: [rb-general] Debian buster, 54% reproducible in practice (Re: Core Debian reproducibility: 57% and rising!)
> Though without solving #894441 we cannot reach much higher than 80%=20 > (because 93% is the current theoretic maximum, of which we need to=20 > distract 12% binNMUs...) Even without solving the general binNMU problem, can't you make more packages reproducible by eliminating those packages' dependencies on SOURCE_DATE_EPOCH? (I always thought that removing the date dependencies from the source code was better than patching over them with this environment variable. The challenge is when individual maintainers refuse to make their packages date-indepedent, yet distros still want the packages to be reproducible. If you can convince a maintainer, you aren't stuck on the horns of this dilemma.) John ___ rb-general@lists.reproducible-builds.org mailing list To change your subscription options, visit https://lists.reproducible-builds.org/listinfo/rb-general. To unsubscribe, send an email to rb-general-unsubscr...@lists.reproducible-builds.org.
Re: [rb-general] Definition of "reproducible build"
> I like the idea, however what you are proposing is basically a new > distro/fork, where you would remove all unreproducible packages, as > every distro still has some unreproducible bits. I suggest going the other way -- produce a distro that is "80% reproducible" from its source code USB stick and its binary boot USB stick. You'd already have the global reproducibility structure and scripts written and working, even before the last packages are individually reproducible. That global reproducibility tech would be immediately adoptable by any distro. The output of the reproduction scripts would be a bootable binary that does boot and run! It would still have differences from the "release master" bootable binary, but those differences would be irrelevant to the functioning of the binary, and would be clearly visible with "diff -r". (For one thing, this would cause the distros to actually produce a "source code USB stick image". Currently most of them don't. They instead require you to download thousands of separate source packages or tarballs, and have no scripts readily visible for building those into a bootable binary image.) After accomplishing that, then the focus could go on the 20% (or 10% or whatever) of packages that aren't yet reproducible. And, people making small distros could cut out such packages to make a 100% reproducible distro, as Holger suggested. John ___ rb-general@lists.reproducible-builds.org mailing list To change your subscription options, visit https://lists.reproducible-builds.org/listinfo/rb-general. To unsubscribe, send an email to rb-general-unsubscr...@lists.reproducible-builds.org.
Re: [rb-general] Definition of "reproducible build"
=?utf-8?Q?Ludovic_Court=C3=A8s?= wrote: > I agree that insisting on provenance is crucial. Dockerfiles (andsimilar) > are often viewed as âsourceâ, but they really arenât source:the actual > source would come with the distros they refer to (Debian,pip, etc.) > Those distros might in turn refer to external pre-built binaries,though, such > as âbootstrap binariesâ for compilers (Rust, OpenJDK, andso on.) I propose a definition for whether a bootable OS distro is reproducible. (If what you're building is not a whole distro that can self-compile, this definition doesn't apply.) Our initial goal would be to produce a bootable binary release (DVD or USB stick) and a source release (ditto). The source release would include the script that allows the binary release to recompile the source release to a new binary release that ends up bit-for-bit identical. Such a binary/source release pair would be called "reproducible". That's useful: If you have to fix a bug in it, you can make the mods you need in the source tree, rebuild the world, and out will come a release with just that one change in the binaries, verifiably identical except where it matters. And developers can use such a release to detect what changes matter to whom, such as: when you alter a system include file, which binaries change? During development, the code would be built by some earlier release's tools, built piecemeal, etc, like current build processes do. Anytime before release, the developers can test whether a draft source release builds into a binary release that itself can build the sources into the same binary release. And fix any discrepancies, ideally long before release. This is similar to what GCC does to test itself, or what Cygnus did to test the whole toolchain for cross-compiling. But applied to the entire OS release. Such a paired source/binary release doesn't require a chain of provenance of earlier binary software, particularly if people can demonstrate bootstrapping it using several different earlier compiler toolchains, still producing the same binaries. You can bootstrap it with itself. The separate efforts to minimize the amount of binary code we have to trust to do a rebuild are laudable and fascinating. Keep going! But we shouldn't require whole distros to do that yet. We haven't even accomplished a basic paired binary/source reproducible release yet, for any major release -- or have we? John PS: For extra points, the binary release should be able to cross-compile its source release into a binary release for each other supported platform, reproducibly. And those other-platform binary releases should cross-compile the source release back bit-for-bit into the same binary release you started with. ___ rb-general@lists.reproducible-builds.org mailing list To change your subscription options, visit https://lists.reproducible-builds.org/listinfo/rb-general. To unsubscribe, send an email to rb-general-unsubscr...@lists.reproducible-builds.org.
Re: [rb-general] Style Guide Updates
> We tend to write Markdown, not HTML, after all so having copy-pastable > snippets is less compelling to me, priority-wise. This also goes for > the non-Javascript "story" but this is less interesting as the > situation is somewhat-satisfactory right now. Here's a vote for making the reproducible builds site completely functional without Javascript. What an insane idea it is that people can't read or interact with a web site without granting permission to a random third party to run arbitrary code in their local machine! I browse with Javascript off all the time (using Firefox plugin NoScript), and with EFF's Privacy Badger. Result: Many (used to be "most") websites work, but I never see ads and I don't get tracked by the bezillion companies that are spying on me to sell my eyeballs for their own benefit. When the sites are arranged so I can't even click a simple link without enabling Javascript, I generally skip further interactions with them. The amazing thing is that so many modern "web designers" don't even know how HTML works and just assume that everybody has and needs Javascript. And ditto for the tool builders who make the tools these designers know instead of HTML. John ___ rb-general@lists.reproducible-builds.org mailing list To change your subscription options, visit https://lists.reproducible-builds.org/listinfo/rb-general. To unsubscribe, send an email to rb-general-unsubscr...@lists.reproducible-builds.org.