Re: Two questions about build-path reproducibility in Debian
On Tue, Mar 12, 2024 at 08:45:03AM -0700, Vagrant Cascadian wrote: > >> Note: I confused myself when writing this; in fact Salsa-CI reprotest > >> _does_ > >> continue to test build-path variance, at least until we decide otherwise. > > this is in fact a bug and should be fixed with the next reprotest release. > That is not a reprotest bug, but an infrastructure issue for the > debian-specific salsa-ci configuration. Reprotest is not a > debian-specific tool. agreed. > Reprotest should continue to vary build paths by default; reprotest > historically and currently defaults to enabling all variations and > making an exception does not seem worth the opinionated change of > behavior. By design, reprotest is easy to configure which variations to > enable and disable as needed. agreed for the upstream release. for reprotest in Debian I'm still not so sure. (and for reprotest running as part of salsaci I do think the default should be not to vary path.) -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ "The two hardest problems in computer science are: (i) people, (ii), convincing computer scientists that the hardest problem in computer science is people, and, (iii) off by one errors." - Jeffrey P. Bigham signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
James Addison wrote: > None of the remaining thirty-or-so (and in fact, none of the 66 updated so > far) > are usertagged both 'buildpath' and 'toolchain'. > > I would say that a few of them _are_ 'toolchain packages' -- mono, > binutils-dev > and a few others -- but for these bugs the buildpath issues are internal to > each package at build-time and do not affect the construction of other > packages in their ecosystem. You are absolutely right to distinguish between a package that is itself unreproducible and a package that is causing other packages to be unreproducible. These are very much orthogonal concepts as you imply, and a package can certainly be in both categories at once. What might be confusing to folks is that our "toolchain" usertag in the Debian BTS does not refer to a toolchain *package* in the usual, Debian sense, i.e. Mono, libc, Bison, documentation generators and so on. But rather that (loosely speaking) "if this usertag is applied to a bug, its denoting that that particular *bug* is affecting the reproducibility of other packages." Unfortunately, the tag is actually an excellent example of that general trend in tech where something was badly named in the spur of the moment, and then the name just sticks around forever due to some combination of muscle memory, inertia and, frankly, priority: as in, this metadata is not *all* that visible nor A++ important to begin with… outside of threads like this. :) Best wishes, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org 💠 ⬊ ⬋ o
Re: Two questions about build-path reproducibility in Debian
Thanks, Chris, On Sun, 31 Mar 2024 at 13:01, Chris Lamb wrote: > > Hi James, > > > Approximately thirty are still set to other severity levels, and I plan to > > update those with the following adjusted messaging […] > > Looks good to me. :) > > Completely out of interest, are any of those 30 bugs tagged both > "buildpath" and "toolchain"? It's written nowhere in Policy (and I > can't remember if it's ever been discussed before), but if package X > is causing package Y to be unreproducible, I feel that has some > bearing on the severity of the bug for that issue filed against X… > completely independent of whether package X is reproducible itself or > not. :) None of the remaining thirty-or-so (and in fact, none of the 66 updated so far) are usertagged both 'buildpath' and 'toolchain'. I would say that a few of them _are_ 'toolchain packages' -- mono, binutils-dev and a few others -- but for these bugs the buildpath issues are internal to each package at build-time and do not affect the construction of other packages in their ecosystem. > Just to underscore that this is simply my curiosity before you > reassign: in the particular case of *buildpath* AND toolchain, these > should almost certainly be wishlist anyway because, as discussed, we > "aren't testing buildpath". Mostly agree. Of the bugs in Debian that _are_ usertagged both buildpath and also toolchain, a few of them appear to have possible known/tested fixes, but in some cases are awaiting maintainer/upstream support. Using a static buildpath seems like it should mitigate most concern there, but if that were not the case, then the severity of those could perhaps be re-argued based on the quantity, popularity and importance of affected software (packaged or otherwise). Regards, James
Re: Two questions about build-path reproducibility in Debian
Hi James, > Approximately thirty are still set to other severity levels, and I plan to > update those with the following adjusted messaging […] Looks good to me. :) Completely out of interest, are any of those 30 bugs tagged both "buildpath" and "toolchain"? It's written nowhere in Policy (and I can't remember if it's ever been discussed before), but if package X is causing package Y to be unreproducible, I feel that has some bearing on the severity of the bug for that issue filed against X… completely independent of whether package X is reproducible itself or not. :) Just to underscore that this is simply my curiosity before you reassign: in the particular case of *buildpath* AND toolchain, these should almost certainly be wishlist anyway because, as discussed, we "aren't testing buildpath". Best wishes, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org 💠 ⬊ ⬋ o
Re: Two questions about build-path reproducibility in Debian
Hi again, On Mon, 11 Mar 2024 at 18:24, James Addison wrote: > > Hi folks, > > On Wed, 6 Mar 2024 at 01:04, James Addison wrote: > > [ ... snip ...] > > > > The Debian bug severity descriptions[1] provide some more nuance, and that > > reassures me that wishlist should be appropriate for most of these bugs > > (although I'll inspect their contents before making any changes). > > Please find below a draft of the message I'll send to each affected bugreport. > > Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ > continue to test build-path variance, at least until we decide otherwise. > > --- BEGIN DRAFT --- > Because Debian builds packages from a fixed build path, customized build paths > are _not_ currently evaluated by the 'reprotest' utility in Salsa-CI, or > during > package builds on the Reproducible Builds team's package test infrastructure > for Debian[1]. > > This means that this package will pass current reproducibility tests; however > we still believe that source code and/or build steps embed the build path into > binary package output, making it more difficult that necessary for independent > consumers to confirm whether their local compilations produce identical binary > artifacts. > > As a result, this bugreport will remain open and be assigned the 'wishlist' > severity[2]. > > ... > > [1] - https://tests.reproducible-builds.org/debian/reproducible.html > > [2] - https://www.debian.org/Bugs/Developer#severities > --- END DRAFT --- Most of the remaining buildpath bugs have been updated to severity 'wishlist'. Approximately thirty are still set to other severity levels, and I plan to update those with the following adjusted messaging: --- BEGIN DRAFT --- Control: severity -1 wishlist Dear Maintainer, Currently, Debian's buildd and also the Reproducible Builds team's testing infrastructure[1] both use a fixed build path when building binary packages. This means that your package will pass current reproducibility tests; however we believe that varying the build path still produces undesirable changes in the binary package output, making it more difficult than necessary for independent consumers to check the integrity of those packages by rebuilding them themselves. As a result, this bugreport will remain open and be re-assigned the 'wishlist' severity[2]. You can use the 'reprotest' package build utility - either locally, or as provided in Debian's Salsa continuous integration pipelines - to assist uncovering reproducibility failures due build-path variance. For more information about build paths and how they can affect reproducibility, please refer to: https://reproducible-builds.org/docs/build-path/ ... [1] - https://tests.reproducible-builds.org/debian/reproducible.html [2] - https://www.debian.org/Bugs/Developer#severities --- END DRAFT --- Thanks for your feedback and suggestions, James
Re: Two questions about build-path reproducibility in Debian
> On Mar 12, 2024, at 11:45 AM, Vagrant Cascadian > wrote: > > On 2024-03-12, Holger Levsen wrote: >> On Mon, Mar 11, 2024 at 06:24:22PM +, James Addison via rb-general wrote: >>> Please find below a draft of the message I'll send to each affected >>> bugreport. >> >> looks good to me, thank you for doing this! >> >>> Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ >>> continue to test build-path variance, at least until we decide otherwise. >> >> this is in fact a bug and should be fixed with the next reprotest release. > > That is not a reprotest bug, but an infrastructure issue for the > debian-specific salsa-ci configuration. Reprotest is not a > debian-specific tool. > > Reprotest should continue to vary build paths by default; reprotest > historically and currently defaults to enabling all variations and > making an exception does not seem worth the opinionated change of > behavior. By design, reprotest is easy to configure which variations to > enable and disable as needed. This makes sense. If programs can build reproducibly while varying the build-path, reproducible builds are easier to create, and that's a good thing even if not strictly required. --- David A. Wheeler
Re: Two questions about build-path reproducibility in Debian
On 2024-03-12, Holger Levsen wrote: > On Mon, Mar 11, 2024 at 06:24:22PM +, James Addison via rb-general wrote: >> Please find below a draft of the message I'll send to each affected >> bugreport. > > looks good to me, thank you for doing this! > >> Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ >> continue to test build-path variance, at least until we decide otherwise. > > this is in fact a bug and should be fixed with the next reprotest release. That is not a reprotest bug, but an infrastructure issue for the debian-specific salsa-ci configuration. Reprotest is not a debian-specific tool. Reprotest should continue to vary build paths by default; reprotest historically and currently defaults to enabling all variations and making an exception does not seem worth the opinionated change of behavior. By design, reprotest is easy to configure which variations to enable and disable as needed. live well, vagrant signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
On Mon, Mar 11, 2024 at 06:24:22PM +, James Addison via rb-general wrote: > Please find below a draft of the message I'll send to each affected bugreport. looks good to me, thank you for doing this! > Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ > continue to test build-path variance, at least until we decide otherwise. this is in fact a bug and should be fixed with the next reprotest release. -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Historians have a word for Germans who joined the Nazi party, not because they hated Jews, but out of hope for restored patriotism, or a sense of economic anxiety, or a hope to preserve their religious values, or dislike of their opponents, or raw political opportunism, or convenience, or ignorance, or greed. That word is "Nazi". Nobody cares about their motives anymore. signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
Hi folks, On Wed, 6 Mar 2024 at 01:04, James Addison wrote: > [ ... snip ...] > > The Debian bug severity descriptions[1] provide some more nuance, and that > reassures me that wishlist should be appropriate for most of these bugs > (although I'll inspect their contents before making any changes). Please find below a draft of the message I'll send to each affected bugreport. Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ continue to test build-path variance, at least until we decide otherwise. --- BEGIN DRAFT --- Because Debian builds packages from a fixed build path, customized build paths are _not_ currently evaluated by the 'reprotest' utility in Salsa-CI, or during package builds on the Reproducible Builds team's package test infrastructure for Debian[1]. This means that this package will pass current reproducibility tests; however we still believe that source code and/or build steps embed the build path into binary package output, making it more difficult that necessary for independent consumers to confirm whether their local compilations produce identical binary artifacts. As a result, this bugreport will remain open and be assigned the 'wishlist' severity[2]. ... [1] - https://tests.reproducible-builds.org/debian/reproducible.html [2] - https://www.debian.org/Bugs/Developer#severities --- END DRAFT ---
Re: Two questions about build-path reproducibility in Debian
On Wed, 2024-03-06 at 14:57 +, Holger Levsen wrote: > On Tue, Mar 05, 2024 at 11:51:16PM +, Richard Purdie wrote: > > FWIW Yocto Project is a strong believer in build reproducibiity > > independent of build path and we've been quietly chipping away at > > those > > issues. > [...] > > OpenEmbedded-Core (around 1000 pieces of software) is 100% > > reproducible > > and we have the tests to prove it running daily, building in > > different > > build paths and comparing the output. > > that's awesome! > > btw, https://www.yoctoproject.org/reproducible-build-results/ (linked > from https://reproducible-builds.org/who/projects/#Yocto%20Project) > doesn't show any results? We made changes to the website and that stopped working. We had noticed and raised it with the website people but I've used your question to encourage them to get it fixed :). It now shows "36754 out of 36754 (100.00%) packages tested were reproducible" :) The tests were always running, the webpage was just broken. > > We're working on our wider layers too, e.g. meta-openembedded has > > another 2000+ pieces of software and less than 100 are not > > reproducible. > > nice. > > we had 35000 pieces of software in Debian of which ~2000 were not > reproducible with undeterministic build pathes. Now with build pathes > as part of the build environment it's less than half. Very nice too! :) FWIW we've made reproducibility an unconditional thing in our configuration and processes now so everyone sees the common errors and we're all using the same build command lines and so on. The idea behind getting meta-openembedded tested was to ensure (and demonstrate) our tools and tests could be used against arbitrary layers which should encourage people to test their own software. Lots of small steps which should help the overall ecosystem and goal. Cheers, Richard
Re: Two questions about build-path reproducibility in Debian
Thank you, Vagrant, for taking my concerns seriously. I realize you've been working on this much longer than I have, so I appreciate your perspective. On 3/6/24 10:55 AM, Vagrant Cascadian wrote: That means that we do not always support each other in all things, but we can support each other in most things, and that seems more important to me, at least in this case. That is the crux of the issue for me. Until now, I thought we were rather united in our goal, and we did support one another. Now our goals are fragmenting, and I think that makes us weaker. I thought our common goal, no matter our project affiliation, was to enable reproducible builds everywhere: not just Debian, not just Linux distributions, but also F-Droid, Flatpak, AppImage, Snap packages, different build farms -- anywhere open-source software is built. That requires us to work together to eliminate all sources of nondeterminism. My fear is that, by fragmenting our goals and losing the critical support of Debian, the rest of us may never get to that common goal. John
Re: Two questions about build-path reproducibility in Debian
On 2024-03-05, John Neffenger wrote: > On 3/5/24 2:11 PM, Vagrant Cascadian wrote: >>> I have no way to change these choices. >> >> Then clearly you have not been provided sufficient information, >> configuration, software, etc. in order to reproduce the build! > > Rather, I really can't change it or configure it any differently. > > Three builds: > > (1) A build on Launchpad submitted from their webpage uses this path: > >/build/openjfx/parts/jfx/build/ > > (2) A remote build on Launchpad submitted locally with this command: > >$ snapcraft remote-build > > uses this path: > > > /build/snapcraft-openjfx-64b793849f913c7228cd17db40a05187/parts/jfx/build/ > > (3) And a build run entirely local with this command: > >$ snapcraft > > uses this path: > >/root/parts/jfx/build/ > > What am I to do? Well, to state the obvious in this case, yes, you either need to fix your tooling to support some mechanism to provide a consistent build path, switch to different tooling that already supports a consistent build path, or fix this particular software to build reproducibly regardless of build paths. Each approach has different advantages and disadvantages. >> That was a fundamentally different issue about having builds not produce >> bit-for-bit identical results still meeting some sort of reproducible >> criterion, as opposed to this discussion is, as I see it, about >> normalizing the path in which the build is performed in order to get >> bit-for-bit identical results. > > I understand and recognize the difference you highlight between this > discussion and the previous one. Yet I would hesitate to call it > fundamental for the reasons below. > > The main reason people didn't want to relax any requirements back in > October 2022 is because then the pressure is off -- it removes our > leverage. If you lower our standards, we may never get the upstream > projects to the goal we really want: fully reproducible builds > independent of these random differences. I guess we differ on the "main reason" ... both of us having participated in that discussion. :) I agree that higher standards are in general better, but I am more concerned with the outcome than this particular issue regarding build paths. That said, I am very glad to hear there are projects actively working on fixing build path issues! I argued time and time again in favor of continuing to test build paths in Debian, largely because some commonly used debian tooling still varies build paths out of the box, I have filed dozens of build path related bugs and marked hundreds of packages affected by build paths, pushed for related changes in core packaging tooling in Debian (e.g. dpkg, debhelper) to fix build paths issues... but I also see the pragmatic reasons why it is tolerable, if not ideal, to just use consistent build paths. > It has sometimes taken me years(!) to get a single reproducible builds > pull request accepted. Likewise. Which... > If they find out they can be "reproducible" without some of these > bothersome changes, it just makes my job that much more difficult. ... is why some people might want to prioritize which issues they want to spend their time on. We always have to pick our battles, and allow others to pick their battles. That means that we do not always support each other in all things, but we can support each other in most things, and that seems more important to me, at least in this case. > I'll make the same argument I made over a year ago: > > Reproducible builds is about /blasting/ away all the useless, > meaningless differences: the timestamps of files created during the > build, the unsorted order of files in their directories, or the random > build paths used in a transient container. When the useless differences > are removed, the meaningful differences can be found. That is certainly one angle on it, and a good one! Yet, the Reproducible Builds Definition is more flexible. It gives room for individual projects to focus on their own priorities, while requiring sticking to bit-for-bit reproducibility. live well, vagrant signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
On 2024-03-05, John Gilmore wrote: > A quick note: > Vagrant Cascadian wrote: >> It would be pretty impractical, at least for Debian tests, to test >> without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from >> debian/changelog for quite a few years now. > > Making a small patch to the local dpkg to alter or remove the value of > SOURCE_DATE_EPOCH, then trying to reproduce all the packages from source > using that version of dpkg, would tell you which of them (newly) fail to > reproduce because they depend on SOURCE_DATE_EPOCH. Sure... which brings us to... >> Sounds like an interesting project for someone with significant spare >> time and computing resources to take on! > > It looks to me like the whole Ubuntu source code (that gets into the > standard release) fits in about 25 GB. The Debian 12.0.0 release > sources fit in 83GB (19 DVD images). Both of these are under 1% of a > 10TB disk drive that runs about $200. A recent Ryzen mini-desktop, > with a 0.5TB SSD that could cache it all, costs about $300. Is this > significant computing resources? For another $40 we could add a better > heat sink and a USB fan. How many days would recompiling a whole > release take on this $540 worth of hardware? You also notably left out ram requirements, which is almost more important than CPU, from what I've seen! You were not talking about a single pass through the archive, you asked for a combinatorially explosive comparison (e.g. with and without build paths, with and without SOURCE_DATE_EPOCH, with and without locale differences, with and without username variations, etc.) ... and for it to continue to be useful, you'd have to keep doing it... indefinitely. Debian currently tests over 25 variations (most of which have actually resulted in differences in the wild): https://tests.reproducible-builds.org/debian/index_variations.html To systematically identify these "simply" through building each possible combination for any significant set of software... is a much larger task. Obviously, you could narrow it to only the set of variations you want to research, or for a limited package set. At least for Debian, with what I would guess is significantly more computing power than you've described, usually did no better than about 30 days from the oldest build, meaning some packages were always behind. We also blacklist some packges that just take too much ram, disk or time, though that is considerably less that 1% of ~35k packages. More importantly, that is with only two builds per package, not testing all 625 permutations of 25 interacting variations per package. > (I agree that the "spare" time to set it up and configure the build > would be the hard part. This is why I advocate for writing and > releasing, directly in the source release DVDs, the tools that would > automate the recompilation and binary comparison. The end user should > be able to boot the matching binary release DVD, download or copy in the > source DVD images, and type "reproduce-release".) Automation can help significantly, although at some point you need to write all that automation, write the code that processes the results meaningfully, and verify that it is working correctly... and continue to verify it as new package versions come in, and so on. In short, easier said than done? live well, vagrant signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
On Tue, Mar 05, 2024 at 11:51:16PM +, Richard Purdie wrote: > FWIW Yocto Project is a strong believer in build reproducibiity > independent of build path and we've been quietly chipping away at those > issues. [...] > OpenEmbedded-Core (around 1000 pieces of software) is 100% reproducible > and we have the tests to prove it running daily, building in different > build paths and comparing the output. that's awesome! btw, https://www.yoctoproject.org/reproducible-build-results/ (linked from https://reproducible-builds.org/who/projects/#Yocto%20Project) doesn't show any results? > We're working on our wider layers too, e.g. meta-openembedded has > another 2000+ pieces of software and less than 100 are not > reproducible. nice. we had 35000 pieces of software in Debian of which ~2000 were not reproducible with undeterministic build pathes. Now with build pathes as part of the build environment it's less than half. > So even if debian doesn't do this, there is interest elsewhere and I > believe good progress is being made. nice! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Lebend in einer privilegierten Region und als Angehöriger einer Generation, der es wahrscheinlich so gut geht wie keiner zuvor und danach, die in nicht dagewesenem Maße die Ressourcen unserer Erde geplündert hat. signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
Hi Vagrant, Narrowing in on (or perhaps nitpicking) a detail: On Mon, 4 Mar 2024 at 20:41, Vagrant Cascadian wrote: > > On 2024-03-04, John Gilmore wrote: > > Vagrant Cascadian wrote: > >> > > to make it easier to debug other issues, although deprioritizing them > >> > > makes sense, given buildd.debian.org now normalizes them. > > > > James Addison via rb-general > > wrote: > >> Ok, thank you both. A number of these bugs are currently recorded at > >> severity > >> level 'normal'; unless told not to, I'll spend some time to double-check > >> their > >> details and - assuming all looks OK - will bulk downgrade them to > >> 'wishlist' > >> severity a week or so from now. > > Well, I think we should change it to "minor" rather than "wishlist" > severity, but that may be splitting hairs; I do not find a huge amount > of difference between debian bug severities... they are pretty much > either critical/serious/grave and thus must be fixed, or > normal/minor/wishlist and fixed when someone feels like it. The Debian bug severity descriptions[1] provide some more nuance, and that reassures me that wishlist should be appropriate for most of these bugs (although I'll inspect their contents before making any changes). Regards, James [1] - https://www.debian.org/Bugs/Developer#severities
Re: Two questions about build-path reproducibility in Debian
Thanks, everyone, for your contributions to this discussion. A quick note: Vagrant Cascadian wrote: > It would be pretty impractical, at least for Debian tests, to test > without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from > debian/changelog for quite a few years now. Making a small patch to the local dpkg to alter or remove the value of SOURCE_DATE_EPOCH, then trying to reproduce all the packages from source using that version of dpkg, would tell you which of them (newly) fail to reproduce because they depend on SOURCE_DATE_EPOCH. > Sounds like an interesting project for someone with significant spare > time and computing resources to take on! It looks to me like the whole Ubuntu source code (that gets into the standard release) fits in about 25 GB. The Debian 12.0.0 release sources fit in 83GB (19 DVD images). Both of these are under 1% of a 10TB disk drive that runs about $200. A recent Ryzen mini-desktop, with a 0.5TB SSD that could cache it all, costs about $300. Is this significant computing resources? For another $40 we could add a better heat sink and a USB fan. How many days would recompiling a whole release take on this $540 worth of hardware? (I agree that the "spare" time to set it up and configure the build would be the hard part. This is why I advocate for writing and releasing, directly in the source release DVDs, the tools that would automate the recompilation and binary comparison. The end user should be able to boot the matching binary release DVD, download or copy in the source DVD images, and type "reproduce-release".) John
Re: Two questions about build-path reproducibility in Debian
On Tue, 2024-03-05 at 08:08 -0800, John Gilmore wrote: > > > But today, if you're building an executable for others, it's common to > > > build using a > > > container/chroot or similar that makes it easy to implement "must compile > > > with these paths", > > > while *fixing* this is often a lot of work. > > I know that my opinion is not popular, but let me try again before we lay > this decision to rest. > > In avoiding fixing directory dependencies, you can move the complexity > around, but in doing so you didn't reduce the complexity. FWIW Yocto Project is a strong believer in build reproducibiity independent of build path and we've been quietly chipping away at those issues. There are issues we resolve by using carefully selected compiler options or environment variables like SOURCE_DATE_EPOCH but also things we do highlight to upstreams and ask if they'd mind improving them. In general once they're aware of the issues, they do try and help. We have identified several regressions in rust in that regard in the last few versions for example and also helped test fixes. OpenEmbedded-Core (around 1000 pieces of software) is 100% reproducible and we have the tests to prove it running daily, building in different build paths and comparing the output. We're working on our wider layers too, e.g. meta-openembedded has another 2000+ pieces of software and less than 100 are not reproducible. So even if debian doesn't do this, there is interest elsewhere and I believe good progress is being made. Cheers, Richard
Re: Two questions about build-path reproducibility in Debian
On 3/5/24 2:11 PM, Vagrant Cascadian wrote: I have no way to change these choices. Then clearly you have not been provided sufficient information, configuration, software, etc. in order to reproduce the build! Rather, I really can't change it or configure it any differently. Three builds: (1) A build on Launchpad submitted from their webpage uses this path: /build/openjfx/parts/jfx/build/ (2) A remote build on Launchpad submitted locally with this command: $ snapcraft remote-build uses this path: /build/snapcraft-openjfx-64b793849f913c7228cd17db40a05187/parts/jfx/build/ (3) And a build run entirely local with this command: $ snapcraft uses this path: /root/parts/jfx/build/ What am I to do? That was a fundamentally different issue about having builds not produce bit-for-bit identical results still meeting some sort of reproducible criterion, as opposed to this discussion is, as I see it, about normalizing the path in which the build is performed in order to get bit-for-bit identical results. I understand and recognize the difference you highlight between this discussion and the previous one. Yet I would hesitate to call it fundamental for the reasons below. The main reason people didn't want to relax any requirements back in October 2022 is because then the pressure is off -- it removes our leverage. If you lower our standards, we may never get the upstream projects to the goal we really want: fully reproducible builds independent of these random differences. It has sometimes taken me years(!) to get a single reproducible builds pull request accepted. If they find out they can be "reproducible" without some of these bothersome changes, it just makes my job that much more difficult. I'll make the same argument I made over a year ago: Reproducible builds is about /blasting/ away all the useless, meaningless differences: the timestamps of files created during the build, the unsorted order of files in their directories, or the random build paths used in a transient container. When the useless differences are removed, the meaningful differences can be found. John
Re: Two questions about build-path reproducibility in Debian
On 2024-03-05, John Gilmore wrote: ... it makes reproducibilty from around 80-85% of all packages to >95%, IOW with this shortcut we can have meaningful reproducibility *many years* sooner, than without. ... > I'd rather that we knew and documented that 57% of > packages are absolutely reproducible, 23% require SOURCE_DATE_EPOCH, and > 12% still require a standardized source code directory, than to claim > all 95% are "meaningfully reproducible" today. Sounds like an interesting project for someone with significant spare time and computing resources to take on! I take "meaningfully reproducible" to mean it is documented how to produce bit-for-bit identical results. In some cases, this requires metadata (e.g. Debian .buildinfo file) that you need to reproduce the build environment, and in some cases, this means you use the standard build tool for the distribution (e.g. nix or guix). Those numbers Holger mentioned were because we historically had a compromise where our tests on tests.reproducible-builds.org Debian testing did not vary the build path and Debian unstable did vary the build path, and the difference mostly held at about 10-15% over the years. In Debian, the build path is usually included in the .buildinfo file (at least for builds produced by Debian), which describes the packages and dependencies and various things about the build environment necessary to reproduce the build. It would be pretty impractical, at least for Debian tests, to test without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from debian/changelog for quite a few years now. Unless you want to test reproducibility of antique Debian releases... live well, vagrant signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
On 2024-03-05, John Neffenger wrote: > On 3/5/24 8:08 AM, John Gilmore wrote: >> Our instructions for reproducing any package would have to identify what >> container/chroot/namespace/whatever the end-user must set up to be able >> to successfully reproduce a package. The build instructions always have to identify what defines the build environment and exactly what that includes may be different from project to project. > And even then, it won't always work. > > I need to verify the JavaFX builds done by Launchpad, for example, where > its LXD container uses a build path as follows: > > /build/openjfx/parts/jfx/build/ > > When I run the same build locally using the same command and a local LXD > container, it uses a build path as follows and fails to be reproducible: > > /root/parts/jfx/build/ > > I have no way to change these choices. Then clearly you have not been provided sufficient information, configuration, software, etc. in order to reproduce the build! > I intend to fix this reproducibility bug, and I shouldn't get away > with not fixing it! > JDK-8307082: Build path is recorded in JavaFX Controls module > https://bugs.openjdk.org/browse/JDK-8307082 Great, please do, we will all be better off for it having been fixed! >> If we move the goal posts in order to claim victory, who are we fooling >> but ourselves? There are no moving goalposts, as the goal has always been to be able to be able to independently verify bit-for-bit results. Maybe we take the bike path to get there, maybe we go by train or hovercraft or jetpack. Some ways might be easier or more expensive or have other consequences or downsides, but as long as the destination is bit-for-bit reproducible... so be it, if it does the job. Normalized build environments have been a technique to achieve reproducible builds, even going back to the early work in bitcoin and tor over a decade ago, and is used by various projects today to achieve bit-for-bit identical reproducible builds. While making software immune to various forms of non-determinism in the build environment is certainly preferred, it is not the one and only true way to achieve bit-for-bit identical results that are able to be independently verified. https://reproducible-builds.org/docs/definition/ "A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts. The relevant attributes of the build environment, the build instructions and the source code as well as the expected reproducible artifacts are defined by the authors or distributors. The artifacts of a build are the parts of the build results that are the desired primary output." The authors or distributors may choose to include build paths or various other things (e.g. LANG=C, LC_ALL=C.UTF-8) as part of their build instructions. So yes, the fewer rube-goldbergian contraptions you need to set up in order to produce the build environment, the better, surely! It is technically possible to get an exact matching clock across two builds for matching timestamps, but it is non-trivial and I would say unreasonable (but very interesting academic work!). Given current technologies, it would be unreasonable to expect setting the quark spin state of particles involved the build process... some things are relatively easy to normalize. Builds that build independent of the build path are better for reproducible builds and even other reasons (e.g. storing the build path takes a few extra "useless" bits). With most software, it is possible to get it to build idependent of the build path. With some software, it is unfortunately more difficult. > I agree completely. In fact, almost all of us agreed completely on this > issue in October 2022: > > Give guidance on reproducible builds #1865 > https://github.com/coreinfrastructure/best-practices-badge/issues/1865 > > Why is this coming up again as if we've forgotten all those arguments > against it? That was a fundamentally different issue about having builds not produce bit-for-bit identical results still meeting some sort of reproducible criterion, as opposed to this discussion is, as I see it, about normalizing the path in which the build is performed in order to get bit-for-bit identical results. live well, vagrant signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
On 3/5/24 8:08 AM, John Gilmore wrote: Our instructions for reproducing any package would have to identify what container/chroot/namespace/whatever the end-user must set up to be able to successfully reproduce a package. And even then, it won't always work. I need to verify the JavaFX builds done by Launchpad, for example, where its LXD container uses a build path as follows: /build/openjfx/parts/jfx/build/ When I run the same build locally using the same command and a local LXD container, it uses a build path as follows and fails to be reproducible: /root/parts/jfx/build/ I have no way to change these choices. I intend to fix this reproducibility bug, and I shouldn't get away with not fixing it! JDK-8307082: Build path is recorded in JavaFX Controls module https://bugs.openjdk.org/browse/JDK-8307082 If we move the goal posts in order to claim victory, who are we fooling but ourselves? I agree completely. In fact, almost all of us agreed completely on this issue in October 2022: Give guidance on reproducible builds #1865 https://github.com/coreinfrastructure/best-practices-badge/issues/1865 Why is this coming up again as if we've forgotten all those arguments against it? John
Re: Two questions about build-path reproducibility in Debian
>> But today, if you're building an executable for others, it's common to build >> using a >> container/chroot or similar that makes it easy to implement "must compile >> with these paths", >> while *fixing* this is often a lot of work. I know that my opinion is not popular, but let me try again before we lay this decision to rest. In avoiding fixing directory dependencies, you can move the complexity around, but in doing so you didn't reduce the complexity. Our instructions for reproducing any package would have to identify what container/chroot/namespace/whatever the end-user must set up to be able to successfully reproduce a package. Will these be the same for every package, for every distro, and for every other environment in which we want to inspire reproducibility? Do we need to add those constraints to the Linux Foundation's Filesystem Hierarchy Standard? Do we need to add them to the buildinfo files? Ideally the tools that ordinary people traditionally use to reproduce one, such as dpkg-buildpackage or rpmbuild, will have been improved to do the container/chroot setup automatically. Otherwise, naive users will have to figure out what a container is or why it is necessary for them to grok this obscure environmental thing in order to tell if their binary package was tampered with or not. Will they always have to build software as root, because chroot doesn't and can't work for ordinary users? If we punt this, there will be an ongoing flow of "my package doesn't build to the same binary, somebody must be 0wning me" emails from people who do the obvious thing like type "make" and "cmp". Do we want successful reproducibility to depend on setting up servers and virtual machines and web-servers and databases and build farms and CI-queues and such? Yes, to reproduce a whole distro, reproducibility has to WORK there, but does it have to DEPEND on that complex infrastructure? I'm an old Unix guy and so are millions of end-users and sysadmins. Containers are a recent Linux thing. Namespaces ditto. I still have never found a use for containers; I tried using Docker for something and was bemused to discover that it could calculate all kinds of stuff, but none of the output of the calculation could come back into my ordinary Linux filesystem (without some kind of obscure per-invocation JCL-like configuration setup), so I stopped trying to use it. Another time, I tried booting an on-disk, installed copy of Ubuntu inside a virtual machine, so I could keep running an older service that's hard to port forward, while migrating the rest of my machine to a newer Ubuntu release. VM/360 could do that decades ago, but I discovered that that use-case is not well supported in the Linux vm tools and documentation, so I gave up on that too. There are more things in heaven and earth, Horatius, than spending all of your time doing sysadmin. These newfangled tools are just not as well rounded as the stuff that's been well understood in Unix since the 1970s or 1980s, like "directories". If only seventeen experts in the world can figure out if a package has been tampered with, we will have labored mightily but not done much to improve computer security. Also recall what pains the full-source bootstrap people are having to go through after some imho foolish decisions were made about depending on modern C++ features inside core tools like gcc and gdb. Reproducible builds should make the underlying software LESS dependent on the particular configuration of the build environment; that's kind of the point. >>> ... it makes reproducibilty from around 80-85% of all >>> packages to >95%, IOW with this shortcut we can have meaningful >>> reproducibility >>> *many years* sooner, than without. If we move the goal posts in order to claim victory, who are we fooling but ourselves? I'd rather that we knew and documented that 57% of packages are absolutely reproducible, 23% require SOURCE_DATE_EPOCH, and 12% still require a standardized source code directory, than to claim all 95% are "meaningfully reproducible" today. John
Re: Two questions about build-path reproducibility in Debian
On 3/4/24 22:25, David A. Wheeler via rb-general wrote: On Mar 4, 2024, at 3:37 PM, Holger Levsen wrote: On Mon, Mar 04, 2024 at 11:52:07AM -0800, John Gilmore wrote: Why would these become "wishlist" bugs as opposed to actual reproducibility bugs that deserve fixing, just because one server at Debian no longer invokes this bug because it always uses the same build directory? because it's "not one server at Debian" but what many ecosystems do: build in an deterministic path (eg /$pkg/$version or whatever) or record the path as part of the build environment, to have it deterministic as well. in the distant past, before namespacing become popular, using a random path was a solution to allow parallel builds of the same software & version. and yes, this is a shortcut and a tradeoff, similar to demanding to build in a certain locale. also it makes reproducibilty from around 80-85% of all packages to >95%, IOW with this shortcut we can have meaningful reproducibility *many years* sooner, than without. and I'd really rather like to see Debian 100% reproducible in 2030, than in 2038. and some subsets today, or much sooner. I agree with Holger (and Vagrant). It'd be *nice* if a build was reproducible regardless of the directory used to build it. But today, if you're building an executable for others, it's common to build using a container/chroot or similar that makes it easy to implement "must compile with these paths", while *fixing* this is often a lot of work. I suggest focusing on ensuring everyone knows what the executable files contain, first. if people can add more flexibility to their build process, all the better, but that added flexibility comes at a cost of time and effort that is NOT as important. --- David A. Wheeler Yet another +1 "here, here!" to this. Flexibility is desirable. Determinism even without maximal flexibility should still get the main thrust, and it is _not_ sufficiently solved yet in many situations and many pieces of software.
Re: Two questions about build-path reproducibility in Debian
> On Mar 4, 2024, at 3:37 PM, Holger Levsen wrote: > > On Mon, Mar 04, 2024 at 11:52:07AM -0800, John Gilmore wrote: >> Why would these become "wishlist" bugs as opposed to actual reproducibility >> bugs >> that deserve fixing, just because one server at Debian no longer invokes this >> bug because it always uses the same build directory? > > because it's "not one server at Debian" but what many ecosystems do: build in > an > deterministic path (eg /$pkg/$version or whatever) or record the path as part > of the build environment, to have it deterministic as well. > > in the distant past, before namespacing become popular, using a random path > was a solution to allow parallel builds of the same software & version. > > and yes, this is a shortcut and a tradeoff, similar to demanding to build > in a certain locale. also it makes reproducibilty from around 80-85% of all > packages to >95%, IOW with this shortcut we can have meaningful > reproducibility > *many years* sooner, than without. > > and I'd really rather like to see Debian 100% reproducible in 2030, than in > 2038. > and some subsets today, or much sooner. I agree with Holger (and Vagrant). It'd be *nice* if a build was reproducible regardless of the directory used to build it. But today, if you're building an executable for others, it's common to build using a container/chroot or similar that makes it easy to implement "must compile with these paths", while *fixing* this is often a lot of work. I suggest focusing on ensuring everyone knows what the executable files contain, first. if people can add more flexibility to their build process, all the better, but that added flexibility comes at a cost of time and effort that is NOT as important. --- David A. Wheeler
Re: Two questions about build-path reproducibility in Debian
On 2024-03-04, John Gilmore wrote: > Vagrant Cascadian wrote: >> > > to make it easier to debug other issues, although deprioritizing them >> > > makes sense, given buildd.debian.org now normalizes them. > > James Addison via rb-general wrote: >> Ok, thank you both. A number of these bugs are currently recorded at >> severity >> level 'normal'; unless told not to, I'll spend some time to double-check >> their >> details and - assuming all looks OK - will bulk downgrade them to 'wishlist' >> severity a week or so from now. Well, I think we should change it to "minor" rather than "wishlist" severity, but that may be splitting hairs; I do not find a huge amount of difference between debian bug severities... they are pretty much either critical/serious/grave and thus must be fixed, or normal/minor/wishlist and fixed when someone feels like it. > I may be confused about this. These bug reports are that a package cannot > be reproducibly built because its output binary depends on the directory in > which > it was built? > > Why would these become "wishlist" bugs as opposed to actual reproducibility > bugs > that deserve fixing, just because one server at Debian no longer invokes this > bug because it always uses the same build directory? > > If an end user can't download a source package (into any directory on > any machine), and build it into the same exact binary as the one that Debian > ships, this is not a "wishlist" idea for some future enhancement. This > is a real issue that prevents the code from being reproducible. I agree it is a real issue, but admit it is fairly easy to work around, given most package building tools use chroots or containers or similar, it seems acceptible to treat build paths as a lower priority. Compare that to timestamps, which are non-trivial to force to use the exact same clock moving at the exact same rate, I would say build path normalization is quite tolerable, if not ideal. You cannot just build on "any machine", the machine needs to have a sufficiently similar build environment (e.g. exactly matching compiler versions, same architecture, etc.) and weather the build path is part of that or not is simply a decision to make. Several (many?) other distros normalize the build path as part of their standard build tooling; Debian is arguably a latecomer to that practice. I have definitely argued in favor of addressing build path issues, and encourage people to fix them, and have personally spent more than a small amount of time working on it, and we have made huge progress on fixing (tens of?) thousands of them. There are only so many hours in the day and so many people actively working on fixing things... there may be bigger fires to put out at the moment. live well, vagrant signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
On Mon, Mar 04, 2024 at 11:52:07AM -0800, John Gilmore wrote: > Why would these become "wishlist" bugs as opposed to actual reproducibility > bugs > that deserve fixing, just because one server at Debian no longer invokes this > bug because it always uses the same build directory? because it's "not one server at Debian" but what many ecosystems do: build in an deterministic path (eg /$pkg/$version or whatever) or record the path as part of the build environment, to have it deterministic as well. in the distant past, before namespacing become popular, using a random path was a solution to allow parallel builds of the same software & version. and yes, this is a shortcut and a tradeoff, similar to demanding to build in a certain locale. also it makes reproducibilty from around 80-85% of all packages to >95%, IOW with this shortcut we can have meaningful reproducibility *many years* sooner, than without. and I'd really rather like to see Debian 100% reproducible in 2030, than in 2038. and some subsets today, or much sooner. -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Homophobia is a sin against god. signature.asc Description: PGP signature
Re: Two questions about build-path reproducibility in Debian
Vagrant Cascadian wrote: > > > to make it easier to debug other issues, although deprioritizing them > > > makes sense, given buildd.debian.org now normalizes them. James Addison via rb-general wrote: > Ok, thank you both. A number of these bugs are currently recorded at severity > level 'normal'; unless told not to, I'll spend some time to double-check their > details and - assuming all looks OK - will bulk downgrade them to 'wishlist' > severity a week or so from now. I may be confused about this. These bug reports are that a package cannot be reproducibly built because its output binary depends on the directory in which it was built? Why would these become "wishlist" bugs as opposed to actual reproducibility bugs that deserve fixing, just because one server at Debian no longer invokes this bug because it always uses the same build directory? If an end user can't download a source package (into any directory on any machine), and build it into the same exact binary as the one that Debian ships, this is not a "wishlist" idea for some future enhancement. This is a real issue that prevents the code from being reproducible. How am I confused? John
Re: Two questions about build-path reproducibility in Debian
On Wed, 28 Feb 2024 at 12:06, Chris Lamb wrote: > > Vagrant Cascadian wrote: > > > There are real-world build path issues, and while it is possible to work > > around them in various ways, I think they are still issues worth fixing > > to make it easier to debug other issues, although deprioritizing them > > makes sense, given buildd.debian.org now normalizes them. > > +1. > > And for this reason, I think we should keep the buildpath-related > bugs as well. They should all be 'wishlist' priority anyway, and I > wouldn't like to bet my hat that the usertag metadata is accurate and > comprehensive enough to blindly close them in the first place. (We > only really used the usertags to do some rough-and-ready statistics > on broad issue categories.) Ok, thank you both. A number of these bugs are currently recorded at severity level 'normal'; unless told not to, I'll spend some time to double-check their details and - assuming all looks OK - will bulk downgrade them to 'wishlist' severity a week or so from now.
Re: Two questions about build-path reproducibility in Debian
Vagrant Cascadian wrote: > There are real-world build path issues, and while it is possible to work > around them in various ways, I think they are still issues worth fixing > to make it easier to debug other issues, although deprioritizing them > makes sense, given buildd.debian.org now normalizes them. +1. And for this reason, I think we should keep the buildpath-related bugs as well. They should all be 'wishlist' priority anyway, and I wouldn't like to bet my hat that the usertag metadata is accurate and comprehensive enough to blindly close them in the first place. (We only really used the usertags to do some rough-and-ready statistics on broad issue categories.) Best wishes, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org 💠 ⬊ ⬋ o
Re: Two questions about build-path reproducibility in Debian
On 2024-02-15, James Addison via rb-general wrote: > A quick recap: in July 2023, Debian's package build infrastructure > (buildd) intentionally began using a fixed directory path during > package builds (bug #1034424). Previously, some string randomness > existed within each source build directory path. > > I've two questions related to buildpaths - one relevant to the > Salsa-CI team, and the other a RB-team housekeeping question: > > 1. [Salsa] Recently Debian's CI pipeline was reconfigured[1] to > enable more variance in builds. However: I think that change also > (inadvertently?) enabled buildpath variation. Is that useful and/or > aligned with Debian package migration incentives[2] -- or should we > disable that buildpath variance? I think it might be worth disabling build path variations by default in salsa-ci, although making it possible for people to override. > 2. [RB] Housekeeping: we use Debian's bugtracker to record packages > with buildpath-related build problems[3]. Do we want to keep those > bugs open, or should we close them? I think the bugs should remain open, but perhaps downgraded to minor or wishlist? While buildd.debian.org does now use a predictible path, sbuild does not by default and requires slightly tricky manual intervention to get the right path; many people still may perform local builds in their home directory; I am not sure if pbuilder now defaults to matching buildd.debian.org, though it is possible to specify the build path (as seen on tests.reproducible-builds.org!); reprotest still uses randomized build paths, although a WIP branch exists: https://salsa.debian.org/reproducible-builds/reprotest/-/merge_requests/22 There are real-world build path issues, and while it is possible to work around them in various ways, I think they are still issues worth fixing to make it easier to debug other issues, although deprioritizing them makes sense, given buildd.debian.org now normalizes them. live well, vagrant signature.asc Description: PGP signature
Two questions about build-path reproducibility in Debian
Hi folks, A quick recap: in July 2023, Debian's package build infrastructure (buildd) intentionally began using a fixed directory path during package builds (bug #1034424). Previously, some string randomness existed within each source build directory path. I've two questions related to buildpaths - one relevant to the Salsa-CI team, and the other a RB-team housekeeping question: 1. [Salsa] Recently Debian's CI pipeline was reconfigured[1] to enable more variance in builds. However: I think that change also (inadvertently?) enabled buildpath variation. Is that useful and/or aligned with Debian package migration incentives[2] -- or should we disable that buildpath variance? 2. [RB] Housekeeping: we use Debian's bugtracker to record packages with buildpath-related build problems[3]. Do we want to keep those bugs open, or should we close them? Thanks, James [1] - https://salsa.debian.org/salsa-ci-team/pipeline/-/merge_requests/468 [2] - "Reproducibility migration policy" @ https://lists.debian.org/debian-devel-announce/2023/12/msg3.html [3] - https://udd.debian.org/bugs/?release=any&pending=ign&merged=ign&done=ign&fnewerval=7&flastmodval=7&fusertag=only&fusertagtag=buildpath&fusertaguser=reproducible-builds%40lists.alioth.debian.org&reproducible=1&sortby=id&sorto=asc&format=html#results