Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
The discussion has moved to https://github.com/rpm-software-management/rpm/discussions/2934 and https://github.com/rpm-software-management/rpm/pull/2944, seems to me we can close this somewhat controversial draft. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-2003597396 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
Closed #2880. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#event-12149995726 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
> I don't think bumping the changelog for rebuilds is actually important, but I > do think that this is still the wrong way to solve it, because we're > presuming that _a rebuild is important_. When rebuilds happen every day for > whatever reason due to dependency churn, they are no longer important. We're now straying off the original subject… but I think that this is an interesting issue: the problem is that — especially if the rebuilds are frequent — most rebuilds have almost no effect, but **some** rebuilds are very important. So both extreme solutions are not good for the user: if we describe a full history of rebuilds, then the user is drowned in a flood of boring details, but if we make all rebuilds opaque, the user has no way to know that a certain rebuild fixes a very important bug. For example, let's say that the compiler was generating crashing code, and then we rebuild all 800 packages which crash randomly at runtime. For the user, just knowing that foo-11-2 was updated to foo-11-3 is not good enough, they really want an annotation that the update fixes the crash, ideally with a link to a bug number, and metadata in the update that it's a high-priority bugfix that requires a full reboot. When we do the rebuilds manually, then we get better metadata. When the rebuilds happen automatically, I'm not aware of any system which would inject information about the reasons for the rebuild. But maybe there should be some mechanism that allows this to happen. Also, maybe reproducible builds and smart analysis of differences between binary builds can help: e.g. if an rebuild has no effect on the payload bits, except for some version string differences, then the update is "boring" and doesn't need to be described. (And probably doesn't even need to be pushed to users…) Anyway, this is all future stuff. I think we have more pressing problems to solve ;) -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1980300643 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
I don't think bumping the changelog for rebuilds is actually important, but I do think that this is still the wrong way to solve it, because we're presuming that _a rebuild is important_. When rebuilds happen every day for whatever reason due to dependency churn, they are no longer important. But that doesn't change anything about handling `$SOURCE_DATE_EPOCH`, because the presumption is that the date doesn't matter because you're fixing it to a fake time anyway. So this is solving the wrong problem anyway. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1966407685 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
Thanks @keszybz for the detailed and thoughtful comments, that's very much my sentiments too: sanity dictates there can be only one buildtime for any given build. Anything else is gaming the system - like I said in my first comment, trying to eat and keep the cake. A test-build is a straightforward, reliable and easy to understand way to deal with avoiding unnecessary updates. It's not *free* of course but nothing is. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1966195690 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
Thank you for the detail about pyc, that is important. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1966164234 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
I read this proposal when the ticket was initially created, and I didn't find it convincing, and now, a few weeks later, I still don't. This is a very complex solution to a very specific problem. It is very narrowly tailored to the specific details of the build system and package delivery system that you have, and makes very strong assumptions about which mtime should be used where. Right now we have the concept of the $SOURCE_DATE_EPOCH, and it is a *single* timestamp that says "this is when the build happened", and both the packager and the user can understand it. This means that the same value is used in many places: embedded build metadata, headers and footers in build docs, the changelog, file mtimes or delivered files, etc. With the patch, we have two timestamps, with ad-hoc rules which to use where. Without considering further technical details, I think this increased complexity would never be worth it. At a more technical level, I think in particular this would break Python .pyc file caching: Python uses cache invalidation where it compares the mtime on the .py file with the timestamp-of-the-original-py-file embedded in the .pyc file. For packages, the .pyc file is created during build and stored in the package. IIUC, you want to use $OLD_SOURCE_DATE_EPOCH for "build scripts", i.e. .pyc files would embed $OLD_SOURCE_DATE_EPOCH, but actually deliver files with mtimes set to (new) $SOURCE_DATE_EPOCH. This means that Python would consider the .pyc file stale and always try to rebuild it. (If, OTOH, you use the mtimes with $OLD_SOURCE_DATE_EPOCH in delivered files, then the user loses an important property that the mtime of compiled files corresponds to the package build.) > OpenSUSE does not change the changelog for the rebuild I think this is a fundamental error. You're doing a rebuild because something *important* changed. When the user gets this newly-built package, they really should know that this is an updated version with the *important* thing. So if there's a %changelog at all, stuff like version bumps and rebuilds for important environment changes are the foremost things to put there. But if the changelog changes, the package is not the same, and in general, I think it doesn't make sense to twist everything to try to avoid updating the package. I would suggest a different approach: if you have the environment change, do a *test rebuild* with no changelog change and the original $SOURCE_DATE_EPOCH. If the result is bit-for-bit identical, record this in a log somewhere and throw away the build, since the users don't need to update. If the result shows any changes, record a changelog entry and rebuild with the new $SOURCE_DATE_EPOCH and push that out to users. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1966099057 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
Since I was tagged in here and for some reason people think I don't care about reproducibility, let me be clear, I do care about it. However, neither Fedora nor openSUSE suffer from the problems Debian has that necessitated reproducible builds, and the nature of the RPM format vs the Debian format means that we do not have the same problems they do with build data influencing the payload reproducibility. Fedora has been so far ahead of Debian on this and the Koji build system provides guarantees (at the consequence of trade-offs like increased disk usage over time) that neither Debian's system nor OBS provide that there is less urgency around the issue. In general, rebuilds should not mutate or influence how reproducible builds behave. I'm confused by the problem you're saying you have: build-compare shouldn't have an issue with SOURCE_DATE_EPOCH being clamped to the changelog, since that's unchanging. The only issue I know if is that if you clamp the buildtime and don't change the Release, you wind up in a situation where it becomes difficult to sort for the newer package. Since OBS changes the Release for every rebuild, this isn't strictly an issue, but openSUSE should not be clamping the buildtime regardless. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1956556571 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
> Steps to reproduce: > >package gcc: version 1.1 >package hello: BuildDepends gcc; changelog epoch 1; but its build script > includes SOURCE_DATE_EPOCH in the binary >build package hello >update package gcc to version 1.2 >[...] I get that this is the real-world scenario, but we'd need something much, MUCH simpler for the test-suite. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1926944167 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
> > If a build e.g. embeds SOURCE_DATE_EPOCH in the output, then the output > > changes every time such a rebuild happens, which can be very often. > > It only changes if you change SOURCE_DATE_EPOCH, and if you took the > SOURCE_DATE_EPOCH from the changelog then it only changes if you change the > changelog, and at that point its no longer the same. By my logic anyhow. It's > really hard to constructively comment on what you don't understand. If you do not change SOURCE_DATE_EPOCH, but change some inputs that may change outputs then tools (like rsync without --checksum) fail that depend on the file modification time stamp always increasing when the content changes. If you do change SOURCE_DATE_EPOCH and the upstream build script embeds it into the output, then the output also always changes. So if Fedora would do the manual changelog bump anytime any build depends change and if they would set SOURCE_DATE_EPOCH and thus the build time from the changelog then it would have the same problem. They could use the OLD_SOURCE_DATE_EPOCH mechanism in this PR to fix it. But the information to set it is currently not in the changes file, as the association of version and revision to dates is not machine readable in the changes file. However it is available in Koji or the package source git repo (last commit date that did change something that is not a changes file) and in the revision if one knows the details of how those get used in Fedora. OpenSUSE does not change the changelog for the rebuild, but that doesn't make any difference here, as we set SOURCE_DATE_EPOCH (also from the changes file, just done outside of rpm) and OLD_SOURCE_DATE_EPOCH appropriately for rpm to use. > A test-case outside any complicated build-system machineries would perhaps > help understand this on a more concrete level. Steps to reproduce: * package gcc: version 1.1 * package hello: BuildDepends gcc; changelog epoch 1; but its build script includes SOURCE_DATE_EPOCH in the binary * build package hello * update package gcc to version 1.2 * rebuild package hello with new gcc: this is bad as the mtime of the files in the package didn't change, but their content did change, on a system where such rpms get installed rsync will create inconsistent copies of the filesystem. * bump package hello changelog to epoch 2 * build package hello. mtimes changed, good. content of files is changed, this is good, as new gcc creates a different build result. * update package gcc to version 1.2.1 which only changes documentation. * bump package hello changelog to epoch 3 * build package hello. content changed, only because SOURCE_DATE_EPOCH changed, gcc would otherwise have produced the same output. this is bad, as it would cause people to unnecessarily download and upgrade. * build package hello with gcc 1.2 and OLD_SOURCE_DATE_EPOCH set to 1. changed mtimes, changed content, good. * build package hello with gcc 1.2.1 and OLD_SOURCE_DATE_EPOCH set to 1. changed mtimes, otherwise unchanged content, good. comparison to previous build output says unchanged, so discard this build. > Also, this lumps a whole lot of changes into one which is further bad for > undestanding. > Split this up into per-change commits, adding docs and test-cases for each. > That would be required for acceptance anyhow, and should help seeing the > individual bits for what they are without needing to know how some > buildsystem somewhere processes stuff. Like already said, for example the bit > about erroring out on missing changelog is something that makes perfect sense > on its own. I will split it. > And OTOH I see there's an added check for SOURCE_DATE_EPOCH in the past, > which is also quite unrelated to this all (AFAICS), and which I disagree > with: ability to set the time into future is useful for testing purposes. I noticed it had the wrong condition. The problem it solves is also solved by using set_mtime_to_source_date_epoch. So I removed it for now. But it would be needed if one were to use clamp_mtime_to_source_date_epoch. If the system clock is before SOURCE_DATE_EPOCH then the build will create mtimes that will not be clamped as they are older, thus defeating the intent of using this setting. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1924084432 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
@JanZerebecki pushed 1 commit. e6c047aaba828aba1a0e40f01bf47fd9c05e1487 support reproducible automatic rebuilds -- View it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880/files/f539811e90825fb120e8592d80cee7b67ac26e1d..e6c047aaba828aba1a0e40f01bf47fd9c05e1487 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
The issues with this one start with the topic. Reproducible builds are reproducible whether manually or automatically, we don't need this patch for that. None of this makes any sense without reading up a whole lot of additional context as to how some initially unmentioned buildsystem processes things. I can understand the basic idea of throwing away builds that didn't actually change, but I get lost in the details. For example: > If a build e.g. embeds SOURCE_DATE_EPOCH in the output, then the output > changes every time such a rebuild happens, which can be very often. It only changes if you change SOURCE_DATE_EPOCH, and if you took the SOURCE_DATE_EPOCH from the changelog then it only changes if you change the changelog, and at that point its no longer the same. By my logic anyhow. It's really hard to constructively comment on what you don't understand. A test-case outside any complicated build-system machineries would perhaps help understand this on a more concrete level. Also, this lumps a whole lot of changes into one which is further bad for undestanding. Split this up into per-change commits, adding docs and test-cases for each. That would be required for acceptance anyhow, and should help seeing the individual bits for what they are without needing to know how some buildsystem somewhere processes stuff. Like already said, for example the bit about erroring out on missing changelog is something that makes perfect sense on its own. And OTOH I see there's an added check for SOURCE_DATE_EPOCH in the past, which is also quite unrelated to this all (AFAICS), and which I disagree with: ability to set the time into future is useful for testing purposes. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1921493020 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
> You've effectively created a situation where your builds are not reproducible > outside of your build system with the build system circumstances that created > it. That is incorrect. Pass the same 2 environment variables and it is reproducible. Same as before, just one additional variable. That the same circumstances are needed was always the case with reproducible builds, the point is to define the circumstances in a machine readable way, so it can be automated. > From my reading of this, this is a very bad idea, and I'm not sure we should > have this. I also think that if this is something openSUSE is going to ship, > it should stop saying it's doing reproducible builds. I discussed this with people at reproducible-builds.org and it was agreed on: https://reproducible-builds.org/docs/source-date-epoch/#interaction-of-source_date_epoch-with-automatic-rebuilds > (Yes, I'm aware of all the caveats of reproducible builds, please don't add > more of them!) This PR is removing caveats. not adding them. One of the points of it is to make it just work in more cases, even when upstreams are doing things they shouldn't. @Conan-Kudo From our previous discussions I don't think you understand reproducible builds in the same way as reproducible-builds.org does. Do you do builds that are reproducible anywhere? My understanding is that Fedora doesn't have enough people with enough of their time working on it. So nobody reproduced even a synthetic package for Fedora for years. The last try I know of was https://reproducible-builds.org/events/hamburg2023/fedora-packages/ . Part of our communication difficulty is that you are talking about a fiction, which is important as a step forward, but not the same as a set of software one can run. Please suggest solutions, ways towards a concrete implementation or at least scientific arguments instead of unfalsifiable criticism. > I will also point out that this is premised on some kind of "build counter" > property that we don't have. No it is not, it depends on SOURCE_DATE_EPOCH increasing when the concretely resolved build dependencies change (or any other build input other than the package source changes), which was always the case. That OpenSUSE will use a build counter for that is just one way to do it. Debian does manual rebuilds and also uses a counter for that, see link above. One could use the build time of the distributions rpm repository index instead. Or the git commit date from a git repo that defines all the exact packages and their versions of a distribution, which is probably what Nix and Guix are doing. > It looks like an attempt to eat and keep the cake, and we all know how well > that works. In the digital, especially with cryptographic properties like reproducible build can be used for, that does indeed work, just copy the cake or rebuild it reproducibly. > Yes, this has the air of digging just deeper into the hole which you should > be looking for a way out of instead. The last thing we need on this front is > yet more fiddly knobs. I welcome reducing the amount of fiddly knobs, I thought that was preferred and thus copied the style. Indeed I'd prefer reproducible builds to become the default for rpm, though that is a breaking change, which I think we might want to wait for it to actually work. @pmatilai Which knobs should I remove for this to be acceptable? Any other suggestions? -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1918991360 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
> This seems to defeat the point. The point of this is to clamp the times to > the date stamp in the changelog. If you're doing automatic rebuilds, you > should not use that feature, full stop. :+1: It looks like an attempt to eat and keep the cake, and we all know how well that works. > (Yes, I'm aware of all the caveats of reproducible builds, please don't add > more of them!) Yes, this has the air of digging just deeper into the hole which you should be looking for a way out of instead. The last thing we need on this front is yet more fiddly knobs. The part about erroring out instead of merely warning is something I could agree on as a separate patch though. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1918574779 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
FYI: @davide125 @michel-slm -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1918382019 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
I will also point out that this is premised on some kind of "build counter" property that we don't have. @bookwar proposed [adding something like this to RPM and extending NVR to NVRB some time ago](https://discussion.fedoraproject.org/t/rfc-build-tag-in-rpms-nvr-nvrb/39954), but there has been no movement on that. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1918381531 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
This seems to defeat the point. The point of this is to clamp the times to the date stamp in the changelog. If you're doing automatic rebuilds, you should not use that feature, full stop. You've effectively created a situation where your builds are not reproducible outside of your build system with the build system circumstances that created it. >From my reading of this, this is a very bad idea, and I'm not sure we should >have this. I also think that if this is something openSUSE is going to ship, >it should stop saying it's doing reproducible builds. (Yes, I'm aware of all the caveats of reproducible builds, please don't add more of them!) -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1918379446 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
Re: [Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
I'd appreciate to know if this would be merged as is. But perhaps we want to wait with the actual merge until it was shipped in OpenSUSE and nobody complained for two weeks. I think it should work as is, but this stuff has so many parts that yet another problem could be found. -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880#issuecomment-1918058948 You are receiving this because you are subscribed to this thread. Message ID: ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint
[Rpm-maint] [rpm-software-management/rpm] support reproducible automatic rebuilds (PR #2880)
Normally automatic rebuilds work, but together with reproducible builds an undesirable situation may occur. If a build e.g. embeds SOURCE_DATE_EPOCH in the output, then the output changes every time such a rebuild happens, which can be very often. This is to be avoided as updating packages without necessity is too expensive. To avoid this, in addition to the settings already needed for reproducible builds, set the here introduced macro use_old_source_date_epoch to Y and set the environment variable OLD_SOURCE_DATE_EPOCH to the date of the first build (e.g. from the changelog or git commit) and put it incremented by the build count into SOURCE_DATE_EPOCH. This makes rpm use the old SOURCE_DATE_EPOCH for build scripts, but still use the new incremented one itself. As source files may have the mtime of commit, if set_mtime_to_source_date_epoch is Y, then set the mtime in the rpm. Using this instead of clamping with clamp_mtime_to_source_date_epoch avoids problems when build scripts use the old SOURCE_DATE_EPOCH for mtime. This is not common, but it would otherwise be difficult to debug when such a problem happens. Also instead of only warning, error out on a missing changelog date for SOURCE_DATE_EPOCH. For debugging if the macro warn_on_mtime_lower_than_source_date_epoch is Y log if any mtime is lower than the new incremented SOURCE_DATE_EPOCH. You can view, comment on, or merge this pull request online at: https://github.com/rpm-software-management/rpm/pull/2880 -- Commit Summary -- * support reproducible automatic rebuilds -- File Changes -- M build/build.c (39) M build/files.c (30) -- Patch Links -- https://github.com/rpm-software-management/rpm/pull/2880.patch https://github.com/rpm-software-management/rpm/pull/2880.diff -- Reply to this email directly or view it on GitHub: https://github.com/rpm-software-management/rpm/pull/2880 You are receiving this because you are subscribed to this thread. Message ID: rpm-software-management/rpm/pull/2...@github.com ___ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint