Re: Bug#876055: Environment variable handling for reproducible builds
Daniel Kahn Gillmor <d...@fifthhorseman.net> writes: > On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote: >> I personally lean towards 2, which is consistent with what's in Policy >> right now, but I can see definite merits in 3. I believe the >> reproducible builds project is currently sort of doing 1, but I have a >> hard time seeing how to make that viable on the testing side. > Thanks for raising this question, Russ! > I'm not sure that we should let lack of exhaustive testing push us away > from (1). (1) is in principle the right thing -- it's easy to make a > build reproducible if we tell people that they have to do exactly one > specific thing. But we generally want people to be able to run > heterogenous systems, and not to force them into one particular > environment. Well... I would argue that the amount of time and effort that's gone into this project shows that it's not that easy to make a build reproducible even when telling people to do exactly one thing. :) But I get your point. > Consider someone who wants to see more logging from a build, for > example. There could be an environment variable that encourages the > toolchain to log more, but doesn't affect the binary objects created by > the build. By going with choices (2) or (3) we effectively dismiss even > considering the reproducibility of those builds, which seems like a > shame. This is the case for (2), but not for (3). Indeed, this is exactly the distinction between (2) and (3). It does mean that discovery of any new such environment variable would require a change to our whitelist in approach (3), so there would be some lag and the whitelist would become long over time (with a corresponding testing load). But (3) does try to achieve that use case without trying to anticipate any possible environment variable setting. It lets us be reactive to newly-discovered environment variables across which we want to stay reproducible. > Does everything in policy need to be rigorously testable? or is it ok > to have Policy state the desired outcome even if we don't know how (or > don't have the resources) to test it fully today. I don't think everything has to be rigorously testable, but I do think it's a useful canary. If I can't test something, I start wondering whether that means I have problems with my underlying assumptions. In particular, for (1), we have no comprehensive list of environment variables that affect the behavior of tools, and that list would be difficult to create. Many pieces of software add their own environment variables with little coordination, and many of those variables could possibly affect tool output. I feel like the work for (1) and for (3) ends up being comparable; for (1) we have to maintain a blacklist, and for (3) we have to maintain a whitelist. But (3) is testable, whereas (1) is inherently aspirational and will always have to be aspirational. We're endlessly going to be discovering some other environment variable that changes tool output. I'm also unsure that (1) is even what we want to claim. Do we really want to say that builds are always reproducible if you don't change this short list of environment variables, no matter whatever other environment variables you set? There's some appeal in this for the end user, but it feels very frustrating for the package maintainer. At first glance, as a package maintainer, I'd think I'd have to maintain a huge blacklist of environment variables that I've discovered affect my toolchain somewhere, and explicitly unset them all in debian/rules. This doesn't feel like a good use of anyone's time (and may actually *break* other, non-reproducibility-related things that people want to do with my package). -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Bug#876055: Environment variable handling for reproducible builds
Package: debian-policy Version: 4.1.0.0 Severity: normal Currently, Debian Policy requires all environment variables be held the same across builds for the build to be expected to be reproducible. However, the current approach of some reproducible build tools is to instead enumerate a set of fixed environment variables and allow other variables to vary. We should ideally converge on a single approach to environment variables and build reproducibility and make it easy for tools to implement that approach. I think the alternatives are: 1. Enumerate environment variables to hold fixed. This is better in the sense that it allows packages to be reproducible under more situations, but it's unstable in the sense that we'll never be able to enumerate all environment variables that might possibly affect the build. It's also not testable in the sense that we can't set every possible environment variable. 2. Set the entire environment to the environment specified in buildinfo when doing a reproducible build. I think this is conceptually the simplest, but it means that we should make every tool that builds official Debian packages use the same environment variable logic so that the buildinfo file completely captures the environment (without leaking random, inappropriate things into buildinfo). It also means effectively giving up on debian/rules build being a path for making a reproducible build, since we don't have control over that environment, but I think it will be hard to make that work anyway. 3. List a set of environment variables that are permitted to vary in the reproducible build policy, and then have reproducible builds clean the environment except for that set and then apply the buildinfo environment variable set. This is very similar to 2. I think the primary advantage is that it lets us require packages build reproducibly in the presence of some settings that logically should not affect the build (USER, HOME, etc.), at the cost of making reproducible builds harder to achieve. It's mostly testable, in that one can try reproducible builds with various settings for those variables, although it would be hard to catch corner cases where only a specific setting causes issues. I personally lean towards 2, which is consistent with what's in Policy right now, but I can see definite merits in 3. I believe the reproducible builds project is currently sort of doing 1, but I have a hard time seeing how to make that viable on the testing side. -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (990, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.12.0-1-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages debian-policy depends on: ii libjs-sphinxdoc 1.6.3-2 debian-policy recommends no packages. Versions of packages debian-policy suggests: pn doc-base -- no debconf information ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: Oppose
Bill Allombert <ballo...@debian.org> writes: > On Wed, Aug 16, 2017 at 12:14:53PM -0700, Russ Allbery wrote: >> If you have specific wording suggestions that you believe would bring >> this Policy requirement closer in line with what we're already doing in >> the project (and which has gotten us to 94% reproducible already), >> please make them. > This percentage was reached mostly by fixing software tools (compiler, > doc generators, packaging tools) to be deterministic, rather than by > fixing individual packages. This is a topic that is wholy absent from > policy. Indeed. There are many things that go into making Debian work that are wholly absent from Policy. Hopefully, over time, we can slowly reduce that, but there will always be new initiatives that aren't documented. > For example policy could mandate that programs that set timestamps > honour SOURCE_DATE_EPOCH. Please propose language. (Ideally in a separate bug, since this one is already quite large and it's easier to address specific issues in specific bugs.) I'm not opposed to adding more advice and requirements that make sense, but there are lots of things in Policy that aren't as fully described as they possibly could be if people did more work. I'm not willing to block this on having the perfect language, but if you want to contribute, you're absolutely welcome to do so. Most packages do not have to care about SOURCE_DATE_EPOCH because it's set by dpkg-buildpackage and consumed by the tools that are most frequently relevant, but I'd be very happy to see that documented in Policy for the packages that do care. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: Oppose
Just to be completely, 100% clear: I will not be responding further to this line of argument in this bug. If you disagree with my decision as a project delegate, I've spelled out your possible next steps under Debian's governance process. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: seeking seconds
Bill Allombert <ballo...@debian.org> writes: > On Wed, Aug 16, 2017 at 09:36:04AM -0700, Russ Allbery wrote: >> Note that, for most developers, this is pretty much equivalent to the >> current situation with FTBFS on, say, s390 architectures. Or even >> issues with running under whichever init system is not the one the >> maintainer personally uses. > Debian provides porter box for that purpose. This means if your package > FTBFS on s390 you can login to a s390 porter box, use sbuild to set up a > build environment, fix the problem and then check the package now build > correctly. > Now compare with reproducible build. You get some error report you > cannot reproduce, do some change following the help provided and hope > for the best. Then some day later you get the same error report. This hasn't been my experience with reproducible build bug reports. Once there's a bug report of unreproducibility under some specific situation, I've always been able to reproduce it by doing multiple builds with that specific variation and seeing how the output changes. I agree that this may not always be the case, but it's also not always the case that one can reproduce an s390 buildd failure on a porter box. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: Oppose
ee; I feel like I have a pretty complete understanding of the issues here, and it's highly unlikely that further elaborations or rephrasings of your current arguments are going to change my mind. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: seeking seconds
Adrian Bunk <b...@debian.org> writes: > This is not about experimenting for raising the bar in the future. > This is about the reproducible builds team not using policy as a stick > for claiming a bar higher than what policy actually defines. > Is it really allowed to claim that a package is not reproducible, > when it actually is reproducible according to policy? Yes. Ideally one would distinguish between those various definitions of reproducible, though, and present all of them. > Unless policy is supposed to be completely detached from reality, the > criteria for claiming in various places that a package is unreproducible > have to match the policy definition of reproducibility. No, I don't agree. This is not how we do things in Debian. There is quite a bit of information that we give developers about possible flaws in their package, from Lintian and build log analysis and many other things, that is not required by Policy. This is no different. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: seeking seconds
Adrian Bunk <b...@debian.org> writes: > Future policy versions might change this definition, but whatever latest > policy states has to be the definition used by both packages and the > reproducible builds team. > Another example is that a package that is reproducible according to the > policy definition must not show up as non-reproducible in tracker/DDPO > based on results from the reproducible infrastructure. This seems really inflexible and unnecessarily absolutist. I don't agree with taking this approach. The point of adding this definition to Policy is that we're setting a new minimum bar for packages in Debian to meet. We're giving official blessing to this requirement for Debian packages (at the normal bug level, not RC bug, for now), meaning this is a goal that the project is working towards and something every packager should think about at this level. This in absolutely no way constrains the reproducible build team from working on raising the bar in the future, just as the absence of this language from Policy did not prevent them from starting to work on this problem four years ago. They should continue to work on making package builds more reproducible and raising the bar for reproducibility as makes sense for their goals and judging the impact of that. Once any new requirements reach maturity and look feasible and have some project committment, we'll change Policy to set a new baseline for the whole project. But the reproducible builds work should not *wait* for that, and should definitely push forward and experiment just as they have up until now. I do think it might be worth considering distinguishing between packages that are minimally reproducible and packages that meet higher reproducibility bars (such as not caring about the location of the build tree) in reporting infrastructure like tracker. But I'm totally fine with surfacing failures on new, higher bars in places like tracker before we change Policy, just like we've been surfacing reproducibility failures before Policy said anything about it at all. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: seeking seconds
Adrian Bunk <b...@debian.org> writes: > I would expect the reproducible builds team to not submit any bugs > regarding varied environment variables as long as as the official > definition of reproducibility in policy states that this is not required > for a package to be reproducible. I believe the planned next step here is to publish the *.buildinfo files, which contain a specification of the environment variables the build cares about, and then Policy can be modified to include a description of *.buildinfo files and how to use them. As part of those changes, we'd certainly update the definition of reproducible to reference matching the environment specified in the corresponding *.buildinfo file. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Revised patch: seeking seconds
Ximin Luo <infini...@debian.org> writes: > To echo dkg and others' comments, it would be nice if we could add here: > +Packages are encouraged to produce bit-for-bit identical binary packages even > +if most environment variables and build paths are varied. This is technically > +more difficult at the time of writing, but it is intended that this stricter > +definition would replace the above one, when appropriate in the future. > If this type of "intent" wording is not appropriate for Policy then > disregard what I'm saying, I don't wish to block this patch for this > reason. Oh, that's a good way to capture that. This seems fine to me, and I have no objections to adding this advice. Seconded the original with or without this addition. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Reproducibility in Policy
Daniel Kahn Gillmor <d...@fifthhorseman.net> writes: > On Fri 2017-08-11 16:08:47 -0700, Sean Whitton wrote: >> - a version of a source package unpacked at a given path; > I don't like the idea of hard-coding a fixed build path requirement into > debian policy. We're over 80% with variable build paths in unstable > already, and i want to keep the pressure up on this. The build location > should not influence the binary output. It shouldn't, but my understanding is that it currently does. If you can fix that, that's great, but until that's been fixed, I don't see the harm in documenting this as a prerequisite for a reproducible build. If we can relax that prerequisite later, great, but nothing about listing it here should reduce the pressure on making variable build paths work. It just documents the current state of the world. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: Bug#844431: Reproducibility in Policy
Sean Whitton <spwhit...@spwhitton.name> writes: > Proposal: > This is what Holger and I think we should add to Policy, after > readability tweaks: > Packages should build reproducibly, which for purposes of this > document means that given > - a version of a source package unpacked at a given path; > - a set of versions of installed build-dependencies; and > - a build architecture, > repeatedly building the source package on the architecture with those > versions of the build dependencies installed will produce bit-for-bit > identical binary packages. I think we need to add all environment variables starting with DEB_* to the prerequisites. If you set DEB_BUILD_OPTIONS=nostrip or DEB_BUILD_MAINT_OPTIONS=hardening=all, you'll definitely get a different package, for instance. I feel like there are a bunch of other environment variables that have to be consistent, although I'm not sure how to specify that since other environment variables shouldn't matter. But, say, setting GNUTARGET is very likely to cause weirdness by changing how ld works. There are probably more interesting examples. How does the current reproducible build testing work with the environment? Maybe we should just document that for right now and relax it later if needed? > Explanation: > The definition from the reproducible builds group[1] says: > A build is reproducible if given the same source code, build > environment and build instructions, any party can recreate > bit-by-bit identical copies of all specified artifacts. > The relevant attributes of the build environment, the build > instructions and the source code as well as the expected > reproducible artifacts are defined by ... distributors. > i.e. Debian has to define the build environment, source code and build > instructions. I think that my wording defines these as Debian currently > understands them. > Later, we could narrow the definition of build environment by adding > more constraints, but we're not there yet. > [1] https://reproducible-builds.org/docs/definition/ We should add a link to that page (maybe in a footnote). -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Bug#832099: lintian: please check for unnecessary SOURCE_DATE_EPOCH assignments
Mattia Rizzolo <mat...@debian.org> writes: > On Fri, Jul 22, 2016 at 12:14:56PM -0700, Russ Allbery wrote: >> I think that's fine in this case, since not setting that variable >> doesn't break the build. It just means the build isn't reproducible, >> which is an optional feature. > 1/ it's an optional feature *for now*. I'd really love to see it being >mandated asap. I'm not sure that's a good idea, although it's mostly an intuitive reaction and I can't think of a specific counter-example off the top of my head. I do think we should support reproducible builds for all packages, and that our default build should be reproducible. I'm just not sure that we should rule out allowing packages to be configured to use default upstream behavior for timestamps and whatnot, if it's not the default. > 2/ it can break the build: I don't know if this is already present in >some package out there, but just think of calling a tool 'foo' >setting a cli flag 'bar' to SDE: > foo --bar="$(SOURCE_DATE_EPOCH)" >for example, using tar: > tar --mtime="@$(SOURCE_DATE_EPOCH)" -c -f out.tar in >this is a valid command when SDE is set, but it's going to fail as >soon as it's not set anymore, I fear; and it makes a lot of sense in >a d/rules to tar something up. Good point -- in cases like that, the packager probably should set this variable directly in debian/rules since the rest of the build does depend on it. (I don't think this is the typical use case, but there are probably a few cases like this.) -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Bug#832099: lintian: please check for unnecessary SOURCE_DATE_EPOCH assignments
Mattia Rizzolo <mat...@debian.org> writes: > On Fri, Jul 22, 2016 at 11:55:46AM +0200, Chris Lamb wrote: >> Attached is the following: >> >> commit 3b10f7dbaecedb0a458c25cc0b8615b489d424af >> Author: Chris Lamb <la...@debian.org> >> Date: Fri Jul 22 10:54:02 2016 +0100 >> >> c/rules: Check for unnecessary SOURCE_DATE_EPOCH assignments >> >> As of dpkg 1.18.8, this is no longer necessary as dpkg exports this >> variable if it is not already set (#75). This should encourage >> removing some duplicated code from a lot of our rules files. > though, using dpkg-buildpackage is not mandatory, a package should be > able to build with just `debian/rules binary`. In such case SDE > wouldn't be exported. > This looks fairly similar to e.g. DEB_HOST_ARCH & friends variables, > that even if they are exported by dpkg-buildpackage you should set them > in d/rules nonetheless. > At least, that's what my AM told me back then ^^ I think that's fine in this case, since not setting that variable doesn't break the build. It just means the build isn't reproducible, which is an optional feature. I think it's fine for optional features to only be implemented by the surrounding build wrapper or by environment variables explicitly set by the person building the package. In fact, it makes somewhat more sense to me than having debian/rules unconditionally force a reproducible build. In other words, I think this is more akin to DEB_BUILD_OPTIONS than it is to DEB_HOST_ARCH. -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/> ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Bug#782878: [debhelper-devel] Bug#782879 + Bug#782878: lib{test-log4perl, scalar-defer}-perl: please make the build reproducible
Niko Tyni nt...@debian.org writes: On Wed, May 20, 2015 at 10:34:20PM +0200, Niels Thykier wrote: On 2015-04-19 14:35, gregor herrmann wrote: On Sun, 19 Apr 2015 14:03:44 +0200, Axel Beckert wrote: Jelmer Vernooij wrote: +# Set man page timestamp to last package change time. +BUILD_DATE = $(shell dpkg-parsechangelog -S Date) +POD_MAN_DATE = $(shell date -u +%Y-%m-%d --date=$(BUILD_DATE)) +export POD_MAN_DATE But isn't this something which should be done doing once and properly in the build system (e.g. in dh_auto_build), like setting all the file time stamps to that date? It is not entirely clear to me what you are asking for. Is this change only supposed to go into a Perl specific build system, in all build systems supported by dh_auto_build or ...? That's a good question. I suppose it should go in all the build systems, although most of the benefit is certainly for Perl module packages. The context is that Pod::Man sets the date header based on the mtime of the file, but if the file is patched by the Debian packaging, the mtime will be set to the extraction time, breaking reproducibility. (See #759404 for some related discussion.) The mtime will also be unreproducible if the file is generated during the build. Disabling the date in the generated man page is definitely the easiest fix. I personally like the idea of instead setting the date to the last modified time of the Debian package. Actually, ideally, I wish that dpkg itself would set the timestamp of all files modified by patches to match the last modification date of the package, which would achieve the same thing but at what feels like the correct level. My feeling is that the date in the man page serves a useful purpose for the end user by communicating some idea of the staleness of the documentation and the recentness of the last release of the software. While this isn't a huge deal, it does feel somewhat less than ideal to lose that data. Replacing it with the last modification date of the Debian package isn't perfect, but it's fairly reasonable. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds