Re: Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Russ Allbery
Daniel Kahn Gillmor <d...@fifthhorseman.net> writes:
> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:

>> I personally lean towards 2, which is consistent with what's in Policy
>> right now, but I can see definite merits in 3.  I believe the
>> reproducible builds project is currently sort of doing 1, but I have a
>> hard time seeing how to make that viable on the testing side.

> Thanks for raising this question, Russ!

> I'm not sure that we should let lack of exhaustive testing push us away
> from (1).  (1) is in principle the right thing -- it's easy to make a
> build reproducible if we tell people that they have to do exactly one
> specific thing.  But we generally want people to be able to run
> heterogenous systems, and not to force them into one particular
> environment.

Well... I would argue that the amount of time and effort that's gone into
this project shows that it's not that easy to make a build reproducible
even when telling people to do exactly one thing.  :)  But I get your
point.

> Consider someone who wants to see more logging from a build, for
> example.  There could be an environment variable that encourages the
> toolchain to log more, but doesn't affect the binary objects created by
> the build.  By going with choices (2) or (3) we effectively dismiss even
> considering the reproducibility of those builds, which seems like a
> shame.

This is the case for (2), but not for (3).  Indeed, this is exactly the
distinction between (2) and (3).  It does mean that discovery of any new
such environment variable would require a change to our whitelist in
approach (3), so there would be some lag and the whitelist would become
long over time (with a corresponding testing load).  But (3) does try to
achieve that use case without trying to anticipate any possible
environment variable setting.  It lets us be reactive to newly-discovered
environment variables across which we want to stay reproducible.

> Does everything in policy need to be rigorously testable?  or is it ok
> to have Policy state the desired outcome even if we don't know how (or
> don't have the resources) to test it fully today.

I don't think everything has to be rigorously testable, but I do think
it's a useful canary.  If I can't test something, I start wondering
whether that means I have problems with my underlying assumptions.

In particular, for (1), we have no comprehensive list of environment
variables that affect the behavior of tools, and that list would be
difficult to create.  Many pieces of software add their own environment
variables with little coordination, and many of those variables could
possibly affect tool output.

I feel like the work for (1) and for (3) ends up being comparable; for (1)
we have to maintain a blacklist, and for (3) we have to maintain a
whitelist.  But (3) is testable, whereas (1) is inherently aspirational
and will always have to be aspirational.  We're endlessly going to be
discovering some other environment variable that changes tool output.

I'm also unsure that (1) is even what we want to claim.  Do we really want
to say that builds are always reproducible if you don't change this short
list of environment variables, no matter whatever other environment
variables you set?  There's some appeal in this for the end user, but it
feels very frustrating for the package maintainer.  At first glance, as a
package maintainer, I'd think I'd have to maintain a huge blacklist of
environment variables that I've discovered affect my toolchain somewhere,
and explicitly unset them all in debian/rules.  This doesn't feel like a
good use of anyone's time (and may actually *break* other,
non-reproducibility-related things that people want to do with my
package).

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Bug#876055: Environment variable handling for reproducible builds

2017-09-17 Thread Russ Allbery
Package: debian-policy
Version: 4.1.0.0
Severity: normal

Currently, Debian Policy requires all environment variables be held the
same across builds for the build to be expected to be reproducible.
However, the current approach of some reproducible build tools is to
instead enumerate a set of fixed environment variables and allow other
variables to vary.

We should ideally converge on a single approach to environment variables
and build reproducibility and make it easy for tools to implement that
approach.

I think the alternatives are:

1. Enumerate environment variables to hold fixed.  This is better in
   the sense that it allows packages to be reproducible under more
   situations, but it's unstable in the sense that we'll never be able to
   enumerate all environment variables that might possibly affect the
   build.  It's also not testable in the sense that we can't set every
   possible environment variable.

2. Set the entire environment to the environment specified in buildinfo
   when doing a reproducible build.  I think this is conceptually the
   simplest, but it means that we should make every tool that builds
   official Debian packages use the same environment variable logic so
   that the buildinfo file completely captures the environment (without
   leaking random, inappropriate things into buildinfo).  It also means
   effectively giving up on debian/rules build being a path for making a
   reproducible build, since we don't have control over that environment,
   but I think it will be hard to make that work anyway.

3. List a set of environment variables that are permitted to vary in the
   reproducible build policy, and then have reproducible builds clean the
   environment except for that set and then apply the buildinfo environment
   variable set.  This is very similar to 2.  I think the primary advantage
   is that it lets us require packages build reproducibly in the presence
   of some settings that logically should not affect the build (USER, HOME,
   etc.), at the cost of making reproducible builds harder to achieve.
   It's mostly testable, in that one can try reproducible builds with
   various settings for those variables, although it would be hard to catch
   corner cases where only a specific setting causes issues.

I personally lean towards 2, which is consistent with what's in Policy
right now, but I can see definite merits in 3.  I believe the reproducible
builds project is currently sort of doing 1, but I have a hard time seeing
how to make that viable on the testing side.

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (990, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.12.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages debian-policy depends on:
ii  libjs-sphinxdoc  1.6.3-2

debian-policy recommends no packages.

Versions of packages debian-policy suggests:
pn  doc-base  

-- no debconf information

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: Oppose

2017-08-16 Thread Russ Allbery
Bill Allombert <ballo...@debian.org> writes:
> On Wed, Aug 16, 2017 at 12:14:53PM -0700, Russ Allbery wrote:

>> If you have specific wording suggestions that you believe would bring
>> this Policy requirement closer in line with what we're already doing in
>> the project (and which has gotten us to 94% reproducible already),
>> please make them.

> This percentage was reached mostly by fixing software tools (compiler,
> doc generators, packaging tools) to be deterministic, rather than by
> fixing individual packages. This is a topic that is wholy absent from
> policy.

Indeed.  There are many things that go into making Debian work that are
wholly absent from Policy.  Hopefully, over time, we can slowly reduce
that, but there will always be new initiatives that aren't documented.

> For example policy could mandate that programs that set timestamps
> honour SOURCE_DATE_EPOCH.

Please propose language.  (Ideally in a separate bug, since this one is
already quite large and it's easier to address specific issues in specific
bugs.)

I'm not opposed to adding more advice and requirements that make sense,
but there are lots of things in Policy that aren't as fully described as
they possibly could be if people did more work.  I'm not willing to block
this on having the perfect language, but if you want to contribute, you're
absolutely welcome to do so.

Most packages do not have to care about SOURCE_DATE_EPOCH because it's set
by dpkg-buildpackage and consumed by the tools that are most frequently
relevant, but I'd be very happy to see that documented in Policy for the
packages that do care.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: Oppose

2017-08-16 Thread Russ Allbery
Just to be completely, 100% clear: I will not be responding further to
this line of argument in this bug.  If you disagree with my decision as a
project delegate, I've spelled out your possible next steps under Debian's
governance process.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: seeking seconds

2017-08-16 Thread Russ Allbery
Bill Allombert <ballo...@debian.org> writes:
> On Wed, Aug 16, 2017 at 09:36:04AM -0700, Russ Allbery wrote:

>> Note that, for most developers, this is pretty much equivalent to the
>> current situation with FTBFS on, say, s390 architectures.  Or even
>> issues with running under whichever init system is not the one the
>> maintainer personally uses.

> Debian provides porter box for that purpose. This means if your package
> FTBFS on s390 you can login to a s390 porter box, use sbuild to set up a
> build environment, fix the problem and then check the package now build
> correctly.

> Now compare with reproducible build. You get some error report you
> cannot reproduce, do some change following the help provided and hope
> for the best. Then some day later you get the same error report.

This hasn't been my experience with reproducible build bug reports.  Once
there's a bug report of unreproducibility under some specific situation,
I've always been able to reproduce it by doing multiple builds with that
specific variation and seeing how the output changes.

I agree that this may not always be the case, but it's also not always the
case that one can reproduce an s390 buildd failure on a porter box.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: Oppose

2017-08-16 Thread Russ Allbery
ee; I
feel like I have a pretty complete understanding of the issues here, and
it's highly unlikely that further elaborations or rephrasings of your
current arguments are going to change my mind.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: seeking seconds

2017-08-15 Thread Russ Allbery
Adrian Bunk <b...@debian.org> writes:

> This is not about experimenting for raising the bar in the future.

> This is about the reproducible builds team not using policy as a stick 
> for claiming a bar higher than what policy actually defines.

> Is it really allowed to claim that a package is not reproducible,
> when it actually is reproducible according to policy?

Yes.  Ideally one would distinguish between those various definitions of
reproducible, though, and present all of them.

> Unless policy is supposed to be completely detached from reality, the
> criteria for claiming in various places that a package is unreproducible
> have to match the policy definition of reproducibility.

No, I don't agree.  This is not how we do things in Debian.  There is
quite a bit of information that we give developers about possible flaws in
their package, from Lintian and build log analysis and many other things,
that is not required by Policy.  This is no different.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: seeking seconds

2017-08-15 Thread Russ Allbery
Adrian Bunk <b...@debian.org> writes:

> Future policy versions might change this definition, but whatever latest
> policy states has to be the definition used by both packages and the
> reproducible builds team.

> Another example is that a package that is reproducible according to the 
> policy definition must not show up as non-reproducible in tracker/DDPO 
> based on results from the reproducible infrastructure.

This seems really inflexible and unnecessarily absolutist.  I don't agree
with taking this approach.

The point of adding this definition to Policy is that we're setting a new
minimum bar for packages in Debian to meet.  We're giving official
blessing to this requirement for Debian packages (at the normal bug level,
not RC bug, for now), meaning this is a goal that the project is working
towards and something every packager should think about at this level.

This in absolutely no way constrains the reproducible build team from
working on raising the bar in the future, just as the absence of this
language from Policy did not prevent them from starting to work on this
problem four years ago.  They should continue to work on making package
builds more reproducible and raising the bar for reproducibility as makes
sense for their goals and judging the impact of that.  Once any new
requirements reach maturity and look feasible and have some project
committment, we'll change Policy to set a new baseline for the whole
project.  But the reproducible builds work should not *wait* for that, and
should definitely push forward and experiment just as they have up until
now.

I do think it might be worth considering distinguishing between packages
that are minimally reproducible and packages that meet higher
reproducibility bars (such as not caring about the location of the build
tree) in reporting infrastructure like tracker.  But I'm totally fine with
surfacing failures on new, higher bars in places like tracker before we
change Policy, just like we've been surfacing reproducibility failures
before Policy said anything about it at all.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: seeking seconds

2017-08-15 Thread Russ Allbery
Adrian Bunk <b...@debian.org> writes:

> I would expect the reproducible builds team to not submit any bugs
> regarding varied environment variables as long as as the official
> definition of reproducibility in policy states that this is not required
> for a package to be reproducible.

I believe the planned next step here is to publish the *.buildinfo files,
which contain a specification of the environment variables the build cares
about, and then Policy can be modified to include a description of
*.buildinfo files and how to use them.  As part of those changes, we'd
certainly update the definition of reproducible to reference matching the
environment specified in the corresponding *.buildinfo file.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Revised patch: seeking seconds

2017-08-12 Thread Russ Allbery
Ximin Luo <infini...@debian.org> writes:

> To echo dkg and others' comments, it would be nice if we could add here:

> +Packages are encouraged to produce bit-for-bit identical binary packages even
> +if most environment variables and build paths are varied. This is technically
> +more difficult at the time of writing, but it is intended that this stricter
> +definition would replace the above one, when appropriate in the future.

> If this type of "intent" wording is not appropriate for Policy then
> disregard what I'm saying, I don't wish to block this patch for this
> reason.

Oh, that's a good way to capture that.  This seems fine to me, and I have
no objections to adding this advice.  Seconded the original with or
without this addition.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Reproducibility in Policy

2017-08-11 Thread Russ Allbery
Daniel Kahn Gillmor <d...@fifthhorseman.net> writes:
> On Fri 2017-08-11 16:08:47 -0700, Sean Whitton wrote:

>> - a version of a source package unpacked at a given path;

> I don't like the idea of hard-coding a fixed build path requirement into
> debian policy.  We're over 80% with variable build paths in unstable
> already, and i want to keep the pressure up on this.  The build location
> should not influence the binary output.

It shouldn't, but my understanding is that it currently does.  If you can
fix that, that's great, but until that's been fixed, I don't see the harm
in documenting this as a prerequisite for a reproducible build.  If we can
relax that prerequisite later, great, but nothing about listing it here
should reduce the pressure on making variable build paths work.  It just
documents the current state of the world.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: Bug#844431: Reproducibility in Policy

2017-08-11 Thread Russ Allbery
Sean Whitton <spwhit...@spwhitton.name> writes:

>  Proposal: 

> This is what Holger and I think we should add to Policy, after
> readability tweaks:

> Packages should build reproducibly, which for purposes of this
> document means that given

> - a version of a source package unpacked at a given path;
> - a set of versions of installed build-dependencies; and
> - a build architecture,

> repeatedly building the source package on the architecture with those
> versions of the build dependencies installed will produce bit-for-bit
> identical binary packages.

I think we need to add all environment variables starting with DEB_* to
the prerequisites.  If you set DEB_BUILD_OPTIONS=nostrip or
DEB_BUILD_MAINT_OPTIONS=hardening=all, you'll definitely get a different
package, for instance.

I feel like there are a bunch of other environment variables that have to
be consistent, although I'm not sure how to specify that since other
environment variables shouldn't matter.  But, say, setting GNUTARGET is
very likely to cause weirdness by changing how ld works.  There are
probably more interesting examples.

How does the current reproducible build testing work with the environment?
Maybe we should just document that for right now and relax it later if
needed?

>  Explanation: 

> The definition from the reproducible builds group[1] says:

> A build is reproducible if given the same source code, build
> environment and build instructions, any party can recreate
> bit-by-bit identical copies of all specified artifacts.

> The relevant attributes of the build environment, the build
> instructions and the source code as well as the expected
> reproducible artifacts are defined by ... distributors.

> i.e. Debian has to define the build environment, source code and build
> instructions.  I think that my wording defines these as Debian currently
> understands them.

> Later, we could narrow the definition of build environment by adding
> more constraints, but we're not there yet.

> [1]  https://reproducible-builds.org/docs/definition/

We should add a link to that page (maybe in a footnote).

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: [Reproducible-builds] Bug#832099: lintian: please check for unnecessary SOURCE_DATE_EPOCH assignments

2016-07-22 Thread Russ Allbery
Mattia Rizzolo <mat...@debian.org> writes:
> On Fri, Jul 22, 2016 at 12:14:56PM -0700, Russ Allbery wrote:

>> I think that's fine in this case, since not setting that variable
>> doesn't break the build.  It just means the build isn't reproducible,
>> which is an optional feature.

> 1/ it's an optional feature *for now*.  I'd really love to see it being
>mandated asap.

I'm not sure that's a good idea, although it's mostly an intuitive
reaction and I can't think of a specific counter-example off the top of my
head.

I do think we should support reproducible builds for all packages, and
that our default build should be reproducible.  I'm just not sure that we
should rule out allowing packages to be configured to use default upstream
behavior for timestamps and whatnot, if it's not the default.

> 2/ it can break the build: I don't know if this is already present in
>some package out there, but just think of calling a tool 'foo'
>setting a cli flag 'bar' to SDE:
>  foo --bar="$(SOURCE_DATE_EPOCH)"
>for example, using tar:
>  tar --mtime="@$(SOURCE_DATE_EPOCH)" -c -f out.tar in
>this is a valid command when SDE is set, but it's going to fail as
>soon as it's not set anymore, I fear; and it makes a lot of sense in
>a d/rules to tar something up.

Good point -- in cases like that, the packager probably should set this
variable directly in debian/rules since the rest of the build does depend
on it.  (I don't think this is the typical use case, but there are
probably a few cases like this.)

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: [Reproducible-builds] Bug#832099: lintian: please check for unnecessary SOURCE_DATE_EPOCH assignments

2016-07-22 Thread Russ Allbery
Mattia Rizzolo <mat...@debian.org> writes:
> On Fri, Jul 22, 2016 at 11:55:46AM +0200, Chris Lamb wrote:
>> Attached is the following:
>> 
>>   commit 3b10f7dbaecedb0a458c25cc0b8615b489d424af
>>   Author: Chris Lamb <la...@debian.org>
>>   Date:   Fri Jul 22 10:54:02 2016 +0100
>>   
>>   c/rules: Check for unnecessary SOURCE_DATE_EPOCH assignments
>>   
>>   As of dpkg 1.18.8, this is no longer necessary as dpkg exports this
>>   variable if it is not already set (#75). This should encourage
>>   removing some duplicated code from a lot of our rules files.

> though, using dpkg-buildpackage is not mandatory, a package should be
> able to build with just `debian/rules binary`.  In such case SDE
> wouldn't be exported.

> This looks fairly similar to e.g. DEB_HOST_ARCH & friends variables,
> that even if they are exported by dpkg-buildpackage you should set them
> in d/rules nonetheless.

> At least, that's what my AM told me back then ^^

I think that's fine in this case, since not setting that variable doesn't
break the build.  It just means the build isn't reproducible, which is an
optional feature.

I think it's fine for optional features to only be implemented by the
surrounding build wrapper or by environment variables explicitly set by
the person building the package.  In fact, it makes somewhat more sense to
me than having debian/rules unconditionally force a reproducible build.

In other words, I think this is more akin to DEB_BUILD_OPTIONS than it is
to DEB_HOST_ARCH.

-- 
Russ Allbery (r...@debian.org)   <http://www.eyrie.org/~eagle/>

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: [Reproducible-builds] Bug#782878: [debhelper-devel] Bug#782879 + Bug#782878: lib{test-log4perl, scalar-defer}-perl: please make the build reproducible

2015-05-24 Thread Russ Allbery
Niko Tyni nt...@debian.org writes:
 On Wed, May 20, 2015 at 10:34:20PM +0200, Niels Thykier wrote:
 On 2015-04-19 14:35, gregor herrmann wrote:
 On Sun, 19 Apr 2015 14:03:44 +0200, Axel Beckert wrote:

 Jelmer Vernooij wrote:

 +# Set man page timestamp to last package change time.
 +BUILD_DATE = $(shell dpkg-parsechangelog -S Date)
 +POD_MAN_DATE = $(shell date -u +%Y-%m-%d --date=$(BUILD_DATE))
 +export POD_MAN_DATE

 But isn't this something which should be done doing once and properly
 in the build system (e.g. in dh_auto_build), like setting all the file
 time stamps to that date?

 It is not entirely clear to me what you are asking for.  Is this change
 only supposed to go into a Perl specific build system, in all build
 systems supported by dh_auto_build or ...?

 That's a good question. I suppose it should go in all the build systems,
 although most of the benefit is certainly for Perl module packages.

 The context is that Pod::Man sets the date header based on the mtime of
 the file, but if the file is patched by the Debian packaging, the mtime
 will be set to the extraction time, breaking reproducibility.  (See
 #759404 for some related discussion.)  The mtime will also be
 unreproducible if the file is generated during the build.

Disabling the date in the generated man page is definitely the easiest
fix.

I personally like the idea of instead setting the date to the last
modified time of the Debian package.  Actually, ideally, I wish that dpkg
itself would set the timestamp of all files modified by patches to match
the last modification date of the package, which would achieve the same
thing but at what feels like the correct level.

My feeling is that the date in the man page serves a useful purpose for
the end user by communicating some idea of the staleness of the
documentation and the recentness of the last release of the software.
While this isn't a huge deal, it does feel somewhat less than ideal to
lose that data.  Replacing it with the last modification date of the
Debian package isn't perfect, but it's fairly reasonable.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds