Re: Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Vagrant Cascadian
On 2017-09-18, Vagrant Cascadian wrote:
> On 2017-09-18, Russ Allbery wrote:
>> Daniel Kahn Gillmor  writes:
>>> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:

>>> Does everything in policy need to be rigorously testable?  or is it ok
>>> to have Policy state the desired outcome even if we don't know how (or
>>> don't have the resources) to test it fully today.
>>
>> I don't think everything has to be rigorously testable, but I do think
>> it's a useful canary.  If I can't test something, I start wondering
>> whether that means I have problems with my underlying assumptions.
>>
>> In particular, for (1), we have no comprehensive list of environment
>> variables that affect the behavior of tools, and that list would be
>> difficult to create.  Many pieces of software add their own environment
>> variables with little coordination, and many of those variables could
>> possibly affect tool output.
>
> There is a huge difference between variables that *might* affect the
> build as an unintended input that gets stored in a resulting packages in
> some manner, and variables that are designed to change the behavior of
> parts of the build toolchain.
>
> I consider unintended variables that affect the build output a bug, and
> variables designed and intended to change the behavior of the toolchain
> expected, reasonable behavior.

Ok, after discussing on IRC a bit, I figured it might be worth expanding
on that point a bit...


The envioronment variables (and other variations) used by the
reproducible builds test infrastructure:

  https://tests.reproducible-builds.org/debian/index_variations.html

I'll try and summarize the rationale for each of the variables used,
many of which have had actual impacts on the result of the builds:


CAPTURE_ENVIRONMENT, BUILDUSERID, BUILDUSERNAME

Some builds capture the entire environment, or most of the environment;
setting arbitrary environment variables can help detect this.

TZ

The timezone used can change the results of embedded timestamps.

LANG, LANGUAGE, LC_ALL

The locale and language settings definitely change the strings embedded
in some binaries, if tool output is translated.

PATH, USER, HOME

Some builds embed these.

DEB_BUILD_OPTIONS=parallel=N

The level of parallelism can change the build output, although other
values in DEB_BUILD_OPTIONS values might be reasonably expected to
change output (e.g. noautodbgsym).


None of the above variables should change the resulting built package,
with the possible exception of some other values of DEB_BUILD_OPTIONS.

On the other hand, I would expect variables such as CC, MAKE,
CROSS_COMPILE, CFLAGS, etc. to reasonably and likely change the result
of the built package. They are, in a sense, part of the build toolchain
environment.


Without generating comprehensive blacklists and/or whitelists, is it
plausible to come up with a policy description of the above two classes
of variables? Given the above lists, it seems relatively obvious to me
that there are basically two classes of variables, but I'm at a loss for
how to really describe it in policy.

You could give a reasonable test of:

  Is this variable intended to change the results of the binary, or is
  it changing the build as an unintended side-effect?

That does require reasoned interpretation, though. I envision such tests
being used in bug reports relating to reproducibility issues, on a
case-by-case basis.


It doesn't solve the testability issue on a policy level, but that could
possibly be addressed outside of policy through best practices for
reproducibility documentation.


live well,
  vagrant


signature.asc
Description: PGP signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Russ Allbery
Daniel Kahn Gillmor  writes:
> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:

>> I personally lean towards 2, which is consistent with what's in Policy
>> right now, but I can see definite merits in 3.  I believe the
>> reproducible builds project is currently sort of doing 1, but I have a
>> hard time seeing how to make that viable on the testing side.

> Thanks for raising this question, Russ!

> I'm not sure that we should let lack of exhaustive testing push us away
> from (1).  (1) is in principle the right thing -- it's easy to make a
> build reproducible if we tell people that they have to do exactly one
> specific thing.  But we generally want people to be able to run
> heterogenous systems, and not to force them into one particular
> environment.

Well... I would argue that the amount of time and effort that's gone into
this project shows that it's not that easy to make a build reproducible
even when telling people to do exactly one thing.  :)  But I get your
point.

> Consider someone who wants to see more logging from a build, for
> example.  There could be an environment variable that encourages the
> toolchain to log more, but doesn't affect the binary objects created by
> the build.  By going with choices (2) or (3) we effectively dismiss even
> considering the reproducibility of those builds, which seems like a
> shame.

This is the case for (2), but not for (3).  Indeed, this is exactly the
distinction between (2) and (3).  It does mean that discovery of any new
such environment variable would require a change to our whitelist in
approach (3), so there would be some lag and the whitelist would become
long over time (with a corresponding testing load).  But (3) does try to
achieve that use case without trying to anticipate any possible
environment variable setting.  It lets us be reactive to newly-discovered
environment variables across which we want to stay reproducible.

> Does everything in policy need to be rigorously testable?  or is it ok
> to have Policy state the desired outcome even if we don't know how (or
> don't have the resources) to test it fully today.

I don't think everything has to be rigorously testable, but I do think
it's a useful canary.  If I can't test something, I start wondering
whether that means I have problems with my underlying assumptions.

In particular, for (1), we have no comprehensive list of environment
variables that affect the behavior of tools, and that list would be
difficult to create.  Many pieces of software add their own environment
variables with little coordination, and many of those variables could
possibly affect tool output.

I feel like the work for (1) and for (3) ends up being comparable; for (1)
we have to maintain a blacklist, and for (3) we have to maintain a
whitelist.  But (3) is testable, whereas (1) is inherently aspirational
and will always have to be aspirational.  We're endlessly going to be
discovering some other environment variable that changes tool output.

I'm also unsure that (1) is even what we want to claim.  Do we really want
to say that builds are always reproducible if you don't change this short
list of environment variables, no matter whatever other environment
variables you set?  There's some appeal in this for the end user, but it
feels very frustrating for the package maintainer.  At first glance, as a
package maintainer, I'd think I'd have to maintain a huge blacklist of
environment variables that I've discovered affect my toolchain somewhere,
and explicitly unset them all in debian/rules.  This doesn't feel like a
good use of anyone's time (and may actually *break* other,
non-reproducibility-related things that people want to do with my
package).

-- 
Russ Allbery (r...@debian.org)   

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Bug#876140: strip-nondeterminism: log which handlers "strip-nd"s a file

2017-09-18 Thread Mattia Rizzolo
Package: libfile-stripnondeterminism-perl
Version: 0.038-1
Severity: wishlist

Since 0.030 strip-nd prints a log when fixing a file, like
|   dh_strip_nondeterminism
|Using 1505769410 as canonical time
|Normalizing debian/libtse3-dev/usr/lib/x86_64-linux-gnu/libtse3.a


I'd find it handy (even if it might be obvious for most cases) to print
the name of the handler that does the normalization.

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
more about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Please review the draft for week 125's blog post

2017-09-18 Thread Ximin Luo
Hi all,

This week's blog post draft is now available for review:

https://reproducible.alioth.debian.org/blog/drafts/125/

Feel free to commit fixes directly to drafts/125.mdwn in

https://anonscm.debian.org/git/reproducible/blog.git/

I'll wait at least 24 hours from the time of this email for any comments, and 
if everything is good then I will publish it soon after that.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: [Pkg-zsh-devel] Bug#764650: zsh: FTBFS with noatime mounts

2017-09-18 Thread Vagrant Cascadian
On 2017-09-18, Axel Beckert wrote:
> Control: retitle -1 zsh: FTBFS with noatime mounts (e.g. on reproducible 
> builds armhf nodes)
...
> We still see this issue with reproducible builds on armhf in unstable as
> well as stretch:
> https://tests.reproducible-builds.org/debian/rb-pkg/stretch/armhf/zsh.html
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/armhf/zsh.html
> https://tests.reproducible-builds.org/debian/rbuild/stretch/armhf/zsh_5.3.1-4.rbuild.log
> https://tests.reproducible-builds.org/debian/rbuild/unstable/armhf/zsh_5.4.2-1.rbuild.log
> (Cc'ing the Reproducible Builds Folks for their information.)
>
> While I don't know for sure if those nodes use noatime, it explains
> well why exactly this tests only fails on the slowest architecture of
> reproducible builds.

I can confirm that the reproducible builds armhf builders use noatime on
the filesystem used for building packages.


live well,
  vagrant


signature.asc
Description: PGP signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds