Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Vagrant Cascadian
On 2017-09-18, Vagrant Cascadian wrote:
> On 2017-09-18, Russ Allbery wrote:
>> Daniel Kahn Gillmor  writes:
>>> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:

>>> Does everything in policy need to be rigorously testable?  or is it ok
>>> to have Policy state the desired outcome even if we don't know how (or
>>> don't have the resources) to test it fully today.
>>
>> I don't think everything has to be rigorously testable, but I do think
>> it's a useful canary.  If I can't test something, I start wondering
>> whether that means I have problems with my underlying assumptions.
>>
>> In particular, for (1), we have no comprehensive list of environment
>> variables that affect the behavior of tools, and that list would be
>> difficult to create.  Many pieces of software add their own environment
>> variables with little coordination, and many of those variables could
>> possibly affect tool output.
>
> There is a huge difference between variables that *might* affect the
> build as an unintended input that gets stored in a resulting packages in
> some manner, and variables that are designed to change the behavior of
> parts of the build toolchain.
>
> I consider unintended variables that affect the build output a bug, and
> variables designed and intended to change the behavior of the toolchain
> expected, reasonable behavior.

Ok, after discussing on IRC a bit, I figured it might be worth expanding
on that point a bit...


The envioronment variables (and other variations) used by the
reproducible builds test infrastructure:

  https://tests.reproducible-builds.org/debian/index_variations.html

I'll try and summarize the rationale for each of the variables used,
many of which have had actual impacts on the result of the builds:


CAPTURE_ENVIRONMENT, BUILDUSERID, BUILDUSERNAME

Some builds capture the entire environment, or most of the environment;
setting arbitrary environment variables can help detect this.

TZ

The timezone used can change the results of embedded timestamps.

LANG, LANGUAGE, LC_ALL

The locale and language settings definitely change the strings embedded
in some binaries, if tool output is translated.

PATH, USER, HOME

Some builds embed these.

DEB_BUILD_OPTIONS=parallel=N

The level of parallelism can change the build output, although other
values in DEB_BUILD_OPTIONS values might be reasonably expected to
change output (e.g. noautodbgsym).


None of the above variables should change the resulting built package,
with the possible exception of some other values of DEB_BUILD_OPTIONS.

On the other hand, I would expect variables such as CC, MAKE,
CROSS_COMPILE, CFLAGS, etc. to reasonably and likely change the result
of the built package. They are, in a sense, part of the build toolchain
environment.


Without generating comprehensive blacklists and/or whitelists, is it
plausible to come up with a policy description of the above two classes
of variables? Given the above lists, it seems relatively obvious to me
that there are basically two classes of variables, but I'm at a loss for
how to really describe it in policy.

You could give a reasonable test of:

  Is this variable intended to change the results of the binary, or is
  it changing the build as an unintended side-effect?

That does require reasoned interpretation, though. I envision such tests
being used in bug reports relating to reproducibility issues, on a
case-by-case basis.


It doesn't solve the testability issue on a policy level, but that could
possibly be addressed outside of policy through best practices for
reproducibility documentation.


live well,
  vagrant


signature.asc
Description: PGP signature


Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Vagrant Cascadian
On 2017-09-18, Russ Allbery wrote:
> Daniel Kahn Gillmor  writes:
>> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
>
>>> I personally lean towards 2, which is consistent with what's in Policy
>>> right now, but I can see definite merits in 3.  I believe the
>>> reproducible builds project is currently sort of doing 1, but I have a
>>> hard time seeing how to make that viable on the testing side.
>
>> Thanks for raising this question, Russ!

Indeed!


>> I'm not sure that we should let lack of exhaustive testing push us away
>> from (1).  (1) is in principle the right thing -- it's easy to make a
>> build reproducible if we tell people that they have to do exactly one
>> specific thing.  But we generally want people to be able to run
>> heterogenous systems, and not to force them into one particular
>> environment.
>
> Well... I would argue that the amount of time and effort that's gone into
> this project shows that it's not that easy to make a build reproducible
> even when telling people to do exactly one thing.  :)  But I get your
> point.

Much of the work has already been done by aspirational, principled
folks... :)


>> Does everything in policy need to be rigorously testable?  or is it ok
>> to have Policy state the desired outcome even if we don't know how (or
>> don't have the resources) to test it fully today.
>
> I don't think everything has to be rigorously testable, but I do think
> it's a useful canary.  If I can't test something, I start wondering
> whether that means I have problems with my underlying assumptions.
>
> In particular, for (1), we have no comprehensive list of environment
> variables that affect the behavior of tools, and that list would be
> difficult to create.  Many pieces of software add their own environment
> variables with little coordination, and many of those variables could
> possibly affect tool output.

There is a huge difference between variables that *might* affect the
build as an unintended input that gets stored in a resulting packages in
some manner, and variables that are designed to change the behavior of
parts of the build toolchain.

I consider unintended variables that affect the build output a bug, and
variables designed and intended to change the behavior of the toolchain
expected, reasonable behavior.


> I feel like the work for (1) and for (3) ends up being comparable; for (1)
> we have to maintain a blacklist, and for (3) we have to maintain a
> whitelist.  But (3) is testable, whereas (1) is inherently aspirational
> and will always have to be aspirational.  We're endlessly going to be
> discovering some other environment variable that changes tool output.

Well, there can be a testable, automatable standard, and a higher,
aspirational standard in parallel.

Which largely seems consistant with what's already in policy... but I'm
not sure it's appropriate to codify these whitelists or blacklists in
policy.


> I'm also unsure that (1) is even what we want to claim.  Do we really want
> to say that builds are always reproducible if you don't change this short
> list of environment variables, no matter whatever other environment
> variables you set?

I don't think we want to make absolute claims; reproducible builds is
about having greater confidence that the binaries are produced from the
source, not absolute confidence.

The ideal is to have as many builds as possible corroborated from a
diverse group of build machines, developers, third-parties,
sophisticated end-users, legal jurisdictions, etc.


> There's some appeal in this for the end user, but it
> feels very frustrating for the package maintainer.  At first glance, as a
> package maintainer, I'd think I'd have to maintain a huge blacklist of
> environment variables that I've discovered affect my toolchain somewhere,
> and explicitly unset them all in debian/rules.  This doesn't feel like a
> good use of anyone's time (and may actually *break* other,
> non-reproducibility-related things that people want to do with my
> package).

In practice, for the vast majority of packages in Debian, it is a
relatively small number of environment variables to get fairly solid
reproducibility coverage... at least from what we've seen so far.

The hard part is actually continuing to tease them out...


live well,
  vagrant


signature.asc
Description: PGP signature


Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Russ Allbery
Daniel Kahn Gillmor  writes:
> On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:

>> I personally lean towards 2, which is consistent with what's in Policy
>> right now, but I can see definite merits in 3.  I believe the
>> reproducible builds project is currently sort of doing 1, but I have a
>> hard time seeing how to make that viable on the testing side.

> Thanks for raising this question, Russ!

> I'm not sure that we should let lack of exhaustive testing push us away
> from (1).  (1) is in principle the right thing -- it's easy to make a
> build reproducible if we tell people that they have to do exactly one
> specific thing.  But we generally want people to be able to run
> heterogenous systems, and not to force them into one particular
> environment.

Well... I would argue that the amount of time and effort that's gone into
this project shows that it's not that easy to make a build reproducible
even when telling people to do exactly one thing.  :)  But I get your
point.

> Consider someone who wants to see more logging from a build, for
> example.  There could be an environment variable that encourages the
> toolchain to log more, but doesn't affect the binary objects created by
> the build.  By going with choices (2) or (3) we effectively dismiss even
> considering the reproducibility of those builds, which seems like a
> shame.

This is the case for (2), but not for (3).  Indeed, this is exactly the
distinction between (2) and (3).  It does mean that discovery of any new
such environment variable would require a change to our whitelist in
approach (3), so there would be some lag and the whitelist would become
long over time (with a corresponding testing load).  But (3) does try to
achieve that use case without trying to anticipate any possible
environment variable setting.  It lets us be reactive to newly-discovered
environment variables across which we want to stay reproducible.

> Does everything in policy need to be rigorously testable?  or is it ok
> to have Policy state the desired outcome even if we don't know how (or
> don't have the resources) to test it fully today.

I don't think everything has to be rigorously testable, but I do think
it's a useful canary.  If I can't test something, I start wondering
whether that means I have problems with my underlying assumptions.

In particular, for (1), we have no comprehensive list of environment
variables that affect the behavior of tools, and that list would be
difficult to create.  Many pieces of software add their own environment
variables with little coordination, and many of those variables could
possibly affect tool output.

I feel like the work for (1) and for (3) ends up being comparable; for (1)
we have to maintain a blacklist, and for (3) we have to maintain a
whitelist.  But (3) is testable, whereas (1) is inherently aspirational
and will always have to be aspirational.  We're endlessly going to be
discovering some other environment variable that changes tool output.

I'm also unsure that (1) is even what we want to claim.  Do we really want
to say that builds are always reproducible if you don't change this short
list of environment variables, no matter whatever other environment
variables you set?  There's some appeal in this for the end user, but it
feels very frustrating for the package maintainer.  At first glance, as a
package maintainer, I'd think I'd have to maintain a huge blacklist of
environment variables that I've discovered affect my toolchain somewhere,
and explicitly unset them all in debian/rules.  This doesn't feel like a
good use of anyone's time (and may actually *break* other,
non-reproducibility-related things that people want to do with my
package).

-- 
Russ Allbery (r...@debian.org)   



Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Daniel Kahn Gillmor
On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
> I personally lean towards 2, which is consistent with what's in Policy
> right now, but I can see definite merits in 3.  I believe the reproducible
> builds project is currently sort of doing 1, but I have a hard time seeing
> how to make that viable on the testing side.

Thanks for raising this question, Russ!

I'm not sure that we should let lack of exhaustive testing push us away
from (1).  (1) is in principle the right thing -- it's easy to make a
build reproducible if we tell people that they have to do exactly one
specific thing.  But we generally want people to be able to run
heterogenous systems, and not to force them into one particular
environment.

Consider someone who wants to see more logging from a build, for
example.  There could be an environment variable that encourages the
toolchain to log more, but doesn't affect the binary objects created by
the build.  By going with choices (2) or (3) we effectively dismiss even
considering the reproducibility of those builds, which seems like a
shame.

Does everything in policy need to be rigorously testable?  or is it ok
to have Policy state the desired outcome even if we don't know how (or
don't have the resources) to test it fully today.

I'd prefer for policy to be able to make strong advisory statements even
without us being able to test them mechanically.  This is already the
case for (for example) "preferred form of modification" -- it's partly
testable, but will never be 100% testable, and will always require
research and discussion and thinking for the corner cases.  Yet we
continue to aim for it.

Policy should be aiming high, not lowering the bar to meet what's
concretely testable.

  --dkg



Bug#737796: may be use the newly proposed License-grant field

2017-09-18 Thread Tobias Frost
On Mon, Sep 18, 2017 at 02:15:54PM +0200, Dominique Dumont wrote:
> Hi
> 
> Since the licence text shown in the original report mention "At the 
> discretion 
> of the user of this library,  this software may be licensed under the terms 
> of 
> ..." , I'm wondering if this would better fit in the new License-Grant field 
> [1].
> 
> Thoughts ?

I like this new feature and would be in favour making it real.

-- 
tobi

 
> All the best
> 
> [1] https://bugs.debian.org/786470
> -- 
>  https://github.com/dod38fr/   -o- http://search.cpan.org/~ddumont/
> http://ddumont.wordpress.com/  -o-   irc: dod at irc.debian.org
> 



Bug#737796: may be use the newly proposed License-grant field

2017-09-18 Thread Dominique Dumont
Hi

Since the licence text shown in the original report mention "At the discretion 
of the user of this library,  this software may be licensed under the terms of 
..." , I'm wondering if this would better fit in the new License-Grant field 
[1].

Thoughts ?

All the best

[1] https://bugs.debian.org/786470
-- 
 https://github.com/dod38fr/   -o- http://search.cpan.org/~ddumont/
http://ddumont.wordpress.com/  -o-   irc: dod at irc.debian.org



Bug#515856: [debhelper-devel] Bug#515856: debhelper: please implement dh get-orig-source

2017-09-18 Thread Bill Allombert
On Mon, Sep 18, 2017 at 12:38:49PM +0200, Helmut Grohne wrote:
> On Mon, Sep 18, 2017 at 11:28:42AM +0200, Bill Allombert wrote:
> > get-orig-source and watch files serve a different purpose.
> > 
> > get-orig-source is used to build the .orig. tarball from the true
> > upstream one. Most package do not need that.  Watch files could not do
> > that until recently.
> > 
> > So the comparaison is unfair.
> > 
> > What need to be checked is how many get-orig-source rules has been
> > reimplemented in term of watch files.
> 
> Challenge accepted. ticharich.d.o has an unpack of rules debian/rules
> files. Most of them are world-readable. A small number (~30) are
> inaccessible, so my analysis will have an error of around 0.2%.
> 
> A simple method is to just look at which of them contain the string
> "get-orig-source" and which of them contain the string "uscan" assuming
> that when both show up, get-orig-source is implemented using uscan.

One would need to check whether get-orig-source use uscan for repacking
or if it only use uscan for downloading and then repack manually.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Bug#515856: [debhelper-devel] Bug#515856: debhelper: please implement dh get-orig-source

2017-09-18 Thread Helmut Grohne
On Mon, Sep 18, 2017 at 11:28:42AM +0200, Bill Allombert wrote:
> get-orig-source and watch files serve a different purpose.
> 
> get-orig-source is used to build the .orig. tarball from the true
> upstream one. Most package do not need that.  Watch files could not do
> that until recently.
> 
> So the comparaison is unfair.
> 
> What need to be checked is how many get-orig-source rules has been
> reimplemented in term of watch files.

Challenge accepted. ticharich.d.o has an unpack of rules debian/rules
files. Most of them are world-readable. A small number (~30) are
inaccessible, so my analysis will have an error of around 0.2%.

A simple method is to just look at which of them contain the string
"get-orig-source" and which of them contain the string "uscan" assuming
that when both show up, get-orig-source is implemented using uscan.

The following packages do not implement get-orig-source with uscan:

biojava4-live
boinc-app-seti
cjk
edk2
fasttree
freeorion
freerdp
gr-air-modes
gr-fcdproplus
gr-iqbal
gr-osmosdr
htmlunit
ioquake3
iortcw
josm
libb64
libreoffice
libtgvoip
neobio
nvidia-graphics-drivers
nvidia-graphics-drivers-legacy-304xx
pencil2d
pixelmed
qemu
r-cran-rniftilib
sagemath
west-chamber
zsh

So we have around 22500 source packages with watch files, we have 3000
packages with get-orig source, of those 28 don't use uscan. The fair
comparison is 22500 vs. 28. That's almost 3 magnitudes. If anything,
policy should document debian/watch, not get-orig-source. The perl
policy, python policy, elpa policy, ... each affect more packages than
get-orig-source. Keeping it is uneconomic.

Helmut



Bug#515856: [debhelper-devel] Bug#515856: debhelper: please implement dh get-orig-source

2017-09-18 Thread Bill Allombert
On Fri, Sep 01, 2017 at 06:52:27AM +0200, Helmut Grohne wrote:
> According to codesearch.d.n, get-orig-source is implemented by less than
> 3000 source packages. This is not very low, but neither a high adoption
> rate. It certainly makes using get-orig-source somewhat useless on a
> distribution-scale. In contrast, we have some 22500 watch files, an
> order of magnitude more. I think it is obvious which mechanism has won.

get-orig-source and watch files serve a different purpose.

get-orig-source is used to build the .orig. tarball from the true
upstream one. Most package do not need that.  Watch files could not do
that until recently.

So the comparaison is unfair.

What need to be checked is how many get-orig-source rules has been
reimplemented in term of watch files.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Bug#515856: [debhelper-devel] Bug#515856: debhelper: please implement dh get-orig-source

2017-09-18 Thread Holger Levsen
On Fri, Sep 01, 2017 at 06:52:27AM +0200, Helmut Grohne wrote:
> According to codesearch.d.n, get-orig-source is implemented by less than
> 3000 source packages. This is not very low, but neither a high adoption
> rate. It certainly makes using get-orig-source somewhat useless on a
> distribution-scale. In contrast, we have some 22500 watch files, an
> order of magnitude more. I think it is obvious which mechanism has won.
 
agreed.

[...]
> I believe that if debhelper is not going to support us in increasing
> get-orig-source adoption, then we should just stop doing it and move on
> to watch files.

agreed.

> I am attaching the removal patch and call for seconds.
> 
> Helmut

> diff --git a/policy/ch-source.rst b/policy/ch-source.rst
> index f706a13..27c49b5 100644
> --- a/policy/ch-source.rst
> +++ b/policy/ch-source.rst
> @@ -368,19 +368,6 @@ The targets are as follows:
>  Instead, the upstream source should be repacked to remove those
>  files.
>  
> -``get-orig-source`` (optional)
> -This target fetches the most recent version of the original source
> -package from a canonical archive site (via FTP or WWW, for example),
> -does any necessary rearrangement to turn it into the original source
> -tar file format described below, and leaves it in the current
> -directory.
> -
> -This target may be invoked in any directory, and should take care to
> -clean up any temporary files it may have left.
> -
> -This target is optional, but providing it if possible is a good
> -idea.
> -

seconded.


-- 
cheers,
Holger


signature.asc
Description: Digital signature


Bug#876075: Anchors are non-unique in the single-HTML version

2017-09-18 Thread Andrey Rahmatullin
On Mon, Sep 18, 2017 at 01:45:10PM +0500, Andrey Rahmatullin wrote:
> Package: debian-policy
> Version: 4.1.0.0
> Severity: normal
> 
> https://www.debian.org/doc/debian-policy/#version
> https://www.debian.org/doc/debian-policy/index.html#introduction
> 
> etc.
> 
> This breaks the ToC.
It also make impossible to link to such sections.

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Bug#876075: Anchors are non-unique in the single-HTML version

2017-09-18 Thread Andrey Rahmatullin
Package: debian-policy
Version: 4.1.0.0
Severity: normal

https://www.debian.org/doc/debian-policy/#version
https://www.debian.org/doc/debian-policy/index.html#introduction

etc.

This breaks the ToC.



-- System Information:
Debian Release: buster/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 
'unstable'), (500, 'testing'), (101, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.13.0-trunk-amd64 (SMP w/4 CPU cores)
Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8), LANGUAGE= 
(charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages debian-policy depends on:
ii  libjs-sphinxdoc  1.6.3-2

debian-policy recommends no packages.

Versions of packages debian-policy suggests:
ii  doc-base  0.10.7

-- no debconf information



Bug#876055: Environment variable handling for reproducible builds

2017-09-18 Thread Paul Sherwood

On 2017-09-18 00:26, Russ Allbery wrote:

2. Set the entire environment to the environment specified in buildinfo
   when doing a reproducible build.  I think this is conceptually the
   simplest, but it means that we should make every tool that builds
   official Debian packages use the same environment variable logic so
   that the buildinfo file completely captures the environment (without
   leaking random, inappropriate things into buildinfo).  It also means
   effectively giving up on debian/rules build being a path for making 
a
   reproducible build, since we don't have control over that 
environment,

   but I think it will be hard to make that work anyway.


FWIW this is the approach we've taken on both of the Baserock build 
tools, and for BuildStream [1].


Given that it's trivially easy for a build script to try to call out to 
the internet (eg fetch tarball, git clone), or look for custom 
environment variables, we think it's clearly safest to put everything in 
a sandbox and be explicit about resources, network and environment 
variables.


br
Paul

[1] https://wiki.gnome.org/Projects/BuildStream/