Re: Two questions about build-path reproducibility in Debian

2024-03-04 Thread David A. Wheeler via rb-general



> On Mar 4, 2024, at 3:37 PM, Holger Levsen  wrote:
> 
> On Mon, Mar 04, 2024 at 11:52:07AM -0800, John Gilmore wrote:
>> Why would these become "wishlist" bugs as opposed to actual reproducibility 
>> bugs
>> that deserve fixing, just because one server at Debian no longer invokes this
>> bug because it always uses the same build directory?
> 
> because it's "not one server at Debian" but what many ecosystems do: build in 
> an
> deterministic path (eg /$pkg/$version or whatever) or record the path as part
> of the build environment, to have it deterministic as well.
> 
> in the distant past, before namespacing become popular, using a random path
> was a solution to allow parallel builds of the same software & version.
> 
> and yes, this is a shortcut and a tradeoff, similar to demanding to build 
> in a certain locale. also it makes reproducibilty from around 80-85% of all 
> packages to >95%, IOW with this shortcut we can have meaningful 
> reproducibility
> *many years* sooner, than without.
> 
> and I'd really rather like to see Debian 100% reproducible in 2030, than in 
> 2038.
> and some subsets today, or much sooner.

I agree with Holger (and Vagrant).

It'd be *nice* if a build was reproducible regardless of the directory used to 
build it.
But today, if you're building an executable for others, it's common to build 
using a
container/chroot or similar that makes it easy to implement "must compile with 
these paths",
while *fixing* this is often a lot of work.

I suggest focusing on ensuring everyone knows what the executable files 
contain, first.
if people can add more flexibility to their build process, all the better, but 
that added flexibility
comes at a cost of time and effort that is NOT as important.

--- David A. Wheeler



Re: Two questions about build-path reproducibility in Debian

2024-03-04 Thread Vagrant Cascadian
On 2024-03-04, John Gilmore wrote:
> Vagrant Cascadian wrote:
>> > > to make it easier to debug other issues, although deprioritizing them
>> > > makes sense, given buildd.debian.org now normalizes them.
>
> James Addison via rb-general  wrote:
>> Ok, thank you both.  A number of these bugs are currently recorded at 
>> severity
>> level 'normal'; unless told not to, I'll spend some time to double-check 
>> their
>> details and - assuming all looks OK - will bulk downgrade them to 'wishlist'
>> severity a week or so from now.

Well, I think we should change it to "minor" rather than "wishlist"
severity, but that may be splitting hairs; I do not find a huge amount
of difference between debian bug severities... they are pretty much
either critical/serious/grave and thus must be fixed, or
normal/minor/wishlist and fixed when someone feels like it.


> I may be confused about this.  These bug reports are that a package cannot
> be reproducibly built because its output binary depends on the directory in 
> which
> it was built?
>
> Why would these become "wishlist" bugs as opposed to actual reproducibility 
> bugs
> that deserve fixing, just because one server at Debian no longer invokes this
> bug because it always uses the same build directory?
>
> If an end user can't download a source package (into any directory on
> any machine), and build it into the same exact binary as the one that Debian
> ships, this is not a "wishlist" idea for some future enhancement.  This
> is a real issue that prevents the code from being reproducible.

I agree it is a real issue, but admit it is fairly easy to work around,
given most package building tools use chroots or containers or similar,
it seems acceptible to treat build paths as a lower priority. Compare
that to timestamps, which are non-trivial to force to use the exact same
clock moving at the exact same rate, I would say build path
normalization is quite tolerable, if not ideal.

You cannot just build on "any machine", the machine needs to have a
sufficiently similar build environment (e.g. exactly matching compiler
versions, same architecture, etc.) and weather the build path is part of
that or not is simply a decision to make.

Several (many?) other distros normalize the build path as part of their
standard build tooling; Debian is arguably a latecomer to that practice.

I have definitely argued in favor of addressing build path issues, and
encourage people to fix them, and have personally spent more than a
small amount of time working on it, and we have made huge progress on
fixing (tens of?) thousands of them.

There are only so many hours in the day and so many people actively
working on fixing things... there may be bigger fires to put out at the
moment.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-04 Thread Holger Levsen
On Mon, Mar 04, 2024 at 11:52:07AM -0800, John Gilmore wrote:
> Why would these become "wishlist" bugs as opposed to actual reproducibility 
> bugs
> that deserve fixing, just because one server at Debian no longer invokes this
> bug because it always uses the same build directory?

because it's "not one server at Debian" but what many ecosystems do: build in an
deterministic path (eg /$pkg/$version or whatever) or record the path as part
of the build environment, to have it deterministic as well.

in the distant past, before namespacing become popular, using a random path
was a solution to allow parallel builds of the same software & version.

and yes, this is a shortcut and a tradeoff, similar to demanding to build 
in a certain locale. also it makes reproducibilty from around 80-85% of all 
packages to >95%, IOW with this shortcut we can have meaningful reproducibility
*many years* sooner, than without.

and I'd really rather like to see Debian 100% reproducible in 2030, than in 
2038.
and some subsets today, or much sooner.


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Homophobia is a sin against god.


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-04 Thread John Gilmore
Vagrant Cascadian wrote:
> > > to make it easier to debug other issues, although deprioritizing them
> > > makes sense, given buildd.debian.org now normalizes them.

James Addison via rb-general  wrote:
> Ok, thank you both.  A number of these bugs are currently recorded at severity
> level 'normal'; unless told not to, I'll spend some time to double-check their
> details and - assuming all looks OK - will bulk downgrade them to 'wishlist'
> severity a week or so from now.

I may be confused about this.  These bug reports are that a package cannot
be reproducibly built because its output binary depends on the directory in 
which
it was built?

Why would these become "wishlist" bugs as opposed to actual reproducibility bugs
that deserve fixing, just because one server at Debian no longer invokes this
bug because it always uses the same build directory?

If an end user can't download a source package (into any directory on
any machine), and build it into the same exact binary as the one that Debian
ships, this is not a "wishlist" idea for some future enhancement.  This
is a real issue that prevents the code from being reproducible.

How am I confused?

John



Re: Two questions about build-path reproducibility in Debian

2024-03-04 Thread James Addison via rb-general
On Wed, 28 Feb 2024 at 12:06, Chris Lamb  wrote:
>
> Vagrant Cascadian wrote:
>
> > There are real-world build path issues, and while it is possible to work
> > around them in various ways, I think they are still issues worth fixing
> > to make it easier to debug other issues, although deprioritizing them
> > makes sense, given buildd.debian.org now normalizes them.
>
> +1.
>
> And for this reason, I think we should keep the buildpath-related
> bugs as well. They should all be 'wishlist' priority anyway, and I
> wouldn't like to bet my hat that the usertag metadata is accurate and
> comprehensive enough to blindly close them in the first place. (We
> only really used the usertags to do some rough-and-ready statistics
> on broad issue categories.)

Ok, thank you both.  A number of these bugs are currently recorded at severity
level 'normal'; unless told not to, I'll spend some time to double-check their
details and - assuming all looks OK - will bulk downgrade them to 'wishlist'
severity a week or so from now.


Re: reprotest: inadvertent misconfiguration in salsa-ci config

2024-03-04 Thread James Addison via rb-general
Hi Chris, Vagrant,

On Tue, 27 Feb 2024 at 17:44, Vagrant Cascadian
 wrote:
>
> On 2024-02-27, Chris Lamb wrote:
> >> * Update reprotest to handle a single-disabled-varations-value as a
> >>   special case - treating it as vary and/or emitting a warning.
>
> Well, I would broaden this to include an arbitrary number of negating
> options:
>
>   --variations=-time,-build_path
>
> That seems just as invalid.
>
> The one special case I could see is "--variations=-all" where you might
> want to be normalizing as much as possible.

Hmm, yep.  So when there are only subtractions, we _could_ imply that
there is an
implicit '+all' at the beginning of the 'variations' argument.

And along that line of thinking, we could emit a warning to stderr:

  $ reprotest auto --dry-run --variations=-timezone
  Implicitly expanding variations '-timezone' to '+all,-timezone'
  ...

> > On whether to magically/transparently fix this, needless to say, it's
> > considered bad practice to change the behaviour of software that has
> > already been released — I would, as a rule, subscribe to that idea.
> > However, we should bear in mind that this idea revolves around what
> > users are *expecting*, not necessarily what the software actually
> > does.
> >
> > I say that because I hazard that all 400 usages are indeed expecting
> > that `--variations=-foo` functions the same as `--variations=all,-foo`
> > (or `--vary=-foo`), and so this proposed change would merely be
> > modifying reprotest to reflect their existing expectations. It would
> > not therefore be a violation of the "don't break existing
> > functionality" dictum.
> >
> > (Saying that, the addition of a warning that we are doing so would
> > definitely not go amiss.)
>
> Hrm. Less inclined toward this approach; expectations can shift with
> time and context and culture and whatnot. That said, I agree the current
> behavior is confusing, and we should change something explicitly, rather
> than implicitly...

Changing-existing-behaviours could arguably be even more problematic for
cases like this where we're talking about continuous integration checks.

Breaking/unbreaking unrelated CI pipelines seems like something we should be
careful to avoid.

> >> * Treat removal of a variance factor from an already-empty-context
> >> as an error.
> >
> > I'm also tempted by this as well. :)  How would this be experienced by
> > most DDs? Would their new pushes to Salsa now suddenly fail in the
> > reprotest job of the pipeline? If so, that's not too awful, given that
> > the prominent error message would presumably let them know precisely
> > how to fix it.
>
> I would much prefer an error message if we can correctly identify this.

That'd be nice - perhaps something like:

  Failed to parse variations: '-timezone'; did you mean '+all,-timezone'?

I've opened a merge request[1] to explore this error-treatment approach; it
lacks useful error messaging so far, but I'll attempt to add that soon.

> Some possible expected behaviors to consider treating as invalid, and
> issue an error:
>
>   --variations=-build_path
>
>   --variations=-time,-build_path
>
> This almost makes me want to entirely deprecate --variations, and switch
> to recommending "--vary=-all,+whatever" or "--vary=-all
> --vary=+whatever" instead of ever using --variations.
>
> I'm not sure the variations syntax enables much that cannot be more
> unambiguously expressed with --vary.

I do think that supporting two command-line argument names that provide
similar operations (and use similar names!) is confusing.

However I'm inclined to limit the effect of any behaviour changes here to the
specific cases that we know are problematic (ref previous thoughts about CI
infrastructure).

> That said, the reprotest code is a bit hairy, and I am not sure what
> sort of refactoring will be needed to make this possible. In particular,
> how --auto-build is implemented, where it systematically tests each
> variation one at a time. That said, Refactoring might be needed
> regardless. :)

That's a neat bit of functionality in auto-build.  As far as I can tell, it
seems agnostic of whether the build specifications are provided by 'vary' or
'variations' -- but test coverage would be better at confirming that.

Regards,
James