Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread John Neffenger
Thank you, Vagrant, for taking my concerns seriously. I realize you've 
been working on this much longer than I have, so I appreciate your 
perspective.


On 3/6/24 10:55 AM, Vagrant Cascadian wrote:

That means that we do not always support each other in all things, but
we can support each other in most things, and that seems more important
to me, at least in this case.


That is the crux of the issue for me. Until now, I thought we were 
rather united in our goal, and we did support one another. Now our goals 
are fragmenting, and I think that makes us weaker.


I thought our common goal, no matter our project affiliation, was to 
enable reproducible builds everywhere: not just Debian, not just Linux 
distributions, but also F-Droid, Flatpak, AppImage, Snap packages, 
different build farms -- anywhere open-source software is built. That 
requires us to work together to eliminate all sources of nondeterminism.


My fear is that, by fragmenting our goals and losing the critical 
support of Debian, the rest of us may never get to that common goal.


John



Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread Vagrant Cascadian
On 2024-03-05, John Neffenger wrote:
> On 3/5/24 2:11 PM, Vagrant Cascadian wrote:
>>> I have no way to change these choices.
>> 
>> Then clearly you have not been provided sufficient information,
>> configuration, software, etc. in order to reproduce the build!
>
> Rather, I really can't change it or configure it any differently.
>
> Three builds:
>
> (1) A build on Launchpad submitted from their webpage uses this path:
>
>/build/openjfx/parts/jfx/build/
>
> (2) A remote build on Launchpad submitted locally with this command:
>
>$ snapcraft remote-build
>
> uses this path:
>
>  
> /build/snapcraft-openjfx-64b793849f913c7228cd17db40a05187/parts/jfx/build/
>
> (3) And a build run entirely local with this command:
>
>$ snapcraft
>
> uses this path:
>
>/root/parts/jfx/build/
>
> What am I to do?

Well, to state the obvious in this case, yes, you either need to fix
your tooling to support some mechanism to provide a consistent build
path, switch to different tooling that already supports a consistent
build path, or fix this particular software to build reproducibly
regardless of build paths.

Each approach has different advantages and disadvantages.


>> That was a fundamentally different issue about having builds not produce
>> bit-for-bit identical results still meeting some sort of reproducible
>> criterion, as opposed to this discussion is, as I see it, about
>> normalizing the path in which the build is performed in order to get
>> bit-for-bit identical results.
>
> I understand and recognize the difference you highlight between this 
> discussion and the previous one. Yet I would hesitate to call it 
> fundamental for the reasons below.
>
> The main reason people didn't want to relax any requirements back in 
> October 2022 is because then the pressure is off -- it removes our 
> leverage. If you lower our standards, we may never get the upstream 
> projects to the goal we really want: fully reproducible builds 
> independent of these random differences.

I guess we differ on the "main reason" ... both of us having
participated in that discussion. :)

I agree that higher standards are in general better, but I am more
concerned with the outcome than this particular issue regarding build
paths.

That said, I am very glad to hear there are projects actively working on
fixing build path issues!

I argued time and time again in favor of continuing to test build paths
in Debian, largely because some commonly used debian tooling still
varies build paths out of the box, I have filed dozens of build path
related bugs and marked hundreds of packages affected by build paths,
pushed for related changes in core packaging tooling in Debian
(e.g. dpkg, debhelper) to fix build paths issues... but I also see the
pragmatic reasons why it is tolerable, if not ideal, to just use
consistent build paths.


> It has sometimes taken me years(!) to get a single reproducible builds 
> pull request accepted.

Likewise. Which...

> If they find out they can be "reproducible" without some of these
> bothersome changes, it just makes my job that much more difficult.

... is why some people might want to prioritize which issues they want
to spend their time on. We always have to pick our battles, and allow
others to pick their battles.

That means that we do not always support each other in all things, but
we can support each other in most things, and that seems more important
to me, at least in this case.


> I'll make the same argument I made over a year ago:
>
> Reproducible builds is about /blasting/ away all the useless, 
> meaningless differences: the timestamps of files created during the 
> build, the unsorted order of files in their directories, or the random 
> build paths used in a transient container. When the useless differences 
> are removed, the meaningful differences can be found.

That is certainly one angle on it, and a good one!

Yet, the Reproducible Builds Definition is more flexible. It gives room
for individual projects to focus on their own priorities, while
requiring sticking to bit-for-bit reproducibility. 


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread Vagrant Cascadian
On 2024-03-05, John Gilmore wrote:
> A quick note:
> Vagrant Cascadian  wrote:
>> It would be pretty impractical, at least for Debian tests, to test
>> without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from
>> debian/changelog for quite a few years now.
>
> Making a small patch to the local dpkg to alter or remove the value of
> SOURCE_DATE_EPOCH, then trying to reproduce all the packages from source
> using that version of dpkg, would tell you which of them (newly) fail to
> reproduce because they depend on SOURCE_DATE_EPOCH.

Sure... which brings us to...

>> Sounds like an interesting project for someone with significant spare
>> time and computing resources to take on!
>
> It looks to me like the whole Ubuntu source code (that gets into the
> standard release) fits in about 25 GB.  The Debian 12.0.0 release
> sources fit in 83GB (19 DVD images).  Both of these are under 1% of a
> 10TB disk drive that runs about $200.  A recent Ryzen mini-desktop,
> with a 0.5TB SSD that could cache it all, costs about $300.  Is this
> significant computing resources?  For another $40 we could add a better
> heat sink and a USB fan.  How many days would recompiling a whole
> release take on this $540 worth of hardware?

You also notably left out ram requirements, which is almost more
important than CPU, from what I've seen!

You were not talking about a single pass through the archive, you asked
for a combinatorially explosive comparison (e.g. with and without build
paths, with and without SOURCE_DATE_EPOCH, with and without locale
differences, with and without username variations, etc.) ... and for it
to continue to be useful, you'd have to keep doing it... indefinitely.

Debian currently tests over 25 variations (most of which have actually
resulted in differences in the wild):

  https://tests.reproducible-builds.org/debian/index_variations.html

To systematically identify these "simply" through building each possible
combination for any significant set of software... is a much larger
task. Obviously, you could narrow it to only the set of variations you
want to research, or for a limited package set.

At least for Debian, with what I would guess is significantly more
computing power than you've described, usually did no better than about
30 days from the oldest build, meaning some packages were always
behind. We also blacklist some packges that just take too much ram, disk
or time, though that is considerably less that 1% of ~35k packages. More
importantly, that is with only two builds per package, not testing all
625 permutations of 25 interacting variations per package.


> (I agree that the "spare" time to set it up and configure the build
> would be the hard part. This is why I advocate for writing and
> releasing, directly in the source release DVDs, the tools that would
> automate the recompilation and binary comparison.  The end user should
> be able to boot the matching binary release DVD, download or copy in the
> source DVD images, and type "reproduce-release".)

Automation can help significantly, although at some point you need to
write all that automation, write the code that processes the results
meaningfully, and verify that it is working correctly... and continue to
verify it as new package versions come in, and so on.


In short, easier said than done?


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread Holger Levsen
On Tue, Mar 05, 2024 at 11:51:16PM +, Richard Purdie wrote:
> FWIW Yocto Project is a strong believer in build reproducibiity
> independent of build path and we've been quietly chipping away at those
> issues.
[...] 
> OpenEmbedded-Core (around 1000 pieces of software) is 100% reproducible
> and we have the tests to prove it running daily, building in different
> build paths and comparing the output.

that's awesome!

btw, https://www.yoctoproject.org/reproducible-build-results/ (linked
from https://reproducible-builds.org/who/projects/#Yocto%20Project)
doesn't show any results?

> We're working on our wider layers too, e.g. meta-openembedded has
> another 2000+ pieces of software and less than 100 are not
> reproducible.

nice.

we had 35000 pieces of software in Debian of which ~2000 were not 
reproducible with undeterministic build pathes. Now with build pathes
as part of the build environment it's less than half.
 
> So even if debian doesn't do this, there is interest elsewhere and I
> believe good progress is being made.
 
nice!


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Lebend in einer privilegierten Region und als Angehöriger einer Generation,
der es wahrscheinlich so gut geht wie keiner zuvor und danach, die in nicht
dagewesenem Maße die Ressourcen unserer Erde geplündert hat.


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread James Addison via rb-general
Hi Vagrant,

Narrowing in on (or perhaps nitpicking) a detail:

On Mon, 4 Mar 2024 at 20:41, Vagrant Cascadian
 wrote:
>
> On 2024-03-04, John Gilmore wrote:
> > Vagrant Cascadian wrote:
> >> > > to make it easier to debug other issues, although deprioritizing them
> >> > > makes sense, given buildd.debian.org now normalizes them.
> >
> > James Addison via rb-general  
> > wrote:
> >> Ok, thank you both.  A number of these bugs are currently recorded at 
> >> severity
> >> level 'normal'; unless told not to, I'll spend some time to double-check 
> >> their
> >> details and - assuming all looks OK - will bulk downgrade them to 
> >> 'wishlist'
> >> severity a week or so from now.
>
> Well, I think we should change it to "minor" rather than "wishlist"
> severity, but that may be splitting hairs; I do not find a huge amount
> of difference between debian bug severities... they are pretty much
> either critical/serious/grave and thus must be fixed, or
> normal/minor/wishlist and fixed when someone feels like it.

The Debian bug severity descriptions[1] provide some more nuance, and that
reassures me that wishlist should be appropriate for most of these bugs
(although I'll inspect their contents before making any changes).

Regards,
James

[1] - https://www.debian.org/Bugs/Developer#severities