Re: Reproducible Builds in May 2024: Missing paper link

2024-06-11 Thread Vagrant Cascadian
On 2024-06-11, John Gilmore wrote:
> Chris Lamb  wrote:
>> Secondly, Ludovic Courtès, Timothy Sample, Simon Tournier and Stefano
>> Zacchiroli have collaborated to publish a paper on "Source Code
>> Archiving to the Rescue of Reproducible Deployment" [42]. Their paper
>> was motivated because:
>> 
>> > The ability to verify research results and to experiment with
>> > methodologies are core tenets of science. As research results are
>> > increasingly the outcome of computational processes, software plays
>> > a central role. GNU Guix [43] is a software deployment tool that
>> > supports reproducible software deployment, making it a foundation
>> > for computational research workflows. To achieve reproducibility, we
>> > must first ensure the source code of software packages Guix deploys
>> > remains available.
>> 
>> A PDF of this article [44] is also available.
>> 
>>  [42] https://hal.science/hal-04586520
>>  [43] https://guix.gnu.org/
>>  [44] https://hal.science/hal-04582287/document
>
> Those links 42 and 44 do not lead to the cited paper.  They lead to
> the first paper discussed (which apparently appears twice in hal.science).

Link 42 looks correct to me.

Link 44 is definitely the wrong article.

The correct link for 44 is:

  https://hal.science/hal-04586520/document


Updated on website:

  
https://salsa.debian.org/reproducible-builds/reproducible-website/-/commits/master?ref_type=heads

Thanks!

live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprint Thursday, June 6th 17:00 UTC!

2024-05-21 Thread Vagrant Cascadian
I am hoping to schedule some Non-Maintainer Uploads (NMU) sprints,
starting with two thursdays from now...

Planning on meeting on irc.oftc.net in the #debian-reproducible channel
at 17:00UTC and going for an hour or two or three. Feel free to start
early or stay late, or even fix things on some other day!

We will have one or more Debian Developers available to sponsor uploads,
so even if you can't upload yourself but you know how to build a debian
package, please join us!


Unapplied patches:

  
https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&reproducible=1

This list is sorted by the oldest bugs with patches not marked pending,
so we can target bugs that have just stalled out for whatever reason,
but feel free to pick bugs that scratch your particular itch.

We will want to make sure the patch still applies and/or refresh the
patches, make sure it still solves the issue, and update the bug report
where appropriate.


Documentation about performing NMUs:

  https://www.debian.org/doc/manuals/developers-reference/pkgs.html#nmu

We will be uploading the to the DELAYED queue (presumably between 10 and
15 days).


If the package has been orphaned we can generally upload without delay
(check the https://tracker.debian.org/PACKAGE page which usually lists
this) and mark it as maintained by "Debian QA Group
" if needed.

If you are impatient, try fixing QA packages, as you can upload fixes
without delays:

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_maint_debian-qa.html


Let's fix some bugs!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: (java) Builds not reproducible on armhf

2024-05-21 Thread Vagrant Cascadian
On 2024-05-20, Mechtilde Stehmann wrote:
> I want to clean up my Java packages.
>
> There are several with FTBR. I found that the day of the *.poms s a date 
> from 1970.
>
> for example they are the packages
>
> vinnie

Looking at the history for vinnie:

  https://tests.reproducible-builds.org/debian/history/armhf/vinnie.html

It is only very recently that this started happening (2024-05-04)
without source changes in vinnie itself, so I would suspect some change
in the toolchain used to produce the .pom files?

commons-email is similar, although starting 2024-04-04:

  https://tests.reproducible-builds.org/debian/history/armhf/commons-email.html

ez-vcard is similar too, starting 2024-04-20:

  
https://tests.reproducible-builds.org/debian/rb-pkg/trixie/armhf/diffoscope-results/ez-vcard.html

Although some of those builds also have differences in some xz
contents... might just be related to the timestamp differences.


Wild hunch is one build is run on a 64-bit kernel (without a linux32
personality) and one build on a 32-bit kernel... that is one of the main
differences between these armhf test builds and builds on other
architectures, where this does not seem to happen...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: silx package from Debian

2024-05-06 Thread Vagrant Cascadian
On 2024-05-04, PICCA Frederic-Emmanuel wrote:
> Hello, I am trying to understand the non reproducible status of the Debian 
> silx package.
>
> here the info of the new version 2.0.0
>
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/silx.html
>
> Can you help me understand what is going on ?

Looks like it is probbaly some sort of sort ordering or randomness issue
in whatever is generating the documentation. Probably not silx specific,
but the tools it uses.

Disabling parallelism *might* help, but hard to know without further
testing...


> Does it block also the Debian package migration ?

It does not currently affect package migration.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: diffoscope 265 released πŸ’ 

2024-04-19 Thread Vagrant Cascadian
On 2024-04-19, Chris Lamb wrote:
> The diffoscope maintainers are pleased to announce the release of
> version 265 of diffoscope.

Signed tag please? :)

live well,
  vagrant


signature.asc
Description: PGP signature


NGI Zero funding projects

2024-04-18 Thread Vagrant Cascadian
Hey folks!

Do you have a reproducible builds or related project you wanted to work
on, but need some funding (~5k-50k euro) to make it happen?

Noticed that NGI Zero is open for accepting project applications till
June 1st:

  https://nlnet.nl/core/

They have funded some interesting projects in the past, such as various
bootstrappable builds work (e.g. GNU Mes):

  https://nlnet.nl/project/index.html

There may be future rounds as well if the deadline is a bit tight,
though I do hear the application process is pretty straightforward.

It is mostly focused on funding EU based developers.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Sticker Giveaway: Read code πŸŽ·πŸ›

2024-03-31 Thread Vagrant Cascadian
On 2024-04-01, kpcyrd wrote:
> in February I printed about 2k stickers to manifest the concept of 
> reviewing source code, picturing a bug throwing a party within the 
> codebases nobody reads.
>
> I usually spread these in my communities in person, due to recent events 
> I've decided to give some of them away by mail. To keep things simple 
> I'm going to eat the shipping cost.

Does shipping include the cat? Or was that just to demonstrate size?

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Reproducible Builds for recent Debian security updates

2024-03-30 Thread Vagrant Cascadian
On 2024-03-29, Vagrant Cascadian wrote:
> So far, I have not found any reproducibility issues; everything I tested
> I was able to get to build bit-for-bit identical with what is in the
> Debian archive.
>
> I only tested bookworm security updates (not bullseye)
...
> Not yet finished building:
>
>   openvswitch

So, the builds of openvswitch failed in the test suite...

... I performed another build with tests disabled, and the amd64
packages were bit-for-bit identical, but one of the arch:all packages,
"openvswitch-source" had an already known issue; embedded information
(username, uid, group, gid, timestamp ...) in the included tarball.

This matches the previous version tested in the reproducible builds test
infrastructure:

  
https://tests.reproducible-builds.org/debian/dbdtxt/bookworm/amd64/openvswitch_3.1.0-2.diffoscope.txt.gz

This is an explanable issue and I would say does not indicate anything
surprising or unexpected or malicious, just unfortunate that it is not
bit-for-bit reproducible, as it actually requires analysis!

The good news is that newer versions (~3.2.2+) in Debian trixie and
unstable of "openvswitch-source" fix this by shipping the source in a
directory rather than a tarball, which dpkg normalizes when generating
the .deb. So at least for future versions this issue is already fixed.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Reproducible Builds for recent Debian security updates

2024-03-30 Thread Vagrant Cascadian
On 2024-03-30, Vagrant Cascadian wrote:
> On 2024-03-30, Salvatore Bonaccorso wrote:
>> On Fri, Mar 29, 2024 at 07:38:35PM -0700, Vagrant Cascadian wrote:
>>> Philipp Kern asked about trying to do reproducible builds checks for
>>> recent security updates to try to gain confidence about Debian's buildd
>>> infrastructure, given that they run builds in sid chroots which may have
>>> used or built or run a vulnerable xz-utils...
> ...
>> There would be an upcoming (or actually postponed) util-linux update
>> as well. Could you as extra paranoia please verify these here as well
>> (I assume its enough for you that the source package is signed, I
>> stripped the signature from the changes):
>>
>> https://people.debian.org/~carnil/tmp/util-linux/
>
> I don't see any source packages there, just .deb .changes and signed
> .buildinfo files! The signed .buildinfo files are great, but would
> definitely need the source code ... looks like the util-linux changes
> are in a git branch, but a signed .dsc would be nice just to be sure I
> am testing the same thing. That said, testing from git and getting
> bit-for-bit identical results ... would be confidence inspiring!
> Hmmm. Might just go for it, and if we have issues, maybe try to dig up
> the .dsc? :)

Hah. Almost in the time it took me to wonder about git vs. .dsc builds,
even with some minor differences in the build-depends, managed a
bit-for-bit identical build of util-linux:amd64 and util-linux:all!

Tarball of build logs and .buildinfo files:

  
https://people.debian.org/~vagrant/util-linux-2.38.1-5+deb12u1.verification.tar.zst

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Reproducible Builds for recent Debian security updates

2024-03-30 Thread Vagrant Cascadian
On 2024-03-30, Salvatore Bonaccorso wrote:
> On Fri, Mar 29, 2024 at 07:38:35PM -0700, Vagrant Cascadian wrote:
>> Philipp Kern asked about trying to do reproducible builds checks for
>> recent security updates to try to gain confidence about Debian's buildd
>> infrastructure, given that they run builds in sid chroots which may have
>> used or built or run a vulnerable xz-utils...
...
> Thanks a lot for doing this verification work!

It is such an obvious application for Reproducible Builds that many
people have worked on for many years. So... I daresay, my pleasure and
honor. :)


> There would be an upcoming (or actually postponed) util-linux update
> as well. Could you as extra paranoia please verify these here as well
> (I assume its enough for you that the source package is signed, I
> stripped the signature from the changes):
>
> https://people.debian.org/~carnil/tmp/util-linux/

I don't see any source packages there, just .deb .changes and signed
.buildinfo files! The signed .buildinfo files are great, but would
definitely need the source code ... looks like the util-linux changes
are in a git branch, but a signed .dsc would be nice just to be sure I
am testing the same thing. That said, testing from git and getting
bit-for-bit identical results ... would be confidence inspiring!
Hmmm. Might just go for it, and if we have issues, maybe try to dig up
the .dsc? :)

live well,
  vagrant


signature.asc
Description: PGP signature


Reproducible Builds for recent Debian security updates

2024-03-29 Thread Vagrant Cascadian
Philipp Kern asked about trying to do reproducible builds checks for
recent security updates to try to gain confidence about Debian's buildd
infrastructure, given that they run builds in sid chroots which may have
used or built or run a vulnerable xz-utils...

So far, I have not found any reproducibility issues; everything I tested
I was able to get to build bit-for-bit identical with what is in the
Debian archive.

I only tested bookworm security updates (not bullseye), and I tested the
xz-utils update now present in unstable, which took a little trial and
error to find the right snapshot! The build dependencies for Debian
bookworm (a.k.a. stable) were *much* easier to satisfy, as it is not a
moving target!


Debian bookworm security updates verified:

  cacti iwd libuv1 pdns-recursor samba composer fontforge knot-resolver
  php-dompdf-svg-lib squid yard

Not yet finished building:

  openvswitch

Did not yet try some time and disk-intensive builds:

  chromium firefox-esr thunderbird

Debian unstable updates verified:

  xz-utils


A tarball of build logs (including some failed builds) and .buildinfo
files is available at:

  https://people.debian.org/~vagrant/debian-security-rebuilds.tar.zst


Some caveats:

Notably, xz-utils has a build dependency that pulls in xz-utils, and the
version used may have been a vulnerable version (partly vulnerable?),
5.6.0-0.2.

The machine where I ran the builds had done some builds using packages
from sid over the last couple months, so may have at some point run the
vulnerable xz-utils code, so is not absolutely cleanest of
checks... but is at least some sort of data point.

The build environment used tarballs that had usrmerge applied (as it is
harder to not apply usrmerge these days), while the buildd
infrastructure chroots do not have usrmerge applied. But this did not
appear to cause significant problems, although pulled in a few more perl
dependencies!


I used sbuild with the --chroot-mode=unshare mode. For the xz-utils
build I used some of the ideas developed in an earlier verification
builds experiment:

  
https://salsa.debian.org/reproducible-builds/debian-verification-build-experiment/-/blob/e003ddf19de13db2d512c25417e4bec863c3a082/sbuild-wrap#L71


Was great to try and apply Reproducible Builds to real-world uses!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-12 Thread Vagrant Cascadian
On 2024-03-12, Holger Levsen wrote:
> On Mon, Mar 11, 2024 at 06:24:22PM +, James Addison via rb-general wrote:
>> Please find below a draft of the message I'll send to each affected 
>> bugreport.
>
> looks good to me, thank you for doing this!
>  
>> Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_
>> continue to test build-path variance, at least until we decide otherwise.
>
> this is in fact a bug and should be fixed with the next reprotest release.

That is not a reprotest bug, but an infrastructure issue for the
debian-specific salsa-ci configuration. Reprotest is not a
debian-specific tool.

Reprotest should continue to vary build paths by default; reprotest
historically and currently defaults to enabling all variations and
making an exception does not seem worth the opinionated change of
behavior. By design, reprotest is easy to configure which variations to
enable and disable as needed.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread Vagrant Cascadian
On 2024-03-05, John Neffenger wrote:
> On 3/5/24 2:11 PM, Vagrant Cascadian wrote:
>>> I have no way to change these choices.
>> 
>> Then clearly you have not been provided sufficient information,
>> configuration, software, etc. in order to reproduce the build!
>
> Rather, I really can't change it or configure it any differently.
>
> Three builds:
>
> (1) A build on Launchpad submitted from their webpage uses this path:
>
>/build/openjfx/parts/jfx/build/
>
> (2) A remote build on Launchpad submitted locally with this command:
>
>$ snapcraft remote-build
>
> uses this path:
>
>  
> /build/snapcraft-openjfx-64b793849f913c7228cd17db40a05187/parts/jfx/build/
>
> (3) And a build run entirely local with this command:
>
>$ snapcraft
>
> uses this path:
>
>/root/parts/jfx/build/
>
> What am I to do?

Well, to state the obvious in this case, yes, you either need to fix
your tooling to support some mechanism to provide a consistent build
path, switch to different tooling that already supports a consistent
build path, or fix this particular software to build reproducibly
regardless of build paths.

Each approach has different advantages and disadvantages.


>> That was a fundamentally different issue about having builds not produce
>> bit-for-bit identical results still meeting some sort of reproducible
>> criterion, as opposed to this discussion is, as I see it, about
>> normalizing the path in which the build is performed in order to get
>> bit-for-bit identical results.
>
> I understand and recognize the difference you highlight between this 
> discussion and the previous one. Yet I would hesitate to call it 
> fundamental for the reasons below.
>
> The main reason people didn't want to relax any requirements back in 
> October 2022 is because then the pressure is off -- it removes our 
> leverage. If you lower our standards, we may never get the upstream 
> projects to the goal we really want: fully reproducible builds 
> independent of these random differences.

I guess we differ on the "main reason" ... both of us having
participated in that discussion. :)

I agree that higher standards are in general better, but I am more
concerned with the outcome than this particular issue regarding build
paths.

That said, I am very glad to hear there are projects actively working on
fixing build path issues!

I argued time and time again in favor of continuing to test build paths
in Debian, largely because some commonly used debian tooling still
varies build paths out of the box, I have filed dozens of build path
related bugs and marked hundreds of packages affected by build paths,
pushed for related changes in core packaging tooling in Debian
(e.g. dpkg, debhelper) to fix build paths issues... but I also see the
pragmatic reasons why it is tolerable, if not ideal, to just use
consistent build paths.


> It has sometimes taken me years(!) to get a single reproducible builds 
> pull request accepted.

Likewise. Which...

> If they find out they can be "reproducible" without some of these
> bothersome changes, it just makes my job that much more difficult.

... is why some people might want to prioritize which issues they want
to spend their time on. We always have to pick our battles, and allow
others to pick their battles.

That means that we do not always support each other in all things, but
we can support each other in most things, and that seems more important
to me, at least in this case.


> I'll make the same argument I made over a year ago:
>
> Reproducible builds is about /blasting/ away all the useless, 
> meaningless differences: the timestamps of files created during the 
> build, the unsorted order of files in their directories, or the random 
> build paths used in a transient container. When the useless differences 
> are removed, the meaningful differences can be found.

That is certainly one angle on it, and a good one!

Yet, the Reproducible Builds Definition is more flexible. It gives room
for individual projects to focus on their own priorities, while
requiring sticking to bit-for-bit reproducibility. 


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-06 Thread Vagrant Cascadian
On 2024-03-05, John Gilmore wrote:
> A quick note:
> Vagrant Cascadian  wrote:
>> It would be pretty impractical, at least for Debian tests, to test
>> without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from
>> debian/changelog for quite a few years now.
>
> Making a small patch to the local dpkg to alter or remove the value of
> SOURCE_DATE_EPOCH, then trying to reproduce all the packages from source
> using that version of dpkg, would tell you which of them (newly) fail to
> reproduce because they depend on SOURCE_DATE_EPOCH.

Sure... which brings us to...

>> Sounds like an interesting project for someone with significant spare
>> time and computing resources to take on!
>
> It looks to me like the whole Ubuntu source code (that gets into the
> standard release) fits in about 25 GB.  The Debian 12.0.0 release
> sources fit in 83GB (19 DVD images).  Both of these are under 1% of a
> 10TB disk drive that runs about $200.  A recent Ryzen mini-desktop,
> with a 0.5TB SSD that could cache it all, costs about $300.  Is this
> significant computing resources?  For another $40 we could add a better
> heat sink and a USB fan.  How many days would recompiling a whole
> release take on this $540 worth of hardware?

You also notably left out ram requirements, which is almost more
important than CPU, from what I've seen!

You were not talking about a single pass through the archive, you asked
for a combinatorially explosive comparison (e.g. with and without build
paths, with and without SOURCE_DATE_EPOCH, with and without locale
differences, with and without username variations, etc.) ... and for it
to continue to be useful, you'd have to keep doing it... indefinitely.

Debian currently tests over 25 variations (most of which have actually
resulted in differences in the wild):

  https://tests.reproducible-builds.org/debian/index_variations.html

To systematically identify these "simply" through building each possible
combination for any significant set of software... is a much larger
task. Obviously, you could narrow it to only the set of variations you
want to research, or for a limited package set.

At least for Debian, with what I would guess is significantly more
computing power than you've described, usually did no better than about
30 days from the oldest build, meaning some packages were always
behind. We also blacklist some packges that just take too much ram, disk
or time, though that is considerably less that 1% of ~35k packages. More
importantly, that is with only two builds per package, not testing all
625 permutations of 25 interacting variations per package.


> (I agree that the "spare" time to set it up and configure the build
> would be the hard part. This is why I advocate for writing and
> releasing, directly in the source release DVDs, the tools that would
> automate the recompilation and binary comparison.  The end user should
> be able to boot the matching binary release DVD, download or copy in the
> source DVD images, and type "reproduce-release".)

Automation can help significantly, although at some point you need to
write all that automation, write the code that processes the results
meaningfully, and verify that it is working correctly... and continue to
verify it as new package versions come in, and so on.


In short, easier said than done?


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-05 Thread Vagrant Cascadian
On 2024-03-05, John Gilmore wrote:
  ... it makes reproducibilty from around 80-85% of all
 packages to >95%, IOW with this shortcut we can have meaningful 
 reproducibility
 *many years* sooner, than without.
...
> I'd rather that we knew and documented that 57% of
> packages are absolutely reproducible, 23% require SOURCE_DATE_EPOCH, and
> 12% still require a standardized source code directory, than to claim
> all 95% are "meaningfully reproducible" today.

Sounds like an interesting project for someone with significant spare
time and computing resources to take on!

I take "meaningfully reproducible" to mean it is documented how to
produce bit-for-bit identical results. In some cases, this requires
metadata (e.g. Debian .buildinfo file) that you need to reproduce the
build environment, and in some cases, this means you use the standard
build tool for the distribution (e.g. nix or guix).

Those numbers Holger mentioned were because we historically had a
compromise where our tests on tests.reproducible-builds.org Debian
testing did not vary the build path and Debian unstable did vary the
build path, and the difference mostly held at about 10-15% over the
years.

In Debian, the build path is usually included in the .buildinfo file (at
least for builds produced by Debian), which describes the packages and
dependencies and various things about the build environment necessary to
reproduce the build.

It would be pretty impractical, at least for Debian tests, to test
without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from
debian/changelog for quite a few years now. Unless you want to test
reproducibility of antique Debian releases...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-05 Thread Vagrant Cascadian
On 2024-03-05, John Neffenger wrote:
> On 3/5/24 8:08 AM, John Gilmore wrote:
>> Our instructions for reproducing any package would have to identify what
>> container/chroot/namespace/whatever the end-user must set up to be able
>> to successfully reproduce a package.

The build instructions always have to identify what defines the build
environment and exactly what that includes may be different from project
to project.


> And even then, it won't always work.
>
> I need to verify the JavaFX builds done by Launchpad, for example, where 
> its LXD container uses a build path as follows:
>
> /build/openjfx/parts/jfx/build/
>
> When I run the same build locally using the same command and a local LXD 
> container, it uses a build path as follows and fails to be reproducible:
>
> /root/parts/jfx/build/
>
> I have no way to change these choices.

Then clearly you have not been provided sufficient information,
configuration, software, etc. in order to reproduce the build!


> I intend to fix this reproducibility bug, and I shouldn't get away
> with not fixing it!

> JDK-8307082: Build path is recorded in JavaFX Controls module
> https://bugs.openjdk.org/browse/JDK-8307082

Great, please do, we will all be better off for it having been fixed!


>> If we move the goal posts in order to claim victory, who are we fooling
>> but ourselves?

There are no moving goalposts, as the goal has always been to be able to
be able to independently verify bit-for-bit results.

Maybe we take the bike path to get there, maybe we go by train or
hovercraft or jetpack. Some ways might be easier or more expensive or
have other consequences or downsides, but as long as the destination is
bit-for-bit reproducible... so be it, if it does the job.

Normalized build environments have been a technique to achieve
reproducible builds, even going back to the early work in bitcoin and
tor over a decade ago, and is used by various projects today to achieve
bit-for-bit identical reproducible builds.

While making software immune to various forms of non-determinism in the
build environment is certainly preferred, it is not the one and only
true way to achieve bit-for-bit identical results that are able to be
independently verified.

  https://reproducible-builds.org/docs/definition/

  "A build is reproducible if given the same source code, build
  environment and build instructions, any party can recreate bit-by-bit
  identical copies of all specified artifacts.

  The relevant attributes of the build environment, the build
  instructions and the source code as well as the expected reproducible
  artifacts are defined by the authors or distributors. The artifacts of
  a build are the parts of the build results that are the desired
  primary output."

The authors or distributors may choose to include build paths or various
other things (e.g. LANG=C, LC_ALL=C.UTF-8) as part of their build
instructions.

So yes, the fewer rube-goldbergian contraptions you need to set up in
order to produce the build environment, the better, surely!

It is technically possible to get an exact matching clock across two
builds for matching timestamps, but it is non-trivial and I would say
unreasonable (but very interesting academic work!). Given current
technologies, it would be unreasonable to expect setting the quark spin
state of particles involved the build process... some things are
relatively easy to normalize.

Builds that build independent of the build path are better for
reproducible builds and even other reasons (e.g. storing the build path
takes a few extra "useless" bits). With most software, it is possible to
get it to build idependent of the build path. With some software, it is
unfortunately more difficult.


> I agree completely. In fact, almost all of us agreed completely on this 
> issue in October 2022:
>
> Give guidance on reproducible builds #1865
> https://github.com/coreinfrastructure/best-practices-badge/issues/1865
>
> Why is this coming up again as if we've forgotten all those arguments 
> against it?

That was a fundamentally different issue about having builds not produce
bit-for-bit identical results still meeting some sort of reproducible
criterion, as opposed to this discussion is, as I see it, about
normalizing the path in which the build is performed in order to get
bit-for-bit identical results.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-03-04 Thread Vagrant Cascadian
On 2024-03-04, John Gilmore wrote:
> Vagrant Cascadian wrote:
>> > > to make it easier to debug other issues, although deprioritizing them
>> > > makes sense, given buildd.debian.org now normalizes them.
>
> James Addison via rb-general  wrote:
>> Ok, thank you both.  A number of these bugs are currently recorded at 
>> severity
>> level 'normal'; unless told not to, I'll spend some time to double-check 
>> their
>> details and - assuming all looks OK - will bulk downgrade them to 'wishlist'
>> severity a week or so from now.

Well, I think we should change it to "minor" rather than "wishlist"
severity, but that may be splitting hairs; I do not find a huge amount
of difference between debian bug severities... they are pretty much
either critical/serious/grave and thus must be fixed, or
normal/minor/wishlist and fixed when someone feels like it.


> I may be confused about this.  These bug reports are that a package cannot
> be reproducibly built because its output binary depends on the directory in 
> which
> it was built?
>
> Why would these become "wishlist" bugs as opposed to actual reproducibility 
> bugs
> that deserve fixing, just because one server at Debian no longer invokes this
> bug because it always uses the same build directory?
>
> If an end user can't download a source package (into any directory on
> any machine), and build it into the same exact binary as the one that Debian
> ships, this is not a "wishlist" idea for some future enhancement.  This
> is a real issue that prevents the code from being reproducible.

I agree it is a real issue, but admit it is fairly easy to work around,
given most package building tools use chroots or containers or similar,
it seems acceptible to treat build paths as a lower priority. Compare
that to timestamps, which are non-trivial to force to use the exact same
clock moving at the exact same rate, I would say build path
normalization is quite tolerable, if not ideal.

You cannot just build on "any machine", the machine needs to have a
sufficiently similar build environment (e.g. exactly matching compiler
versions, same architecture, etc.) and weather the build path is part of
that or not is simply a decision to make.

Several (many?) other distros normalize the build path as part of their
standard build tooling; Debian is arguably a latecomer to that practice.

I have definitely argued in favor of addressing build path issues, and
encourage people to fix them, and have personally spent more than a
small amount of time working on it, and we have made huge progress on
fixing (tens of?) thousands of them.

There are only so many hours in the day and so many people actively
working on fixing things... there may be bigger fires to put out at the
moment.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Two questions about build-path reproducibility in Debian

2024-02-27 Thread Vagrant Cascadian
On 2024-02-15, James Addison via rb-general wrote:
> A quick recap: in July 2023, Debian's package build infrastructure
> (buildd) intentionally began using a fixed directory path during
> package builds (bug #1034424).  Previously, some string randomness
> existed within each source build directory path.
>
> I've two questions related to buildpaths - one relevant to the
> Salsa-CI team, and the other a RB-team housekeeping question:
>
>   1. [Salsa] Recently Debian's CI pipeline was reconfigured[1] to
> enable more variance in builds.  However: I think that change also
> (inadvertently?) enabled buildpath variation.  Is that useful and/or
> aligned with Debian package migration incentives[2] -- or should we
> disable that buildpath variance?

I think it might be worth disabling build path variations by default in
salsa-ci, although making it possible for people to override.


>   2. [RB] Housekeeping: we use Debian's bugtracker to record packages
> with buildpath-related build problems[3].  Do we want to keep those
> bugs open, or should we close them?

I think the bugs should remain open, but perhaps downgraded to minor or
wishlist?

While buildd.debian.org does now use a predictible path, sbuild does not
by default and requires slightly tricky manual intervention to get the
right path; many people still may perform local builds in their home
directory; I am not sure if pbuilder now defaults to matching
buildd.debian.org, though it is possible to specify the build path (as
seen on tests.reproducible-builds.org!); reprotest still uses randomized
build paths, although a WIP branch exists:

  https://salsa.debian.org/reproducible-builds/reprotest/-/merge_requests/22

There are real-world build path issues, and while it is possible to work
around them in various ways, I think they are still issues worth fixing
to make it easier to debug other issues, although deprioritizing them
makes sense, given buildd.debian.org now normalizes them.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: reprotest: inadvertent misconfiguration in salsa-ci config

2024-02-27 Thread Vagrant Cascadian
On 2024-02-27, Chris Lamb wrote:
>> * Update reprotest to handle a single-disabled-varations-value as a
>>   special case - treating it as vary and/or emitting a warning.

Well, I would broaden this to include an arbitrary number of negating
options:

  --variations=-time,-build_path

That seems just as invalid.

The one special case I could see is "--variations=-all" where you might
want to be normalizing as much as possible.


> On whether to magically/transparently fix this, needless to say, it's
> considered bad practice to change the behaviour of software that has
> already been released β€” I would, as a rule, subscribe to that idea.
> However, we should bear in mind that this idea revolves around what
> users are *expecting*, not necessarily what the software actually
> does.
>
> I say that because I hazard that all 400 usages are indeed expecting
> that `--variations=-foo` functions the same as `--variations=all,-foo`
> (or `--vary=-foo`), and so this proposed change would merely be
> modifying reprotest to reflect their existing expectations. It would
> not therefore be a violation of the "don't break existing
> functionality" dictum.
>
> (Saying that, the addition of a warning that we are doing so would
> definitely not go amiss.)

Hrm. Less inclined toward this approach; expectations can shift with
time and context and culture and whatnot. That said, I agree the current
behavior is confusing, and we should change something explicitly, rather
than implicitly...


>> * Treat removal of a variance factor from an already-empty-context
>> as an error.
>
> I'm also tempted by this as well. :)  How would this be experienced by
> most DDs? Would their new pushes to Salsa now suddenly fail in the
> reprotest job of the pipeline? If so, that's not too awful, given that
> the prominent error message would presumably let them know precisely
> how to fix it.

I would much prefer an error message if we can correctly identify this.

Some possible expected behaviors to consider treating as invalid, and
issue an error:

  --variations=-build_path

  --variations=-time,-build_path

This almost makes me want to entirely deprecate --variations, and switch
to recommending "--vary=-all,+whatever" or "--vary=-all
--vary=+whatever" instead of ever using --variations.

I'm not sure the variations syntax enables much that cannot be more
unambiguously expressed with --vary.

That said, the reprotest code is a bit hairy, and I am not sure what
sort of refactoring will be needed to make this possible. In particular,
how --auto-build is implemented, where it systematically tests each
variation one at a time. That said, Refactoring might be needed
regardless. :)


live well,
  vagrant


signature.asc
Description: PGP signature


Re: How to verify a package by rebuilding it locally on Debian?

2024-02-13 Thread Vagrant Cascadian
On 2024-02-12, cen wrote:
> I would like to verify that a package is reproducible by rebuilding it 
> locally on Debian (bookworm).
...
> I found https://buildinfos.debian.net and I can in theory fetch a 
> .buildinfo file from there using the correct package version and arch 

Yeah, buildinfos.debian.net should get you the .buildinfo file for
packages actually present in Debian...


> but debrebuild is not happy about it:
>
> debrebuild --buildresults=./artifacts --builder=mmdebstrap 
> nano_7.2-1_amd64.buildinfo
> Unknown option: buildresults
> nano_7.2-1_amd64.buildinfo contained a GPG signature; it has NOT been 
> validated (debrebuild does not support this)!
> Use of uninitialized value $srcpkgver in substitution (s///) at 
> /usr/bin/debrebuild line 246.
> refusing to overwrite the input buildinfo file

Well, this looks very similar to the documented use in the debrebuild
manpage, so probably a bug report to devscripts/debrebuild is in order.

If you're lucky, debrebuild *should* work, but there have been issues
with snapshot.debian.org that make it less reliable than one might
hope.

There is a work-in-progress on a snapshot replacement for the purposes
of rebuilding all packages currently in Debian, though it needs more
work and possibly a different frontend, or to add support for it to
debrebuild, as it is a little different design from snapshot.debian.org.


So, in short, no, there is nothing quite working yet, although there is
work in that direction; now that we have demonstrated reproducible
builds as more than theoretically possible, this is a pretty important
goal for Debian in 2024!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Please review the draft for December's report

2024-01-11 Thread Vagrant Cascadian
On 2024-01-11, John Gilmore wrote:
> https://reproducible-builds.org/reports/2023-12/
>
>   "Reproducible Builds in December 2023
>
>Welcome to the November 2023 report..."
>
> It seems better to NOT reproduce the previous month's header quite so
> accurately.  ;-/

Heh, thanks!

Just pushed a fix.

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Debating Full Source Bootstrap

2023-11-16 Thread Vagrant Cascadian
On 2023-11-16, aho...@0w.se wrote:
> On Wed, Nov 15, 2023 at 11:11:47AM -0800, Vagrant Cascadian wrote:
>> On 2023-11-15, aho...@0w.se wrote:
>> > I challenge you to explain how the use (of an arbitrary implementation)
>> > of a toolchain and of the other necessary tools affects the
>> > certainty of *source-only-based* provenance of the result in VSOBFS.
>
>> It certainly seems source-based, and it makes a strong correlation
>> between the source and the resulting artifacts by getting to a
>> bit-for-bit identical result from diverse paths.
>
> You seem to be unaware of the fact that VSOBFS ensures equivalence between
> the artifacts and the sources.

I am operating under the assumption (based on what you have said, as I
have not personally verified it) that VSOBFS provides bit-for-bit
identical artifacts built from a given source, with an arbitrary set of
toolchain implementations on a variety of OSes.

Which is to say, no, I am not unaware of this...

It is exactly why I have been praising VSOBFS. It combines aspects of
reproducible builds, diverse-double compiling, and bootstrapping to
provide this "equivalence between the artifacts and the sources" with
what I would call a very high degree of confidence.


> Calling equivalence "a strong correlation" can be mistaken for an attempt
> to spread FUD. This would be very unfortunate, wouldn't it?

It is most unfortunate that you appear to be inferring that from what I
said.


>> Source only? Sure!
>> Verifyable so? Sure!
>> Full source? *shrug*
>
> You still did not answer the question, so let me repeat:
>
> how the use (of an arbitrary implementation)
> of a toolchain and of the other necessary tools affects the
> certainty of *source-only-based* provenance of the result in VSOBFS?
> ^^^
>
> My answer:   does not affect in any way.
> Your answer: ?

I did not refute what you are saying because I did not disagree with it.
I agree, based on what you are saying VSOBFS does, that it provides a
very strong degree of certainty that the sources are what is used to
produce bit-for-bit identical artifacts regardless of toolchain. With my
reproducible builds hat on, that is something to cheer on!

I simply do not agree that VSOBFS is a Full Source Bootstrap, as
mentioned in the article that seemed to evoke such a strong response.

Both the Guix Full Source Bootstrap and VSOBFS provide an auditability
path from source to binary artifacts with very strong confidence. This
is not a zero-sum game.


>> I also note, that presumably using a guix or live-bootstrap based
>> toolchain as one of the possible diverse implementations for VSOBFS,
>> makes an even stronger correlation. That is the beauty of diverse
>
>> These projects can be used to make even stronger claims than any
>> individual project could alone.
>
> Your reasoning is based on an incorrect premise that VSOBFS would
> lack some of its crucial key virtues. This can only indicate that
> you did not get sufficient information (I can hardly think of any
> other reason?).

Not at all! I made no claim that it lacks any "crucial key virtues".


> Guix can not become "a foundation for" or "an implementation of VSOBFS",
> because the very concept of VSOBFS is to be its own complete foundation,
> usable with a wide set of starting points.

You can take an arbitrary C toolchain, yes? By using the C toolchain
that Guix or live-bootstrap provide to build VSOBFS as one of the
possible C toolchains, as part of the set of starting points, you get
the benefits of VSOBFS as well as the guix/live-bootstrap projects.

VSOBFS strength does not rely on the auditability of the potential set
of diverse starting points, but surely it is more ideal if some of the
starting points are themselves more auditable?


> Let me provide some basic facts:
>
> 1. VSOBFS yields the byte-for-byte identical result, irrespective of
> the host platform used to start from. This reflects the equivalence
> to the source, not a "correlation".

I do not believe anything is infallible, and so I typically try to speak
in terms of degree of confidence. So to me it is a strong correlation,
or strong confidence of equivalence if you prefer, not an absolute
proof.

This certainly makes me terrible at marketing.


> 2. VSOBFS is a full-strength solution.
>
> Talking about "stronger correlation" and "stronger claims" can presumably
> only stem from your insufficient familiarity with the matter.

Again, I speak in terms of confidence and stronger claims, not
absolutes.


> Otherwise it could even look like an insistent continuation of
> FUD. Nice that we have avoided such an uncomfortable interpretation.

I praise VSOBFS for an imp

Debating Full Source Bootstrap

2023-11-15 Thread Vagrant Cascadian
On 2023-11-15, aho...@0w.se wrote:
> On Tue, Nov 14, 2023 at 03:00:29PM -0800, Vagrant Cascadian wrote:
>> On 2023-11-14, aho...@0w.se wrote:
>> > On Tue, Nov 14, 2023 at 10:18:01AM -0800, Vagrant Cascadian wrote:
>> >> On 2023-11-14, aho...@0w.se wrote:
>> > The result of VSOBFS does not depend on the host binaries used in
>> > the process. You can freely replace them with ones of your choice,
>> > as long as those are functional at all.
>> 
>> Not quite full agreement, apparently. Just because you can freely
>> replace them does not mean to me that it is fully from source. It still
>> depends on arbitrary toolchains outside of the source. That kind of just
>> sounds like... bootstrapping.
>
> I appreciate your friendly tone and the occasion to discuss
> the topics related to reproducible builds and to VSOBFS.
>
> At the same time, it is hard to appreciate that you continue with
> persuasive definition of "dependency", superficially convenient to
> discredit the VSOBFS in the contended priority claim.

Can you build it without a preexisting C toolchain and running kernel?

To me, something required to build is a... dependency.

It is a bit disappointing to have something so straightforward and
presented in good faith be treated as anything else.


> I challenge you to explain how the use (of an arbitrary implementation)
> of a toolchain and of the other necessary tools affects the
> certainty of *source-only-based* provenance of the result in VSOBFS.

It certainly seems source-based, and it makes a strong correlation
between the source and the resulting artifacts by getting to a
bit-for-bit identical result from diverse paths.

Source only? Sure!
Verifyable so? Sure!
Full source? *shrug*

I also note, that presumably using a guix or live-bootstrap based
toolchain as one of the possible diverse implementations for VSOBFS,
makes an even stronger correlation. That is the beauty of diverse
implementations!

These projects can be used to make even stronger claims than any
individual project could alone.


>> > sure about the source provenance of the resulting OS, regardless which
>> > hard- and software you have used.
>> 
>> These are great properties! But... not what I would call a full source
>> bootstrap. So perhaps we just disagree on terms. I would call VSOBFS
>
> We do disagree on terms.
>
>> something like "Diversely Verifiable Bootstrap" based on the description.
...
> a redefinition of VSOBFS (which for a reason stands for *all* of
> "Verifiable Source Only Bootstrap") feels like a hostile move
> meant to undermine my priority position against Guix's offensive marketing.

I tried to incorporate my understanding and excitement of how VSOBFS
incorportates (elements of?) Diverse Double-Compiling into a
bootstrapping process and a way to simply describe that.

There was simply no hostility intended, apologies.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Debating Full Source Bootstrap

2023-11-14 Thread Vagrant Cascadian
On 2023-11-14, aho...@0w.se wrote:
> On Tue, Nov 14, 2023 at 10:18:01AM -0800, Vagrant Cascadian wrote:
>> On 2023-11-14, aho...@0w.se wrote:
>> > On Sun, Nov 12, 2023 at 06:19:31PM -0800, Vagrant Cascadian wrote:
>> >> The very thing the "Full-Source Bootstrap" builds is a C development
>> >> toolchain; that is arguably the whole point of the "Full-Source
>> >> Bootstrap" ... to avoid starting with a C development toolchain, by
>> >> starting from source, and building up to a working C toolchain...
>> >
>> > Before addressing in detail, TL;DR:
>> > The above looks regrettably like a persuasive definition[1].
>> ...
>> > [1] "a form of stipulative definition which purports to describe the true
>> > or commonly accepted meaning of a term, while in reality stipulating
>> > an uncommon or altered use, usually to support an argument for some 
>> > view"
>> > https://en.wikipedia.org/wiki/Persuasive_definition
>> 
>> I dispute the use of "altered or uncommon". Starting with a prebuilt C
>> compiler and claiming to bootstrap from source is a far more altered or
>> uncommon meaning for "Full Source Bootstrap" to me than bootstrapping
>> from an auditable hex binary.
>
>> > Why would "full-source" mean something other than
>> > that dependencies on anything but source are excluded?
>> 
>> I aboslutely agree with you here! Which is why I am so confused by this
>> whole argument!
>
> Then we are in full agreement.
> The result of VSOBFS does not depend on the host binaries used in
> the process. You can freely replace them with ones of your choice,
> as long as those are functional at all.

Not quite full agreement, apparently. Just because you can freely
replace them does not mean to me that it is fully from source. It still
depends on arbitrary toolchains outside of the source. That kind of just
sounds like... bootstrapping.

Though... what is really exciting is VSOBFS has the excellent property
that you can replace the starter toolchain with some set of binaries;
this has many (all?) of the properties of Diverse Double-Compiling that
I was hoping would be incorporated in a bootstrap process! That is
truely, truely great!


>> I don't understand why you are arguing that something that depends on a
>> prebuilt existing C compiler toolchain to be a full source bootstrap.
>
> You think probably about dependencies on the *contents* of a binary,
> in other words, on a certain *implementation* of some functionality,
> be it a compiler or a copy program.
>
> VSOBFS postulates multiple, diverse, unrelated bootstraps which makes
> any faulty corresponding implementation to produce a diversion, while
> sane host systems all yield the same result.

This is really great, yes.


> The sanity indication is reliable to any desirable degree, by adding
> the diversity and choosing host systems which in practical terms are
> least vulnerable to possible coordinated attacks.

Yes!


> Moreover, some of this work (to bootstrap on diverse and protected
> systems) has already been done by yours truly. VSOBFS itself consists
> only of source code and you can easily test that the result of your
> VSOBFS-bootstrap corresponds to the announced one. Then you can be
> sure about the source provenance of the resulting OS, regardless which
> hard- and software you have used.

These are great properties! But... not what I would call a full source
bootstrap. So perhaps we just disagree on terms. I would call VSOBFS
something like "Diversely Verifiable Bootstrap" based on the description.

I mean, most open source projects are built from source, with an
existing toolchain. The fact that VSOBFS builds the toolchain to build
the toolchain to build VSOBFS is part of a source based bootstrap
process... not quite what I would call a full source bootstrap.


>> If pushed, and there does feel to be a bit of pushing going on, I think
>> I would argue that neither Guix nor VSOBFS are yet technically "Full
>> Source Bootstraps", as they depend on pre-existing binaries (kernel and
>> C compiler in one case, kernel and guile binary in another).
>
> Even from this point of view (which I do not agree with, as argued above)
> the phrase
> "something that had never been achieved, to our knowledge, since the
> birth of Unix"
> does not belong there in the Guix blog which started the controversy.
>
> This is what I kindly ask to correct.

I sincely doubt that will get changed at this point by reiterating the
same arguments, but you have definitely made your case, and I think the
arguments have been heard and understood by many of the people involved,
even if in the end some people choose to disagree.


live well,
  vagrant


signature.asc
Description: PGP signature


Debating Full Source Bootstrap

2023-11-14 Thread Vagrant Cascadian
On 2023-11-14, aho...@0w.se wrote:
> On Sun, Nov 12, 2023 at 06:19:31PM -0800, Vagrant Cascadian wrote:
>> The very thing the "Full-Source Bootstrap" builds is a C development
>> toolchain; that is arguably the whole point of the "Full-Source
>> Bootstrap" ... to avoid starting with a C development toolchain, by
>> starting from source, and building up to a working C toolchain...
>
> Before addressing in detail, TL;DR:
> The above looks regrettably like a persuasive definition[1].
...
> [1] "a form of stipulative definition which purports to describe the true
> or commonly accepted meaning of a term, while in reality stipulating
> an uncommon or altered use, usually to support an argument for some view"
> https://en.wikipedia.org/wiki/Persuasive_definition

I dispute the use of "altered or uncommon". Starting with a prebuilt C
compiler and claiming to bootstrap from source is a far more altered or
uncommon meaning for "Full Source Bootstrap" to me than bootstrapping
from an auditable hex binary.


> Why would "full-source" mean something other than
> that dependencies on anything but source are excluded?

I aboslutely agree with you here! Which is why I am so confused by this
whole argument!

I don't understand why you are arguing that something that depends on a
prebuilt existing C compiler toolchain to be a full source bootstrap.

At the very least, we may to have to agree to disagree here.

What Guix (and live-bootstrap) is doing looks more closely a
"Full-Source Bootstrap" reading the term at face value, at least to
me. You seem to think what VSOBFS did is, and should get the claim of
first, or at least remove the claim of guix having a first.

I have always felt that guix's "Full-Source Bootstrap" has some chinks
in it's armor, depending on a kernel and guile binary, and someday may
have to publish the "More Totally Completely Really Very
Fullerest-Source Bootstrap" post when those dependencies are removed.

If pushed, and there does feel to be a bit of pushing going on, I think
I would argue that neither Guix nor VSOBFS are yet technically "Full
Source Bootstraps", as they depend on pre-existing binaries (kernel and
C compiler in one case, kernel and guile binary in another).


But this bickering is truely tiresome!

  https://xkcd.com/386/

Please, remeber to rest!


At the end of the day, I think many projects have done tremendously
interesting and valuable things in this space! I look forward to
improvements in the state-of-the-art in reproducible bootstrapping.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: GNU Mes 0.25 released

2023-11-12 Thread Vagrant Cascadian
On 2023-11-11, aho...@0w.se wrote:
> On Sat, Nov 11, 2023 at 07:38:42AM +0100, Janneke Nieuwenhuizen wrote:
>> We are happy to announce the release of GNU Mes 0.25!
>
> Regrettably, the post includes a reference to
>
>>   version 0.24.2 has realized the first Full Source Bootstrap for Guix
>>   
>> .
>
> which makes a false claim:
>
> "[...] something that had never been achieved, to our knowledge, since the
> birth of Unix.
> We refer to this as the Full-Source Bootstrap"
>
> The author must have been well aware that a complete and reproducible
> full source bootstrap to a Posix-like OS with a C99 development toolchain
> has been done by other parties earlier.

The very thing the "Full-Source Bootstrap" builds is a C development
toolchain; that is arguably the whole point of the "Full-Source
Bootstrap" ... to avoid starting with a C development toolchain, by
starting from source, and building up to a working C toolchain...

So by citing some other project that allegedly got there first, by
depending on an already existing C development toolchain; it seems like
comparing apples to oranges or maybe even broccoli.

I do not see this alleged "false claim", because they are fundamentally
different projects, which claim to accomplish different things, by
different routes, even if they are similar in their end result.

Other approaches to bootstrapping are certainly welcome, and valuable,
and useful, but bickering over this specific claim where you are
comparing different things seems unhelpful at best, and replying to each
and every post referencing the alleged "false claims" seems actively
disruptive to me.

Let the mes folks have their release announcement, already!


I will admit ... The dark secret of the guix full-source bootstrap is
that it does require a running kernel and a guile binary to orchestrate
the process... and so there are technically *some* binary seeds in the
process in order to execute the first bit of source code in the "Full
Source Bootstrap", though the guile orchestration could in theory be
done by hand, as I understand it, leaving only the running kernel as the
only binary dependency... and there is work-in-progress to actually
provide a bootstrap path on UEFI and even bare metal using the same
bootstrap path...


One of the best things about bootstrapping is that there is more than
one way to do it, each has different properties and tradeoffs, and
people can work on whatever part in the complex chain of a bootstrap
process that strikes their fancy and brings them joy!

For example, I look forward (with a long view) to a bootstrap path that
embeds diverse double-compilation as part of the bootstrap, leveraging
the strengths of both reproducible builds and bootstrappable builds!


live well,
  vagrant

p.s. I have taken the liberty of reducing the CC list significantly.


signature.asc
Description: PGP signature


Re: Verification Builds and Snapshots For Debian

2023-10-12 Thread Vagrant Cascadian
On 2023-10-12, Marek Marczykowski-GΓ³recki wrote:
> On Sat, Sep 30, 2023 at 04:59:33PM -0700, Vagrant Cascadian wrote:
>> On 2023-09-20, Lucas Nussbaum wrote:
>> > On 19/09/23 at 13:52 -0700, Vagrant Cascadian wrote:
>> >> Snapshotting the archive(s) multiple times per day, today, tomorrow, and
>> >> going forward will at least enable doing verification rebuilds of
>> >> packages starting from this point, with less immediate overhead than
>> >> trying to replicate the entire functionality or more complete history of
>> >> snapshot.debian.org.
>> 
>> In the meantime, I worked on a naive implementation of this, using
>> debmirror and btrfs snapshots (zfs or xfs are other likely candidates
>> for filesystem-level snapshots). It is working better than I expected!
>
> Isn't this more or less what has been tried few times before, and it
> works only until you load it with years worth of data?

Well, then we partition it off by year or whatever size unit it works
at? It is also possible improvements in the underlying filesystem and
disk technologies over the years make it more viable now than in the
past?

I definitely don't have more than a few months using this method at this
time, so sure, it could all come screetching to a halt at some
point...

I can continue to backfill from snapshot.debian.org until it breaks, I
suppose. :)

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Verification Builds and Snapshots For Debian

2023-10-12 Thread Vagrant Cascadian
On 2023-10-12, Vagrant Cascadian wrote:
> On 2023-10-12, Chris Lamb wrote:
>>> In the meantime, I worked on a naive implementation of this, using
>>> debmirror and btrfs snapshots (zfs or xfs are other likely candidates
>>> for filesystem-level snapshots). It is working better than I expected!
>> […]
>>> Currently weighing in at about 550GB, each snapshot of the archive for
>>> amd64+all+source is weighing in under 330GB if I recall correctly... so
>>> that is over a month worth of snapshots for the cost of about two full
>>> snapshots. Obviously, adding more architectures would dramatically
>>> increase the space used (Would probably add arm64, armhf, i386, ppc64el
>>> and riscv64 if I were to do this again).
>>
>> This sounds like great progress. :)  Do you have any updates since you
>> posted your message?
>
> It's still running!

The original btrfs based one now has snapshots as far back as july 2023
up to the present, and is currently 974GB. So, that looks like overall
about 330GB growth per month, roughly.

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Verification Builds and Snapshots For Debian

2023-10-12 Thread Vagrant Cascadian
On 2023-10-12, Chris Lamb wrote:
>> In the meantime, I worked on a naive implementation of this, using
>> debmirror and btrfs snapshots (zfs or xfs are other likely candidates
>> for filesystem-level snapshots). It is working better than I expected!
> […]
>> Currently weighing in at about 550GB, each snapshot of the archive for
>> amd64+all+source is weighing in under 330GB if I recall correctly... so
>> that is over a month worth of snapshots for the cost of about two full
>> snapshots. Obviously, adding more architectures would dramatically
>> increase the space used (Would probably add arm64, armhf, i386, ppc64el
>> and riscv64 if I were to do this again).
>
> This sounds like great progress. :)  Do you have any updates since you
> posted your message?

It's still running! And now I have one running with xfs filesystem, and
one on btrfs. Only the xfs one is publicly available via:

  http://snapshot.reproducible-builds.org/snapshot-experiment 

Which only started earlier this month, but in theory could pull in the
updates from the btrfs snapshots for a little more redundancy.

Also managed to backfill from snapshot.debian.org some older
generations, maybe as far back as july? That's only available on the
currently not publicly accesible btrfs implementation.

Could probably set up some proxy to make the ones on btrfs available
publicly too.

> (Are you snapshotting after each dinstall and labelling them with some
> timestamp…? Or perhaps you have some other, cleverer, scheme?)

The timestamp i am using is the most recent timestamp from any relevent
Release file. This way, the timestamp for any given mirror state,
(presuming you are mirroring the same distributions and architectures),
should match if you had two snapshots running independently.

For the main repositories (e.g. not security or incoming), I am syncing
from a leaf mirror that happens to be very close on the network, so I
just schedule it to run from cron roughly when I expect the leaf mirror
to be finished updating, and then a second pass some hours later just in
case so we are less likely to miss a snapshot generation or get an
incomplete generation. Really want to avoid missing snapshots or partial
snapshots; that could certainly use some more solid error checking, as
it mostly relies on debmirror doing the right thing.

It also is currently missing debian-installer images, though I *think*
that would be reasonably easy to add by passing more arguments to
debmirror. For the first proof-of-concept I focused on .deb and .udeb
packages, to be able to rebuild packages.


live well,
  vagrant


Re: Verification Builds and Snapshots For Debian

2023-10-06 Thread Vagrant Cascadian
On 2023-09-30, Vagrant Cascadian wrote:
> On 2023-09-20, Lucas Nussbaum wrote:
>> On 19/09/23 at 13:52 -0700, Vagrant Cascadian wrote:
>>> * Looking forward and backwards at snapshots
>>> 
>>> I do think that a more complete snapshot approach is probably better
>>> than package-specific snapshots, and it might be worth doing
>>> forward-looking snapshots of ftp.debian.org (and security.debian.org and
>>> incoming.debian.org), in addition to trying to fill out all the missing
>>> past snapshots to be able to attempt verification builds of older
>>> packages, such as all of bookworm.
>>> 
>>> Snapshotting the archive(s) multiple times per day, today, tomorrow, and
>>> going forward will at least enable doing verification rebuilds of
>>> packages starting from this point, with less immediate overhead than
>>> trying to replicate the entire functionality or more complete history of
>>> snapshot.debian.org.
>
> In the meantime, I worked on a naive implementation of this, using
> debmirror and btrfs snapshots (zfs or xfs are other likely candidates
> for filesystem-level snapshots). It is working better than I expected!

xfs seems to work well enough too, by using "cp --archive --reflink" to
produce snapshots:

  http://snapshot.reproducible-builds.org/snapshot-experiment/archive/debian/

A little klunkier than btrfs snapshots, but workable.

It would also be possible with btrfs or xfs (and presumably zfs somehow)
to make by-checksum reflinked copies, which might make it possible to
(at least partially) reassemble a repository from the contents if you
have the relevent Release, Packages and Sources files...

I have not migrated all of the snapshots captured on btrfs ... because
it is a bit tricky to sync the files efficiently... What becomes tricky
is efficiently transferring those files to another machine or making
backups... rsync can break the reflinked files if you are not
careful. There are patches for rsync to support efficiently reflink:

  https://github.com/WayneD/rsync/issues/119
  https://github.com/WayneD/rsync/issues/153


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Verification Builds and Snapshots For Debian

2023-09-30 Thread Vagrant Cascadian
On 2023-09-20, Lucas Nussbaum wrote:
> On 19/09/23 at 13:52 -0700, Vagrant Cascadian wrote:
>> * Looking forward and backwards at snapshots
>> 
>> I do think that a more complete snapshot approach is probably better
>> than package-specific snapshots, and it might be worth doing
>> forward-looking snapshots of ftp.debian.org (and security.debian.org and
>> incoming.debian.org), in addition to trying to fill out all the missing
>> past snapshots to be able to attempt verification builds of older
>> packages, such as all of bookworm.
>> 
>> Snapshotting the archive(s) multiple times per day, today, tomorrow, and
>> going forward will at least enable doing verification rebuilds of
>> packages starting from this point, with less immediate overhead than
>> trying to replicate the entire functionality or more complete history of
>> snapshot.debian.org.

In the meantime, I worked on a naive implementation of this, using
debmirror and btrfs snapshots (zfs or xfs are other likely candidates
for filesystem-level snapshots). It is working better than I expected!

It currently has snapshots for debian amd64 on bookworm,
bookworm-backports, bookworm-proposed-updates, trixie, sid and
experimental (or I guess, rc-buggy...), and debian-security for
bookworm-security, and this might be a little redundant, but just in
case, also incoming.debian.org for most of the above codenames as well
starting between september 20th and 22nd (with some gaps as I was
sorting out what was worth capturing; currently does not include
debian-installer images, for example, and some generations missed
.udebs). Soon it will start capturing October, and beyond! The machine
it is running on happens to be very close to a debian mirror, which is
helpful! It also seems to have caught some snapshot generations that
snapshot.debian.org missed!

I also tried to backfill out some snapshots from snapshot.debian.org for
"debian" and "debian-security" for roughly the same codenames, with more
success than I expected, capturing all of september and edging into
august so far. Hope to get as far as maybe june, so that anything built
since the bookworm release can has relevent snapshots. It mostly works,
although once and a while I appear to trip some download limits and it
stalls out.

Currently weighing in at about 550GB, each snapshot of the archive for
amd64+all+source is weighing in under 330GB if I recall correctly... so
that is over a month worth of snapshots for the cost of about two full
snapshots. Obviously, adding more architectures would dramatically
increase the space used (Would probably add arm64, armhf, i386, ppc64el
and riscv64 if I were to do this again).


I'm in the process of using this snapshot mirror calling out to
grep-dctrl and dose-builddebcheck (look mom, no database!) to generate
apt sources.list entries pointing to the appropriate snapshots for each
.buildinfo from september, and eventually perform verification builds
for each of these. I think it covers roughly 6000 .buildinfo files,
which is not nothing!


>> I wonder if having multiple snapshot.debian.org implementations might
>> actually be a desireable thing, as it is so essential to the ability to
>> do long-term reproducible builds verification builds, and having
>> additional independent snapshots could provide redundancy and the
>> ability to repair breakages if one of the services fails in some way.
>
> What is the state of efforts regarding alternate snapshot.d.o
> implementations?

The main one I was aware of:

  https://github.com/fepitre/debian-snapshot

I believe snapshot.reproducible-builds.org which used this is currently
on hiatus, but I hope see that picked up again in 2024, possibly with a
different implementation...


> Has someone explored an implementation backed by S3-compatible storage,
> which would easily allow hosting it in a cloud?

No idea, but multiple options would be good! Would probably want to use
a lot of redundancy (multiple S3 providers, multiple "local" mirrors,
etc.), just because this sort of thing is so difficult to fix
retroactively (if possible at all)...

How difficult is it to implement deduplication with S3 storage? Saw a
few hits with a quick search...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Verification Builds and Snapshots For Debian

2023-09-20 Thread Vagrant Cascadian
On 2023-09-19, Vagrant Cascadian wrote:
> * Some actual results!
>
> Testing only arch:all and arch:amd64 .buildinfos, I had decent luck with
> 2023/09/16:
>
> total buildinfos to check: 538
> attempted/building: 535
>
> unreproducible: 28  5 %
> reproducible:   461 85 %
> failed: 46  8 %
> unknown:3   0 %
...
> I also had similar results for 2023-09-15 and 2023-09-17, but ... this
> morning most of those results myseriously disappeared!?! No idea what
> happened to them.

It was nagging at me, so I re-ran the builds for those days, and it was
not too bad, and looks similar to the initial results ... from memory:

2023/09/15
total buildinfos to check: 151
attempted/building: 151
unreproducible: 34  22 %
reproducible:   97  64 %
failed: 20  13 %
unknown:0   0 %

2023/09/17
total buildinfos to check: 152
attempted/building: 149
unreproducible: 10  6 %
reproducible:   125 82 %
failed: 14  9 %
unknown:3   1 %

Or, for all three days 2023-09-15 to 2023-09-17 combined:

total buildinfos to check: 839
attempted/building: 835
unreproducible: 72  8 %
reproducible:   683 81 %
failed: 80  9 %
unknown:4   0 %

Still not bad for real-world testing.


Also, my test environments unintentionally introduced a few more
variations, for example:

+Build-Tainted-By:
+ merged-usr-via-aliased-dirs

The thorn in Debian's side strikes again. The tarballs I was using were
usrmerge, but the buildds are still not doing usrmerge. This is more
fiddly to set up a non-usrmerge base tarball than it used to be, but it
is doable, and at least the .buildinfo records this information.


Mysterious discrepancies in dependency differences:

  libfontconfig-dev (= 2.14.2-6),
  libfontconfig1 (= 2.14.2-6),
- libfontconfig1-dev (= 2.14.2-6),

Apparently libfontconfig-dev provides libfontconfig1-dev, and
libfontconfig1-dev is a transitional package, and sometimes
dpkg-genbuildinfo decides to include it explicitly and... sometimes not?
I do not think this particular case is likely to change the build
results, at least.


 Environment:
- DEB_BUILD_OPTIONS="parallel=6"
+ DEB_BUILD_OPTIONS="parallel=7"
+ LANG="C.UTF-8"
  LC_ALL="C.UTF-8"

Could have set up the builds to use the same level of parallelism easily
enough.

LANG was trickier. Some of the buildd .buildinfo files explicitly set
LANG="C.UTF-8" but some have it undefined. If I left it unset, it ended
up using LANG="en_US.UTF-8".  I chose to consistently use LANG="C.UTF-8"
in my testing.  Although I am not even entirely sure C.UTF-8 is a valid
value for LANG...


live well,
  vagrant


signature.asc
Description: PGP signature


Verification Builds and Snapshots For Debian

2023-09-19 Thread Vagrant Cascadian
I experimented with verification builds building packages that were
recently built by the Debian buildd infrastrcture... relatively soon
after the .buildinfo files are made available, without relying on
snapshot.debian.org... with the goal of getting bit-for-bit identical
verification of newly added packages in the Debian archive.

Overall, I think the results are promising and we should actually try
something kind of like this in a more systematic way!


Fair warning, this has turned into quite a long email...


* Background

For the most part in Debian, we have been doing CI builds, where a
package is built twice and the results compared, but it is not verifying
packages in the official Debian archive. It is useful, especially for
catching regressions in toolchains and such, but verifying the packages
people actually use is obviously desireable.

In order to actually perform a verification build, you need the exact
same packages installed in a build environment...

There was a beta project performing verification builds that appears to
have stalled sometime in 2022:

  https://beta.tests.reproducible-builds.org/

From what I recall, one of the main challenges was the reliability of
the snapshot.debian.org service which lead to the development of an
alternative snapshotting service, although that is currently not yet
completed...

At some point, debsnapshot was used to perform some limited testing, but
this was also dependent on a reliable snapshot.debian.org.

There have been several other attempts are rebuilders for debian, but
the main challenge usually seems to come down to a working snapshot
service in order to be able to sufficiently reproduce the build
environment a package was originally built in...


* Summary of approach for this experiment

Copy a .buildinfo file from either
coccia.debian.org:/srv/ftp-master.debian.org/buildinfo/2023/09/16 or
https://buildinfos.debian.net/ftp-master.debian.org/buildinfo/2023/09/16
or other dates, but something fairly recent for best results...

Create a package-specific snapshot of all the exact versions of packages
in the .buildinfo file (Installed-Build-Depends).

Build a package with the exact versions from the .buildinfo file added
as build-dependencies, with the package-specific snapshot added to
available repositories(as well as a bunch of others), leveraging "sbuild
--build-dep-resolver=aptitude" to resolve the potentially complicated
build dependencies.

This supports sid and experimental reasonably well, including binNMUs.
It also supports the few bookworm-proposed-updates and
bookworm-backports .buildinfo files to some degree. Not sure where to
get .buildinfo files from debian-security, but would love to test those
as well! In theory it supports trixie as well, but nearly all packages
for trixie currently get built in sid/unstable rather than directly in
trixie.

I found that building sid and experimental worked best starting with a
slightly out-of-date trixie tarball, as it was almost always easier to
upgrade packages than to downgrade. Currently bookworm-proposed-updates
and bookworm-backports are fairly stable, although possibly the same
issue might apply.


* Package specific snapshots vs. complete snapshots

I have mixed feelings on the package-specific snapshots. It solves the
problem of getting old versions of packages to verify the build (or at
least could, with a bit more work), but with some drawbacks (custom apt
keyring, redundant information in *many* little snapshots, kind of
complicated).

Having explored package-specific snapshots, I think a better approach
might be to make forward-looking snapshots of ftp.debian.org,
incoming.debian.org and ideally security.debian.org (in addition to
snapshot.debian.org or a replacement)...

With locally available complete snapshots, each .buildinfo can be
processed as soon as possible to find the list of snapshots that would
satisfy the dependencies (to reduce the likelihood of having to rummage
through older snapshots to find dependencies)... and make an addendum to
the .buildinfo file that includes enough information to fully resolve
all the build dependencies... allowing the build to be performed at some
other time. This addendum might also need to recommend a snapshot for
the build chroot or base tarball, though that might be a bit trickier.

This could avoid having to leverage something like metasnap.debian.net,
that can process a .buildinfo and spit out the relevent sanpshots.


* The Code

My proof of concept collection of scripts, configuration and and total
lack of documentation:

  
https://salsa.debian.org/reproducible-builds/debian-verification-build-experiment

In retrospect, I should clearly have started by poking more at
debrebuild and other prior art... oops!

This also did not handle the syncing of the .buildinfo files at all,
which I did manually for this experiment, but that is a fairly
straightforward problem, and buildinfos.debian.net does this already.


* Some actual results!

T

Re: Please review the draft for July's report

2023-08-02 Thread Vagrant Cascadian
On 2023-08-02, David A. Wheeler wrote:
> Sphinx just merged a change, I recommend adding a note about it. E.g.,
> just before "Lastly in news, kpcyrd posted to our mailing list
> announcing a new β€œrepro-env” tool" add this:
>
> The [Sphinx](https://github.com/sphinx-doc/sphinx) documentation tool
> recently [accepted a change to improve deterministic reproducibility
> of
> documentation](https://github.com/sphinx-doc/sphinx/pull/11312). It's
> internal util.inspect.object_description attempts to sort collections,
> but this can fail. The change handles the failure case by using
> string-based object descriptions as a fallback deterministic sort
> ordering, as well as adding recursive object-description calls for
> list and tuple datatypes. As a result, documentation generated by
> Sphinx will be more likely to be automatically reproducible.

Added.

> (Sorry I can't hop on to the git repo right now, but I hope that helps.)

Thanks for bringing it up! Excited to see how much this change might
improve!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: trying to reproduce hello-traditional from Debian. .buildinfo file? next steps?

2023-08-02 Thread Vagrant Cascadian
On 2023-08-02, Carles Pina i. Estany wrote:
> This is Debian specific but I cannot find a reproducible builds Debian
> specific mailing list. Let me know if I should ask elsewhere. Feel free
> to send me some pointers to read it myself.

There is also reproducible-bui...@lists.alioth.debian.org more
specifically for Debian, although rb-general works too. We can all learn
from the quirks of other projects. :)


> TL;DR: I'm trying to build hello-traditional from Debian and have the
> same result as Debian. I cannot do it. Pointers welcome. I thought of
> using the .buildinfo file to reproduce the build environment and deps
> but unsure of the best way and if this is the way.

Yes, you usually need to use the same packages as listed in the
.buildinfo, and in general the same build path (tools like sbuild and
pbuilder randomize the build path by default). Although it looks like
hello-traditional is generally reproducible with varied build paths, so
more likely it is just different build dependencies.

It is sometimes possible to get bit-for-bit identical results even with
some variations in the build-dependencies, but it is not expected. More
like a happy fluke of luck. :)


> I'm trying to reproduce the build of the package hello-traditional. I
> understand from here:
> https://tests.reproducible-builds.org/debian/rb-pkg/bookworm/amd64/hello-traditional.html
>
> That should be reproducible.
>
> I've done:
> $ sbuild --no-clean --arch-any --arch-all --no-source --dist=stable 
> --arch=amd64 
> http://deb.debian.org/debian/pool/main/h/hello-traditional/hello-traditional_2.10-6.dsc
>
> Multiple times in two Debian systems (Debian 12.1 and 11.7, I know that
> should only depend on the schroot...) and every time I get:
>
> f712bac966e8fc2d1660bc5d61328a8e9f8354a93c119bb2137169dbdaeb22ab  
> hello-traditional_2.10-6_amd64.deb

So yeah, bookworm is now a stable release, so barring security or point
release updates, you are very likely to get the same exact packages
installed way more often than in testing or unstable.


> But the package that I can retrieve from Debian has a different sha256:
> $ curl -s 
> http://ftp.de.debian.org/debian/pool/main/h/hello-traditional/hello-traditional_2.10-6_amd64.deb
>  | sha256sum
> e39004ec8c3309f909d5442596f9fc442082cd8e28f03e7c438a65fb5bfd9956  -

But the hello-traditional on Debian's mirror was built in december of
2022, several months before the bookworm release, and many build
dependencies have since changed...


> And my question is: how to achieve the same Build ID?
>
> I thought of using the .buildinfo file from:
> https://tests.reproducible-builds.org/debian/buildinfo/bookworm/amd64/hello-traditional_2.10-6_amd64.buildinfo
>
> But I'm not sure what is the best way (besides installing the same exact
> packages in the schroot and setting the Environment) to do it. And I'm
> not sure that this is the way to go anyway, tool that might exist, etc.

There is some tooling to try to reproduce the exact build environment,
although it is somewhat hindered by issues with snapshot.debian.org. If
you're lucky, you might get it to work. There is a work-in-progress
replacement, but I am not sure of the status at this moment.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Unreproducible tar files on go.googlesource.com

2023-07-18 Thread Vagrant Cascadian
On 2023-07-18, kpcyrd wrote:
> while packaging govulncheck for Arch Linux I noticed a checksum mismatch 
> for a tar file I downloaded from go.googlesource.com.
...
> https://go.googlesource.com/vuln/+archive/refs/tags/v1.0.0.tar.gz
>
> I downloaded the file 3 times and got a different sha256 every time, it 
> seems the tar file records the current time when downloading it.
>
> 1st: ddf7cfd295eef68ba284b6471b88dea8efb91b5a115cbead2a3303dce55db94f
> 2nd: 4e9e72a8d19faf25a303d46af559471e8698321d131cec05f31419e2fc9ab43a
> 3rd: 37d9a2b04e9d73effdfbe565012f47456be2360f9389ebd89a981ce27c8bf4ce
>
> I figured I'd share this here for documentation purpose.

Wonder if there is anyone who could nudge them to fix that?

FWIW, looks like guix uses git instead of tarballs for projects hosted
on go.googlesource.com. Which gives some other benefits, such as
archival at softwareheritage.org.


live well,
  vagrant


signature.asc
Description: PGP signature


Reproducible Builds at Flock 2023

2023-07-17 Thread Vagrant Cascadian
Yesterday I was excited to learn there is some renewed interest in
Reproducible Builds in the Fedora community!

  https://flock2023.sched.com/event/1Or8e/reproducible-builds-hackfest

  https://flocktofedora.org/

  Cork, Ireland August 2nd through 4th


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Irregular status update about reproducible live-build ISO images

2023-07-04 Thread Vagrant Cascadian
On 2023-07-04, David A. Wheeler wrote:
>> On Jul 2, 2023, at 11:37 AM, Roland Clobus  wrote:
>> here is the 18th update of the status for reproducible live-build ISO images 
>> [1].
>> 
>> Single line summary: Live images are looking good, and the number of 
>> (passed) automated tests is growing
>> 
>> Reproducible status:
>> * All major desktops build reproducibly with bullseye, bookworm, trixie and 
>> sid
>
> Spectacular work!
>
> How close are things to having the *released* versions of the
> Debian live images & (main) packages reproducible?
> I can't tell if this means "it's possible to create reproducible builds" or
> "the packages people are using are the reproducible builds".
> Sorry if this is obvious to everyone else.

My understanding is the live images themselves are bit-for-bit
reproducible, with the inputs being the actual .deb packages from the
debian archive. The individual .deb packages might not neccesarily be
independently reproducible when built from source.

This is similar to what Tails has historically achieved and continues to
be used as part of the release process for Tails, if I understand
correctly...

Showing how reproducible various package sets looks fairly good for most
desktops and tails (which is a live image that is mostly gnome based):

  https://tests.reproducible-builds.org/debian/trixie/amd64/index_pkg_sets.html

Looks to be about 92% to 95% reproducible at the moment...


Would be interesting to get various live image package sets up there
too!


live well,
  vagrant


signature.asc
Description: PGP signature


PackagingCon, Berlin, October 26-28 2023

2023-06-13 Thread Vagrant Cascadian
This seems like it might be a good conference for a reproducible builds
talk:

  https://packaging-con.org/

Call For Proposals closes end of July:

  https://cfp.packaging-con.org/2023/cfp

It is also the weekend before the Reproducible Builds World Summit in
Hamburg, so not too far from Berlin.

Unfortunately, not sure I will be able to make it, but... hopefully
someone can!

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Vagrant Cascadian
On 2023-05-29, David A. Wheeler wrote:
> On Sun, 28 May 2023 21:10:36 -0700, Vagrant Cascadian 
>  wrote:
>
>> Do such tools actually exist, or are we talking about something
>> theoretical here?  I am nervous about investing too much energy in
>> something without a specific, precise, working proof of concept.
>> 
>> In your earlier mention of OSSGadget, it was not immediately clear that
>> anything in there could actually do this sort of analysis... ?
>
> OSSGadget is a collection of tools.
> One of its tools is oss-reproducible, which measures this:
>
> https://github.com/microsoft/OSSGadget/blob/main/src/oss-reproducible/README.md

Ah, I got lost on the the top-level README.md, which did not mention it
at all (although looking back, apparently that was mentioned in the
initial email on this thread ... sorry, lazy me).


> They originally called it just verifying a "reproducible build".
> I learned about the tool, thought it was neat but I told them
> that using the term "reproducible build" for this was confusing.
> They agreed and decided to change their term to
> "semantically reproducible build". I thought the approach was
> interesting and so posted about it here.

Well, thanks for nudging them away from completely abusing the term
"Reproducible Builds"! Still not a fan of prepending with
"semantically" ...


>> I still expect it will be harder to actually do "semantically
>> reproducible builds" than "fully reproducible builds.
>
> This isn't intended for the developers and builders.
> It's a way to identify some packages that are low risk
> because, while the builds aren't reproducible, the
> differences are unlikely to be an issue.

So, maybe a third-party reviewing their dependency trees?


>> To be honest, it sounds like a lot of extra work to avoid fixing things
>> properly...
>
> As a user I often cannot choose what the builder or developer do.
> I can propose a patch set, but it takes time to create them,
> and there's no guarantee the project will accept them.

Well, the above-referenced oss-reproducible does not appear to be for
your average user either...

Even if a project will not accept the changes, presuming it is FOSS, you
can test the reproducibility locally and see what changes are necessary
to fix it. Although for huge dependency chains this obviously becomes
impractical... but huge dependency chains are arguably impractical in
their own right (and yet painfully pervasive).


Any tooling that facilitates review of software obviously is useful; I
certainly do not object to that.

I guess the initial mail subject "Introducing: Semantically reproducible
builds" kind of came off a bit wrong to me ... almost as a product
announcement, starting by essentially proposing something that weakens
the term "Reproducible Builds" that so many have worked so hard to
establish.

... which reminded me of a discussion a while back; which felt similarly
about weakening the meaning of reproducible builds:

  https://github.com/coreinfrastructure/best-practices-badge/issues/1865

I feel that issue was even worse than the current thread, but many of
the same points were made there, so it felt a bit like rehashing old
arguments...

I at least can see that the oss-reproducible language for "Semantic
Reproducibility" does clearly state the relationship to bit-for-bit
identical... but it still feels a bit too easy to conflate or confuse
with what feels to me like the authentic, real reproducible builds.


While I see that Reproducible Builds can be hard to achieve 100% of the
time (especially with projects outside of one's control or influence)
and I see value in getting a given build artifact more reproducible than
before ... and stepping stones and strategies that are useful to get
there may include partial fixes or workarounds... and I myself use such
stepping stones routinely...

Apparently, I hold a very strong stance on keeping the *focus* on
bit-for-bit reproducibility!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Vagrant Cascadian
On 2023-05-29, Bernhard M. Wiedemann via rb-general wrote:
> On 29/05/2023 06.10, Vagrant Cascadian wrote:
>> Do such tools actually exist, or are we talking about something
>> theoretical here?
>
> https://github.com/openSUSE/build-compare/ is in use for 13 years.
>
> And strip-nondeterminism can be used to build another such tool.

Sure, I am well aware of strip-nondeterminism and somewhat aware of the
openSUSE build-compare. Debian uses strip-nondeterminism extensively, as
it is part of the majority of standard debhelper based packages ... so I
almost forget it is even there sometimes!


> They will only ever be able to normalize or ignore certain known classes 
> of differences. It is good enough to avoid review of many diffs.

Exactly. You can do this sort of thing, and it is useful in many cases,
but there are limits. It can only modify things in very specific
contexts. Each context is essentiall a feature and great care needs to
be made to make sure it does not break things or normalize too much.

Though strip-nondeterminism (and presumably similar tooling) will have
occasional bugs that break the resulting binaries, artifacts,
etc.


> e.g. https://rb.zq1.de/compare.factory/report-202303.txt has
> not-bit-by-bit-identical: 673
> build-compare-failed: 483
>
> So for 190 packages build-compare found that they only had insignificant 
> diffs and were considered semantically equivalent, so I could spend more 
> time, debugging the other 483 diffs.

These approaches are definitely useful to troubleshoot reproducibility
issues, by stripping out all the things that are deemed safe to
sanitize, normalize, etc. and leaving only the more inscrutible things
to scrutinize. I do not dispute that!


>> I very much worry that the meaning of Reproducible Builds may gradually
>> get whittled down
>
> I share this concern, which is why I have been calling this 
> semi-reproducible to distinguish it from bit-reproducible / 
> fully-reproducible.
> That 'semi-' prefix should give people a good hint of what it is and if 
> not, encourage them to ask for details. "sort-of-reproducible" or 
> "almost-but-not-quite-reproducible" could also be an option :-)

semi-reproducible still leaves me a bit nervous, but is definitely
clearer than semantic. :)


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Introducing: Semantically reproducible builds

2023-05-28 Thread Vagrant Cascadian
On 2023-05-28, David A. Wheeler wrote:
> On Sun, 28 May 2023 13:04:40 +0100, James Addison via rb-general 
>  wrote:
>> Thanks for sharing this.
>> 
>> I think that the problem with this idea and name are:
>> 
>> - That it does not allow two or more people to share and confirm that
>> they have the same build of some software.
>
> Sure they can, they just use the same process (e.g., use the same tool to
> verify it). E.g., if you rebuild it, and the two builds are the same EXCEPT
> for the datetime stamps, it's semantically reproducible (not fully 
> reproducible).

Do such tools actually exist, or are we talking about something
theoretical here?  I am nervous about investing too much energy in
something without a specific, precise, working proof of concept.

In your earlier mention of OSSGadget, it was not immediately clear that
anything in there could actually do this sort of analysis... ?


>> - That it does not allow tests to fail-early, catching and preventing
>> reproducibility  regressions (semantic or otherwise).
>
> It's *possible* to fail early, though the CPU requirements are admittedly
> higher (because you have to do much more than a bit-for-bit test).
>
> But I expect that in practice, the use of "semantically reproducible builds"
> is long *after* any CI/CD process of the package being analyzed.
> The problem, in many cases, is that the package was not created in a way
> that supports reproducible builds, so the goal is to try to estimate the
> risk of the package when it is *not* a reproducible build.

I still expect it will be harder to actually do "semantically
reproducible builds" than "fully reproducible builds".

To be honest, it sounds like a lot of extra work to avoid fixing things
properly...

I find it hard to believe it could so close that you can programatically
determine something is (probably!) mostly harmless and yet still have it
be implausible to go all the way to make a properly reproducible build.

That flys in the face of the thousands of packages I have personally
reviewed, and submitted patches for hundreds of them... sometimes only
partially successful, but in the vast majority of cases I end up staring
at megabytes of leftover gibberish... or something bit-for-bit
reproducible.

The main type of issue you have mentioned, timestamps, is usually the
easiest one to identify and actually remove from resulting builds.

The other one, arbitrary files that end up in the package? That also
sounds like it ought to be something easy to fix; part of the build
process should remove files known to arbitrarily appear if they are
truely non-functional and not intended to be part of the built
artifacts. Although why these files appear arbitrarily and unpredictibly
is a bit of a concerning situation...

The only solid case I can think of off the top of my head would be
embedded cryptographic signatures, but as I understand it, there are
projects (android apks used by f-droid? rpm?) that have some ways to
deal with this, even if they are less than ideal.

If there are still outstanding issues on a given piece of software (or
toolchain), it should be tracked in some sort of issue tracker...


>> - That the naming terminology conflates with true reproducible builds,
>> therefore creating the potential for misunderstanding to consumers.
>
> Naming is hard. As long as the term is carefully defined I think it works.
> You can use "fully reproducible build" when you want to contrast, and
> that makes it clear that a normal "reproducible build" is the stronger test
> (at the cost of being sometimes harder to achieve).

I do not like the idea of calling this thing using the Reproducible
Builds name... it is hard enough to get across the bit-for-bit identical
meaning of Reproducible Builds, and using it to mean something else will
inevitibly dilute that meaning, no matter how many other very clearly
and appropriately selected words you prefix it with.

I very much worry that the meaning of Reproducible Builds may gradually
get whittled down by adding more exceptions up to the point of being
little more than a checkbox in a compliance checklist... with little
actual benefit to the world at large.

The primary benefit of Reproducible Builds is that it is easy to
actually verify the result is in fact reproducible... or not.


Well, I appear to have strongly held opinions!


live well,
  vagrant


signature.asc
Description: PGP signature


GCC, binutils, and Debian's build-essential set

2023-04-30 Thread Vagrant Cascadian
I have been poking at gcc and binutils this month; they take a good long
while to build...

Inspired by how close we are to making the Debian build-essential set
reproducible and how important that set of packages are in general... I
have some progress, some hope, and I daresay, some fears...

  
https://tests.reproducible-builds.org/debian/bookworm/amd64/pkg_set_build-essential-depends.html

Fixing issues in these packages should in general help most all distros,
although some of the specific issues addressed are pretty
Debian-specific.


With a few patches, and a lot of caveats, I was able to get
binutils building reproducibly:

  https://bugs.debian.org/1033958
  files in source tarball in arbitrary order

  https://bugs.debian.org/1033958
  build paths embedded in debug symbols

This required disabling PGO (profile guided optimization) by passing
DEB_BUILD_OPTIONS=nopgo, so that configure is not passed
--enable-pgo-build=lto ... without that, some non-determinism ended up
in the binaries.

Also, disabling the test suites with DEB_BUILD_OPTIONS=nocheck, as this
embeds various timing and other information, as these results are
intentionally embedded in the package. This was discussed a while back,
with some ideas on how to move forward:

  https://bugs.debian.org/950585

Most of these issues are arguably debian-specific issues (e.g. we do not
need to test build paths, the tarball is specific to debian packaging,
upstream does not embed test suites in the packages, etc.), although the
speed improvements of PGO will likely tempt many distros to adopt
it...


Fixing the embedded test suite results in a more generalized way will
also be necessary to fix...

GCC!

The GCC packages in Debian also embed the test suite results... but
thankfully, puts all the test suite results into a single package on
Debian, so that is easy enough to exclude, at least!

One caveat is building gcc with:

  DEB_BUILD_PROFILES="nocheck nodoc nopgo nolto"

For many of the same reasons as with binutils...

At first, I worked on the gcc-12 packages a fair amount, discovered some
bugs in GNU Modula2 which were embedding timestamps, only to find that
gcc-13 includes GNU Modula2... in a way that does not appear to embed
the timestamps! Yay!

Curiously, even though I have been building gcc-* with
DEB_BUILD_OPTIONS=nodoc, but it is clearly still building some
documentation! So there may be some documentation that is successfully
not included that has other issues.

I did find some documentation that embeds the timestamp:

  
https://gcc.gnu.org/git?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/scripts/run_doxygen;h=42ed9eb4f5dfa8ed7697f4ac50353133ff6a8e6b;hb=c\
c035c5d8672f87dc8c2756d9f8367903aa72d93#l133

Have some builds going testing a fix for that right now... moderate
hopes they will be successful... will see.

Also, there are _formulas.log files that get shipped in the gcc-*
packages and embed some timestamps and timing data produced by texlive,
but I suspect/hope that these can simply be removed from the packages;
they appear to be build artifacts from some documentation generation
process.

There may be build path issues, but those are somewhat ignorable
(although much debian tooling defaults to randomizing the build path, so
would be nice to fix still)...


Broadening the horizons a bit, I poked at a few of the
build-essential-build-depends package set (the packages needed to build
build-essential):

  
https://tests.reproducible-builds.org/debian/bookworm/amd64/pkg_set_build-essential-depends.html

And fixed issues in php8.2 (with a nice little patch from Jelle!), and
the maintainer uploaded the fixes already!

  https://bugs.debian.org/1034892
  Timestamps in phar files

  https://bugs.debian.org/1034423
  Paths to sed in php-config8.2 and and phpize8.2

And twisted:

  https://bugs.debian.org/1034499
  timestamp embedded in .html documentation

And boost:

  https://bugs.debian.org/1034740
  build date and time embedded in .html documentation

And qemu:

  https://bugs.debian.org/1034431
  FTBFS creating vof.bin when /bin/sh -> bash


For all of this, I have leveraged the reprotest --auto-build feature
extensively, which will do a normalized build, another normalized build,
a everything-you-asked-to-vary build, followed by a build for each type
of variation... mostly just to get it to the point where the two
normalized builds build reproducibly or nearly so, to be able to focus
on the smallest number of issues at a time, and it has been paying off,
if slowly...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Sphinx: localisation changes / reproducibility

2023-04-26 Thread Vagrant Cascadian
On 2023-04-26, James Addison wrote:
> On Wed, 26 Apr 2023 at 18:48, Vagrant Cascadian
>  wrote:
>>
>> On 2023-04-26, James Addison wrote:
>> > On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian
>> >  wrote:
>> >> > James Addison  wrote:
>> >> This is why in the reproducible builds documentation on timestamps,
>> >> there is a paragraph "Timestamps are best avoided":
>> >>
>> >>   https://reproducible-builds.org/docs/timestamps/
>> >>
>> >> Or as I like to say "There are no timestamps quite like NO timestamps!"
>> >
>> > I see a parallel between the use of timestamps as a key for
>> > data-lookup (as in Holger's developers-reference package), and the use
>> > of locale as a similar data-lookup key (as in the case of localised
>> > documentation builds).
>>
>> > I'm not sure what the equivalent approach is for localisation, though.
>> > Command-line software, for example, requires at least one written
>> > natural-language to be usable, and as a second use case, providing
>> > natural-language documentation with software is highly recommended (is
>> > it part of the software?  maybe not.  but a sufficiently-confusing
>> > poorly-translated error message could be as serious as a code-related
>> > bug, I think?).
>> >
>> > Linking back to my recent experience with Sphinx, and from the
>> > perspective of allowing-users-to-verify-their-software, I'd tend to
>> > think that an ideally-produced, reproducible, localised software would
>> > include _all_ available translations in the build artifact.  Some of
>> > that could be retrieved at runtime (gettext, for example), and some
>> > could be static (file-backed HTML documentation, where runtime lookups
>> > might not be so straightforward).
>>
>> I struggle to see the parallel. A timestamp is an arbitrary value based
>> on when you built it, whereas the locale-rendered document should be
>> reproducibly translated based on the translations you have available at
>> the time you run whatever process generates the translated version of
>> the document/binary, and regardless of the locale of the build
>> environment.
>
> Ok, I think I understand.  Please check my understanding, though: I
> interpret your perspective as matching the ideal-world scenario that
> John outlined, where the SOURCE_DATE_EPOCH value has no effect at all
> on the output of the build

Yes, ideally SOURCE_DATE_EPOCH does not matter. It is a workaround to
embed a (hopefully meaningful) timestamp, when from a reproducible
builds perspective, ideally there would be no timestamp at all in the
resulting artifacts. SOURCE_DATE_EPOCH is a tolerable compromise when
leaving out timestamps entirely is either too difficult to achieve
(technically, politically, emotionally, logistically ...).


> Until then, I see both the build-time (SOURCE_DATE_EPOCH) and
> build-locale as inputs that do affect the output of software build
> systems, and believe that relevant guidance could help projects
> migrate towards reproducibility.

I would say a build should be reproducible regardless of the build
environment locale.

If you want to generate, say, README.fr.txt, the build process
translating that from README.txt should force the locale to use to
generate that document (e.g. LC_ALL=fr_FR.UTF-8), ignoring the locale of
the host system (e.g. C.UTF-8) and the locale of the user logged into
that system (e.g. es_ES.UTF-8); in this case, the locale of the build
environment should be made irrelevent by whatever build process is
used. Maybe the build logs respect the user or system locale in some
ways, but the resulting build artifact (e.g. README.fr.txt) should be
immune to the system and user locale settings.


>> While there almost certainly might be more than one legitimate
>> translation for a given work, your process for rendering it should
>> really only have one particular output given a particular input
>> (e.g. the source language input and the descriptions of how to translate
>> it to the desired language)... barring, of course, bugs in the system
>> ... or am i missing something entirely?
>
> No, I don't think you missed anything, and I think we have the same
> understanding of the components.  We're likely arriving from different
> perspectives on the problem space.
>
> My question is approximately this: for some source software developed
> in a natural language that I don't read or understand, and that
> includes statically-built documentation (say, HTML files for example),
> could I determine that the distributed software (a

Re: Sphinx: localisation changes / reproducibility

2023-04-26 Thread Vagrant Cascadian
On 2023-04-26, James Addison wrote:
> On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian
>  wrote:
>> > James Addison  wrote:
>> This is why in the reproducible builds documentation on timestamps,
>> there is a paragraph "Timestamps are best avoided":
>>
>>   https://reproducible-builds.org/docs/timestamps/
>>
>> Or as I like to say "There are no timestamps quite like NO timestamps!"
>
> I see a parallel between the use of timestamps as a key for
> data-lookup (as in Holger's developers-reference package), and the use
> of locale as a similar data-lookup key (as in the case of localised
> documentation builds).

> I'm not sure what the equivalent approach is for localisation, though.
> Command-line software, for example, requires at least one written
> natural-language to be usable, and as a second use case, providing
> natural-language documentation with software is highly recommended (is
> it part of the software?  maybe not.  but a sufficiently-confusing
> poorly-translated error message could be as serious as a code-related
> bug, I think?).
>
> Linking back to my recent experience with Sphinx, and from the
> perspective of allowing-users-to-verify-their-software, I'd tend to
> think that an ideally-produced, reproducible, localised software would
> include _all_ available translations in the build artifact.  Some of
> that could be retrieved at runtime (gettext, for example), and some
> could be static (file-backed HTML documentation, where runtime lookups
> might not be so straightforward).

I struggle to see the parallel. A timestamp is an arbitrary value based
on when you built it, whereas the locale-rendered document should be
reproducibly translated based on the translations you have available at
the time you run whatever process generates the translated version of
the document/binary, and regardless of the locale of the build
environment.

With runtime translation, you would be desiring translation from the
source language to the operating locale of the environment you've called
it in... but that should still be systematic, no?

In a traditional translation process, as I understand it, you have the
source language, some system of translating that document or bit of text
into another language (maybe by words, strings, partial or whole
documents, etc.).

While there almost certainly might be more than one legitimate
translation for a given work, your process for rendering it should
really only have one particular output given a particular input
(e.g. the source language input and the descriptions of how to translate
it to the desired language)... barring, of course, bugs in the system
... or am i missing something entirely?

Unless, I guess, you're using some Machine Learning model to produce
your translations?


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Sphinx: localisation changes / reproducibility

2023-04-18 Thread Vagrant Cascadian
On 2023-04-17, John Gilmore wrote:
> James Addison  wrote:
>> When the goal is to build the software as it was available to the
>> author at the time of code commit/check-in - and I think that that is
>> a valid use case - then that makes sense.
>
> I think of the goal as being less related to the author, and more
> related to the creator of a widespread binary release (such as a Linux
> distribution, or an app that goes into an app-store).
>
> The goal is then that the recipient of that binary release can verify
> that the source code they obtained from the same place is able to
> rebuild that exact widespread binary release.  This proves that the
> source code can be trusted for some purposes, such as being used to read
> it to understand what the binary does.  Or to make small bug-fixes to it.
> Or to become the base for further evolution of the project if the
> maintainer is suddenly "hit by a bus" and stops making further
> releases.

Yes!

> James Addison  wrote:
>> Inverting the question somewhat: if a single source-base is rebuilt
>> using two different SOURCE_DATE_EPOCH values (let's say, 1970-01-01
>> and 2023-04-18), then what are expected/valid differences in the
>> resulting output?
>
> In the ideal circumstances, the resulting output would be identical,
> because the build process would have no dependencies on
> SOURCE_DATE_EPOCH.
...
> Much code in Linux does not reach that ideal (yet!).  Instead, builds of
> non-ideal code use SOURCE_DATE_EPOCH as a crutch to limit their
> dependencies on the local build environment, replacing those
> dependencies with a dependency on SOURCE_DATE_EPOCH.
>
> So, if you rebuild a non-ideal package with two different values of
> SOURCE_DATE_EPOCH, you will get two different binaries that differ in
> the areas of dependency.  For example, if the documentation embeds a
> build-date in its page footer, you'd expect every page of the built
> documentation would differ.  If the "--version" output of the program
> embeds the build date, then the code that produces that output would
> differ.  Etc.  In fact, "fuzzing" their code with different values
> of SOURCE_DATE_EPOCH can help a maintainer identify where those
> dependencies still remain.

Nice explanations!

This is why in the reproducible builds documentation on timestamps,
there is a paragraph "Timestamps are best avoided":

  https://reproducible-builds.org/docs/timestamps/

Or as I like to say "There are no timestamps quite like NO timestamps!"


> We try to talk package authors out of such dependencies, but ultimately
> it's their package and they make the architectural decisions.  To some
> of them it's incredibly important that the build date appears in the
> man-page.  Reproducibility usually features lower among their priorities
> than it does in ours.

Yeah, SOURCE_DATE_EPOCH is really just a standardized *workaround* when
it is too difficult to convince upstream to remove timestamps entirely.
Sometimes it might be technically difficult to remove all timestamps,
sometimes it is different priorities or social norms or expectations.


live well,
  vagrant


signature.asc
Description: PGP signature


distro-info-data and SOURCE_DATE_EPOCH (was: Sphinx: localisation changes / reproducibility)

2023-04-14 Thread Vagrant Cascadian
On 2023-04-14, Holger Levsen wrote:
> i'm wondering whether distro-info should respect SOURCE_DATE_EPOCH: 
> src:developers-reference builds different content based on the build
> date, due to using distro-info and distro-info knows that in 398 days
>  trixie will be released :))) 
> see  
> https://tests.reproducible-builds.org/debian/rb-pkg/bookworm/arm64/diffoscope-results/developers-reference.html
>
> (src:developers-reference is "my" package using sphinx.)

This also recently came up for extrepo-data:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020648#15

Porbably should file a bug on distro-info and then mark as affecting
extrepo-data and developers-reference, and any other relevent
packages. Maybe also we should create an issue for it in
reproducible-notes.git.


live well,
  vagrant


signature.asc
Description: PGP signature


Real World Reproducibility in Debian (was Re: Debian and reproducible-builds.org incoherence?)

2023-04-13 Thread Vagrant Cascadian
On 2023-04-13, David A. Wheeler wrote:
>> On Apr 12, 2023, at 11:46 AM, Chris Lamb  
>> wrote:
>> This is, unfortunately, a little misleading. To clarify, this
>> statement only means that *tests.reproducible-builds.org* believes
>> that the fbreader source package is reproducible β€” it doesn't promise
>> that the binary packages on the official Debian mirrors are
>> bit-for-bit identical with anything.
>> 
>> This is, of course, not ideal. Still, this is what folks on this list
>> are getting at when they say they "want to make Debian 'really'
>> reproducible".
>
> Any progress on that front? What can be done to change things so that the
> packages people normally *use* are reproducible?

I think it is not nearly as bad as people think, and we undersell
ourselves when we say we do not have "real" reproducibility testing for
Debian. The work we have done and continue to do has made significant
real-world reproducibility possible!

The numbers actually looked quite good (~93%) for comparing against
packages actually in Debian Bullseye (a.k.a. the current stable release)
the last time it was done:

  
https://beta.tests.reproducible-builds.org/debian_bullseye.html#debian-bullseye-amd64+all

I am not sure exactly how up-to-date it is, but the changes in Debian
stable releases are very minimal; I would not expect much has changed
since the last run.

That is very similar to our "fuzzed" build testing (~96%), though I am
somewhat surprised that it was not consistently better than the fuzz
testing numbers!


... Unfortunately, beta.tests.reproducible-builds.org testing against
packages in Bookworm and Unstable are currently lagging, with most of
the results stuck in a pending state... although of the packages that
were tested, it follows a similar curve. (e.g. vast majority are
reproducible).


The main blockers to making consistent comparisons against the packages
in the Debian archive more prominent is getting a reliable replacement
for snapshot.debian.org, to be able to rebuild with the same packages as
the packages that were used at the time of the build.

There was a talk a couple years ago that touches on many of those
challenges:

  
https://debconf21.debconf.org/talks/22-making-use-of-snapshotdebianorg-for-fun-and-profit/

I've also read some recent reports suggesting that snapshot.debian.org
is missing quite a few archive states, too:

  https://bugs.debian.org/1031628

And it was marked as "wontfix" and closed...

So I do feel we really need an alternative. Or a significant turn of
events to make snapshot.debian.org *much* better.


There is significant work in progress on getting an alternative for
snapshot.debian.org going, but it can take several months to sync that
much data... and every time it gets caught up, there is more data
accumulated in the meantime... so it is not a "just do it already" kind
of thing.

Obviously, in the meantime, we are not sitting there watching the
progress bars... :)


Another approach might be to try and keep on top of the packages as soon
as they are built and we have a .buildinfo file for them, schedule a
build and to match the build environment before the package versions in
the archive change too much... but that is still going to be not
entirely systematic... but might be possible (at least for amd64/x86_64)
with a huge amount of builders and a bit more explicit coordination with
various infrastructure teams to make that more realistic.

For example, right now we sync the .buildinfo files several times a day
to https://buildinfos.debian.net ... but if we could get the .buildinfo
files to land in https://incoming.debian.org in real-time and/or some
coordination from the https://buildd.debian.org machines that perform
the builds (or the wanna-build infrastructure that coordinates
buildd.debian.org) to inform us of a new .buildinfo file... we would be
more likely to be able to get a sufficiently identical build environment
in nearly real-time.

My long-term fantasy of course, would be that https://buildd.debian.org
performs two builds and only uploads them to the archive if they are
successfully built reproducibly. We have significant enough coverage of
the archive that at least making that an opt-in feature on a per-package
basis would be a good start. (e.g. if you normalize build paths, we
should be able to reach 90-95% reproducible).


> I think it'd be better to have package build process insert tools like
> strip-nondeterminism, and then have *actual* packages reproducible, than
> be strongly confident in the packages no one uses.

Well, strip-nondeterminism is already used on the vast majority of
packages already, as it is integrated with debhelper, the main packaging
framework used in debian.

There is some academic work in what I would describe as normalized
builds(even normalizing the order of system calls and cpu ticks, if I
remember correctly), but for that to be useful, you have to normalize
the original build as well in the same way... and it is SLOW.


S

sbuild, reprotest and the unsharing spirit

2023-03-31 Thread Vagrant Cascadian
Last month, I pondered about the future of reprotest and some related
ideas and tooling:

  
https://lists.reproducible-builds.org/pipermail/rb-general/2023-February/002876.html

This month, fleshed out a method of usefully using reprotest as a hook
to sbuild (a package build tool for Debian) using sbuild's unshare mode:

  https://salsa.debian.org/reproducible-builds/sbuild-unshare-reprotest

It even has a README.md...

It is essentially a small wrapper around mmdebstrap (which can generate
a base tarball) and a configuration for sbuild that calls reprotest as a
hook to compare against the build the sbuild normally produces. This
leverages sbuild to set up the build environment, build a package, tweak
the build environment and build again...

Eventually may write another wrapper to generate the config, which would
allow more flexibility in enabling and disabling variations, or using
the reprotest --auto-build feature (e.g. perform builds with each
individual variation).

Experimented with essentially skipping the "sbuild" build, and just
using reprotest, so that it is possible to disable build path variations
or use --auto-build meaningfully.

Still a little rough around the edges, but I think it is a bit easier
setup for doing reproducible builds fuzz testing for Debian-style
packages. I also like that the diffoscope output ends up in the build
log, and at least the initial build is unpolluted by
reprotest/diffoscope dependencies.


Long term, I would love to explore integrating some sort of "unshare"
chroot mode into reprotest, to be a little more distro-agnostic while
still relatively easy to set up.

If there are other tools that basically implement a simple unshare
usernamespace'ed chroot and/or reproducibility fuzz testing, curious to
hear about them!

I have not gone too far into making any more significant changes to
reprotest. I really have my eye set on trying to remove randomization in
all the variations, and instead deterministically vary things. Did this
for locale variations, but there are plenty of calls to random.choice
left!


In order to test this environment, I also made a very quick example
unreproducible package:

  https://salsa.debian.org/reproducible-builds/notveryreproducible

... which was somewhat inspired by the more comprehensive examples of
unreproducibility:

  https://github.com/bmwiedemann/theunreproduciblepackage


And with those tools in hand (and a newish reasonably fast build
machine), I am now cranking on binutils and gcc, the last major holdouts
in the Debian build-essential set:

  
https://tests.reproducible-builds.org/debian/bookworm/amd64/pkg_set_build-essential.html


live well,
  vagrant


signature.asc
Description: PGP signature


Re: verifiable source-only bootstrap from scratch

2023-03-09 Thread Vagrant Cascadian
On 2023-03-08, aho...@0w.se wrote:
> We seem to be the first project offering bootstrappable and verifiable
> builds without any binary seeds.
>
> The project's website is at [1]
...
> [1] the site is available through the Tor/onion network
> (for the advantages of convenient and privacy-friendly hosting) at
> http://rbzfp7h25zcnmxu4wnxhespe64addpopah5ckfpdfyy4qetpziitp5qd.onion/

Is there a URL other than via tor .onion network to read up on what this
project is actually doing?

While I applaud and support the use of tor, exclusively using tor is a
bit of a surprise and seems to severely limit the scope of people who
will even read about it at all.

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Does diffoscope compares disk partitions

2023-03-01 Thread Vagrant Cascadian
On 2023-03-01, John Gilmore wrote:
>>> So, overall, I actually don't think that diffoscope has the requested
>>> support, and it's not "just" a bug of failed identification.
>
> I have been surprised at how much effort has gone into "diffoscope" as a
> total fraction of the Reproducible Builds effort.  Perhaps it is a case
> akin to the drunk looking for his keys under the streetlight where he
> can see, rather than in the dark where he dropped them.

I daresay it is more akin to someone looking for lost keys by inventing
a flashlight to look near where they dropped them, and they happen to
have dropped the keys into a bin with miscelaneous arbitrary pieces and
bits of things, many off which are shaped and/or sized roughly like your
keys... and wow before nobody thought to make a flashlight shine at so
many different wavelengths, or detect the density of the materials, or
produce a harmonic resonance with certain materials, or a high intensity
laser to burn off all the organic detritus that has accumulated,
and... only to find out that people have been loosing their keys in this
bin for decades, and we found someone else's keys too! Oh, look, with
this small tweak, we could also detect antique coins...


> (It's easier to hack diffoscope than to hack thousands of
> irreproducible packages.)

Fixing reproducibility issues blindfolded does not seem like an
efficient way to fix issues either. We have already fixed tens of
thousands of issues, and have thousands more that we are working on.
Diffoscope is a highly useful tool towards that end.


live well,
  vagrant


signature.asc
Description: PGP signature


Future of reprotest and alternatives (sbuild wrapper)?

2023-02-27 Thread Vagrant Cascadian
I have managed to make some changes to reprotest now and again, but as a
whole, cannot say I can wrap my head around the code enough to maintain
it.

It also contains forks of some autopkgtest code, last updated in 2017,
if I am reading the git logs correctly. It is apparently no longer
working with current versions of qemu with the qemu backends:

  https://bugs.debian.org/1001250

I think it was forked largely to remove Debian-isms in the autopkgtest
code, which looks to be only packaged on Debian derivatives:

  https://repology.org/project/autopkgtest/versions

I am not sure how widespread reprotest is used outside of Debian, though
it has support for Debian, Arch, Fedora and Guix.

Without a maintainer, reprotest can limp along, occasionally gaining a
feature or fixing a bug... but it could really use someone actively
working on it! Not sure what the future looks like with the status
quo...


Are there other tools used on other distros to do similar sorts of
things? Basically reproducibility fuzz-testing... I know Bernard uses
some tooling for openSUSE...


The last few days I have been taking a look at using sbuild (which is
very specific to .deb package building) to implement some of the
functionality that reprotest does. I am certainly not the first person
to have explored or toyed around with this idea:

  https://bugs.debian.org/847805
  https://bugs.debian.org/875445

... and the builds at https://tests.reproducible-builds.org/debian do
something similar with pbuilder instead of sbuild.

Some new developments in sbuild, namely using the "unshare" mode, make
it a little more compelling to me than before, as it is possible to
build without requiring root access on modern Debian systems, blocking
network access to the build, and is fairly easy to set up a working
environment (e.g. mmdebstrap can create a tarball without root
privledges). I know for some distros, these sorts of features are just
integrated into standard build tooling, but this is Debian!

Basically, one needs to generate an sbuild.conf that implements a few
variations, as some options are not possible from the commandline
(although many are!):

  $chroot_mode = 'unshare';
  $run_lintian = 0;
  $build_env_cmnd = '/usr/local/bin/unreproducibility';
  $manual_depends = ['faketime'];
  $external_commands = {
  "chroot-setup-commands" => [
  # /bin/sh -> /bin/bash or /bin/dash
  ['ln', '-svf', '/bin/bash', '/bin/sh'],
  # /usr/share/zoneinfo/Etc/GMT-14 vs. GMT+12
  ['ln', '-svf', '/usr/share/zoneinfo/Etc/GMT-14', '/etc/localtime'],
  # create /usr/local/bin/unreproducibility
  [ 'printf "#!/bin/sh -x\nsetarch linux32 --uname-2.6 faketime \'+397 
days\' \$@" > /usr/local/bin/unreproducibility' ],
  [ 'chmod', '+x', '/usr/local/bin/unreproducibility' ],
  [ 'cat', '/usr/local/bin/unreproducibility' ],
],
  };
  $build_path = '/build/firstbuild/';
  # first build user, blocked by: https://bugs.debian.org/1032046
  #$build_user = 'user';

And then a second build with a different sbuild.conf with different
variations, as needed... and then fire up diffoscope and compare the
two. A wrapper around sbuild like this seems fairly maintainable to me,
at least in theory, but maybe I am just naive and unimaginative... :)


live well,
  vagrant


signature.asc
Description: PGP signature


Re: python datetime .. grrr

2023-02-11 Thread Vagrant Cascadian
On 2023-02-11, Larry Doolittle wrote:
> verilator 5.006-2 in Debian is not reproducible
>   
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/verilator.html
> and I finally figured out why.  It's timezone handling in python3 datetime.
>
> $ cat verilator_doc.py
> # Distilled from upstream verilator docs/guide/conf.py
> # (post-commit 87a7881d46)
> from datetime import datetime
> import os
> try:
> # https://reproducible-builds.org/specs/source-date-epoch/
> doc_now = datetime.fromtimestamp(int(os.environ["SOURCE_DATE_EPOCH"]))
> print("Using SOURCE_DATE_EPOCH")
> except Exception:
> doc_now = datetime.now()
> # Date format to ISO
> today_fmt = doc_now.strftime("%F")
> print(today_fmt)
> $ cat repro 
> export SOURCE_DATE_EPOCH=$(date -d "07 Feb 2023 17:17:27 +0100" +%s)
> echo $SOURCE_DATE_EPOCH
> # https://tests.reproducible-builds.org/debian/index_variations.html
> TZ="/usr/share/zoneinfo/Etc/GMT+12" python3 verilator_doc.py
> TZ="/usr/share/zoneinfo/Etc/GMT-14" python3 verilator_doc.py
> $ sh repro
> 1675786647
> Using SOURCE_DATE_EPOCH
> 2023-02-07
> Using SOURCE_DATE_EPOCH
> 2023-02-08
>
> I've spent more than an hour RTFM and trying different ways to get
> python3 datetime to ignore the local timezone when computing dates.
> No joy.  Surely someone here has learned how to that?

There are some python examples that might be helpful:

  https://reproducible-builds.org/docs/source-date-epoch/

I would recommend linking to the documentation about source-date-epoch
rather than the spec itself that you linked to in your reproducer code,
as it has practical hands-on examples that would be a bit too much to
embed into the specification.

Would it also be possible to simply remove the date from the
documentation? Unless you actually patch the documentation between
versions in debian, using SOURCE_DATE_EPOCH for the publication date is
still somewhat of an abitrary date that might be more misleading than
accurate in many cases...

live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprint Tuesday, January 10th, 16:00 UTC!

2023-01-04 Thread Vagrant Cascadian
First Debian NMU Sprint of 2023... this coming Tuesday, January 10th,
16:00 UTC!

Some past sprints:

  
https://lists.reproducible-builds.org/pipermail/rb-general/2022-November/002756.html

IRC:

  irc.oftc.net #debian-reproducible

Unapplied patches:

  
https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&reproducible=1

Documentation about performing NMUs:

  https://www.debian.org/doc/manuals/developers-reference/pkgs.html#nmu

If you are impatient, try fixing QA packages, as you can upload fixes
without delays:

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_maint_debian-qa.html


live well,
  vagrant


signature.asc
Description: PGP signature


Last Debian NMU Sprint of the year, December, 29th Thursday 17:00 UTC!

2022-12-22 Thread Vagrant Cascadian
On 2022-11-20, Vagrant Cascadian wrote:
> Since the previous sprints were fun and productive, I am planning on
> doing NMU sprints every Thursday in December (1st, 8th, 15th, 22nd,
> 29th). We are planning on meeting on irc.oftc.net in the
> #debian-reproducible channel at 17:00UTC and going for an hour or two or
> three. Feel free to start early or stay late, or even fix things on some
> other day!

Last call! December 29th, 17:00 UTC...

> We will have Debian Developers available to sponsor uploads, so even if
> you can't upload yourself but you know how to build a debian package,
> please join us!
>
> Unapplied patches:
>
>   
> https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&reproducible=1

We have finally closed more patches than we opened! When we started
doing these NMU sprints back in september, it was around ~250 unapplied
patches... and it is down to 249! Great marketing, I know.

I think most remaining patches are *only* two or three years stale...

We might continue these in 2023 and switch up the days and times, and of
course, feel free to go solo and upload an NMU outside of our these
sprints!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: How to talk to skeptics?

2022-12-14 Thread Vagrant Cascadian
On 2022-12-14, Bernhard M. Wiedemann via rb-general wrote:
> a colleague of mine is rather skeptic towards bootstrapping and 
> reproducible-builds.
>
> E.g. he wrote
>
> https://fy.blackhats.net.au/blog/html/2021/05/12/compiler_bootstrapping_can_we_trust_rust.html

This seems to miss the point that the sources *are* auditable, even if
after the fact, even if imperfectly, whereas the binaries are orders of
magnitude harder to audit.

Also curious how to address the bootstrapping problem if a compromised
binary ever worked its way into your blind trust of the upstream
provided binary compiler?

Even if downstream distributions such as OpenSUSE bootstrap from a
binary upstream compiler with each new rust version, I sure would hope
that upstream can *prove* beyond a reasonable doubt that what they
produced is legit in an auditable way... and while I am biased, it seems
a bootstrappable and reproducible build is the best current known way to
have very high confidence...


That many people use rustup to install rust and nothing has (noticeably)
gone horribly wrong yet does not win me over in any argument regarding
security. The https://rustup.rs recommendation of:

  curl ... | sh

... is relying on the weakest link in the chain of "trusted" certificate
authorities; a security vulnerability that is not so much a back door
vulnerability, as a wide open front door with the lights on in the dead
of night.


The argument that you can't trust the source code is a valid and
important concern, but outside the scope of reproducible builds, and
there are ways of addressing that through peer review of source code,
independent third-party review, and fastidious audit logs of who
committed what.

The bugdoor argument kind of falls down eventually, because logically,
if someone can trivially inject plausible but incorrect source code (and
well...  I guess they can), why bother reviewing source code at all? Why
bother tracking who committed it at all? Since it is impossible to
perfectly review source code, may as well not do any kind of review at
all... right? Uh, no.

All review and auditing processes will catch some bugs, and all security
measures raise the bar by some degree... using all known best practices
will catch as much as we can plausibly catch with our non-infinite
resources, despite being imperfect.


I wonder if the reproducible builds focus on bit-for-bit identical
perfection gets peoples head stuck in the idea of perfection in all
ways? While bit-for-bit identical builds are possible, we do not claim
it is absolute, incontrovertable proof of a perfect build. It just just
one measure of confidence amoung many. A good measure, in my opinion,
but just one tool.


Compromised compilers most definitely have been released into the
wild. It is getting a little old now, but XcodeGhost (a.k.a. Strawhorse)
falls squarely into this category:

  https://en.wikipedia.org/wiki/XcodeGhost

Even without more current examples, even though it is difficult to pull
off... it is clearly possible, has been done, and been executed by well
funded entities in the past... and is, by design, hard to detect. I have
no reason to believe that was a one-off playground experiment.


And yes, you eventually get down to how do you trust hardware... there
are a lot of rabbit holes here, and at the end of the day, you need to
prioritize what is the next important thing is, or what gets you the
most value in the short, medium and long term.

Bootstrappable and Reproducible Builds is probably more in the medium to
long term realm... yet can demonstrate some benefits almost
immediately... if you only focus on the short term, the long-term work
will never happen. I daresay that what the world needs now is a bit more
long-term thinking in general.


> and the effect can also be seen in his packaging such as
> https://build.opensuse.org/package/show/openSUSE:Factory/rust1.65
> that ships with two gigabytes of bootstrap compiler binaries for various 
> architectures instead of using our existing rust packages of version N-1 
> "because compilation takes twice as long".
>
> He also once pointed me to
> https://blog.cmpxchg8b.com/2020/07/you-dont-need-reproducible-builds.html

And for a more light-hearted take...

You don't *need* computers either. :)

In a similar vein:

  https://xkcd.com/2368/

Especially I think the alt-text nailed it.


> In the end, it would be useful to collect some well-worded / 
> well-thought counter-arguments on r-b.o (if we don't have that already)
>
> https://reproducible-builds.org/docs/buy-in/ could provide some input.
>
> Any thoughts and/or volunteers?


I think Morten Linderud had really good points when this came up before:

  
https://lists.reproducible-builds.org/pipermail/rb-general/2020-August/002008.html


live well,
  vagrant


signature.asc
Description: PGP signature


Re: buildinfo question

2022-12-13 Thread Vagrant Cascadian
On 2022-12-13, James Addison via rb-general wrote:
> As Debian's buildinfo[1] wiki page hints, it's difficult to determine
> whether a build dependency is genuinely required at build-time,
> compared to: it was required in the past, but has become dependency
> cruft.
>
> I was wondering: are there reproducible-builds efforts underway (in
> Debian or other ecosystems) to determine the packages that were
> involved (first-pass approximation: at least one file belonging to the
> package was read from the filesystem by a child of the build process
> -- anything else?) during a reproducible package build?

I have certainly used an ad-hoc simpler but related technique, building
with and without a build-dependency, and checking for bit-for-bit
identical results.

It would be interesting to do something more systematic like your
suggestion, though I'm not aware of anything at the moment.

live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprints in December, Thursdays 17:00 UTC!

2022-11-20 Thread Vagrant Cascadian
Since the previous sprints were fun and productive, I am planning on
doing NMU sprints every Thursday in December (1st, 8th, 15th, 22nd,
29th). We are planning on meeting on irc.oftc.net in the
#debian-reproducible channel at 17:00UTC and going for an hour or two or
three. Feel free to start early or stay late, or even fix things on some
other day!

We will have Debian Developers available to sponsor uploads, so even if
you can't upload yourself but you know how to build a debian package,
please join us!

Unapplied patches:

  
https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&reproducible=1

This list is sorted by the oldest bugs with patches not marked pending,
so we can target bugs that have just stalled out for whatever reason,
but feel free to pick bugs that scratch your particular itch.

We will want to make sure the patch still applies and/or refresh the
patches, make sure it still solves the issue, and update the bug report
where appropriate.

Documentation about performing NMUs:

  https://www.debian.org/doc/manuals/developers-reference/pkgs.html#nmu

We will be uploading the to the DELAYED queue (presumably between 10 and
15 days).

If the package has been orphaned we can generally upload without delay
(check the https://tracker.debian.org/PACKAGE page which usually lists
this) and mark it as maintained by "Debian QA Group
" if needed.

If you are impatient, try fixing QA packages, as you can upload fixes
without delays:

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_maint_debian-qa.html


Let's fix some bugs!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: citests vs. (verification |re)builds

2022-11-14 Thread Vagrant Cascadian
On 2022-11-13, Vagrant Cascadian wrote:
> On 2022-11-13, kpc...@archlinux.org wrote:
>> On 11/13/22 22:59, Vagrant Cascadian wrote:
>> They both serve different purposes, Build Environment Fuzzing helps 
>> detect issues before they show up during Verification Builds but can 
>> also mislead, if you already have a diverse set of Verification Builders 
>> and they never run into the issue, is there an issue to begin with?
>
> With normalized build environments, this is significantly less of an
> issue... so I can see what you are getting at!
>
> Here are some other angles to consider...
>
> Doing two consecutive builds and comparing them is really helpful to see
> if toolchain fixes actually worked or not, without re-uploading all the
> packages to the "official" archive; doing verification builds would
> necessarily use the unfixed toolchains of the original .buildinfo file.

It also just occured to me the importants of regression testing to see
if toolchain updates break reproducibility... perhaps even more
important that checking for fixes.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: citests vs. (verification |re)builds

2022-11-13 Thread Vagrant Cascadian
On 2022-11-13, kpc...@archlinux.org wrote:
> On 11/13/22 22:59, Vagrant Cascadian wrote:
>> I'm not sure how exactly to structure a rewording or adjustment of the
>> website and whatnot, but would like to start the conversation, at least!
>
> Thanks for bringing this up, maybe we should be more explicit what's 
> being tested, this is currently not clear when looking at 
> https://reproducible-builds.org/citests/.

Yeah, clarifying what type of builds are performed would be very
helpful...


> I'd suggest having a page (and also place it more prominently) that is 
> more explicit around this:
>
> -- 8< --
>
> ## Verification Builds (imo this is the only true reproducible builds)
>
> Binary artifacts are downloaded and compared to binaries built from 
> source (using the official buildinfo file as additional build input, if 
> the projects needs one for reproducible builds).
>
> https://reproducible.archlinux.org/ (Arch Linux)
> https://beta.tests.reproducible-builds.org/ (Debian, Qubes)
> https://r-b.engineering.nyu.edu/ (Arch Linux)
> https://rebuilderd.dustri.org/ (Tails)
>
> ## Build Environment Fuzzing
>
> The source code is downloaded and built 2+ times in a diverse set of 
> environments.
>
> https://tests.reproducible-builds.org/archlinux/
> https://tests.reproducible-builds.org/coreboot/
> https://tests.reproducible-builds.org/debian/
> https://tests.reproducible-builds.org/freebsd/
> https://tests.reproducible-builds.org/netbsd/
> https://tests.reproducible-builds.org/openwrt/
> https://reproducible-builds.openeuler.org/
> https://www.yoctoproject.org/reproducible-build-results/

This seems like a reasonable start, thanks!


> ## Unclear
>
> I don't know what these services are doing, can somebody help categorize 
> them?
>
> https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility

This is a bit hard to place... neither really "verification" nor
"fuzzing" builds.

It is comparing builds from (at least) two different build farms
(a.k.a. substitute servers). The build environment is largely
normalized, but some differences do reveal themselves (e.g. differing
cpu implementations).

Any user of guix could build a given package, and compare the results to
publicly available substitute servers. Guix is fundamentally a
source-based distro; binary substitutes of packages are an opt-in
(opt-out?) performance optimization...

I'm not sure there is an equivalent to a .buildinfo; there is a
derivation file, which describes the inputs used in the build process,
and which are used to define a given build as essentially a hash of the
inputs. The .narinfo might contain the derivation file and I think the
signed hashes of the build result, which starts to look almost like a
.buildinfo file... but I'm not sure.

There is no "original" build in the guix context to verify against, but
it is (at least) two builders comparing against each other... so is
arguably not a "verification build" but the build environment is
normalized as much as possible instead of intentionally fuzzing the
build environment.


> https://r13y.com/

I would guess Nix is similar to Guix here, but do not know for sure.


> They both serve different purposes, Build Environment Fuzzing helps 
> detect issues before they show up during Verification Builds but can 
> also mislead, if you already have a diverse set of Verification Builders 
> and they never run into the issue, is there an issue to begin with?

With normalized build environments, this is significantly less of an
issue... so I can see what you are getting at!

Here are some other angles to consider...

Doing two consecutive builds and comparing them is really helpful to see
if toolchain fixes actually worked or not, without re-uploading all the
packages to the "official" archive; doing verification builds would
necessarily use the unfixed toolchains of the original .buildinfo file.

Fuzzing does sometimes reveal bugs that would not be revealed in a
normalized build environment; for example, I have seen error messages
reproducibly getting embedded in manpages, but fuzzing with a different
locale actually revealed the issue(or some other case), and actually
lead to removing what triggered the embedded error in the first
place. Fuzzing in this way can actually reveal papered-over or hidden
issues.

Fuzzing style building can gain greater confidence in the results with a
smaller number of builders. It does not rely as much on unintentional
subtle differences between a diverse pool of builders discovering
issues, because you are actively seeking out issues through intentional
variations.

In conclusion, I still think they're all valuable; let us all scratch
our own particular itches. :)


> I also thin

citests vs. (verification |re)builds

2022-11-13 Thread Vagrant Cascadian
So, when going to check the reproducibility status of a package in
archlinux, I went to:

  https://reproducible-builds.org/citests/

Which has a link for archlinux tests:

  https://tests.reproducible-builds.org/archlinux/

But I was informed that those tests are not really working...

And looking at various other projects on the "citests" page, a few of
those I suspect are not working all that well either...


It was suggested I really should look at the tests referred to at:

  https://reproducible-builds.org/who/projects/

Which links to:

  https://reproducible.archlinux.org/

Which gets us to the question of continuous integration test builds
vs. rebuilds a.k.a. verification builds. They're both useful for
slightly different purposes, and it might be good to clarify the
distinction on the website somehow?

In either case ... it would be nice to give a heads-up on parts of the
infrastructure that are known to have issues...

I'm not sure how exactly to structure a rewording or adjustment of the
website and whatnot, but would like to start the conversation, at least!


live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprint Thursday November 17th 17:00 UTC!

2022-11-12 Thread Vagrant Cascadian
On 2022-11-11, Chris Lamb wrote:
> Can you clarify whether you meant *Wednesday* November 16th or
> Thursday November *17th*? :)

Oops! The 17th!

live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprint Thursday November 17th 17:00 UTC!

2022-11-11 Thread Vagrant Cascadian
On 2022-11-11, Chris Lamb wrote:
> Can you clarify whether you meant *Wednesday* November 16th or
> Thursday November *17th*? :)

Oops! Thursday November 17th!

live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprint Thursday November 16th 17:00 UTC!

2022-11-10 Thread Vagrant Cascadian
We were productive and had some fun with the previous NMU sprints:

  
https://lists.reproducible-builds.org/pipermail/rb-general/2022-September/002689.html

So we are planning on meeting on irc.oftc.net in the
#debian-reproducible channel at 17:00UTC and going for an hour or two or
three.

We will have Debian Developers available to sponsor uploads, so even if
you can't upload yourself but you know how to build a debian package,
please join us!

Unapplied patches (currently 287):

  
https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&reproducible=1

This list is sorted by the oldest bugs with patches not marked pending,
so we can target bugs that have just stalled out for whatever reason,
but feel free to pick bugs that scratch your particular itch.

We will want to make sure the patch still applies and/or refresh the
patches, make sure it still solves the issue, and update the bug report
where appropriate.

Documentation about performing NMUs:

  https://www.debian.org/doc/manuals/developers-reference/pkgs.html#nmu

We will be uploading the to the DELAYED queue (presumably between 10 and
15 days).

If the package has been orphaned we can generally upload without delay
(check the https://tracker.debian.org/PACKAGE page which usually lists
this) and mark it as maintained by "Debian QA Group
" if needed.


Let's fix some bugs!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Debian NMU Sprint Thursday 16:00 UTC!

2022-11-08 Thread Vagrant Cascadian
On 2022-11-08, Chris Lamb wrote:
>> > We are planning on meeting on irc.oftc.net in the #debian-reproducible
>> > channel at 16:00UTC and going for an hour or two or three.
>>
>> It was fun, so we hope to do this roughly every two weeks!
>> Next one is thus planned for Thursday, October 6th, 16:00 UTC!
>
> I enjoyed the sprint on October 6th and found it both fun and
> productive; can we schedule another one...?

Basically, I'm usually up for it any Thursday at that time slot. Two
days might be a bit too soon to schedule(though I would do it if someone
else wanted to!)...

  how about the November 17th and December 1st? 16:00 UTC or 17:00 UTC?

I did a solo one on October 20th and still got a couple NMUs in:

  msp430mcu_20120406-2.3_source.ftp-master.upload
  checkpw_1.02-1.2_source.ftp-master.upload

And did the legwork that lead to a QA upload the following day:

  madlib_1.3.0-3_source.ftp-master.upload


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Please review the draft for September's report

2022-10-05 Thread Vagrant Cascadian
On 2022-10-05, David A. Wheeler wrote:
>> On Oct 5, 2022, at 3:50 PM, Chris Lamb  wrote:
>> Please review the draft for September's Reproducible Builds report:
>> 
>>  https://reproducible-builds.org/reports/2022-09/?draft
>
> As always, thanks! A few proposed tweaks below.
>
> --- David A. Wheeler
>
> 
>
> First, an easy nit:
>   s/David Wheeler/David A. Wheeler/g
> if you would please. I do answer to "David" (and many other things!).
> However, there are a ridiculous number of "David Wheelers", so I always use 
> my initial
> initial in written materials to reduce confusion.

Fixed.

> Second:
>> David Wheeler also pointed out that there are some potential upcoming 
>> changes to the OpenSSF Best Practices badge for open source software in 
>> relation to reproducibility. Whilst the badge programme has three 
>> certification levels (β€œpassing”, β€œsilver” and β€œgold”), the β€œgold” level 
>> includes the criterion that β€œThe project MUST have a reproducible build”.
> This was merely a proposal for a change, based on some projects' requests -
> whether or not it happens depends on feedback!
> Indeed, based on current feedback I doubt it'll go anywhere. So I think it'd 
> be clearer written this way:
>> David A. Wheeler also posted a proposed change to the OpenSSF Best Practices 
>> badge for open source software in relation to reproducible builds. Whilst 
>> the badge programme has three certification levels (β€œpassing”, β€œsilver” and 
>> β€œgold”), the β€œgold” level includes the criterion that β€œThe project MUST have 
>> a reproducible build”.
> Then delete the "However," that follows it, and change "we raised" to "were 
> raised".

Done. Noticed another typo when I added that part too ...

Thanks!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Debian NMU Sprint Thursday 16:00 UTC!

2022-09-22 Thread Vagrant Cascadian
On 2022-09-21, Vagrant Cascadian wrote:
> We are planning on meeting on irc.oftc.net in the #debian-reproducible
> channel at 16:00UTC and going for an hour or two or three.

It was fun, so we hope to do this roughly every two weeks!
Next one is thus planned for Thursday, October 6th, 16:00 UTC!

> We will have at least two Debian Developers available to sponsor
> uploads, so even if you can't upload yourself but you know how to build
> a debian package, please join us!
>
>
> Unapplied patches (currently ~250):
>
>   
> https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&fnewerval=7&flastmodval=7&reproducible=1&sortby=last_modified&sorto=asc&format=html#results

bugs closed: 1
bugs marked pending: 7
bugs followup: 2

Excluding bugs marked pending, I think we're down to 232 patches to
upload, ideally before the freeze in January.

Seemed like Holger focused on bugs with a high popularity contest score,
and I focused on bugs that haven't seen activity since 2015!


h01ger:
- mailed #1010957 asking for an update and whether to remove the patch tag 
for now
- uploaded src:gmp to DELAYED/15 fixing #1009931
- mailed  #1017372 and asked for maintainer opinion on the patch
- uploaded src:time to DELAYED/15 fixing #983202

vagrant:
- verify and update patch for mylvmbackup https://bugs.debian.org/782318
- uploaded mylvmbackup to DELAYED/10
- verify/update patches for libranlip
  https://bugs.debian.org/788000
  https://bugs.debian.org/846975
  https://bugs.debian.org/1007137
- uploaded libranlip to DELAYED/10
- verified patch for cclive https://bugs.debian.org/824501
- uploaded cclive to DELAYED/10
- was unable to reproduce the issue with two patches:
  #791423 linuxtv-dvb-apps: please make the build reproducible
Marked as done
  #794398 clhep: please make the build reproducible
Uncertain of status


live well,
  vagrant


signature.asc
Description: PGP signature


Debian NMU Sprint Thursday 16:00 UTC!

2022-09-21 Thread Vagrant Cascadian
Holger and I were chatting about doing more Debian NMUs
(Non-Maintainer-Uploads) to clear the huge backlog of reproducible
builds patches submitted... and we may as well get started this
Thursday!

We are planning on meeting on irc.oftc.net in the #debian-reproducible
channel at 16:00UTC and going for an hour or two or three.

We will have at least two Debian Developers available to sponsor
uploads, so even if you can't upload yourself but you know how to build
a debian package, please join us!


Unapplied patches (currently ~250):

  
https://udd.debian.org/bugs/?release=sid&patch=only&pending=ign&merged=ign&done=ign&fnewerval=7&flastmodval=7&reproducible=1&sortby=last_modified&sorto=asc&format=html#results

The list is sorted by activity, so we can target bugs that have just
stalled out for whatever reason, but feel free to pick bugs that scratch
your particular itch.

We will want to make sure the patch still applies and/or refresh the
patches, make sure it still solves the issue, and update the bug report
where appropriate.


Documentation about performing NMUs:

  https://www.debian.org/doc/manuals/developers-reference/pkgs.html#nmu

We will be uploading the to the DELAYED queue (presumably between 10 and
15 days).


If this is fun and productive, we might keep doing this approximately
once or twice a month!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Making reproducible builds & GitBOM work together in spite of low-level component variation

2022-06-24 Thread Vagrant Cascadian
On 2022-06-24, David A. Wheeler wrote:
>> On Jun 22, 2022, at 2:28 PM, Vagrant Cascadian 
>>  wrote:
> Fair enough. Let's use Debian as an example. The "typical"
> way I've seen Linux kernel headers installed would be by running:
>
>> sudo apt install linux-headers-$(uname -r)
>
> This command would *NOT* work any more with reproducible builds if GitBOM is 
> used
> and the kernel is updated. Even if the headers don't change the resulting
> *executable* code, the GitBOM hashes would. be recorded in the resulting
> compiled objects (e.g., ELF files), and they would be *different*. What's 
> more,
> since the GitBOMs are transitive, all the generated executables would be 
> transitively different.
>
> The solution is either to run on the same old kernel (e.g., in a VM), or to 
> install
> the linux-headers-VERSION for the build being reproduced (NOT for the actual 
> running kernel).
> The latter *does* work fine for a container (as I noted earlier).

Right.

The only issue I see is if you're somehow guessing about the header
files to use from the running kernel (e.g. "uname -r"), rather than
using the information present in your build metadata (e.g. .buildinfo,
GitBOM, etc.) ... you need to consistently install the same packages
and/or files in your build environment to get something to consistently
build reproducibly.

In general, this is accomplished with chroots, containers or virtual
machines purpose-built to only contain the packages and/or files needed.

Obviously this only works if your build environment is completely set up
before the build process starts; if the build process downloads stuff
from the network, all manner of non-determinism can work into the
builds!


Do you have another concrete example of something that might get
inferred, rather than explicitly defined in the GitBOM? It would seem a
bug in the build process to have implicit inputs rather than explicit
ones.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Making reproducible builds & GitBOM work together in spite of low-level component variation

2022-06-22 Thread Vagrant Cascadian
On 2022-06-22, Vagrant Cascadian wrote:
> On 2022-06-22, David A. Wheeler wrote:
>> GitBOM is explained at <https://gitbom.dev/>. As they explain it, its 
>> purpose is to:
>>  β€’ Build a compact Artifact Dependency Graph (ADG), tracking every 
>> source code file incorporated into each built artifact.
>>  β€’ Embed a unique, content-addressable reference for that Artifact 
>> Dependency Graph (ADG), the GitBOM identifier, into the artifact
>> at build time.

In my previous reply, I somehow glazed over the fact that the ADG and
GitBOM identifier are embedded in the artifacts at build time...

I can see the value in embedding provenence information in the build
artifacts, but that makes reproducible builds considerably harder to
achieve if it is recording *everything* about the build environment.

Because GitBOM metadata is intentionally included in the build artifacts
themselves, maybe GitBOM should be very discriminating in what is
included in the GitBOM; I don't imagine GitBOM records the running cpu
microcode, but that is arguably just as relevent as the running
kernel...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Making reproducible builds & GitBOM work together in spite of low-level component variation

2022-06-22 Thread Vagrant Cascadian
On 2022-06-22, David A. Wheeler wrote:
> The challenge is that I believe that there will be subtle variations in 
> inputs caused by
> very low-level components, particularly kernels & but also potentially also 
> low-level
> runtimes like the C runtime. This could result it irreproducibility of 
> anything with GitBOMs
> if the whole process is applied without some corrective factor.
>
> I'm going to use the Linux kernel as an example here. That said,
> I suspect the issue is broader (it would at least apply to any kernel).
>
> Programs running on a Linux kernel eventually must call the kernel.
> To support this, the Linux kernel provides a mechanism to export its API. See
> "exporting kernel headers" here:
> https://docs.kernel.org/kbuild/headers_install.html#:~:text=The%20linux%20kernel's%20exported%20header,used%20with%20these%20system%20calls.
>
> These header files are either used directly by programs to call to the kernel,
> or are processed & converted into other files that end up getting embedded in
> intermediate runtimes (typically the C runtime).
>
> But here's the thing: kernel header files change on basically every release,
> e.g., to add new system calls or new flags. In practically all cases these 
> changes
> don't change the result of executing a build, and thus don't currently 
> interfere
> with reproducible builds. If GitBOM data is added, however, this variance will
> cause different hashes to be included, causing all build results 
> (transitively) to be
> different when you use an even *slightly* different kernel version.

> POTENTIAL SOLUTIONS
>
> Here are some potential solutions I can see:
>
> 1. For reproducible builds, rebuild on *EXACTLY* the same kernel version, C 
> library, etc.
>   This means that you can't just use containers to control rebuilds, since 
> typically containers are
>   designed to be able to run on arbitrary kernels & people normally upgrade 
> containers.
>   You'll need to build on whole new VMs with specifically-configured kernels,
>   *NOT* just embed this in containers. You also need to record exactly which
>   kernel was used to compiler it.

This seems more relevent for the way GitBOM records provenance
information than it does for achieving a reproducible build.

Kernel version differences are tested on Debian's 31k+ packages:

  https://tests.reproducible-builds.org/debian/index_variations.html

Most of the reproducibility issues I've encountered seem to be embedding
the kernel version, not header data. 

I don't recall off the top of my head how many packages have been
manually fixed, but the remaining packages in debian that are affected
by kernel version differences amount to about 30 packages out of 31k+
total:

  
https://tests.reproducible-builds.org/debian/issues/bookworm/captures_kernel_version_issue.html
  
https://tests.reproducible-builds.org/debian/issues/bookworm/captures_kernel_version_via_CMAKE_SYSTEM_issue.html

So from a Reproducible Builds pespective, the running kernel should not
really matter... and that is a good thing!


> 2. For reproducible builds, redirect header file content requests so they use 
> the same
>header files, etc., as the original build. GitBOM doesn't care what the 
> underlying kernel
>version is really, it's just recording the inputs *used*. This means 
> containers can
>once again be used, even when the kernel changes, but it does complicate
>performing reproducible builds.

Feels a bit unclean... either you record what you care about honestly,
or you decide you don't care about it and don't record it. I think the
key is the transparency about the process.


> 3. Have compiler flags/configurations to *omit* certain files from the GitBOM 
> results.
>   After all, you're not actually *including* the kernel in the generated 
> results, so it makes
>   sense to omit those files from the point of view of "what is being included 
> in this application"?
>   Ed Warnicke hates this idea, because it creates a "blind spot" in GitBOM.

Yeah, I can see why someone would not like this approach.


> 4. Tweak the definition of reproducible builds so that it's a bit-by-bit 
> identical
> copy of a specified artifact, but the artifact can be *part* of a file.

In practice, there are a few cases where this is done, e.g. .apk and
.rpm files embed signatures which need to be stripped out for
reproducibility comparison.

Excluding some bits and verifying the rest adds complication to the
verification process, and thus opportunities for errors, and I believe
at least once resulted in incorrect results due to bugs in the
verification process...


> Basically, the checked artifacts are the files NOT including GitBOM.
> Since the *executed* parts would be identical, just not certain metadata,
> the risk of subverted code seems small. Sure, someone might slip secrets
> into the unchecked parts, but that's not really why most people are interested
> in reproducible builds.

Presuming I am understanding GitBOM correc

Re: Reproducibility of "core" packages in GNU Guix

2022-06-01 Thread Vagrant Cascadian
On 2022-05-02, Vagrant Cascadian wrote:
> $ guix challenge --diff=none $(cat guix-base-set)
>
> /gnu/store/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5 contents 
> differ:

Proving more difficult than I'd hoped for, smallish diffs in the .ko
files and in the bzImage and System.map, but nothing obvious leaping out
at me. The corresponding files are reproducible in Debian bookworm...

Working on this lead me to notice a bug in diffoscope at least:

  https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/305


> /gnu/store/7qz2jlghm4gc87jww5j24c5mcip0whzy-keyutils-1.6.3 contents differ:

Patch:

  https://issues.guix.gnu.org/55758

There was already a patch in debian to set the date using an environment
variable. Might be worth working up a patch to support SOURCE_DATE_EPOCH
and push it upstream... or nudging upstream to drop the timestamp
entirely. :)


> /gnu/store/ajw8nnrnd6hr183skwqdgc8c7mazg97h-isl-0.23 contents differ:

Patch:

  https://issues.guix.gnu.org/55757

Disabling parallel building in guix fixes it for me consistently,
although Debian's "isl" package is reproducible but... builds with
parallelism. (well, not reproducible on i386, but who's really
counting?)

What about other distros? Do you do anything to make "isl" reproducible?


> /gnu/store/45b6181w68a3lprx9m6riwgyinw3y145-guix-1.3.0-25.c1719a0 contents 
> differ:
> /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8 contents differ:

Both of these were not *just* due to parallelism as I'd
hoped... inscrutible guile...


So, 2 out of the 5 remaining packages have plausible fixes (out of 47
total)... not too bad!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: What should be the proper practice to manage `.dsc` files on Reprepro?

2022-05-27 Thread Vagrant Cascadian
On 2022-05-27, David A. Wheeler wrote:
> I think that in general *signatures* should be separated from *what
> they are signing*, preferably by being different files.
>
> This solves reproducibility problems. It also solves other problems,
> e.g., it's quite possible for multiple people to sign something (e.g.,
> "I also approve of this") - that shouldn't change what was being
> signed.
>
> Of course, it's possible to interpret a file as two parts, the part
> being signed & the signatures. So it's definitely *possible* to
> combine them. But there are risks that the split will be implemented
> incorrectly. It's easier if they're always handled separately; it
> reduces the risk of incorrect handling.

Absolutely agreed!

That said, in the context of Debian's .dsc files, there are decades of
history here that are not likely to change... any time soon.

Though really, the .dsc files themselves are in fact the signed metadata
here; the actual data is the PAKAGAGE_VERSION.orig.tar.gz and other
files that the .dsc file references. Though that still makes it a little
tricky to have multi-party signatures...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: What should be the proper practice to manage `.dsc` files on Reprepro?

2022-05-26 Thread Vagrant Cascadian
On 2022-05-26, Yaobin Wen wrote:
> In my company, we use *Ubuntu (18.04)* and are practicing reproducible
> builds. Our code is built into a lot of .*deb* packages using *debuild* (and
> related tools). We have made a lot of effort to make our builds
> reproducible by following the Achieve deterministic builds
>  and Known issues related to
> reproducible builds
> . We have
> made a lot of progress and are still working on it.

It is great to hear of your work on reproducible builds!


> We set up a company-wise *Reprepro server to serve the Debian packages
> that we build regularly*. We publish both *.deb* files and the *GPG-signed
> .dsc* files to the Reprepro server. BTW, our build system is designed to do
> an "*change-only build*": if a package is not changed since the last build,
> i.e., its *changelog* is not changed, the package is not built again this
> time. *Their .deb and .dsc files are still added to Reprepro but because
> these files remain unchanged, Reprepro can successfully "add" them (but in
> fact they are skipped)*. I figured this point may be important to
> understand why I have my questions below.

I don't follow what you mean by "the package is not built again this
time" and "Their .deb and .dsc files are still added to Reprepro". How
can a package be added if it is not built?

What's unclear to me is why you're uploading the same .deb and .dsc
files to the same reprepro repository multiple times... ?


> Although we have solved many reproducibility issues in the .*deb* files, *I
> found the .dsc files were changed when* I rebuilt the packages (by deleting
> the previously built *.deb* and *.dsc* files) so Reprepro refuses to
> include them and reports the following error:
>
> ERROR: '' cannot be included as
>> 'pool/'.
>> Already existing files can only be included again, if they are the same,
>> but:
>> md5 expected: , got: 
>> sha1 expected: , got: 
>> sha256 expected: , got: 
>
>
> *diffoscope* told me the `.*dsc*` files *only differ in their GPG
> signatures* - the related source tarball (.orig.tar.gz) and
> debian tarball (.debian.tar.xz) *have not changed between
> builds.*

That's to be expected...


> I understand that, because as this SO answer says
> , the GPG signature is
> generated using the creation time as an input. I found the issue
> cryptographic_signature
> 
> that
> made me think we should not have signed our .*dsc* files, but the Debian
> Admin's Handbook
> 
> shows that the .*dsc* files are supposed to be signed by the maintainers.
> In addition, in the Known Issues list
> , I didn't
> seem to find any issue that's related with the .*dsc* files.

If you want to know which party claims to have built a given .dsc file,
you need the signatures on them. If you track that information some
other way, you *could* use unsigned .dsc files...

Or you could re-use the original .dsc files, if all the contents they
reference are bit-for-bit identical. If you want to store the new ones
somewhere else as a "proof of having rebuilt it again" you could do
that, but obviously not in the exact same repository.


> *After reading around, I'm guessing my understanding about
> reproducible builds may not be totally correct, so I want to ask here:*
>
>1. *Should the .dsc files be reproducible, too?* Because Reprepro can
>manage .*dsc* files, I've been thinking that .*dsc* files should be
>reproducible, but now it seems not?

If you build them in the same build environment, with the same source
code, they should be reproducible *minus the signatures*, as you've
noted...

Generally, from a reproducible builds perspective, the .dsc file is
considered an input to the build process rather than a result of a build
process. Though admittedly, .dsc files are themselves artifacts of a
source-only build, so it is a bit of a grey area.


>2. In my case, since my company maintains both .*deb* files and .*dsc* 
> files
>in Reprepro, if one day we need to build the code of an earlier version, we
>would inevitably generate different .*dsc* files because of the GPG
>signatures. *Am I supposed to publish the .dsc files to the same
>Reprepro server that we maintain our regular build?* Because I've been
>thinking .*dsc* files should also be reproducible, I've been thinking we
>should keep using the same Reprepro server. *But now it looks like we
>need to prepare a second Reprepro server to hold the packages of the
>earlier version.*

So you're looking to be able to recreate your whole repository from
scratch (maybe from git repositories or some other VCS?) at

Re: Help with arm64 binaries build reproducibility issue

2022-05-22 Thread Vagrant Cascadian
On 2022-05-22, Luca Boccassi wrote:
> We have been having an issue with making the systemd build reproducible
> on arm64. On x86 it's all fine, but on arm there are differences in the
> built binaries that I cannot explain - I don't speak arm assembly so I
> can't really tell where they are coming from.

I brought up a thread about armhf back in December:

  https://lists.debian.org/debian-arm/2021/12/threads.html#7

The short of it was that it was doing something dependent on the running
kernel, rather than the userspace architecture.

But... for arm64, I don't think we actually vary the kernel on
tests.reproducible-builds.org ... so there are other reproducibility
issues at play.

Maybe try debian-...@lists.debian.org


live well,
  vagrant


signature.asc
Description: PGP signature


Re: faketime breaks quilt patched file times in Debian

2022-05-21 Thread Vagrant Cascadian
On 2022-05-01, Vagrant Cascadian wrote:
> On 2022-05-01, Holger Levsen wrote:
>> On Sat, Apr 30, 2022 at 03:53:13PM +0200, Roland Rosenfeld wrote:
>>> [tl;dr faketime results on broken file timestamps for quilt patched
>>> files on salsa]
...
> reprotest uses faketime to implement time variations, and salsa-ci has
> reprotest pipelines...
>
> This reminds me to explore user namespaces for reprotest (on the hope
> that maybe time namespaces would be possible too...).

time namespaces basically seem to only be useful for things like
"tricking" uptime, not actually modifying the time that the OS reports.

In searching for information on time namespaces, there was some
discussion of implementing actual hardware clock time namespace features
if there were any use-cases; that might be worth exploring with linux
upstream at some point!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Reproducibility of "core" packages in GNU Guix

2022-05-02 Thread Vagrant Cascadian
On 2022-05-02, zimoun wrote:
> On Mon, 02 May 2022 at 06:11, Vagrant Cascadian 
>  wrote:
>> $ guix challenge --diff=none $(cat guix-base-set)
...
>> The fact that the guix and guile packages do not build reproducibly is a
>> little disappointing as they're both so central to guix itself; I
>> suspect parallelism triggers those reproducibility issues(from
>> experience with Debian), though that may just reveal other issue in
>> guile itself.
>
> About Guix, probably bug#44835 [1] for one, I guess.  And note this old
> Guile bug#20272 [2] for two, which implies unreproducible Guix.
>
> 1: <http://issues.guix.gnu.org/issue/44835>

This one is regarding build paths, and since guix normalizes the build
path in the build environment, it should not affect builds in guix...

> 2: <https://issues.guix.gnu.org/issue/20272>

But this one definitely touches on the parallelism issue!

Thanks for the links! I remembered commenting on them... just didn't
find the links with a quick search...



live well,
  vagrant


signature.asc
Description: PGP signature


Reproducibility of "core" packages in GNU Guix

2022-05-02 Thread Vagrant Cascadian
On 2022-04-27, Vagrant Cascadian wrote:
> Lately, I've been trying to get a handle on the status of the really
> core packages in Debian
...
> I'd also be really curious to hear about the status of similar package
> sets in other distros!

With my metaphorical guix hoodie[1] on...

$ guix describe

Generation 73   May 02 2022 05:21:25(current)
  guix 9dafaf1
repository URL: /home/vagrant/src/guix
branch: master
commit: 9dafaf163574edca5cb4eac0f8dc3edbb0ef0a75

$ guix challenge --diff=none $(cat guix-base-set)

/gnu/store/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5 contents differ:
  no local build for 
'/gnu/store/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5'
  
https://ci.guix.gnu.org/nar/zstd/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5:
 19rg55v51wliy9v30sm82f38rxm1lqjpfqs6r63ikb3vklnj0pnw
  
https://bordeaux.guix.gnu.org/nar/lzip/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5:
 14fax6g9sx7qj64z73hrh8ydlbv6kxzhd1hbyqz7v0ra51bprv1k
/gnu/store/7qz2jlghm4gc87jww5j24c5mcip0whzy-keyutils-1.6.3 contents differ:
  no local build for 
'/gnu/store/7qz2jlghm4gc87jww5j24c5mcip0whzy-keyutils-1.6.3'
  
https://ci.guix.gnu.org/nar/lzip/7qz2jlghm4gc87jww5j24c5mcip0whzy-keyutils-1.6.3:
 1sag2bq9kbp5np3fpakyi4xg96kxq5xwbb7ib4hamx2bqh6vscr9
  
https://bordeaux.guix.gnu.org/nar/lzip/7qz2jlghm4gc87jww5j24c5mcip0whzy-keyutils-1.6.3:
 07ln4fqgvg0ag2d881xhgdw2h3m1lqzs6xlac8p7rz2rgx0wx1yr
/gnu/store/ajw8nnrnd6hr183skwqdgc8c7mazg97h-isl-0.23 contents differ:
  no local build for '/gnu/store/ajw8nnrnd6hr183skwqdgc8c7mazg97h-isl-0.23'
  https://ci.guix.gnu.org/nar/lzip/ajw8nnrnd6hr183skwqdgc8c7mazg97h-isl-0.23: 
03a180af1my7lmsnig01qhrirxa2fp7j052jw9kv5ff4i6ya7fh4
  
https://bordeaux.guix.gnu.org/nar/lzip/ajw8nnrnd6hr183skwqdgc8c7mazg97h-isl-0.23:
 1j24gc6ysa9d3z4hq6lsxvdik94ddb7nj93krv7cs5lmbmjwmqw7
/gnu/store/45b6181w68a3lprx9m6riwgyinw3y145-guix-1.3.0-25.c1719a0 contents 
differ:
  no local build for 
'/gnu/store/45b6181w68a3lprx9m6riwgyinw3y145-guix-1.3.0-25.c1719a0'
  
https://ci.guix.gnu.org/nar/lzip/45b6181w68a3lprx9m6riwgyinw3y145-guix-1.3.0-25.c1719a0:
 0p7lhfxcx7bfjfwlyrp6h5j9fcyzswyj2wkbnhcd3fgxm5swdi6c
  
https://bordeaux.guix.gnu.org/nar/lzip/45b6181w68a3lprx9m6riwgyinw3y145-guix-1.3.0-25.c1719a0:
 0yfpcsmvbnzw0vpjrjwwrjih4ss3yvk7cy4k6ibdpsn7dcx9kw2c
/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8 contents differ:
  no local build for '/gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8'
  
https://ci.guix.gnu.org/nar/lzip/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8: 
0vppx6fk1a7gvk9ccz9ma992w1h5bhfk535acddrnkhyrk92z5ln
  
https://bordeaux.guix.gnu.org/nar/lzip/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8:
 05w5i5zq1k1avqx2gqxnqynn5lmdizis9babk34dkmnazb3h77kb

47 store items were analyzed:
  - 42 (89.4%) were identical
  - 5 (10.6%) differed
  - 0 (0.0%) were inconclusive


I love that Guix really has batteries included when it comes to
reproducible builds verification! :)

At first, I thought I would have to build all this stuff locally, but
then I realized guix actually has two independent build farms, so guix
challenge can compare the results between them! For more data points,
one could build them all locally!


The fact that the guix and guile packages do not build reproducibly is a
little disappointing as they're both so central to guix itself; I
suspect parallelism triggers those reproducibility issues(from
experience with Debian), though that may just reveal other issue in
guile itself.


The linux-libre package *ought* to be reproducible; I hope it is
something easy to fix there...

$ guix challenge --diff=diffoscope linux-libre

/gnu/store/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5 contents differ:
  no local build for 
'/gnu/store/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5'
  
https://ci.guix.gnu.org/nar/zstd/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5:
 19rg55v51wliy9v30sm82f38rxm1lqjpfqs6r63ikb3vklnj0pnw
  
https://bordeaux.guix.gnu.org/nar/lzip/8gmqvwf0ccqfyimficcnhxvrykwx6y8g-linux-libre-5.17.5:
 14fax6g9sx7qj64z73hrh8ydlbv6kxzhd1hbyqz7v0ra51bprv1k
 ...
 0%  ETA:  4 days, 2:03:47

Ok... well, I guess I won't wait for the results...


A better "core" package set for GNU Guix could surely be created. I came
up with this list of packages by taking the essential, required and
build-essential package sets from Debian, tweaking the package names
appropriately, dropping debian-specific stuff, and adding guile and
guix to create "guix-base-set":

acl
attr
audit
bash
binutils
bzip2
coreutils
diffutils
e2fsprogs
elogind
findutils
gawk
gcc
glibc
gmp
grep
guile
guix
gzip
isl
keyutils
libcap
libcap-ng
libnsl
libselinux
libsigsegv
libtirpc
libxcrypt
linux-pam
linux-libre
mpfr
ncurses
openssl
patch
pcre
pcre2
perl
readline
rpcsvc-proto
sed
shadow
tar
tzdata
util-linux
xz
zlib
zstd


> I would also like to see if there is 

Re: faketime breaks quilt patched file times in Debian

2022-05-01 Thread Vagrant Cascadian
On 2022-05-01, Holger Levsen wrote:
> On Sat, Apr 30, 2022 at 03:53:13PM +0200, Roland Rosenfeld wrote:
>> [tl;dr faketime results on broken file timestamps for quilt patched
>> files on salsa]
>
> which is one of several reasons why (in 2014 or so) we choose not to use
> faketime to achieve reproducible-builds.
>  
>> Since some time I observe random broken reproducibilty in some of my
>> Debian packages (xfig, igerman98, libsnmp-session-perl) in the weekly
>> salsa CI runs.
>> 
>> Today I tracked this down to faketime being used in Salsa CI
>> reprotest in combination with quilt modifying the file.
>
> why is salsa CI using faketime in the first place?

reprotest uses faketime to implement time variations, and salsa-ci has
reprotest pipelines...

This reminds me to explore user namespaces for reprotest (on the hope
that maybe time namespaces would be possible too...).

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Status of Required/Essential/Build-Essential in Debian

2022-04-29 Thread Vagrant Cascadian
On 2022-04-28, Chris Lamb wrote:
>> Lately, I've been trying to get a handle on the status of the really
>> core packages in Debian, namely the essential, required and
>> build-essential package sets. The first two are present on nearly every
>> Debian system, and build-essential is the set of packages assumed to be
>> available whenever you build a package in Debian.
>
> Wow, this is an excellent summary of the status here β€” thanks! And
> indeed, an explicit & extra thank you for taking the time to write it
> up so it could be posted here β€” I made an analogous and scrappy list
> for myself in ~2018 to target these high-value packages, but I was
> now clearly in error in not taking the time to clean it up and share
> it... :(

The fact that the list has been for the most part shrinking is part of
what made me excited to take the time to write it up. :)


>> The more difficult issue with apt is caused by toolchain bugs in
>> doxygen:
>>
>> https://tests.reproducible-builds.org/debian/issues/nondeterminstic_todo_identifiers_in_documentation_generated_by_doxygen_issue.html
>>
>> https://tests.reproducible-builds.org/debian/issues/nondeterministic_ordering_in_documentation_generated_by_doxygen_issue.html
>
> For the sake of completeness (we talked about this issue elsewhere
> recently, Vagrant; this is for the benefit of this list/thread..),
> although there are only two notes linked immediately above, I suspect
> there are more than the two issues in doxygen, and the program could
> well warrant a sustained attack... especially as it would affect a
> *lot* of packages.

Yes, there are definitely more problems and would be lovely to fix them
all! For apt's specific case I *think* solving those two issues might
be sufficient for apt.


live well,
  vagrant


signature.asc
Description: PGP signature


Status of Required/Essential/Build-Essential in Debian

2022-04-27 Thread Vagrant Cascadian
Lately, I've been trying to get a handle on the status of the really
core packages in Debian, namely the essential, required and
build-essential package sets. The first two are present on nearly every
Debian system, and build-essential is the set of packages assumed to be
available whenever you build a package in Debian.

I will summarize below the outstanding issues for Debian with these
package sets.

I'd also be really curious to hear about the status of similar package
sets in other distros! I would also like to see if there is anything in
Debian or other distros that still needs to be pushed upstream, so we
can all benefit!


Essential:

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_essential.html

Almost done with essential, at 95% reproducible:

The only outlier is glibc, which currently doesn't build, but a version
that does build in debian experimental has a patch submitted specific to
Debian's packaging of glibc:

  different file permissions on ld.so.conf* and others
  https://bugs.debian.org/1010233


Required:

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_required.html

Also nearly there, at 88.9% reproducible (and one probably obsolete
package in the list, gcc-9):


apt has two remaining issues, one of which is trivial to fix:

 BuildId differences triggered by RPATH
 https://bugs.debian.org/1009796

The more difficult issue with apt is caused by toolchain bugs in
doxygen:

 
https://tests.reproducible-builds.org/debian/issues/nondeterminstic_todo_identifiers_in_documentation_generated_by_doxygen_issue.html
 
https://tests.reproducible-builds.org/debian/issues/nondeterministic_ordering_in_documentation_generated_by_doxygen_issue.html

There is a workaround patch for apt to disable building of documentation:

 support "nodoc" build profile
 https://bugs.debian.org/1009797


Build-Essential:

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_build-essential.html

Not bad at 87.1% reproducible.

linux has two issues, one unidentified issue relating to build paths,
and another documentation issue:

  
https://tests.reproducible-builds.org/debian/issues/randomness_in_documentation_generated_by_sphinx_issue.html


libzstd has one remaining issue, where it embeds build paths in assembly
objects:

  
https://tests.reproducible-builds.org/debian/issues/build_path_captured_in_assembly_objects_issue.html


gmp has one outstanding set of patches to fix build path issues:

  Embedded build paths in various files
  https://bugs.debian.org/1009931


binutils has several identified issues and probably some unidentified
issues:

  included log files introduce reproducibility issues (debian specific?)
  https://bugs.debian.org/950585
  
https://tests.reproducible-builds.org/debian/issues/unstable/test_suite_logs_issue.html

  source tarball embeds build user and group (debian specific)
  https://bugs.debian.org/1010238


krb5 has one really perplexing issue related to build paths triggering
seemingly unrelated changes in the documentation, possibly toolchain
related (sphinx? doxygen?):

  differing build paths trigger different documentation
  https://bugs.debian.org/1000837


gcc-12 (and probably other gcc variants) also embeds test suite logs
very similar to bintuils described above. Probably many other issues,
especially related to build-time profile-guided-optimization and... who
knows! GCC also takes so long to build, it can be difficult for our test
infrastructure to actually build and/or run diffoscope without timing
out...


openssl contains a few unidentified issues relating to build paths, some
test suite failures in our test infrastructure, and a couple known build
path related issues:

  
https://tests.reproducible-builds.org/debian/issues/build_path_captured_in_assembly_objects_issue.html

  Embeded compiler flags contain build paths
  https://bugs.debian.org/1009934


Build-Essential-Depends Bonus Round! (all the packages that
Build-Essential needs to build itself):

  
https://tests.reproducible-builds.org/debian/unstable/amd64/pkg_set_build-essential-depends.html

At 86.3% reproducible, it still doesn't look too bad, and there are a
lot of patches submitted and/or in progress. It is a much larger set of
packages, so I won't even try to summarize the status here.


S A few closing thoughts...

A fair number of these are build paths issues, which we do not test in
Debian testing (currently bookworm), only in debian unstable and
experimental. So the numbers in general look a better for
testing/bookworm. Other distros by-and-large do not test build paths
variations, and while I'd like to fix those issues, they're a little
lower-priority.

Two other remaining issues are toolchain issues for documentation using
sphinx and doxygen, and are the last blockers for fixing apt and linux
(as well as numerous other packages). This seems like a high priority to
fix!

I have been chewing on the ideas of how to resolve the embed

Re: Please review the draft for December's report

2022-01-04 Thread Vagrant Cascadian
On 2022-01-04, John Neffenger wrote:
> On 1/3/22 7:08 AM, Chris Lamb wrote:
>> Please review the draft for December's Reproducible Builds report:
>> 
>>https://reproducible-builds.org/reports/2021-12/?draft
>
> Would it be helpful to add a section about upstream changes regarding 
> reproducible builds made by the upstream projects themselves?

Addressing upstream project reproducibility issues is absolutely
welcome! :)

> The OpenJDK project has made good progress lately. All of my personal 
> Java projects are now reproducible when using the JDK 19 tools directly 
> in an early-access build. The last piece I needed was this pull request:
>
> 8276766: Enable jar and jmod to produce deterministic timestamped content
> https://github.com/openjdk/jdk/pull/6481
>
> This change has been integrated into JDK 19 (to be released in September 
> 2022), and a back-port of the commit has been requested for JDK 18 (to 
> be released on March 22, 2022).
>
> The full discussion of the change is found below in the CSR 
> (Compatibility and Specification Review):
>
> JDK-8277755: Enable jar and jmod to produce deterministic timestamped 
> content
> https://bugs.openjdk.java.net/browse/JDK-8277755

This is all very exciting news to me! Thanks for workign on it and
bringing it to our attention.


If you forsee being a regularly contributor, please sign up for an
account at salsa.debian.org and we can get you access to the repository.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: PackageRebuilder status

2021-09-26 Thread Vagrant Cascadian
On 2021-09-25, FrΓ©dΓ©ric Pierret wrote:
> I'm happy to announce that I've worked on results layout for
> PackageRebuilder and you can find results at:
> https://rebuild.notset.fr

Just in case it isn't obvious, this is a view of reproducibility of
actual packages shipped by Debian! In my opinion, this is really
exciting news!!!

Previously the tests of debian packages on tests.reproducible-builds.org
only tests the building of a package twice, which is great for finding
issues, but the results are not a meaningful comparison against packages
anyone actually uses.

I've seen numbers in the ranges of 60-90% reproducible, depending on the
package sets and the timing ("pending" tests for new uploads and such
tend to make the numbers fluctuate quite a bit, especially on the
smaller package sets) ... which, all things considered, is pretty good,
and fairly similar to what tests.reproducible-builds.org shows, maybe
even slightly better in some cases.


> I've cherry picked a lot of the codes from Archlinux rebuilderd
> website. You can find the source code at
> https://github.com/fepitre/package-rebuilder-website. Any contribution
> is welcomed to make it visually better. I'm not a front web developer
> at all and this is my first ReactJS app :).

Debian is catching up to Arch Linux again... :)


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Need help with getting a package to build reproducibly on arm*

2021-02-11 Thread Vagrant Cascadian
On 2021-01-08, Vagrant Cascadian wrote:
> On 2021-01-08, Vagrant Cascadian wrote:
>> On 2021-01-07, Vagrant Cascadian wrote:
>>> On 2021-01-07, Michael Biebl wrote:
>>>> Am 07.01.21 um 18:24 schrieb Michael Biebl:
>>>>> as can be seen at [1], systemd does not build reproducibly on armhf and
>>>>> arm64 (while there is no problem on amd64 and i386).
>>>>> 
>>>>> The problem is, I have no idea what the diffoscope diff [2] means and
>>>>> how I can make the package build reproducibly everywhere or how I can
>>>>> further investigate this.
>>>>> 
>>>>> Any help here is greatly appreciated as I think reproducible-builds are
>>>>> a great effort and I'd like to support that as much as I can.
>>>>> 
>>>>> Regards,
>>>>> Michael
>>>>> 
>>>>> 
>>>>> [1]
>>>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/systemd.html
>>>>> [2]
>>>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/systemd.html
>>>
>>> My best wild guesses would be parallelism, filesystem ordering or locale
>>> differences causing various sort ordering differences.
>>>
>>> I'm running a local build on arm64 with "reprotest --auto-build" to see
>>> if it can help give us any better leads, will see if that shows anything
>>> more useful... it could take some time on not particularly fast
>>> hardware.
>>
>> First attempt was reproducible for me (two normalized builds and one
>> varied build), though I couldn't vary the clock with reprotest
>> (libfaketime appears to trigger issues with building systemd)... or
>> fileordering, user, group or hostname due to some limitations my my
>> typical test environment. The command I ran was:
>>
>>   reprotest  --verbose --min-cpus=1 
>> --vary=-user_group,-domain_host,-fileordering,-time auto -- null
> ...
>
> But the second attempt for some reason did produce some interesting
> results... why it didn't happen the first time suggests it is not
> deterministic.
>
> β”‚ β”‚ β”‚ β”œβ”€β”€ ./usr/bin/bootctl
> ...
> β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ strings --all --bytes=8 {}
> β”‚ β”‚ β”‚ β”‚ β”‚ @@ -250,15 +250,15 @@
> β”‚ β”‚ β”‚ β”‚ β”‚  SystemdOptions
> β”‚ β”‚ β”‚ β”‚ β”‚  Failed to set SystemdOptions EFI variable: %m
> β”‚ β”‚ β”‚ β”‚ β”‚  supported
> β”‚ β”‚ β”‚ β”‚ β”‚  not supported
> β”‚ β”‚ β”‚ β”‚ β”‚  Failed to query reboot-to-firmware state: %m
> β”‚ β”‚ β”‚ β”‚ β”‚  Failed to parse argument: %s
> β”‚ β”‚ β”‚ β”‚ β”‚  Failed to set reboot-to-firmware option: %m
> β”‚ β”‚ β”‚ β”‚ β”‚ -/EFI/systemd/systemd-bootaa64.efi
> β”‚ β”‚ β”‚ β”‚ β”‚ +/EFI/systemd/systemd-bootarm.efi
> β”‚ β”‚ β”‚ β”‚ β”‚  Failed to access EFI variables. Is the "efivarfs" filesystem 
> mounted?
> β”‚ β”‚ β”‚ β”‚ β”‚  Failed to determine current boot order: %m
>
> This suggests to me that the running kernel is somehow used to determine
> the userspace architecture, effectively similar to:
>
>   
> https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_arch_issue.html
>
>
> The armhf builders on tests.reproducible-builds.org for Debian do not
> systematically test this. I'm not sure about the arm64 builders, but I
> think they run the same kernel, so it seems unlikely to be the only
> issue. Also surprised the i386 builder doesn't catch this. Then again,
> it only happened on my second try, which suggests this is
> non-deterministic in some way; maybe the slower armhf/aarch64
> architectures trigger it more often?
>
> I'll later post the results of the diffoscope output somewhere and give
> a closer look.

And those results I promised:

  https://people.debian.org/~vagrant/reproducible/systemd.20210108.dE8pOx/

Nothing terribly obvious to me, though as mentioned, the running kernel
may be a factor for the arm64 and armhf platforms.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Outreachy Summer 2021

2021-01-31 Thread Vagrant Cascadian
On 2021-01-21, Mattia Rizzolo wrote:
> We are pondering whether to do a round of Outreachy this year.
> Contrary to last years' we are going to throw the topic much earlier,
> and see if
> some good proposal for that round.
>
> Example for the past rounds would be:
>  * 
> https://wiki.debian.org/SummerOfCode2019/ApprovedProjects/ReproducibleBuilds
>  * 
> https://wiki.debian.org/Outreachy/Round15/Projects/ReproducibleBuildsOfDebian
>
> But really, we would love to see some more interesting ideas in there,
> including volunteers to take charge of such projects, if there are any.
>
> Examples for the kind of projects we are looking for include workflow
> changes, large refactoring work, new features of our tools, specific
> reproducibility fixes, etc. etc.  But crucially they should fit in that
> sweet spot of (a) requiring more time and energy than a weekend project,
> but (b) are also not too complicated that they would take forever and
> that (c) they are actually possible to 'complete' in a satisfactory
> manner.

I would be interested in fleshing out a proposal for an outreachy
internship to develop a tool to parse diffoscope output and make
best-effort suggestions about the possible types of discovered
reproducibility issues (e.g. build path, timestamp, kernel version,
username, hostname, etc.)  ... kind of like a retroactive autoclassify
or "reprotest --auto-build" when you don't actually have the binaries
but you have diffoscope output, like the current Debian tests on
tests.reproducible-builds.org.

Does that sound like an appropriately ambitious (but not too ambitious)
goal for Outreachy?


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Need help with getting a package to build reproducibly on arm*

2021-01-08 Thread Vagrant Cascadian
On 2021-01-08, Vagrant Cascadian wrote:
> On 2021-01-07, Vagrant Cascadian wrote:
>> On 2021-01-07, Michael Biebl wrote:
>>> Am 07.01.21 um 18:24 schrieb Michael Biebl:
>>>> as can be seen at [1], systemd does not build reproducibly on armhf and
>>>> arm64 (while there is no problem on amd64 and i386).
>>>> 
>>>> The problem is, I have no idea what the diffoscope diff [2] means and
>>>> how I can make the package build reproducibly everywhere or how I can
>>>> further investigate this.
>>>> 
>>>> Any help here is greatly appreciated as I think reproducible-builds are
>>>> a great effort and I'd like to support that as much as I can.
>>>> 
>>>> Regards,
>>>> Michael
>>>> 
>>>> 
>>>> [1]
>>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/systemd.html
>>>> [2]
>>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/systemd.html
>>
>> My best wild guesses would be parallelism, filesystem ordering or locale
>> differences causing various sort ordering differences.
>>
>> I'm running a local build on arm64 with "reprotest --auto-build" to see
>> if it can help give us any better leads, will see if that shows anything
>> more useful... it could take some time on not particularly fast
>> hardware.
>
> First attempt was reproducible for me (two normalized builds and one
> varied build), though I couldn't vary the clock with reprotest
> (libfaketime appears to trigger issues with building systemd)... or
> fileordering, user, group or hostname due to some limitations my my
> typical test environment. The command I ran was:
>
>   reprotest  --verbose --min-cpus=1 
> --vary=-user_group,-domain_host,-fileordering,-time auto -- null
...

But the second attempt for some reason did produce some interesting
results... why it didn't happen the first time suggests it is not
deterministic.

β”‚ β”‚ β”‚ β”œβ”€β”€ ./usr/bin/bootctl
...
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ strings --all --bytes=8 {}
β”‚ β”‚ β”‚ β”‚ β”‚ @@ -250,15 +250,15 @@
β”‚ β”‚ β”‚ β”‚ β”‚  SystemdOptions
β”‚ β”‚ β”‚ β”‚ β”‚  Failed to set SystemdOptions EFI variable: %m
β”‚ β”‚ β”‚ β”‚ β”‚  supported
β”‚ β”‚ β”‚ β”‚ β”‚  not supported
β”‚ β”‚ β”‚ β”‚ β”‚  Failed to query reboot-to-firmware state: %m
β”‚ β”‚ β”‚ β”‚ β”‚  Failed to parse argument: %s
β”‚ β”‚ β”‚ β”‚ β”‚  Failed to set reboot-to-firmware option: %m
β”‚ β”‚ β”‚ β”‚ β”‚ -/EFI/systemd/systemd-bootaa64.efi
β”‚ β”‚ β”‚ β”‚ β”‚ +/EFI/systemd/systemd-bootarm.efi
β”‚ β”‚ β”‚ β”‚ β”‚  Failed to access EFI variables. Is the "efivarfs" filesystem mounted?
β”‚ β”‚ β”‚ β”‚ β”‚  Failed to determine current boot order: %m

This suggests to me that the running kernel is somehow used to determine
the userspace architecture, effectively similar to:

  
https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_arch_issue.html


The armhf builders on tests.reproducible-builds.org for Debian do not
systematically test this. I'm not sure about the arm64 builders, but I
think they run the same kernel, so it seems unlikely to be the only
issue. Also surprised the i386 builder doesn't catch this. Then again,
it only happened on my second try, which suggests this is
non-deterministic in some way; maybe the slower armhf/aarch64
architectures trigger it more often?

I'll later post the results of the diffoscope output somewhere and give
a closer look.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Need help with getting a package to build reproducibly on arm*

2021-01-08 Thread Vagrant Cascadian
On 2021-01-07, Vagrant Cascadian wrote:
> On 2021-01-07, Michael Biebl wrote:
>> Am 07.01.21 um 18:24 schrieb Michael Biebl:
>>> as can be seen at [1], systemd does not build reproducibly on armhf and
>>> arm64 (while there is no problem on amd64 and i386).
>>> 
>>> The problem is, I have no idea what the diffoscope diff [2] means and
>>> how I can make the package build reproducibly everywhere or how I can
>>> further investigate this.
>>> 
>>> Any help here is greatly appreciated as I think reproducible-builds are
>>> a great effort and I'd like to support that as much as I can.
>>> 
>>> Regards,
>>> Michael
>>> 
>>> 
>>> [1]
>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/systemd.html
>>> [2]
>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/systemd.html
>
> My best wild guesses would be parallelism, filesystem ordering or locale
> differences causing various sort ordering differences.
>
> I'm running a local build on arm64 with "reprotest --auto-build" to see
> if it can help give us any better leads, will see if that shows anything
> more useful... it could take some time on not particularly fast
> hardware.

First attempt was reproducible for me (two normalized builds and one
varied build), though I couldn't vary the clock with reprotest
(libfaketime appears to trigger issues with building systemd)... or
fileordering, user, group or hostname due to some limitations my my
typical test environment. The command I ran was:

  reprotest  --verbose --min-cpus=1 
--vary=-user_group,-domain_host,-fileordering,-time auto -- null

So maybe one of those disabled variations, but all those are also varied
on all the platforms that tests.reproducible-builds.org tests for
Debian, so... hrm.


Another possibility is the locale used... reprotest picks more or less
at random, while:

  https://tests.reproducible-builds.org/debian/index_variations.html

  amd64: LANG="et_EE.UTF-8"
  i386: LANG="de_CH.UTF-8"
  arm64: LANG="nl_BE.UTF-8"
  armhf: LANG="it_CH.UTF-8"

Similar for LC_ALL and LANGUAGE.

But I somewhat doubt both nl_BE and it_CH would break in the same
way...


The other thing that's maybe a bit different is parallelism:

  XXX on amd64: 16 or 15
  XXX on i386: 10 or 9
  XXX on armhf: 5 or 3

But the difference between 3-5 cores and 9-10 or 15-16 doesn't seem very
likely to trigger issues either...


Was hoping reprotest would at least point us in a clearer direction for
what to test for ... but not today.

I'll chew on it a bit more and possibly try to stir up some more
possibilities.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Need help with getting a package to build reproducibly on arm*

2021-01-07 Thread Vagrant Cascadian
On 2021-01-07, Michael Biebl wrote:
> Am 07.01.21 um 18:24 schrieb Michael Biebl:
>> as can be seen at [1], systemd does not build reproducibly on armhf and
>> arm64 (while there is no problem on amd64 and i386).
>> 
>> The problem is, I have no idea what the diffoscope diff [2] means and
>> how I can make the package build reproducibly everywhere or how I can
>> further investigate this.
>> 
>> Any help here is greatly appreciated as I think reproducible-builds are
>> a great effort and I'd like to support that as much as I can.
>> 
>> Regards,
>> Michael
>> 
>> 
>> [1]
>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/systemd.html
>> [2]
>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/systemd.html

My best wild guesses would be parallelism, filesystem ordering or locale
differences causing various sort ordering differences.

I'm running a local build on arm64 with "reprotest --auto-build" to see
if it can help give us any better leads, will see if that shows anything
more useful... it could take some time on not particularly fast
hardware.


live well,
  vagrant


signature.asc
Description: PGP signature


Office Hours / Ask Me Anything 2021-01-07 18:00-20:00 UTC

2020-12-07 Thread Vagrant Cascadian
We will set aside some time to be available for asking questions about
anything related to Reproducible Builds.

This is an opportunity to ask introductory questions and is intended to
be welcoming to newcomers, though of course, any questions relating to
Reproducible Builds should be fair game!


We had fun at the first session:

  
http://meetbot.debian.net/reproducible-builds/2020/reproducible-builds.2020-11-30-17.19.log.html

So we are going to try this again!


Our next session is planned for January 7th, 18:00 UTC going until 20:00
UTC:

  https://time.is/compare/1800_07_Jan_2021_in_UTC


The location will be irc.oftc.net in the #reproducible-builds
channel. If you are new to IRC, there is a web interface available:

  https://webchat.oftc.net/?channels=reproducible-builds


Remember to wait for a few minutes after asking a question, to give
people a chance to respond. Once you have joined the channel, even if
there is a conversation already going, jump right in, no need to ask
permission to speak!


Please share this with anyone or with any networks where you think there
might be people interested in Reproducible Builds.


Thanks!


live well,
  vagrant


signature.asc
Description: PGP signature


"Office Hours / Ask Me Anything" 2020-11-30 17:00-20:00 UTC

2020-11-25 Thread Vagrant Cascadian
Hi!

We are experimenting with setting aside some time to be available for
asking questions about anything related to Reproducible Builds.

This is an opportunity to ask introductory questions and is intended to
be welcoming to newcomers, though of course, any questions relating to
Reproducible Builds should be fair game!

Our first session is planned for November 30th, 17:00 UTC going until
20:00 UTC:

  https://time.is/compare/1700_30_Nov_2020_in_UTC

The location will be irc.oftc.net in the #reproducible-builds
channel. If you are new to IRC, there is a web interface available:

  https://oftc.net/WebChat/

Please share this with anyone or with any networks where you think there
might be people interested in Reproducible Builds.


Thanks!


live well,
  vagrant


signature.asc
Description: PGP signature


poll for IRC breakout session: How to debug various distros

2020-11-02 Thread Vagrant Cascadian
At our last IRC meeting, it was decided to host an IRC session about
sharing our distro-specific Reproducible Builds debugging workflows (or
at least, that's what I *thought* we were doing), e.g.:

  https://github.com/bmwiedemann/reproducibleopensuse/blob/devel/howtodebug


We picked the date as 2020-11-16, but have not decided on a time... yet!
Poll is here:

  https://framadate.org/CFKEmKOb3avTbPFe

If you're interested in such a session, please add yourself to the poll!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Help with a local configuration

2020-10-19 Thread Vagrant Cascadian
On 2020-10-19, ElΓ­as Alejandro wrote:
> Dear all,
> I hope you are well. I have a newbie question about how to configure
> my local box to reproduce bug[1] and finally fix it. I was following
> [2] but I got a successful message without modifying the source .
> Maybe I need to add another configuration?
> ---
> dpkg-deb: building package 'gpick' in '../gpick_0.2.6~rc1-4_amd64.deb'.
> dpkg-deb: building package 'gpick-dbgsym' in
> '../gpick-dbgsym_0.2.6~rc1-4_amd64.deb'.
>  dpkg-genbuildinfo --build=binary
>  dpkg-genchanges --build=binary >../gpick_0.2.6~rc1-4_amd64.changes
> dpkg-genchanges: info: binary-only upload (no source code included)
>  dpkg-source --after-build .
> dpkg-buildpackage: info: binary-only upload (no source included)
> Reproducible, even when varying as much as reprotest knows how to! :)
> ===
> Reproduction successful
> ===
> No differences in ./*.deb
> 1ee58347431220a1481ac7823e2636c170937be867835040e3aec9bb5bf5b37d
> ./gpick-dbgsym_0.2.6~rc1-4_amd64.deb
> 327be523546805e107daa9bc56d55f488718e68890c756290f87c61aa9a8eade
> ./gpick_0.2.6~rc1-4_amd64.deb
> The last command took 1741.127 seconds.
> -
>
> If you need any more information please let me know.

It would be helpful to say what command you actually ran. :)


> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956591
> [2] https://wiki.debian.org/ReproducibleBuilds/Howto#Testing_procedure

I'll speculate that you were using (since that it is in the instructions
you were following):

  $ sudo reprotest --vary=-build_path,domain_host.use_sudo=1 --auto-build 
XXX.dsc -- schroot unstable-amd64-sbuild

Since the issue in bug #956591 is build_path related, you shouldn't
disable the build path variation (e.g. --vary=-build_path).

From the gpick source directory and with the build-depends installed, a
simple test (without chroots, and disabling some variations that require
more complicated setup):

  reprotest --auto-build --store-dir=$(mktemp -d) --min-cpus=10 
--vary=-user_group,-domain_host,-fileordering auto -- null

It told me that there were issues when varying the build_path,
consistent with the reported bug.


So a minimal command that should reproduce the issue:

  reprotest --vary=-all,+build_path auto -- null


If you look at:

  
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/gpick.html

You'll notice that it generally only has reproducibility issues in
"unstable". We don't vary the build path in the testing/bullseye or
stable/buster tests, so that makes sense.


Hope that helps get you to a useful place to debug the issue further!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: GNU Mes rebuild is definitely an application of DDC!

2020-10-12 Thread Vagrant Cascadian
On 2020-10-12, David A. Wheeler wrote:
> In the discussion today I was pointed to this awesome post about
> creating a reproducible bootstrap of the GNU Mes C compiler:
> https://reproducible-builds.org/news/2019/12/21/reproducible-bootstrap-of-mes-c-compiler/
>
> I was asked if this counted as an application of Diverse
> Double-Compiling (DDC). Unless I’m grossly misunderstanding something,
> that is *definitely* an application of DDC! Different compilers are
> being used with the same source code in a special way to verify that
> the results are bit-for-bit identical. That’s what DDC is all
> about.

Great!


> The compilers being used in the DDC process aren’t as diverse
> as one might like, so there are limits to the result (as discussed in
> section 6 of my dissertation).

Indeed.


> But that’s definitely the real deal. In fact, it shows how DDC &
> reproducible builds can work together to provide a very strong
> countermeasure against the trusting trust attack & other kinds of
> maliciously subverted executables.

OK!


> I wrote a summary explaining it here:
> https://dwheeler.com/trusting-trust/#real-world

That sums it up very nicely, thanks!


> If I missed anything, or if anything is wrong, let me know.

Some minor typos:

  s/GNU MeS/GNU Mes/
  s/big-for-bit/bit-for-bit/
  s/distributions GNU Guix, Nix and Debian)/distributions (GNU Guix, Nix and 
Debian)/


> But I think it’s worth noting that this really is an application of
> DDC to gain confidence in a reproducible bootstrap.

Excellent.

Thanks for following up and the good conversation on IRC.


Now we need to step up our compiler diversity and OS diversity for for
future tests!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Evaluation of bundling .buildinfo in .deb proposal

2020-08-31 Thread Vagrant Cascadian
On 2020-08-31, kpcyrd wrote:
> I'm a bit short on time, sorry in advance if the email is a little 
> short/blunt:
>
> - What was the original motivation of putting the size and checksum of the
>   package into the buildinfo file? We aren't tracking this info in Arch Linux
>   and it turned out we didn't need those fields to implement a rebuilder.
>   Please consider simply dropping those fields instead of trying to build a
>   tool to work around this.

It's an attestation of of "with source X and build environment Y I
produced artifacts Z".

So I guess you're saying if the .buildinfo is embedded in the binary
package, you can get the hashes of the artifacts from the binaries
themselves as, obviously, you already have the binaries.


>   In Arch Linux we consider the buildinfo file a build parameter to ensure the
>   build environment is always identical, but strictly speaking it's not a 
> build
>   output (even though it's generated during the package build, but you can
>   generate it without actually running the whole build). Having access to the 
> build
>   outputs is not necessary and out of scope of "recording the build
>   environment". In my opinion everything in the buildinfo file that goes 
> beyond
>   "a collection of parameters for the build" is feature-creep at the cost of
>   complexity.
>
>   This also solves the .changes problem (if I understood it correctly). The
>   buildinfo file is available very early (as long as you stop referencing 
> build
>   outputs) and you can simply include it when creating the deb in the first
>   place instead of manipulating it afterwards.

I'm definitely coming around to this idea.


> - The current debian reproducible builds effort is very focused on debian.org,
>   but virtually none of that can be downstreamed by debian derivates. Having
>   externally hosted buildinfo files is an effort that every downstream would
>   need to repeat and every rebuilder need to know about. All Arch Linux
>   downstreams I've checked ship buildinfo files, while zero debian downstreams
>   do. This is an advantage that's currently not mentioned yet.

Nice point!


> - The "having the buildinfo file in the binary package is wasteful" argument 
> is
>   a micro optimization that pushes a non-trivial amount of extra complexity on
>   the debian r-b developers. Considering that debian rebuilder tooling is 
> still
>   very sparse due to the lack of developer resources I'm not sure that's a
>   smart trade-off.

It is largely very compressable, though with source packages that
produce multiple binaries, it's a bit more resources; shipping in each
binary instead of once per source package.

If you then extract the .buildinfo when installing the package is
another resource question; Arch Linux does not, it might make it easier
to get that information out of the installed system than re-fetch
it. Might not be worth it, though.

That said, embedding the .buildinfo seems like a much more workable
approach to a problem we've had for so long...


> - I don't understand the concern about source-only uploads. The uploader can't
>   know the build environment that buildd is going to setup, therefore the
>   buildinfo file needs to be generated by buildd anyway.

The .buildinfo is also generated by the buildd. This is part of the
"multiple signers" property that .buildinfo files were attempting to
address. For the initial upload, you typically have one .buildinfo from
the uploader, and one from each of the architectures it was built
on. The buildd on the relevent architecture in this case becomes a
(somewhat sloppy) rebuilder.

If in fact the .buildinfo is embedded in the binary package, and the
.buildinfo itself is made effectively reproducible, it's been pointed
out to me on IRC that you could then just sign the binary packge itself.


Even if we do manage to start embedding .buildinfo information inside
the .deb files, we could still produce external .buildinfo files with
checksums just as is done now, though minor differences in build
environment will always result in differing builds then even if the
packages are otherwise reproducible...


> Sorry for being rather Arch centric in this email, but I think it's a good 
> idea
> to ensure you're familiar with how other distros solved the problem that
> debian is facing since a few years.

Thank you very much!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: setting -fdebug-prefix-map via envvar

2020-06-26 Thread Vagrant Cascadian
On 2020-06-26, hartmut wrote:
> a) The build process should be well documented and obviously. It means usage 
> of an environment
> variable inside the compiler, internal, it very bad for that. Because the 
> value of the envvar may be
> unknown after build. It is possible to use an environment variable as 
> argument : gcc ...
> $SPECIALOPTION but the expanded value should be documented. A good idea is 
> documentation of the
> effective command line, may be by an echo statement. 

Well, this requires either documenting the build path of the build
environment, or documenting the environment variable that was used to
sanitize the build path. Either way, the compiler is embedding
information about the build environment that is potentially
non-deterministic. Which is why we document the build environment.

Creating a build in the same build path in many cases requires root
privledges of some form; using an environment variable is *much* easier
to set without any special privledges.


> b) Absolute Paths on build: I have the experience that the absolute path is 
> stored in Objectfiles
> while compiling and influences the machinen code, unexpected and 
> unfortunatelly, but it was so on some
> compiler. My solution in this kind was: Re-Compiling under same conditions of 
> absolute paths, storing
> the "Sandbox" of sources in the same absolute path on any machine. A symbolic 
> link on Unix is
> helpfull. Because I had need do that on a Windows PC (XP), I have used the 
> subst command: "subst B: D:
> \my\special\path" and start the compilation from "B:\localmake". Hint: On 
> Windows the B-drive was a
> diskette in the old past, it is usual free to use.  

Sure, building in the same path is the easiest way to work around these
sorts of issues, if you can easily control the build path. Which is why
in the reproducible builds testing for Debian the "unstable" suite
varies the build path and the "testing" suite builds in the same build
path. This gives us a rough idea to compare how many packages are
affected by this issue.

Long-term, we should address the build-path issue, but in the short to
middle term, it seems to me there are other more important issues to
address, since it is relatively easy to work around...


live well,
  vagrant


signature.asc
Description: PGP signature


Re: setting -fdebug-prefix-map via envvar

2020-06-25 Thread Vagrant Cascadian
On 2020-06-26, Bill Allombert wrote:
> Is it possible to set -fdebug-prefix-map via an environment variable or
> a similar mechanism rather than through the command line ?
>
> The issue is that adding -fdebug-prefix-map=PREFIX to CFLAGS
> leads to PREFIX leaking in buildlogs and in generated Makefiles and 
> similar files that can end up inside packages, making previously
> reproducible packages unreproducible.

We had proposed BUILD_PATH_PREFIX_MAP to address exactly these kinds of
issues:

  https://reproducible-builds.org/specs/build-path-prefix-map/

Unfortunately it was not accepted into upstream GCC as they did not like
accepting input from an environment variable (yes, they let
SOURCE_DATE_EPOCH slip though!) ... not sure if many or any other
compilers have adopted it.

It would be good to revisit at some point if there is any way we could
conceivably get GCC developers on board with some variant of the
proposal. Maybe try getting into other toolchains first?


My best thought was --build-path-from-env=BUILD_PATH_PREFIX_MAP to at
least make it explicity as to which environment variable was affecting
behavior, without embedding the value of the environment variable in the
commandline... but I didn't get positive feedback on this approach back
when I presented at GNU Tools Cauldron last year and floated the idea.


live well,
  vagrant


signature.asc
Description: PGP signature


RE: [EXTERNAL] Re: Reproducible Builds Verification Format

2020-06-04 Thread Vagrant Cascadian
On 2020-05-15, Jason Zions via rb-general wrote:
> kpcyrd:
>> The argument was that a debian/arch rebuilder *always* needs to take
>> the buildinfo file as a rebuild input. That's the reason the buildinfo is
>> shipped inside the arch package, collecting detached buildinfo files is a
>> debian thing, but only the buildinfo file for the build that was actually
>> uploaded into the archive is useful for anything.

> This is one of the challenges we face today. The buildinfo file is
> required to rebuild a package. That's fine, most of the time.

I very much like nudging this conversation towards buildinfo files, and
you bring up a very interesting use-case...


> When an upstream team issues a patch (e.g. a fix for a security
> issue), I need to build the updated package immediately and get it
> into the hands of my users. It's often the case that, when I build
> that patched package, there's no buildinfo file yet because a build
> hasn't yet appeared in the Debian repo.

I do find it implausible to reproduce a .deb package without having
identical source, if for no other reason than the debian/changelog file
will contain differences...

So are you suggesting we do per-file comparisons of reproducibility
within the individual .deb files? Off the top of my head, it has some
downsides (more complicated comparison process) and some big upsides
(finer grained comparisons could have higher reproducibility hite rate
for the bits you actually care about). Though it would be a big shift
from the current direction.


> It might be 24-48 hours before that package appears. For security
> patches, that delay is troublesome.

Typically these days, at least the developer's .buildinfo file is
uploaded simultaneously with the source, and not long after for most
architectures, the .buildinfo produced by one of the "buildd" machines.

But, as you're probably aware, the official mirror archive doesn't
publish the .buildinfo files publicly (https://bugs.debian.org/763822),
so we only have services like buildinfos.debian.net and
buildinfo.debian.net (yes, two different sites!) which typically involve
some delays before the .buildinfo lands... it should be in the ballpark
of less than 6 hours, maybe 12 hours in the worst case, but not 24-48
hours.

I dream of someday moving debian towards doing source-only builds and
the .deb files only land in debian if multiple builds successfully
reproduce the build... but that would likely add even further delays to
the binary release!


> In Marrakesh, we talked about the distro maker just being one among
> multiple rebuilders; often the "first one in", but not required to be
> first in. It seems to me that we'd want each rebuilder to
>  - build without an input buildinfo for a given source if none is
>available from the clearinghouse
>  - record its output buildinfo and checksum information in the
>clearinghouse

I do very much like this; I prefer models where there is no "official"
.buildinfo files, just a bunch of .buildinfo files produced by various
parties which are essentially attestations of "with source X and
toolchain Y I produced artifacts Z" that can be analyzed for comparison
(ideally in an automated or semi-automated fashion), perhaps by a
"clearinghouse" service like you're suggesting?


I'm a little nervous at all the discussions about "trusted rebuilders"
with a binary reproducible/not reproducible result; it looses a lot of
potential information valuable for diagnosis. That said, if "trusted
rebuilders" means *something* actually gets implemented sooner since it
is simpler to implement, so be it!


> Most of the time, the Debian build process would be first-in; it would
> build without an input buildinfo and record the buildinfo and
> checksums in the clearinghouse. Rebuilders would then rebuild, using
> the recorded buildinfo, and record the checksum they got. Any
> differences would trigger email to all the builders.

> If rebuilders got out ahead of the Debian process, they would build
> (without buildinfo), then record their buildinfo and
> checksums. Multiple rebuilders in parallel might do this; as soon as
> the second rebuilder completes, conflicts would be detected and raised
> to human eyes for resolution.

Again, I'm curious if you mean building the same published .dsc (debian
source code) file as Debian builds, or some locally created variant of
the source?


> Marek:
>> I have built package , version , with source hash  and
>> got binary package(s)  with hash .
>> -- signed by (re)builder 
>> 
>> Other information, like what rebuilder needs to know, or what 
>> environment was used etc could be optional, or even totally separate.
>> And in fact, we do have a format for that extra info already: 
>> buildinfo file. And I think that should be kept separated.

> That's insufficient for the "rebuilders are out ahead of the distro
> maker" scenario I outlined above. Rebuilders who structure their
> rebuild environment to duplicate (as much as po

Re: Build reproducibility metrics

2020-06-04 Thread Vagrant Cascadian
On 2020-06-03, Christopher Baines wrote:
> Combining that with the substitute server operated by Tobias, which has
> a pretty awesome substitute availability of over 90% for recent
> revisions, not only is there data from 4 different substitute servers to
> use in the comparison, but the proportion of packages where there isn't
> sufficient data is pretty low, below 10%.
>
> I'm currently using the data.guix-patches.cbaines.net instance of the
> Guix Data Service, you can see the package substitute availability for
> the latest revision using this URL [1], and the package reproducibility
> at this URL [2].
>
> 1: 
> https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-substitute-availability
> 2: 
> https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-reproducibility
>
> Some caution is needed when interpreting this data. It's most probably
> less up to date than what you'd get through running the guix weather or
> guix challenge commands, as it takes the Guix Data Service time to query
> the data, that querying process isn't very reliable at the moment
> either. Additionally, the "matching" percentage could easily go down if
> that output is built with a different hash in the future.
>
> While the number itself maybe isn't the most useful thing, I like that
> clicking through to the "Not matching" outputs will show a list of
> outputs which didn't build reproducibly, which is something that could
> help identify reproducibility issues to investigate and fix.
>
> I think things are coming together on the substitute server side. The
> goal I have in mind for this is for users of Guix to be able to have
> greater trust in the substitutes they use, through trusting substitutes
> only if it's been built reproducibly on multiple substitute servers. It
> would be great to see work start soon on how guix as a client to
> substitute servers might be enhanced to check for reproducibility when
> fetching substitutes.

Really glad to see great progress on this!

I've CC'ed the reproducible builds list, as others might be interested
to see too! original post:

  https://lists.gnu.org/archive/html/guix-devel/2020-06/msg00034.html

live well,
  vagrant


signature.asc
Description: PGP signature


Re: Link to weekly news broken on rb debian page sidebar

2020-03-29 Thread Vagrant Cascadian
On 2020-03-29, Boyuan Yang wrote:
> I'm just writing to let you know that the link to weekly news at the
> sidebar of https://tests.reproducible-builds.org/debian/reproducible.html
> is now broken. It still uses the old alioth-related URL. Maybe it
> should be replaced by https://reproducible-builds.org/news/ ?

Thanks!

Pushed fix:

  
https://salsa.debian.org/qa/jenkins.debian.net/-/commit/cc3c9b0466477989a14c4dadbfaa09921b494877

Not sure how long it will take to propegate, but it'll get there
eventually.


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Research on Reproducible Builds

2020-03-04 Thread Vagrant Cascadian
On 2020-03-05, Chris Lamb wrote:
> David A. Wheeler wrote:
>> The paper's description of date handling sounds odd, where "dates" are 
>> really counts of system calls. If the "starting date" is arbitrary (like Jan 
>> 1, 1970) that would look odd. But if the "starting date" were forcibly set 
>> to a 
>> human-reasonable value (like the date-time of the last commit, or of the 
>> latest source 
>> file), then it might be easier to accept the results.
>
> This was curious to me too β€” to wit, the paper describes that the Debian
> Β«wheezyΒ» distribution was being built so it was interesting to me that
> the first timestamp in the debian/changelog was not chosen, Γ‘ la
> SOURCE_DATE_EPOCH.

If I recall correctly, wheezy was chosen precisely because it had *less*
reproducibility fixes ... so that they could more easily tell how much
dettrace solved...

live well,
  vagrant


  1   2   >