Some of these dreams and the outlines of these concepts have been around quite a bit longer than this year, even.  I think some differential diagnosis about what makes this draft different, and why it makes the choices it does, would be useful.

Some things I'd like to see identified and explicitly discussed more frequently in this concept space:

- What's the "primary key"?  In other words, how can I meaningfully expect to identify this one attestation record, or this one build instruction document?

- What are the "secondary keys" I could plausibly expect to select on if I have a zillion of these, and want to find those that should or should not align in results?

- What parts of this info do we expect to be useful, and why?  (What user story caused a certain piece of info to seem relevant and actionable enough to include?)

- What things we *could* imagine someone proposing putting in this info which we might reject because we don't believe it would be useful, and why?

The motivations of "a generic way to compare results" are good.  But good intentions can only carry us so far.  These four things are some of the first considerations I have when looking at a format proposal.  Without some thought about the "keys", I don't know how it will deliver on "comparability" at scale.  Without some meta-documentation of not just the data that goes _in_, but also the kind of data that _doesn't_, I worry that the spec will become a kitchen sink, sopping up more data with time regardless of its relevance, and correspondingly becoming less and less useful over time.

I don't know if these are the only four questions to ask, nor will I claim they are perfect, but they're some of the first things that come to my mind as heuristics, and I share them in the hope that they can be a useful whetstone for someone else's thoughts.



As an incidental aside, I think what's currently listed in that github link as "origin_uri" may be mistaken in its conception of "URI".  The examples are such things as "http://ftp.us.debian.org/"; and "https://download.docker.com/";, and I'm sure these are _locations_, not _identifiers_ -- URLs, not URIs.

And I would question (begging forgiveness from anyone who knows my refrain already) if "locations" as any sort of primary key are a sturdy idea to try to build upon.  They're terribly centralized. And provide very little insurance against mutability events which can make all other documents that refer to them become instantly useless.  Content-addressing may have some potential to address this, git (at least in concept) has shown us the way...



Cheers to all hopeful rebuilders :)


On 5/12/20 11:00 PM, Paul Spooren wrote:
Hi all,

at the RB Summit 2019 in Marrakesh there were some intense discussions about
*rebuilders* and a *verification format*. While first discussed only with
participants of the summit, it should now be shared with a broader audience!

A quck introduction to the topic of *rebuilders*: Open source projects usually
offer compiled packages, which is great in case I don't want to compile every
installed application. However it raises the questions if distributed packages
are what they claim. This is where *reproducible builds* and *rebuilders* join
the stage. The *rebuilders* try to recreate offered binaries following the
upstream build process as close as necessary.

To make the results accessible, store-able and create tools around them, they
should all follow the same schema, hello *reproducible builds verification
format* (rbvf). The format tries to be as generic as possible to cover all open
source projects offering precompiled source code. It stores the rebuilder
results of what is reproducible and what not.

Rebuilders should publish those files publicly and sign them. Tools then collect
those files and process them for users and developers.

Ideally multiple institutions spin up their own rebuilders so users can trust
those rbuilders and only install packages verified by them.

The format is just a draft, please join in and share you thoughts. I'm happy to
extend, explain and discuss all the details. Please find it here[0].

As a proof of concept, there is already a *collector* which compares upstream
provided packages of Archlinux and OpenWrt with the results of rebuilders.
Please see the frontend here[1].

If you already perform any rebuilds of your project, please contacy me on how to
integrate the results in the collector!

Best,
Paul


[0]: https://github.com/aparcar/reproducible-builds-verification-format
[1]: https://rebuild.aparcar.org/


Reply via email to