Re: [Reproducible-builds] Moving towards buildinfo on the archive network

2016-08-21 Thread Ximin Luo
Jonathan McDowell:
> On Sat, Aug 20, 2016 at 03:13:00PM +, Ximin Luo wrote:
>> I have trouble imagining what could make Buildinfo.tgz hard, but make
>> Buildinfo.xz easy - could you explain this in more detail, please?
> 
> Debian's archive information is largely stored within a database; things
> like the Packages and Contents files are generated each archive run from
> this database, rather than incrementally updating a file. It is easy to
> generate a Buildinfo.xz file from information contained within the
> database (I have some proof-of-concept code locally that does the
> beginnings of this), but generating a tar file like you are describing
> is either a case of storing each .buildinfo in the database and
> generating the tar each run, or adding and deleting files to an existing
> tarball. It seems overly intensive and doesn't really seem to scale.
> 
>> Regarding the OpenPGP signatures, they are vital - but I also see no
>> need to strip them in your model. From the point-of-view of the FTP
>> archive, there is no immediate need to read or understand the contents
>> of the buildinfo file. [*] It's just a dumb data blob, it shouldn't
>> matter to Debian whether it's clearsigned or not.
> 
> What I was trying to do with my proposal was turn it from being a dumb
> data blob which wasn't easily mapping to the Debian infrastructure, to
> something where almost all the information (everything except the actual
> signature from the original builder) could be provided alongside the
> binaries themselves, enabling people to have what they required to
> confirm they could reproduce the builds themselves. *I* think this is
> incredibly useful, even if it doesn't achieve everything possible with
> reproducible-builds, and I also think that it would provide a sound
> basis for another Debian service (perhaps under debian.net to start
> with) where multiple builders (starting with the original builder) would
> be able to upload their claims, based directly off the buildinfo
> information from the archive network. Yes, that's probably an extra
> step for the original builder, but it also (to me) seems to be more
> flexible and a stronger statement as multiple independent builders can
> all confirm things in a single place.
> 
> It sounds like this isn't compatible with where reproducible-builds is
> heading though, so apologies for the noise.
> 

I don't mean to suggest a database is not useful. I thought I was talking to
ftp-masters through you, so I wanted to be very clear about the security
properties we're aiming for, and get common understanding about that first.

But I'm not sure why you say it's incompatible - could you not also store the
detached signatures within the database, and generate the original file
(including signature) from this and the other information? The signatures are
much smaller than the rest of the file.

In fact, we do indeed have longer-term plans for Debian infrastructure to look
into this data and not turn it into a data blob - for example, buildds
themselves could try to reproduce a given buildinfo uploaded by a DD, and send
alerts about packages that can't be reproduced. (I hinted at this by the "more
advanced" behaviours I mentioned in my previous email.) But I wanted to start
off with a simple yet strongly-secure model first.

What I described is not supposed to contradict the ability for users to
"confirm they could reproduce the builds themselves". As I mentioned, a
majority use-case is to allow others to download "all the buildinfo files for a
given binary package", then they check this locally.

Perhaps the confusion is in the suggestion of a single Buildinfo.tgz. Let me
disclaim this for now - I wasn't present for the discussions around why all of
this information needs to be in one file, it actually does *not* make sense to
me. An obvious alternative is to cat all the buildinfo files for a given source
package, into one $source-$version.buildinfos.gz file and store this in pool/.
This would also make it easy to lookup buildinfo files for a given binary
later. Could someone tell me why this approach isn't suitable?

Now going back to "users confirming rebuilds":

The reason why I started off with this high-security dumb-data-blob approach is
to make the security arguments and reasoning very simple and obvious, so it's
harder to accidentally weaken or subvert it in the future. Debian isn't even
involved in the security logic - it's purely the end-user verifier program.

Another benefit of signatures, is that it gives you more information, in the
cases where you might not want to build it yourself (e.g. very large programs).
If you strip this information, then only Debian is "attesting" to a particular
hash (which it didn't even build). If you keep this information, then you can
aggregate multiple peoples' attempts to build a given binary.

Eventually we could have buildinfo-only uploads, just like we have binary-only
or source-only uploads. Then for important binaries 

Re: [Reproducible-builds] Moving towards buildinfo on the archive network

2016-08-21 Thread Ximin Luo
Jonathan McDowell:
> On Sat, Aug 20, 2016 at 03:13:00PM +, Ximin Luo wrote:
>> Note that the builder is a *distinct entity* from the distribution.
>> It's important to keep the *original* signature by B on C. It breaks
>> our security logic, to strip the signature and re-sign C using (e.g.)
>> the Debian archive release keys - because the entity in charge of this
>> release key is not the one that actually performed the build. Doing
>> this, would allow malicious builders to re-attribute their misdeeds to
>> look like it's the fault of Debian.
> 
> Debian already does this in the context of the fact that Package files
> etc are signed by the archive key. It's possible to go and grab the .dsc
> file to see who did the file build, but day-to-day no one is using these
> to verify the binaries they receive. I care more that Debian stands
> behind the packages I download than being able to verify individually
> who build each of the packages I'm running - there's no meaningful way I
> can attribute trust to *all* of the people who packaged something I have
> installed.
> 

You have this backwards.

"Being able to verify individually who build each of the packages I'm running"

is *exactly* what is required to *not* have to 

"attribute trust of *all* of the people who packaged something I have 
installed."

and that is one major (probably the main) goal of R-B.

Now that I point this out - do you agree, and does it change your mind on 
anything you previously said?

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Re: [Reproducible-builds] Moving towards buildinfo on the archive network

2016-08-21 Thread Jonathan McDowell
On Sat, Aug 20, 2016 at 03:13:00PM +, Ximin Luo wrote:
> Jonathan McDowell:
> > Having been impressed by the current status of reproducible builds
> > and the fact it looks like we're close to having the important
> > pieces in Debian proper, I have started to have a look at how I
> > could help out with this bug. I've done some poking around in the
> > dak code, and think I have a vague idea of how to achieve what I
> > think is wanted.
> > 
> > First, it is helpful to describe what I think is wanted. What I
> > think we need is the archive network to have, alongside the binary
> > packages it contains, details of exactly how to build those
> > binaries. This is, I believe, the information contained in the
> > .buildinfo files.
> > 
> In our newest discussions, this purpose is secondary. The primary
> purpose of buildinfo files is to record what *one particular builder
> actually did in order to produce some output*. Or, equivalently:
>
>   | A buildinfo file, abstractly, is a *claim* C by some builder entity B that
>   | "I executed process P with env/input I to produce output results R".
>
> This latter form is slightly easier to reason about, in terms of
> security properties. We securely bind the claim C (the contents of the
> buildinfo file) to the entity B using a cryptographic signature.

I think the problem here is it's not clear (on either side) who "we" or
"our" means. Different people want different things from reproducible
builds, or have different opinions about relative priorities.

As a *minimum* I think distributions should be providing the information
of how a particular binary was produced. I suppose what it sort of maps
to is "I executed process P with env/input I to produce output results
R" (though, of course, distros already provide R; that's the binaries
shipped). You've used all the letters I might want to refer to it by, so
let's call it Z.

The claim, C, is a signature over Z by B. It's useful extra information,
but it's not required for me to ensure that the source I have build the
binaries I have.

> Note that the builder is a *distinct entity* from the distribution.
> It's important to keep the *original* signature by B on C. It breaks
> our security logic, to strip the signature and re-sign C using (e.g.)
> the Debian archive release keys - because the entity in charge of this
> release key is not the one that actually performed the build. Doing
> this, would allow malicious builders to re-attribute their misdeeds to
> look like it's the fault of Debian.

Debian already does this in the context of the fact that Package files
etc are signed by the archive key. It's possible to go and grab the .dsc
file to see who did the file build, but day-to-day no one is using these
to verify the binaries they receive. I care more that Debian stands
behind the packages I download than being able to verify individually
who build each of the packages I'm running - there's no meaningful way I
can attribute trust to *all* of the people who packaged something I have
installed.

> Now back to the "secondary" purpose:
> 
> Using these information "B claims C", other reproduction programs
> (that we're also developing) can attempt to actually reproduce the
> binaries described. It would do this, by (1) reading the buildinfo
> file (2) recreating _some_ of the environment stored in C, and (3)
> executing the process, and see if it gives R.

You don't need the signature to validate the reproducibility.

> The "_some_" in clause (2) is currently up-for-debate, but the
> important thing is that this can be changed in the future *without
> affecting already-produced buildinfo files*. It may even well be the
> case that in the future we'd want to support different values for
> "_some_" for a given reproduction tool.
> 
> The main point is that, this is not a concern of the producer nor
> distributor of the buildinfo files. I.e.: you guys (the FTP team) only
> have to care about making these signed-claims available to be
> downloaded by users, and it is up to the users to run a tool that
> "interprets" these claims for purposes such as actually attempting
> reproduction of a binary.

To clarify: I am not a member of the FTP team and do not claim to
represent them. I am a DD who was present at the DebConf talk about
reproducible builds, was impressed by how far it's come, and asked how I
could help get what was missing and still required into Debian.

> In this way, we achieve full end-to-end security properties
> (verifiability of build) between the producers (builders) and
> consumers (users). Distributors only need to care about availiability,
> they take no part in the security (except for the case where they are
> also a builder, as noted already).

I think I take a less strict view on this, which may be where some of
the disconnect comes from. I care that Debian stands behind it's builds.
I'd like the builder claims to be available (and my original mail did
talk about the fact I didn't think I 

Re: [Reproducible-builds] Moving towards buildinfo on the archive network

2016-08-20 Thread Ximin Luo
Hey, Lunar has stopped doing reproducible builds as a regular thing, and I'm
taking over his previous responsibilities. I was also the main other person in
formulating the ideas behind the "next iteration" of buildinfo, that dkg
described in message #10 earlier in this thread, with Message-ID
<87vb8f58rg@alice.fifthhorseman.net>.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763822#10

Jonathan McDowell:
> Having been impressed by the current status of reproducible builds and
> the fact it looks like we're close to having the important pieces in
> Debian proper, I have started to have a look at how I could help out
> with this bug. I've done some poking around in the dak code, and think I
> have a vague idea of how to achieve what I think is wanted.
> 
> First, it is helpful to describe what I think is wanted. What I think we
> need is the archive network to have, alongside the binary packages it
> contains, details of exactly how to build those binaries. This is, I
> believe, the information contained in the .buildinfo files.
> 

In our newest discussions, this purpose is secondary. The primary purpose of
buildinfo files is to record what *one particular builder actually did in order
to produce some output*. Or, equivalently:

  | A buildinfo file, abstractly, is a *claim* C by some builder entity B that
  | "I executed process P with env/input I to produce output results R".

This latter form is slightly easier to reason about, in terms of security
properties. We securely bind the claim C (the contents of the buildinfo file)
to the entity B using a cryptographic signature.

Note that the builder is a *distinct entity* from the distribution. It's
important to keep the *original* signature by B on C. It breaks our security
logic, to strip the signature and re-sign C using (e.g.) the Debian archive
release keys - because the entity in charge of this release key is not the one
that actually performed the build. Doing this, would allow malicious builders
to re-attribute their misdeeds to look like it's the fault of Debian.

(Of course there is the special case where the builder *is* Debian, but even in
this case it's good practise to have separate keys for every buildd, plus a
separate release signing key. We can discuss these details separately though.)

Anyway, that's our "next iteration" definition of buildinfo files, along with a
simplified discussion of the rationale. I wrote down more elsewhere, but I'll
keep this short for now, to avoid overwhelming readers.

Now back to the "secondary" purpose:

Using these information "B claims C", other reproduction programs (that we're
also developing) can attempt to actually reproduce the binaries described. It
would do this, by (1) reading the buildinfo file (2) recreating _some_ of the
environment stored in C, and (3) executing the process, and see if it gives R.

The "_some_" in clause (2) is currently up-for-debate, but the important thing
is that this can be changed in the future *without affecting already-produced
buildinfo files*. It may even well be the case that in the future we'd want to
support different values for "_some_" for a given reproduction tool.

The main point is that, this is not a concern of the producer nor distributor
of the buildinfo files. I.e.: you guys (the FTP team) only have to care about
making these signed-claims available to be downloaded by users, and it is up to
the users to run a tool that "interprets" these claims for purposes such as
actually attempting reproduction of a binary.

In this way, we achieve full end-to-end security properties (verifiability of
build) between the producers (builders) and consumers (users). Distributors
only need to care about availiability, they take no part in the security
(except for the case where they are also a builder, as noted already).

> This bug has previously talked about a tarball of .buildinfo files,
> presented as Buildinfos.tgz alongside the Packages file. From looking at
> the current architecture of dak I do not believe that this is an easy
> option.
> 
> I propose instead a Buildinfo.xz (or gz or whatever) file, which is
> single text file with containing all of the buildinfo information that
> corresponds to the Packages list. What is lost by this approach are the
> OpenPGP signatures that .buildinfo files can have on them. I appreciate
> this is an important part of the reproducible builds aim, but I believe
> one of its strengths is the ability for multiple separate package builds
> to attest that they have used that buildinfo information to build the
> exact same set of binary artefacts. This is not something that easily
> scales on the archive network and I think it is better served by a
> separate service; it would be possible to take the package snippet from
> the buildinfo file and sign that alone, uploading the signature to the
> attestation service. For "normal" Debian operation the usual archive
> signatures would provide a basic level of attestation of chain 

Re: [Reproducible-builds] Moving towards buildinfo on the archive network

2016-08-03 Thread Johannes Schauer
Hi Jonathan,

Quoting Jonathan McDowell (2016-07-25 22:29:39)
> Having been impressed by the current status of reproducible builds and
> the fact it looks like we're close to having the important pieces in
> Debian proper, I have started to have a look at how I could help out
> with this bug. I've done some poking around in the dak code, and think I
> have a vague idea of how to achieve what I think is wanted.

Having tried hacking dak myself, I want to especially thank you for looking
into that!

> (Additionally it is not clear to me where the dpkg status for buildinfo
> creation is; I have heard that it's close to happening, but I can't find
> anything on recent list archives about it - pointers appreciated!)

You are probably aware of #138409?

It scrolled out of my IRC history already but I think guillem said in
#debian-dpkg that releasing a dpkg version with buildinfo support was blocked
by coordination with dak because he wants to make sure that dpkg support aligns
with what dak ends up supporting.

Thanks!

cheers, josch


signature.asc
Description: signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Moving towards buildinfo on the archive network

2016-08-02 Thread Vagrant Cascadian
On 2016-07-25, Jonathan McDowell wrote:
> I propose instead a Buildinfo.xz (or gz or whatever) file, which is
> single text file with containing all of the buildinfo information that
> corresponds to the Packages list. What is lost by this approach are the
> OpenPGP signatures that .buildinfo files can have on them. I appreciate
> this is an important part of the reproducible builds aim, but I believe
> one of its strengths is the ability for multiple separate package builds
> to attest that they have used that buildinfo information to build the
> exact same set of binary artefacts. This is not something that easily
> scales on the archive network and I think it is better served by a
> separate service; it would be possible to take the package snippet from
> the buildinfo file and sign that alone, uploading the signature to the
> attestation service. For "normal" Debian operation the usual archive
> signatures would provide a basic level of attestation of chain of build
> information.
>
> The rest of this mail continues on the above assumptions. If you do not
> agree with the above the below is probably null and void, so ignore it
> and instead educate me about what the requirements are and I'll try and
> adjust my ideas based on that.
>
> So. If a single Buildinfo.xz file is acceptable, with the attestation
> being elsewhere, I think this is doable without too much hackery in dak.
> There are some trade-offs to make though, and I need to check which are
> acceptable and which are viewed as too much.

I just wanted to give a huge thanks for taking a good look at this, even
if it isn't exactly what has been specced out by earlier
reproducible-builds discussions. Evaluating a somewhat different
approach, especially if it turns out to be more feasible (at least from
some angles), is really valuable in my eyes.

FWIW, I wasnt involved in the discussions spelling out what the
reproducible builds projects wanted in the archive, so I don't have much
concrete to say, but you've clearly given some serious thought and
effort to this, so I didn't want it to slip through the cracks!

I tried to read through some of the documentation I could find:

  https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification
  https://reproducible-builds.org/events/athens2015/debian-buildinfo-review/
  https://reproducible-builds.org/events/athens2015/buildinfo-content/

Having reviewed the above, there doesn't seem to be a huge conflict that
you haven't at least considered already.

Hopefully, someone with more history and context with the .buildinfo
file discussions can chime in soonish...


live well,
  vagrant


signature.asc
Description: PGP signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds