Re: [Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

2015-09-21 Thread Ximin Luo
On 20/09/15 20:43, Jérémy Bobbio wrote:
> Ximin Luo:
>> With our current .buildinfo setup, the above process is more
>> complicated, because we *only* store hashes of the binary build
>> environment.
> 
> [..]
> 
> The idea to put a hash of the binary package in the
> Build-Environment is a late addition to the original idea. 
> 

Sure, I realised after I posted that the binary hashes hadn't been implemented 
yet. That's a side issue though.

> In any cases, we currently don't have code to store any hash of the
> Build-Environment. If we wanted to store hashes of binary packages, then
> we would need to have them in /var/lib/dpkg/status and it's not done
> yet, even if Guillem said this would be a good thing to have.
> 

`apt-cache show [pkg]` will list hashes of binaries. Is there some reason we 
can't just do this?

>> Currently, to run a DDC test, we would have to read the buildinfo
>> file, find the hashes of the binary build-deps, lookup the source
>> packages that corresponds to these hashes, find a different binary
>> build-deps for these hashes, and run our DDC-checker. This takes many
>> round trips, and contacting external infrastructure that isn't
>> necessary.
> 
> You would not need to lookup the source packages using hashes. Using
> package and version gives you enough info to retrieve a specific source
> package from the archive.
> 
>> If .buildinfo files contained source hashes, the DDC-checker could
>> work more directly, without requiring a remote repository of source
>> hash <-> binary hash mappings.
> 
> I'm interested in `.buildinfo` in the context of the Debian project. The
> Debian archive is designed to be immutable. A specific version of a
> package will always correspond to the same source and binary files.
> So I don't see why one would do complex “source hash - binary hash
> mapping” when you can just rely on what is in the archive (and what has
> been archived by snapshot.debian.org).
> 

It's a good principle to design something to rely on the least amount of 
external infrastructure as possible. Just because we already depend on some 
infrastructure, doesn't mean we need to add more dependencies to it.

Suppose someone did a source-only mirror in the future, because binaries are 
too costly to store. Then, the .buildinfo files (with source hashes) can still 
be used against this mirror.

The "intuitive meaning" that we would like a .buildinfo file to have, is to 
describe immutably the input and the output. For testing and verification 
purposes, the input is the *source code* of the build-deps and of the target.

Getting reproducible builds to work is IMO fixing a massive bug that has 
existed for decades. Normally, when you run a fixed program against fixed 
input, what do you expect? Fixed output. Binary-hash-only .buildinfo files 
would only help to prove that this bug doesn't exist. *But that's not an 
incredible achievement.* Great, f(x) == g(y) when f == g and x == y, whoopee? 
We should aim higher, to be able to generate fixed-binary proofs for when only 
the source code (and not necessarily the binaries) matches.

> If by building thing that ought to match a specific package version you
> get different result, then you will have to investigate in any cases.
> 
> 
> Implementation-wise, getting the hash of the .dsc in the .buildinfo is
> going to be very tricky. dpkg does not know about what's available in
> the archive. It just knows about packages which are or were installed.
> 

`apt-cache showsrc [pkg]` has the right information in there, but it's a bit 
messy. I need to test this without a deb-src line, though.

X

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git



signature.asc
Description: OpenPGP digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

2015-09-21 Thread Jérémy Bobbio
Ximin Luo:
> > Implementation-wise, getting the hash of the .dsc in the .buildinfo is
> > going to be very tricky. dpkg does not know about what's available in
> > the archive. It just knows about packages which are or were installed.
> > 
> 
> `apt-cache showsrc [pkg]` has the right information in there, but it's a bit 
> messy. I need to test this without a deb-src line, though.

Building Debian packages doesn't involve APT in any ways. There is
currently no coupling in the direction dpkg → APT.

(That's why we need to get hash of the binary packages in
/var/lib/dpkg/status before they can be written in dpkg-genbuildinfo.)

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

2015-09-21 Thread Ximin Luo
On 20/09/15 19:22, Johannes Schauer wrote:
> Hi,
> 
> Quoting Ximin Luo (2015-09-20 18:49:16)
>> Currently, to run a DDC test, we would have to read the buildinfo file, find
>> the hashes of the binary build-deps, lookup the source packages that
>> corresponds to these hashes, find a different binary build-deps for these
>> hashes, and run our DDC-checker. This takes many round trips, and contacting
>> external infrastructure that isn't necessary.
>>
>> If .buildinfo files contained source hashes, the DDC-checker could work more
>> directly, without requiring a remote repository of source hash <-> binary
>> hash mappings.
> 
> which packages would benefit from this?
> 

Every package that is (or might be, in the future) a build-dep of another 
package would benefit, because it would make it easier to check (though this is 
being discussed in the other branch) that *source* build-deps result in a 
fixed-binary, regardless of how they are compiled (e.g. if they're compiled by 
something compromised).

gcc and clang are only examples for the DDC case, but the point generally 
applies to (a) { checking that binary0(source1)==binary1 } vs (b) { checking 
that source0(source1)==binary1 }. For DDC, we do (b) and select source0 = 
source1, but it's harder to select this if we only have information about (a).

X

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git



signature.asc
Description: OpenPGP digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

2015-09-20 Thread Jérémy Bobbio
Ximin Luo:
> With our current .buildinfo setup, the above process is more
> complicated, because we *only* store hashes of the binary build
> environment.

I'm sorry but this is not accurate regarding the current
specification [1]. It says:

Build-Environment

List of all packages forming the build environment, their
architecture if different from build architecture, and their
version.

The idea to put a hash of the binary package in the
Build-Environment is a late addition to the original idea. It came as a
way to make `srebuild` job easier: retrieving a specific binary package
with its hash is already part of snapshot.debian.org interface. It also
makes unecessary to find the relevant repository snapshot and the
related headaches with how to handle expired signatures.

In any cases, we currently don't have code to store any hash of the
Build-Environment. If we wanted to store hashes of binary packages, then
we would need to have them in /var/lib/dpkg/status and it's not done
yet, even if Guillem said this would be a good thing to have.

> Currently, to run a DDC test, we would have to read the buildinfo
> file, find the hashes of the binary build-deps, lookup the source
> packages that corresponds to these hashes, find a different binary
> build-deps for these hashes, and run our DDC-checker. This takes many
> round trips, and contacting external infrastructure that isn't
> necessary.

You would not need to lookup the source packages using hashes. Using
package and version gives you enough info to retrieve a specific source
package from the archive.

> If .buildinfo files contained source hashes, the DDC-checker could
> work more directly, without requiring a remote repository of source
> hash <-> binary hash mappings.

I'm interested in `.buildinfo` in the context of the Debian project. The
Debian archive is designed to be immutable. A specific version of a
package will always correspond to the same source and binary files.
So I don't see why one would do complex “source hash - binary hash
mapping” when you can just rely on what is in the archive (and what has
been archived by snapshot.debian.org).

If by building thing that ought to match a specific package version you
get different result, then you will have to investigate in any cases.


Implementation-wise, getting the hash of the .dsc in the .buildinfo is
going to be very tricky. dpkg does not know about what's available in
the archive. It just knows about packages which are or were installed.

 [1]: https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

[Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

2015-09-20 Thread Ximin Luo
Hi list,

BACKGROUND
==

One of the main points of reproducible builds is to enable DDC: 
http://www.dwheeler.com/trusting-trust/

To take an example, I can convince myself that my /bin/gcc5 corresponds exactly 
to the source code /src/gcc5, if I can:

1. assume that one of /bin/clang, /bin/gcc4.9 is not compromised

2. /bin/clang /src/gcc5 -o /bin/gcc5_b1; /bin/gcc4.9 /src/gcc5 -o /bin/gcc5_b2

3. /bin/gcc5_b1 /src/gcc5 > /bin/gcc5_b1a; /bin/gcc5_b2 /src/gcc5 -o 
/bin/gcc5_b2a

4. cmp /src/gcc5_b1a /src/gcc5_b2a

If this exits 0 and (1) was true (and gcc5 is non-buggy), then /bin/gcc5 
corresponds exactly to /src/gcc5. If this exits 1, then one of /bin/clang, 
/bin/gcc4.9 is not compromised.

More generally, if we assume that /bin/cc0 is good, then pick /bin/cc{1.n} ... 
and run the above for all $i, then the set of compilers that generated the same 
final output as cc0, is also good.

PROBLEM
===

With our current .buildinfo setup, the above process is more complicated, 
because we *only* store hashes of the binary build environment. This means that 
we can try to reproduce the build, but it makes it more awkward to run DDC, and 
communicates "the wrong thing".

The point of the .buildinfo file is to say "with these build-deps and this 
environment, you can build this source code to get this binary target". Of 
course if you build something with different tools, you expect to get a 
different result, and that is why we have these files. However, "these 
build-deps" from a human level refers to the source code, not the binary code. 
That is, if we replace our binary build-deps with something *compiled from the 
same source code*, they should behave identically, and we *should still be able 
to reproduce the same binary target hash*. This is a key principle of DDC.

Currently, to run a DDC test, we would have to read the buildinfo file, find 
the hashes of the binary build-deps, lookup the source packages that 
corresponds to these hashes, find a different binary build-deps for these 
hashes, and run our DDC-checker. This takes many round trips, and contacting 
external infrastructure that isn't necessary.

If .buildinfo files contained source hashes, the DDC-checker could work more 
directly, without requiring a remote repository of source hash <-> binary hash 
mappings. It could even build the build-deps itself, without worrying about the 
binary hashes of the results, perhaps on a different host architecture. 
Importantly, it also states the *intentions* of this file much better.

(Lunar tells me on IRC that this is less feasible, but let's discuss this 
further and see if we can come up with better solutions.)

X

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git



signature.asc
Description: OpenPGP digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

2015-09-20 Thread Johannes Schauer
Hi,

Quoting Ximin Luo (2015-09-20 18:49:16)
> Currently, to run a DDC test, we would have to read the buildinfo file, find
> the hashes of the binary build-deps, lookup the source packages that
> corresponds to these hashes, find a different binary build-deps for these
> hashes, and run our DDC-checker. This takes many round trips, and contacting
> external infrastructure that isn't necessary.
> 
> If .buildinfo files contained source hashes, the DDC-checker could work more
> directly, without requiring a remote repository of source hash <-> binary
> hash mappings.

which packages would benefit from this?

Clearly, a DDC check of C compilers like gcc and clang would benefit from this.

Is there any other language where the compiler is written in the same language
that it compiles and where there exist more than one compiler that has enough
features to compile it?

Otherwise, I'd say that your argument is quite weak because it only would make
checking of two packages in Debian easier (gcc and clang). And I think that
even this check would probably not need to be done than, lets say, once per
month as a jenkins job which can do the necessary mapping in a shell script.

Is there a stronger argument for storing source and binary hashes in the
buildinfo itself?

cheers, josch


signature.asc
Description: signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds