Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?

2015-06-25 Thread Johannes Schauer
Hi,

Quoting Guillem Jover (2015-06-26 06:30:39)
> On Tue, 2015-06-23 at 09:31:05 +0200, Jérémy Bobbio wrote:
> > Some people suggested that we should record a checksum of the `.deb`
> > installed as a way to unambiguously referring to a specific package.
> 
> In principle the tuple pkgname-version-arch should be unique per
> archive, otherwise bad-things-will-happen. Of course that does not
> cover locally built packages and similar, or mixing different archives
> with duplicated tuples, but then those are probably out-of-scope for
> reproducible builds *in* Debian anyway, I guess.

I would like to second this.

During my work on real dependency solvers, we need an answer to the question
what makes a package unique and as Guillem already pointed out, a binary
package is unique if it has the same packagename-version-arch tuple.

In principal it would theoretically be possible to extend this definition by a
fourth tuple member being a checksum of some sorts but that would mean that
even more software like dpkg and apt would have to be adapted to follow this
new definition of unique-ness.

So instead of doing that I'd rather like if everybody building binary packages
that could potentially end up being mixed with Debian packages would realize
that *the name-ver-arch tuple they use for them must be unique*. If they don't
manage to do that, then somebody should make them aware of the problem that
packages are unique by the name-ver-arch tuple.

Since David pointed out that this is a real problem, I think this issue might
need more awareness.

In summary, yes this could be solved technically but I'd rather prefer a social
solution which spreads awareness about the unique-ness problem.

cheers, josch


signature.asc
Description: signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?

2015-06-25 Thread Guillem Jover
Hi!

On Tue, 2015-06-23 at 09:31:05 +0200, Jérémy Bobbio wrote:
> Some people suggested that we should record a checksum of the `.deb`
> installed as a way to unambiguously referring to a specific package.

In principle the tuple pkgname-version-arch should be unique per
archive, otherwise bad-things-will-happen. Of course that does not
cover locally built packages and similar, or mixing different archives
with duplicated tuples, but then those are probably out-of-scope for
reproducible builds *in* Debian anyway, I guess.

> The main benefit that I can think of is that it would allow to directly
> retrieve the file from snapshot.debian.org based on the hash‗[2].

Personally I find the point that David mentioned to be a bit more
interesting. :)

> But, as far as I know, this information is currently not recorded by
> dpkg and there is no way to know for sure which `.deb` has been used for
> a package currently installed. I have a couple of memories where this
> could have been useful outside of the aforementioned use case.
>·
> From my limited knowledge of dpkg's internals, computing checksums
> and adding a new field to the status file doesn't seem hard to
> implement.

The general idea seems worthwhile in principle. The devil is in the
details though, and with dpkg, the implementation is usually not the
hard part. :)

David also pointed some of the possible issues. Others that quickly
come to mind, would be:

 * Checksum of what exactly? Although the seemingly obvious answer
   might be “the entire .deb container”, depending on what one wants,
   the interesting data might be different. For example, essential for
   apt would appear to be control.tar and data.tar, and you might not
   want to reinstall if some other member changes; when using signed
   packages changes to the signatures might also be relevant. Other
   .deb members might also be relevant in case another tool wants to
   use them.
 * Currently dpkg extracts the control.tar with dpkg-deb directly to
   disk, and gets the data.tar contents piped from dpkg-deb, so it does
   not get direct access to the whole file, which means the checksum
   would need to be computed out-of-band, needing to process the .deb
   one more time, which might be wasteful.
 * A possibility could be to pre-compute the checksum on creation or
   modification time, and store it in the debian-binary member for
   example. The problem with that is that tools that modify .debs
   might not genereate a checksum, or worse might not update it. And
   this would also not benefit old binaries.
 * Another possibility might be to make dpkg-deb compute the checksum
   when parsing the .deb and output it on a supplied fd through a
   command-line option.
 * Even when dpkg was being used through dselect, where the checksums
   from the archive were fresh and at reach from the available file,
   dpkg has never propagated them to the status file. I guess mainly
   because at the time of «dpkg -i», there was no guarantee that those
   packages corresponded to the ones from the archive.
 …

Thanks,
Guillem

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?

2015-06-24 Thread David Kalnischkies
On Tue, Jun 23, 2015 at 09:31:05AM +0200, Jérémy Bobbio wrote:
> But, as far as I know, this information is currently not recorded by
> dpkg and there is no way to know for sure which `.deb` has been used for
> a package currently installed. I have a couple of memories where this
> could have been useful outside of the aforementioned use case.

Not exactly a different use case, but higher level package managers have
pretty much the same problem while figuring out if the installed v1 is
the version v1 available from archive A, archive B or an entirely
different one you may or may not want to upgrade away from.

You can either ignore this problem and declare v1 a unique identifier or
run crazy like apt does and guess based on fields like the dependencies
if this is the same version or not, with all the subtil bugs arising
from either…


The big problem I see is which hash to store. dpkg itself currently
supports only MD5, adding more means finding a free implementation of
them to embed or add a new (then pseudo-essential) (pre-)dependency.
APT has so far opted for public-domain implementations, but you get
complains about it being slower than openssl and alike, which in
exchange is a can of (license-)worms apt didn't want to open so far.
If dpkg is willing to do that on the other hand…

Otherwise sticking with MD5 means that we formally require MD5 to be
available in repositories (we don't currently, but I guess very few
actually go to the trouble of not having it, so that is more
a theoretical concern) and that it is harder to get a deb file securely
as you can treat the MD5 just as an ID; you need to establish
authenticity some other way (= lookup more secure hashes in Packages
file, which you have to validate itself and so on, which is hard™ and
rules out simply using snapshots.d.o compared to e.g. SHA256).


A way to sidestep these problems would be to allow package managers to
ask dpkg to store arbitrary additional fields. This would basically
promote the (multi-release long) transition period you would have anyway
as dpkg can't retroactively calculate the hashes for already installed
packages to an eternal "chaos" of never being quiet sure if the fields
are available…


Best regards

David Kalnischkies


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?

2015-06-23 Thread Daniel Kahn Gillmor
On Tue 2015-06-23 03:31:05 -0400, Jérémy Bobbio wrote:

> Some people suggested that we should record a checksum of the `.deb`
> installed as a way to unambiguously referring to a specific package.
> The main benefit that I can think of is that it would allow to directly
> retrieve the file from snapshot.debian.org based on the hash [2].

I like the idea of storing a cryptographically-strong digest of each
installed package.  I'm no expert on package management, but dpkg does
sound to me like the right place to keep this record, for whatever
that's worth.

>  [2]: https://anonscm.debian.org/cgit/mirror/snapshot.debian.org.git/plain/API
>   URL: /file/

This API is a little weird in that it doesn't specify the hash
algorithm.  Their examples are all 160-bits, hex-encoded, which makes me
suspect that they're using SHA1.  While SHA1 isn't completely
practically broken yet, it's probably not a good idea to rely on it in
situations like this that depend on the digest mechanism's
collision-resistance over binary objects.  We haven't seen a forced
SHA-1 collision in published research yet, but it's just a matter of
time (and we don't know what the SHA-1 collision attacks look like in
private research).

A stronger digest from the SHA2 family (SHA-256 or SHA-512) would be
preferable if we're hardcoding the choice of digest in this first
implementation.  allowing for algorithm agility (hash selection at
runtime) is another option, but it seems like extra engineering work.

--dkg

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

[Reproducible-builds] Storing .deb checksums in ADMINDIR/status?

2015-06-23 Thread Jérémy Bobbio
Hi!

While thinking one more time about the current specification for
`.buildinfo` files [1], I remembered one unresolved question.

The `Build-Environment` field currently has the same syntax as
`Built-Using`: a list of packages and their exact version. This works
fine but might not be optimal.

Some people suggested that we should record a checksum of the `.deb`
installed as a way to unambiguously referring to a specific package.
The main benefit that I can think of is that it would allow to directly
retrieve the file from snapshot.debian.org based on the hash [2].

But, as far as I know, this information is currently not recorded by
dpkg and there is no way to know for sure which `.deb` has been used for
a package currently installed. I have a couple of memories where this
could have been useful outside of the aforementioned use case.

From my limited knowledge of dpkg's internals, computing checksums
and adding a new field to the status file doesn't seem hard to
implement.

What do you think? Would it such feature be a good addition to dpkg?
I'm willing to spend time writing a patch.

 [1]: https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification
 [2]: https://anonscm.debian.org/cgit/mirror/snapshot.debian.org.git/plain/API
  URL: /file/

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds