Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?
Hi, Quoting Guillem Jover (2015-06-26 06:30:39) > On Tue, 2015-06-23 at 09:31:05 +0200, Jérémy Bobbio wrote: > > Some people suggested that we should record a checksum of the `.deb` > > installed as a way to unambiguously referring to a specific package. > > In principle the tuple pkgname-version-arch should be unique per > archive, otherwise bad-things-will-happen. Of course that does not > cover locally built packages and similar, or mixing different archives > with duplicated tuples, but then those are probably out-of-scope for > reproducible builds *in* Debian anyway, I guess. I would like to second this. During my work on real dependency solvers, we need an answer to the question what makes a package unique and as Guillem already pointed out, a binary package is unique if it has the same packagename-version-arch tuple. In principal it would theoretically be possible to extend this definition by a fourth tuple member being a checksum of some sorts but that would mean that even more software like dpkg and apt would have to be adapted to follow this new definition of unique-ness. So instead of doing that I'd rather like if everybody building binary packages that could potentially end up being mixed with Debian packages would realize that *the name-ver-arch tuple they use for them must be unique*. If they don't manage to do that, then somebody should make them aware of the problem that packages are unique by the name-ver-arch tuple. Since David pointed out that this is a real problem, I think this issue might need more awareness. In summary, yes this could be solved technically but I'd rather prefer a social solution which spreads awareness about the unique-ness problem. cheers, josch signature.asc Description: signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?
Hi! On Tue, 2015-06-23 at 09:31:05 +0200, Jérémy Bobbio wrote: > Some people suggested that we should record a checksum of the `.deb` > installed as a way to unambiguously referring to a specific package. In principle the tuple pkgname-version-arch should be unique per archive, otherwise bad-things-will-happen. Of course that does not cover locally built packages and similar, or mixing different archives with duplicated tuples, but then those are probably out-of-scope for reproducible builds *in* Debian anyway, I guess. > The main benefit that I can think of is that it would allow to directly > retrieve the file from snapshot.debian.org based on the hash‗[2]. Personally I find the point that David mentioned to be a bit more interesting. :) > But, as far as I know, this information is currently not recorded by > dpkg and there is no way to know for sure which `.deb` has been used for > a package currently installed. I have a couple of memories where this > could have been useful outside of the aforementioned use case. >· > From my limited knowledge of dpkg's internals, computing checksums > and adding a new field to the status file doesn't seem hard to > implement. The general idea seems worthwhile in principle. The devil is in the details though, and with dpkg, the implementation is usually not the hard part. :) David also pointed some of the possible issues. Others that quickly come to mind, would be: * Checksum of what exactly? Although the seemingly obvious answer might be “the entire .deb container”, depending on what one wants, the interesting data might be different. For example, essential for apt would appear to be control.tar and data.tar, and you might not want to reinstall if some other member changes; when using signed packages changes to the signatures might also be relevant. Other .deb members might also be relevant in case another tool wants to use them. * Currently dpkg extracts the control.tar with dpkg-deb directly to disk, and gets the data.tar contents piped from dpkg-deb, so it does not get direct access to the whole file, which means the checksum would need to be computed out-of-band, needing to process the .deb one more time, which might be wasteful. * A possibility could be to pre-compute the checksum on creation or modification time, and store it in the debian-binary member for example. The problem with that is that tools that modify .debs might not genereate a checksum, or worse might not update it. And this would also not benefit old binaries. * Another possibility might be to make dpkg-deb compute the checksum when parsing the .deb and output it on a supplied fd through a command-line option. * Even when dpkg was being used through dselect, where the checksums from the archive were fresh and at reach from the available file, dpkg has never propagated them to the status file. I guess mainly because at the time of «dpkg -i», there was no guarantee that those packages corresponded to the ones from the archive. … Thanks, Guillem ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?
On Tue, Jun 23, 2015 at 09:31:05AM +0200, Jérémy Bobbio wrote: > But, as far as I know, this information is currently not recorded by > dpkg and there is no way to know for sure which `.deb` has been used for > a package currently installed. I have a couple of memories where this > could have been useful outside of the aforementioned use case. Not exactly a different use case, but higher level package managers have pretty much the same problem while figuring out if the installed v1 is the version v1 available from archive A, archive B or an entirely different one you may or may not want to upgrade away from. You can either ignore this problem and declare v1 a unique identifier or run crazy like apt does and guess based on fields like the dependencies if this is the same version or not, with all the subtil bugs arising from either… The big problem I see is which hash to store. dpkg itself currently supports only MD5, adding more means finding a free implementation of them to embed or add a new (then pseudo-essential) (pre-)dependency. APT has so far opted for public-domain implementations, but you get complains about it being slower than openssl and alike, which in exchange is a can of (license-)worms apt didn't want to open so far. If dpkg is willing to do that on the other hand… Otherwise sticking with MD5 means that we formally require MD5 to be available in repositories (we don't currently, but I guess very few actually go to the trouble of not having it, so that is more a theoretical concern) and that it is harder to get a deb file securely as you can treat the MD5 just as an ID; you need to establish authenticity some other way (= lookup more secure hashes in Packages file, which you have to validate itself and so on, which is hard™ and rules out simply using snapshots.d.o compared to e.g. SHA256). A way to sidestep these problems would be to allow package managers to ask dpkg to store arbitrary additional fields. This would basically promote the (multi-release long) transition period you would have anyway as dpkg can't retroactively calculate the hashes for already installed packages to an eternal "chaos" of never being quiet sure if the fields are available… Best regards David Kalnischkies signature.asc Description: Digital signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Storing .deb checksums in ADMINDIR/status?
On Tue 2015-06-23 03:31:05 -0400, Jérémy Bobbio wrote: > Some people suggested that we should record a checksum of the `.deb` > installed as a way to unambiguously referring to a specific package. > The main benefit that I can think of is that it would allow to directly > retrieve the file from snapshot.debian.org based on the hash [2]. I like the idea of storing a cryptographically-strong digest of each installed package. I'm no expert on package management, but dpkg does sound to me like the right place to keep this record, for whatever that's worth. > [2]: https://anonscm.debian.org/cgit/mirror/snapshot.debian.org.git/plain/API > URL: /file/ This API is a little weird in that it doesn't specify the hash algorithm. Their examples are all 160-bits, hex-encoded, which makes me suspect that they're using SHA1. While SHA1 isn't completely practically broken yet, it's probably not a good idea to rely on it in situations like this that depend on the digest mechanism's collision-resistance over binary objects. We haven't seen a forced SHA-1 collision in published research yet, but it's just a matter of time (and we don't know what the SHA-1 collision attacks look like in private research). A stronger digest from the SHA2 family (SHA-256 or SHA-512) would be preferable if we're hardcoding the choice of digest in this first implementation. allowing for algorithm agility (hash selection at runtime) is another option, but it seems like extra engineering work. --dkg ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
[Reproducible-builds] Storing .deb checksums in ADMINDIR/status?
Hi! While thinking one more time about the current specification for `.buildinfo` files [1], I remembered one unresolved question. The `Build-Environment` field currently has the same syntax as `Built-Using`: a list of packages and their exact version. This works fine but might not be optimal. Some people suggested that we should record a checksum of the `.deb` installed as a way to unambiguously referring to a specific package. The main benefit that I can think of is that it would allow to directly retrieve the file from snapshot.debian.org based on the hash [2]. But, as far as I know, this information is currently not recorded by dpkg and there is no way to know for sure which `.deb` has been used for a package currently installed. I have a couple of memories where this could have been useful outside of the aforementioned use case. From my limited knowledge of dpkg's internals, computing checksums and adding a new field to the status file doesn't seem hard to implement. What do you think? Would it such feature be a good addition to dpkg? I'm willing to spend time writing a patch. [1]: https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification [2]: https://anonscm.debian.org/cgit/mirror/snapshot.debian.org.git/plain/API URL: /file/ -- Lunar.''`. lu...@debian.org: :Ⓐ : # apt-get install anarchism `. `'` `- signature.asc Description: Digital signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds