On Tue, 2015-06-23 at 09:31:05 +0200, Jérémy Bobbio wrote:
> Some people suggested that we should record a checksum of the `.deb`
> installed as a way to unambiguously referring to a specific package.
In principle the tuple pkgname-version-arch should be unique per
archive, otherwise bad-things-will-happen. Of course that does not
cover locally built packages and similar, or mixing different archives
with duplicated tuples, but then those are probably out-of-scope for
reproducible builds *in* Debian anyway, I guess.
> The main benefit that I can think of is that it would allow to directly
> retrieve the file from snapshot.debian.org based on the hash‗.
Personally I find the point that David mentioned to be a bit more
> But, as far as I know, this information is currently not recorded by
> dpkg and there is no way to know for sure which `.deb` has been used for
> a package currently installed. I have a couple of memories where this
> could have been useful outside of the aforementioned use case.
> From my limited knowledge of dpkg's internals, computing checksums
> and adding a new field to the status file doesn't seem hard to
The general idea seems worthwhile in principle. The devil is in the
details though, and with dpkg, the implementation is usually not the
hard part. :)
David also pointed some of the possible issues. Others that quickly
come to mind, would be:
* Checksum of what exactly? Although the seemingly obvious answer
might be “the entire .deb container”, depending on what one wants,
the interesting data might be different. For example, essential for
apt would appear to be control.tar and data.tar, and you might not
want to reinstall if some other member changes; when using signed
packages changes to the signatures might also be relevant. Other
.deb members might also be relevant in case another tool wants to
* Currently dpkg extracts the control.tar with dpkg-deb directly to
disk, and gets the data.tar contents piped from dpkg-deb, so it does
not get direct access to the whole file, which means the checksum
would need to be computed out-of-band, needing to process the .deb
one more time, which might be wasteful.
* A possibility could be to pre-compute the checksum on creation or
modification time, and store it in the debian-binary member for
example. The problem with that is that tools that modify .debs
might not genereate a checksum, or worse might not update it. And
this would also not benefit old binaries.
* Another possibility might be to make dpkg-deb compute the checksum
when parsing the .deb and output it on a supplied fd through a
* Even when dpkg was being used through dselect, where the checksums
from the archive were fresh and at reach from the available file,
dpkg has never propagated them to the status file. I guess mainly
because at the time of «dpkg -i», there was no guarantee that those
packages corresponded to the ones from the archive.
Reproducible-builds mailing list