Bug#802241: please store the hash of the installed .deb and allow to query it

2018-04-24 Thread Drew Parsons
I got caught out by this apt bug when handling sasview. I had installed
a local build of sasview with my patch for 4.2.0~git20180309-3. Stuart
Prescott added another patch and uploaded.

I assumed apt had picked up on the update and installed the final,
official version.  But it hadn't, even though the contents and
changelog had changed.  I was left wondering why the "new" version was
misbehaving (still showing the bug that Stuart had patched).

Drew



Bug#802241: please store the hash of the installed .deb and allow to query it

2017-08-26 Thread Holger Levsen
hi,

while I very much like this idea, please don't store md5sums, but rather
sha256sums.

Thank you!


-- 
cheers,
Holger (wondering why I seem to have to write this in 2017)



signature.asc
Description: Digital signature


Bug#802241: please store the hash of the installed .deb and allow to query it

2017-08-26 Thread Julian Andres Klode
On Sat, Aug 26, 2017 at 03:16:52PM +0200, Mattia Rizzolo wrote:
> On Sat, Aug 26, 2017 at 02:14:16PM +0200, Julian Andres Klode wrote:
> > I also want this for delta debs, to identify local rebuilds being
> > installed, and prevent delta installation failure in such cases.
> 
> yay another user!
> 
> > > To me it seems that:
> > > * we are mostly interested in the hash of the whole container: all the
> > >   use cases highlighted above would require this;
> > > * If ↑ then the hash can't be pre-computed and stored inside the
> > >   container.
> > 
> > Practically speaking, for your use case you only need a hash of the
> > file tree. My proposal for a package id is to use the md5sum of
> > DEBIAN/md5sums. This can be stored in DEBIAN/control in an id
> > field and generated at build time. 
> 
> That's not true, as we need the hash also (for example) of all the
> maintainer scripts which are not in data.tar (I assume that's what you
> meant by "file tree").  Also, we have seen packages where the only
> difference is the order of entry in the md5sums file, therefore making
> the build not reproducible by our (higher than policy) standards.
> We really need the whole container.

I don't see why you need maintainer scripts. When you are building
a package what you care about is the state you are building in. And
the maintainer scripts are irrelevant in that state. But well, you could
hash them too.

> 
> > We can also use cat DEBIAN/md5sums DEBIAN/control | md5sum (without an
> > Id field in control) as the ID, and then append that to control. This
> > means that dependency relations and stuff is included as well. That's
> > useful for the solver use case; but it's not really relevant for
> > the reproducible build use case - dependencies on the installed
> > system, description, etc should not matter for you.
> 
> Well, DEBIAN/control contains the dependencies generated during the
> package build, and we do are interested in them as well…
> In short: we do care about both data.tar and control.tar.  After all, we
> do compare the hashes of the final .deb container.

I fail to see why you'd be interested in dependencies. Your stated use
case is "allowing to recreate the same build-environment of a past build
we would need to know which packages where installed.". To recreate a 
build environment you do not have to care about dependencies, nor do
you have to care about maintainer scripts (as soon as you involve
upgrades, the result might be different anyway, if you always bootstrap
the exact tree, it might be useful). Because the matter of fact is that
neither maintainer scripts nor dependencies affect how a package is
built (once build-depends are installed).

There's also the question of conffiles. They might affect the environment,
but: The actually installed ones might be different from the ones in the
package (which essentially is the same upgrade problem you have for
maintainer scripts); hence it's not really useful to include them in
the hash (but it does not hurt either).

> As I saw it when I originally thought of the problem the only sane
> solution to this for me would be to have dpkg compute the hash of the
> .deb before unpacking it, and store in it's $admindir/status file, but
> that makes the installation process very CPU-intensive, to the point
> that very probably it's too much to be bareable in many systems.

If you really want everything, the wise choice is to just hash the
entire tree dpkg-deb is packaging up and then add that to the ID
field. You can easily reconstruct the ID by unpacking the package,
removing the ID field from debian/control and rehashing.

Once mtree is in, this also includes stuff like permissions, xattrs.
Then we can also SHA256 everything and use that as an ID.

The only thing not covered by this is the layout of the tar files,
the compression, and the layout of the final ar file. But that's
not really relevant to any of our use cases.

For APT, we specifically need the ID to be in the package and dpkg
not to add any missing IDs, otherwise we cannot rely on the ID as
the installed one might have one but the remote one not.

For deltas, we only care that it has *some* ID, for what it's worth,
it could be a random UUID (but that's not reproducible). I do need
that to be in debian/control, as this will allow us to change the
ID algorithm at any point in time and not require us to recompute
the id when generating the delta.

-- 
Debian Developer - deb.li/jak | jak-linux.org - free software dev
  |  Ubuntu Core Developer |
When replying, only quote what is necessary, and write each reply
directly below the part(s) it pertains to ('inline').  Thank you.



Bug#802241: please store the hash of the installed .deb and allow to query it

2017-08-26 Thread Mattia Rizzolo
On Sat, Aug 26, 2017 at 02:14:16PM +0200, Julian Andres Klode wrote:
> I also want this for delta debs, to identify local rebuilds being
> installed, and prevent delta installation failure in such cases.

yay another user!

> > To me it seems that:
> > * we are mostly interested in the hash of the whole container: all the
> >   use cases highlighted above would require this;
> > * If ↑ then the hash can't be pre-computed and stored inside the
> >   container.
> 
> Practically speaking, for your use case you only need a hash of the
> file tree. My proposal for a package id is to use the md5sum of
> DEBIAN/md5sums. This can be stored in DEBIAN/control in an id
> field and generated at build time. 

That's not true, as we need the hash also (for example) of all the
maintainer scripts which are not in data.tar (I assume that's what you
meant by "file tree").  Also, we have seen packages where the only
difference is the order of entry in the md5sums file, therefore making
the build not reproducible by our (higher than policy) standards.
We really need the whole container.

> We can also use cat DEBIAN/md5sums DEBIAN/control | md5sum (without an
> Id field in control) as the ID, and then append that to control. This
> means that dependency relations and stuff is included as well. That's
> useful for the solver use case; but it's not really relevant for
> the reproducible build use case - dependencies on the installed
> system, description, etc should not matter for you.

Well, DEBIAN/control contains the dependencies generated during the
package build, and we do are interested in them as well…
In short: we do care about both data.tar and control.tar.  After all, we
do compare the hashes of the final .deb container.


As I saw it when I originally thought of the problem the only sane
solution to this for me would be to have dpkg compute the hash of the
.deb before unpacking it, and store in it's $admindir/status file, but
that makes the installation process very CPU-intensive, to the point
that very probably it's too much to be bareable in many systems.

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
more about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Bug#802241: please store the hash of the installed .deb and allow to query it

2017-08-26 Thread Julian Andres Klode
On Sun, Oct 18, 2015 at 06:20:01PM +, Mattia Rizzolo wrote:
> Package: dpkg
> Version: 1.18.3
> Severity: wishlist
> X-Debbugs-CC: reproducible-bui...@lists.alioth.debian.org
> 
> Hi dpkg people,
> 
> in the context of allowing to recreate the same build-environment of a
> past build we would need to know which packages where installed.
> Currently we rely on (pkgname, arch, version) tuples to uniquely
> identify a binary package, but as you can easily imagine this is not
> unique at all, definitly not in the multi distro universe, possibly not
> even across suites.

I also want this for delta debs, to identify local rebuilds being
installed, and prevent delta installation failure in such cases.

> 
> To me it seems that:
> * we are mostly interested in the hash of the whole container: all the
>   use cases highlighted above would require this;
> * If ↑ then the hash can't be pre-computed and stored inside the
>   container.

Practically speaking, for your use case you only need a hash of the
file tree. My proposal for a package id is to use the md5sum of
DEBIAN/md5sums. This can be stored in DEBIAN/control in an id
field and generated at build time. 

We can also use cat DEBIAN/md5sums DEBIAN/control | md5sum (without an
Id field in control) as the ID, and then append that to control. This
means that dependency relations and stuff is included as well. That's
useful for the solver use case; but it's not really relevant for
the reproducible build use case - dependencies on the installed
system, description, etc should not matter for you.

-- 
Debian Developer - deb.li/jak | jak-linux.org - free software dev
  |  Ubuntu Core Developer |
When replying, only quote what is necessary, and write each reply
directly below the part(s) it pertains to ('inline').  Thank you.



Bug#802241: please store the hash of the installed .deb and allow to query it

2015-10-18 Thread Mattia Rizzolo
Package: dpkg
Version: 1.18.3
Severity: wishlist
X-Debbugs-CC: reproducible-bui...@lists.alioth.debian.org

Hi dpkg people,

in the context of allowing to recreate the same build-environment of a
past build we would need to know which packages where installed.
Currently we rely on (pkgname, arch, version) tuples to uniquely
identify a binary package, but as you can easily imagine this is not
unique at all, definitly not in the multi distro universe, possibly not
even across suites.
This can also help quite some higher level package manager to identify
which archive is providing the installed package, as David Kalnischkies
pointed out in https://lists.debian.org/20150624164233.GA25413@crossbow

I would think to just add a field in /var/lib/dpkg/status but YMMV and
I'm happy with everything.

As a side effect this allows enables anyone easily whether a package
came from the Debian archive or from somewhere else.


This matter was already briefly discussed in ML, and ended up with some
open questions in https://lists.debian.org/20150623073105.GE5719@loar so
let's file this bug to way easily track it.

To me it seems that:
* we are mostly interested in the hash of the whole container: all the
  use cases highlighted above would require this;
* If ↑ then the hash can't be pre-computed and stored inside the
  container.


Thanks in advance for everything!

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
more about me:  http://mapreri.org  : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature