Re: Please test gzip -9n - related to dpkg with multiarch support

2012-02-07 Thread Henrique de Moraes Holschuh
On Tue, 07 Feb 2012, Ben Hutchings wrote:
> But it's worse than this: even if dpkg decompresses before comparing,
> debsums won't (and mustn't, for backward compatibility).  So it's

Maybe you can switch to sha256 and add the new functionality while at
it?  Detect which mode (md5sum raw, sha256 uncompress) by the size of
the hash.  Old debsums won't work with the new files, but is that really
a problem?  That's what stable updates and backports are for...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120207234212.gb13...@khazad-dum.debian.net



Re: Please test gzip -9n - related to dpkg with multiarch support

2012-02-07 Thread Ben Hutchings
On Tue, Feb 07, 2012 at 10:04:04PM +, Neil Williams wrote:
> On Tue, 07 Feb 2012 19:11:16 +0100
> Michael Biebl  wrote:
> 
> > On 07.02.2012 18:07, Joey Hess wrote:
> > > Neil Williams wrote:
> > >> I'd like to ask for some help with a bug which is tripping up my tests
> > >> with the multiarch-aware dpkg from experimental - #647522 -
> > >> non-deterministic behaviour of gzip -9n.
> > > 
> > > pristine-tar hat tricks[1] aside, none of gzip, bzip2, xz are required
> > > to always produce the same compressed file for a given input file, and I
> > > can tell you from experience that there is a wide amount of variation. If
> > > multiarch requires this, then its design is at worst broken, and at
> > > best, there will be a lot of coordination pain every time there is a
> > > new/different version of any of these that happens to compress slightly
> > > differently.
> 
> Exactly. I'm not convinced that this is fixable at the gzip level, nor
> is it likely to be fixable by the trauma of changing from gzip to
> something else. That would be pointless.
> 
> What matters, to me, is that package installations do not fail
> somewhere down the dependency chain in ways which are difficult to fix.
> Compression is used to save space, not to provide unique identification
> of file contents. As it is now clear that the compression is getting in
> the way of dealing with files which are (in terms of their actual
> *usable* content) identical, then the compression needs to be taken out
> of the comparison operation. Where the checksum matches that's all well
> and good (problems with md5sum collisions aside), where it does not
> match then dpkg cannot deem that the files conflict without creating a
> checksum based on the decompressed content of the two files.
[...]

But it's worse than this: even if dpkg decompresses before comparing,
debsums won't (and mustn't, for backward compatibility).  So it's
potentially necessary to fix up the md5sums file for a package
installed for multiple architectures, if it contains a file that was
compressed differently.

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
  - Albert Camus


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120207224923.gc12...@decadent.org.uk



Re: Please test gzip -9n - related to dpkg with multiarch support

2012-02-07 Thread Russ Allbery
Neil Williams  writes:

> Maybe the way to solve this properly is to remove compression from the
> uniqueness check - compare the contents of the file in memory after
> decompression. Yes, it will take longer but it is only needed when the
> md5sum (which already exists) doesn't match.

Another possible solution is to just give any package an implicit Replaces
(possibly constrained to /usr/share/doc) on any other package with the
same name and version and a different architecture.  This isn't as
defensive, in that it doesn't catch legitimate bugs where someone has made
a mistake and the packages contain different contents, but it also solves
the binNMU issue (well, "solves"; the changelog will randomly swap back
and forth between the packages, but I'm having a hard time being convinced
this is a huge problem).

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87fwemz229@windlord.stanford.edu



Re: Please test gzip -9n - related to dpkg with multiarch support

2012-02-07 Thread Neil Williams
On Tue, 07 Feb 2012 19:11:16 +0100
Michael Biebl  wrote:

> On 07.02.2012 18:07, Joey Hess wrote:
> > Neil Williams wrote:
> >> I'd like to ask for some help with a bug which is tripping up my tests
> >> with the multiarch-aware dpkg from experimental - #647522 -
> >> non-deterministic behaviour of gzip -9n.
> > 
> > pristine-tar hat tricks[1] aside, none of gzip, bzip2, xz are required
> > to always produce the same compressed file for a given input file, and I
> > can tell you from experience that there is a wide amount of variation. If
> > multiarch requires this, then its design is at worst broken, and at
> > best, there will be a lot of coordination pain every time there is a
> > new/different version of any of these that happens to compress slightly
> > differently.

Exactly. I'm not convinced that this is fixable at the gzip level, nor
is it likely to be fixable by the trauma of changing from gzip to
something else. That would be pointless.

What matters, to me, is that package installations do not fail
somewhere down the dependency chain in ways which are difficult to fix.
Compression is used to save space, not to provide unique identification
of file contents. As it is now clear that the compression is getting in
the way of dealing with files which are (in terms of their actual
*usable* content) identical, then the compression needs to be taken out
of the comparison operation. Where the checksum matches that's all well
and good (problems with md5sum collisions aside), where it does not
match then dpkg cannot deem that the files conflict without creating a
checksum based on the decompressed content of the two files. A checksum
failure of a compressed file is clearly unreliable and will generate
dozens of unreproducible bugs.

MultiArch has many benefits but saving space is not why MultiArch
exists and systems which will use MultiArch in anger will not be likely
to be short of either RAM or swap space. Yes, the machines which are
*targeted* by the builds which occur as a result of having MultiArch
available for Emdebian will definitely be aimed at "low resource"
devices but those devices do NOT need to actually use MultiArch
themselves. In the parlance of --build, --host and autotools, MultiArch
is a build tool, not a host mechanism. If you've got the resources to
cross-build something, you have the resources to checksum the
decompressed content of some files.

As far as having MultiArch to install non-free i386 on amd64, it is
less of a problem simply because the number of packages installed as
MultiArch packages is likely to be a lot less. Even so, although the
likelihood drops, the effect of one of these collisions getting through
is the same.

> This seems to be a rather common problem as evindenced by e.g.
> 
> https://bugs.launchpad.net/ubuntu/+source/clutter-1.0/+bug/901522
> https://bugs.launchpad.net/ubuntu/+source/libtasn1-3/+bug/889303
> https://bugs.launchpad.net/ubuntu/oneiric/+source/pam/+bug/871083

See the number of .gz files in this list:
http://people.debian.org/~jwilk/multi-arch/same-md5sums.txt

> In Ubuntu they started to work-around that by excluding random files
> from being compressed. So far I refused to add those hacks to the Debian
> package as this needs to be addressed properly.

Maybe the way to solve this properly is to remove compression from the
uniqueness check - compare the contents of the file in memory after
decompression. Yes, it will take longer but it is only needed when the
md5sum (which already exists) doesn't match.

The core problem is that the times when the md5sum of the compressed
file won't match are unpredictable. No workaround is going to be
reliable because there is no apparent logic to the files which become
affected and any file which was affected at libfoo0_1.2.3 could well be
completely blameless in libfoo0_1.2.3+b1.

(binNMU's aren't the answer either because that could just as easily
transfer the bug from libfoo0 to libfoo-dev and so on.)

There appears to be plenty of evidence that checksums of compressed
files are only useful until the checksums fail to match, at which point
I think dpkg will just have to fall back to decompressing the contents
in RAM / swap and doing a fresh checksum on the contents of each
contentious compressed file. If the checksums of the contents match,
the compressed file on the filesystem wins.

Anything else and Debian loses all the reproducibility which is so
important to developers and users. When I need to make a cross-building
chroot from unstable (or write a tool for others to create such
chroots), it can't randomly fail today, work tomorrow and fail
with some other package the day after.

If others agree, I think that bug #647522, currently open against gzip,
could be reassigned to dpkg and retitled to not rely on checksums for
compressed files when determining MultiArch file collisions.

-- 


Neil Williams
=
http://www.linux.codehelp.co.uk/



pgptj0LtSC5J0.pgp
Description: PGP signature