Re: Let's shrink Packages.xz

2014-07-28 Thread Ian Jackson
Russ Allbery writes (Re: Let's shrink Packages.xz):
 Ian Jackson ijack...@chiark.greenend.org.uk writes:
  But the problem with lots of small packages is not that the Packages.xz
  has too many bytes.
 
  It's that the packaging tools, UIs (for users and developers), and
  humans, need to think about too many packages.
 
  This makes packaging tools slow, UIs cluttered, and humans confused.
 
 So we need to figure out how to solve that problem.  But don't package
 things is not a good solution to that problem, obviously, and don't
 package small things or don't use a packaging structure that works well
 for our tools aren't much better.

I think the right answer is that we should try to avoid creating lots
of small packages even if that is a bit more work for the specific
package.

The reason this argument keeps coming up is that the people looking at
a particular package see almost entirely the costs of aggregating into
a single package.  The costs of disaggregating are diffused across the
whole of the user and developer population.

So there is a need to (a) exercise some self-restraint (b) educate
developers who have failed to restrain themselves.

 I think it's important, when looking at a problem like this, to
 distinguish between problems that are fixable via a change in policy
 versus problems that are only deferrable.  This is a problem that's
 deferrable by changing what and how we package, but not fixable.  The
 amount of software in the world is going to continue to grow, and Debian
 is hopefully going to continue to grow with it, which means that the
 package list is going to get longer regardless of our policies around
 packaging small things.  So all those problems are going to happen no
 matter what, which means we should find better solutions to them.

Yes, the overall costs of having lots of packages are going to grow
because there is always going to be more software and more complicated
software.

But improving our tools won't make this problem go away, either.  The
relationship between capability, degradation due to overload, and
effort put into scaling, is complicated, but we cannot expect to ever
make a system that will scale indefinitely.

So we will always need to compromise between having lots of packages
because that's convenient for those packages and having fewer packages
because that's convenient for the rest of the system.

The debate is simply where to put that boundary.  Personally I think
this should be spelled out more clearly in policy.  Having consistency
in approach across the archive would be valuable.

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/21462.30846.824437.707...@chiark.greenend.org.uk



Re: Let's shrink Packages.xz

2014-07-25 Thread Ian Jackson
Russ Allbery writes (Re: Let's shrink Packages.xz):
 I'm fairly sure Jakub's message was in response to the recent discussion
 about small Node.js packages and the frequent complaints that we should
 not introduce small packages into the archive because it bloats our
 metadata.
 
 Reducing the size of Packages.xz by 11% or 22% would leave room for quite
 a lot of small packages while not making the problem any worse than it is
 today.

But the problem with lots of small packages is not that the
Packages.xz has too many bytes.

It's that the packaging tools, UIs (for users and developers), and
humans, need to think about too many packages.

This makes packaging tools slow, UIs cluttered, and humans confused.

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/21458.17524.168121.211...@chiark.greenend.org.uk



Re: Let's shrink Packages.xz

2014-07-25 Thread Matt Zagrabelny
On Fri, Jul 25, 2014 at 6:50 AM, Ian Jackson
ijack...@chiark.greenend.org.uk wrote:

 Reducing the size of Packages.xz by 11% or 22% would leave room for quite
 a lot of small packages while not making the problem any worse than it is
 today.

 But the problem with lots of small packages is not that the
 Packages.xz has too many bytes.

 It's that the packaging tools, UIs (for users and developers), and
 humans, need to think about too many packages.

 This makes packaging tools slow, UIs cluttered, and humans confused.

What is too many packages?

It seems that the UI would need to be able to handle an arbitrary
number of packages. There have been many thousands of packages since
Woody/Sarge.

-m


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/caolfk3xaxcftk57cd-qcudwnl1xwr84ebuougg_9qagqto0...@mail.gmail.com



Re: Let's shrink Packages.xz

2014-07-25 Thread Russ Allbery
Ian Jackson ijack...@chiark.greenend.org.uk writes:

 But the problem with lots of small packages is not that the Packages.xz
 has too many bytes.

 It's that the packaging tools, UIs (for users and developers), and
 humans, need to think about too many packages.

 This makes packaging tools slow, UIs cluttered, and humans confused.

So we need to figure out how to solve that problem.  But don't package
things is not a good solution to that problem, obviously, and don't
package small things or don't use a packaging structure that works well
for our tools aren't much better.

I think it's important, when looking at a problem like this, to
distinguish between problems that are fixable via a change in policy
versus problems that are only deferrable.  This is a problem that's
deferrable by changing what and how we package, but not fixable.  The
amount of software in the world is going to continue to grow, and Debian
is hopefully going to continue to grow with it, which means that the
package list is going to get longer regardless of our policies around
packaging small things.  So all those problems are going to happen no
matter what, which means we should find better solutions to them.

Packaging tools need better, faster algorithms.  UIs need better
information hiding: we have a lot of that already with, for example,
shared libraries, which the average user never has to see (since they're
pulled in via other packages the user wants), and therefore doesn't care
how many of them there are in the archive.  Data formats need to deal with
large numbers of packages better.  Because, no matter what, we're going to
have to deal with large numbers of packages.

And, as a bonus, if we solve the underlying problem, we can use a more
natural packaging strategy where an upstream corresponds to a package,
without the complexity of artificial lumping together of packages that are
actually distinct.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/874my5nz8y@windlord.stanford.edu



Re: Let's shrink Packages.xz

2014-07-25 Thread Gerrit Pape
On Fri, Jul 25, 2014 at 10:07:25AM -0700, Russ Allbery wrote:
 Ian Jackson ijack...@chiark.greenend.org.uk writes:
  But the problem with lots of small packages is not that the Packages.xz
  has too many bytes.
  It's that the packaging tools, UIs (for users and developers), and
  humans, need to think about too many packages.
  This makes packaging tools slow, UIs cluttered, and humans confused.
 
 So we need to figure out how to solve that problem.  But don't package
 things is not a good solution to that problem, obviously, and don't
 package small things or don't use a packaging structure that works well
 for our tools aren't much better.

SCNR
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422139#151

Regards, Gerrit.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20140725172802.16488.qm...@22ba4216a2609a.315fe32.mid.smarden.org



Re: Let's shrink Packages.xz

2014-07-18 Thread Chris Bannister
On Wed, Jul 16, 2014 at 08:40:29PM +0200, Ondřej Surý wrote:
 On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote:
  Ondřej Surý ond...@sury.org writes:
   On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote:
  
   Food for thought:
   Which fields take up most space in Packages.xz[0]?
  
   I am still lost - what problem are we trying to solve here?
   Could we at least define it to see if the problem exists?
  
  I'm fairly sure Jakub's message was in response to the recent discussion
  about small Node.js packages and the frequent complaints that we should
  not introduce small packages into the archive because it bloats our
  metadata.
  
  Reducing the size of Packages.xz by 11% or 22% would leave room for quite
  a lot of small packages while not making the problem any worse than it is
  today.
 
 Ok, that makes much more sense now. Still is the main problem the
 download
 size or the size on the disk (I can guess that it can be a problem on
 embedded
 archs). Or both?

Or just being a tidy citizen and try to avoid unnecessary wastage?

-- 
If you're not careful, the newspapers will have you hating the people
who are being oppressed, and loving the people who are doing the 
oppressing. --- Malcolm X


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140718150937.GK8963@tal



Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Mon, Jul 14, 2014 at 12:26:30PM -0500, Jeff Epler wrote:
 actually used by current versions of apt. (ideally you'd just go sha256,
 but iirc it's the md5sum that is used in practice, even today.  but
 please find that thread, don't trust my summary)

- apt-get --print-uris defaults to MD5 by default as there at least were
  clients expecting exactly that. jigdo given in the bugreport leading
  to this default for the time being (#576420). If that is still the
  case, who knows? In the last iteration the thread mysteriously died
  after I mentioned that we need someone who checks… If you don't like
  the default: -o Acquire::ForceHash=hash you wanna force  Still up
  for takers of course, but I am not holding my breath…
- pdiffs index is a SHA1-only fileformat at the moment
- Description-md5 is not security related, it just needed for mapping,
  so using something stronger would be non-sense. Something weaker
  would equally work, but that might be a way to ugly transition.
- apt-get source uses MD5 at the moment in all released versions,
  I guess other clients might as well as the fieldname is super handy…
  (and for us it is also an abi-breaking change)
- the rest uses the first hash it can find out of SHA512, 256, 1 and
  MD5 (checking in the order given here, not the order presented in the
  file). Check for yourself if you really care at which point which one
  was added… –– modulo all the bugs included of course.

The later two change in the yet-to-be finished version currently
residing in experimental in so far as that 'source' stops relying on the
Files field alone and that certain cases in the code will do an
all-known hashes comparison instead of best-only (it's difficult to
explain which ones these are without expecting a good understanding of
how files are acquired by apt, so I go with a each time we can do it
for free which is surprisingly often 'thanks' to our architecture).


Best regards

David Kalnischkies


signature.asc
Description: Digital signature


Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Mon, Jul 14, 2014 at 06:25:47PM +0200, Jakub Wilk wrote:
 Description-md5 794.3 KiB   11.9%

Needed to provide a mapping as versions change a lot more often than
descriptions do; also, historically, Translation-* were outside of the
control of ftpmasters (at least, that is what history digging told me).
It is also relatively new in the Packages file, which leads me to:

With a slight change in semantic we could drop the field from the
Packages file again anyhow: At the moment it is the MD5sum of the long
description. If it isn't present the clients are expected to calculate
it for themselves (well, this was required to work with Translation-*
before we moved long descriptions to Translation-en, so very new clients
might not know about that). So if we change this to MD5sum of whatever
is in the description field (short or long), we could drop it from the
Packages file and clients will again calculate this themselves to look
stuff up with it in the Translation files (where this field came from).

I haven't tested, but that should work without any change in apt (okay,
apt-ftparchive needs to be patched), so first stop for someone wanting
to drive this is probably dak - takers? Other servers and maybe clients
need to be adapted, but that could be done rather uncoordinated as there
is usually just one server creating both Packages and Translation-*
files, so it will have the same semantic interpretation and clients
either take what they get or already implicitely have the whatever is
in the field semantic.

(sidenote: see my other mail for the non-existent security implications
of using md5 here if you care)


 Description 463.4 KiB7.0%

ftpmaster's actually wanted to drop that in their final implementation
of the long description splitout. We got the short description back as
it wasn't part of the initial plan and clients didn't liked that (=
apt-cache search would segfault for example), beside that I prefer to
have at least a short description around in any case. I think if we drop
one of them, it should be the -md5 field as it isn't as compressible as
human-readable text… (not to mention quite useless for a human).


 SHA256 1463.8 KiB   22.0%
 SHA1938.9 KiB   14.1%
 MD5sum  752.4 KiB   11.3%

I *guess* the most painless drop would be SHA1. Entirely dropping it
from the archive means changing the pdiff infrastructure though.
Someone ought to check that claim…

Dropping MD5 will break some scripts parsing apt output. I personally
hate breaking users, so any takers to check/fix that at least Debian
tools do not break? Entirely dropping would be easy after this is done
(modulo Description-md5 of course, but see there).

Adding/Changing to SHA512 in the indexes is probably close to useless,
in the Release file the benefit is probably not worthwhile, but it is
here if need would arise. I have some hope that with apt/experimental we
will be able to add new hashsums with less pain (aka: no abibreak), too,
but that just as a sidenote.


 [other fields - present hopefully only for comparison proposes]

For the rest it is hopefully clear why we can't drop them, even though
I kinda like the idea of dropping dependencies… would make installing
stuff so much simpler… ;)


 Format changes ala base-whatever, \0, …

Changing the format is _*EXTREMELY*_ painful. It is also nice to have
a textfile you can work with easily… If you want to improve, this
improvement should be factored into a compression algorithm so that not
every parser in the universe needs to be rewritten… (one of apts
testcases uses 'rev' as a compression algorithm.  You just need to set
some options, advertise the availability in the Release file and you are
good to go…)


Best regards

David Kalnischkies


signature.asc
Description: Digital signature


Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Wed, Jul 16, 2014 at 02:23:34PM +0200, David Kalnischkies wrote:
 With a slight change in semantic we could drop the field from the
 Packages file again anyhow: At the moment it is the MD5sum of the long
 description. If it isn't present the clients are expected to calculate
 it for themselves (well, this was required to work with Translation-*
 before we moved long descriptions to Translation-en, so very new clients
 might not know about that). So if we change this to MD5sum of whatever
 is in the description field (short or long), we could drop it from the
 Packages file and clients will again calculate this themselves to look
 stuff up with it in the Translation files (where this field came from).

FTR: This has an obvious sideeffect though: If (lets say) foo/sid has
feature x and advertises it in the long description (but short didn't
change), while foo/stable didn't have it, it is undefined which
description will be shown, it could be any – but it will be only one,
the other is 'discarded' as duplicate.

Never happens with stable/single archives of course which I was thinking
of, but in sid you would see it relatively often. Multiply this with
'outdated' translations still being shown as current. (Did I mention
that I dislike feature lists in descriptions as they are always out of
date even without that?)

You could mix and match it of course, like dropping it for stable as
long descriptions aren't going to change there, so that short would be
identifier enough, but well…


Best regards

David Kalnischkies


P.S.: version number doesn't work here nicely, as experimental/security
do not have translated descriptions, but can piggyback on them this way.
Which is the historical reason for this alltogether, as they were first
not in the archive and then just file imports (which they still might or
might not be, I have no idea – and my last speculation in a train ended
in the previous mail - I blame the broken air condition…)


signature.asc
Description: Digital signature


Re: Let's shrink Packages.xz

2014-07-16 Thread Ondřej Surý
Hi Jakub,

On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote:
 Food for thought:
 Which fields take up most space in Packages.xz[0]?

I am still lost - what problem are we trying to solve here?
Could we at least define it to see if the problem exists?

Ondrej
-- 
Ondřej Surý ond...@sury.org
Knot DNS (https://www.knot-dns.cz/) – a high-performance DNS server


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/1405526596.17312.142348849.27365...@webmail.messagingengine.com



Re: Let's shrink Packages.xz

2014-07-16 Thread Russ Allbery
Ondřej Surý ond...@sury.org writes:
 On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote:

 Food for thought:
 Which fields take up most space in Packages.xz[0]?

 I am still lost - what problem are we trying to solve here?
 Could we at least define it to see if the problem exists?

I'm fairly sure Jakub's message was in response to the recent discussion
about small Node.js packages and the frequent complaints that we should
not introduce small packages into the archive because it bloats our
metadata.

Reducing the size of Packages.xz by 11% or 22% would leave room for quite
a lot of small packages while not making the problem any worse than it is
today.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/87fvi1fc2g@windlord.stanford.edu



Re: Let's shrink Packages.xz

2014-07-16 Thread Ondřej Surý
On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote:
 Ondřej Surý ond...@sury.org writes:
  On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote:
 
  Food for thought:
  Which fields take up most space in Packages.xz[0]?
 
  I am still lost - what problem are we trying to solve here?
  Could we at least define it to see if the problem exists?
 
 I'm fairly sure Jakub's message was in response to the recent discussion
 about small Node.js packages and the frequent complaints that we should
 not introduce small packages into the archive because it bloats our
 metadata.
 
 Reducing the size of Packages.xz by 11% or 22% would leave room for quite
 a lot of small packages while not making the problem any worse than it is
 today.

Ok, that makes much more sense now. Still is the main problem the
download
size or the size on the disk (I can guess that it can be a problem on
embedded
archs). Or both?

Dropping md5+sha1 or even introducing sha-224 instead of sha-256 would
help
in this case.

Having the fallback mechanism leaves open door for stripping+downgrade
attacks
anyway.

Switching to an optimized binary format isn't an option? But I guess it
won't
be probably that much better than a good compression algorithm.

O.
-- 
Ondřej Surý ond...@sury.org
Knot DNS (https://www.knot-dns.cz/) – a high-performance DNS server


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/1405536029.10176.142404101.152ab...@webmail.messagingengine.com



Let's shrink Packages.xz

2014-07-14 Thread Jakub Wilk

Food for thought:
Which fields take up most space in Packages.xz[0]?

(whole file)   6662.0 KiB  100.0%
SHA256 1463.8 KiB   22.0%
SHA1938.9 KiB   14.1%
Description-md5 794.3 KiB   11.9%
MD5sum  752.4 KiB   11.3%
Depends 473.0 KiB7.1%
Description 463.4 KiB7.0%
Filename338.9 KiB5.1%
Homepage183.1 KiB2.7%
Tag 176.1 KiB2.6%
Size168.3 KiB2.5%
Maintainer  161.3 KiB2.4%
Installed-Size  144.6 KiB2.2%
Package 134.5 KiB2.0%
Version  73.4 KiB1.1%
Suggests 68.9 KiB1.0%
Recommends   63.7 KiB1.0%


[0] More precisely: for each field, how much would Packages.xz shrink if 
we removed this (and only this) field?


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140714162547.ga3...@jwilk.net



Re: Let's shrink Packages.xz

2014-07-14 Thread Jeff Epler
I performed a few little experiments, too.

First, I tried encoding the various digests as base64 or base93, rather
than hex.  In each case, the file grew in size; base93 was the worst.

Eliminating all the headers (e.g., replacing Package: foo with simply
foo) saved 3.2%.  Replacing each one with an integer after its first
occurrence (e.g., since the first line is Packages:, every subsequent
Packages: line starts 0: instead) saved 1.7%.

Using \0 instead of :  and \n saved 1.5%.  Using : instead of : 
saved 1.8%.

None of these do as much good as simply getting rid of one file hash.  I
know this has been talked about in the past, and as far as I recall the
result, nobody could agree on a hash to drop, in light of both the
expected future security of various hashes, as well as the hashes
actually used by current versions of apt. (ideally you'd just go sha256,
but iirc it's the md5sum that is used in practice, even today.  but
please find that thread, don't trust my summary)

Jeff


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140714172629.ga6...@unpythonic.net



Re: Let's shrink Packages.xz

2014-07-14 Thread ابراهیم محمدی
Isn't a single (rather small) hash value enough for almost all users?


On Mon, Jul 14, 2014 at 8:55 PM, Jakub Wilk jw...@debian.org wrote:

 Food for thought:
 Which fields take up most space in Packages.xz[0]?

 (whole file)   6662.0 KiB  100.0%
 SHA256 1463.8 KiB   22.0%
 SHA1938.9 KiB   14.1%
 Description-md5 794.3 KiB   11.9%
 MD5sum  752.4 KiB   11.3%
 Depends 473.0 KiB7.1%
 Description 463.4 KiB7.0%
 Filename338.9 KiB5.1%
 Homepage183.1 KiB2.7%
 Tag 176.1 KiB2.6%
 Size168.3 KiB2.5%
 Maintainer  161.3 KiB2.4%
 Installed-Size  144.6 KiB2.2%
 Package 134.5 KiB2.0%
 Version  73.4 KiB1.1%
 Suggests 68.9 KiB1.0%
 Recommends   63.7 KiB1.0%


 [0] More precisely: for each field, how much would Packages.xz shrink if
 we removed this (and only this) field?

 --
 Jakub Wilk


 --
 To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
 with a subject of unsubscribe. Trouble? Contact
 listmas...@lists.debian.org
 Archive: https://lists.debian.org/20140714162547.ga3...@jwilk.net




Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
ابراهیم محمدی mebra...@gmail.com writes:

 Isn't a single (rather small) hash value enough for almost all users?

Using multiple hashes gives us some theoretical robustness against a break
in one of the hash functions provided that all clients check all the
hashes and the hashes would fail independently (which is likely).  The
basic idea is that it's much harder to come up with a simultaneoush hash
collision with both SHA-1 and SHA-2 than breaking either of them
independently.  I'm a bit dubious the clients actually check, though.
Also, it's questionable whether protecting against this theoretical
possibility is a good tradeoff.  If SHA-2 is broken suddenly, we have
larger problems than the integrity of the Packages file, and hopefully
we'd get a bit of advance warning (like we have with MD5) and be able to
introduce a new hash at that point.

MD5 may still be required for backward compatibility; otherwise, it's the
obvious one to drop.

If we were going to keep only one, we should keep SHA256, as that's the
most robust from a cryptographic standpoint at this point (SHA-3 may get
there, but is still too new), but obviously all the clients have to
support that.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/87lhrv3jfq@windlord.stanford.edu



Re: Let's shrink Packages.xz

2014-07-14 Thread Peter Palfrader
On Mon, 14 Jul 2014, Russ Allbery wrote:

 ابراهیم محمدی mebra...@gmail.com writes:
 
  Isn't a single (rather small) hash value enough for almost all users?
 
 Using multiple hashes gives us some theoretical robustness against a break
 in one of the hash functions provided that all clients check all the
 hashes and the hashes would fail independently (which is likely).

I would like to see some supporting evidence for the claim that they
will likely fail independently.  In particular given that they are all
the same construct.

The
 basic idea is that it's much harder to come up with a simultaneoush hash
 collision with both SHA-1 and SHA-2 than breaking either of them
 independently.

ISTR reading papers that put this much harder into doubt.  But I can't
find those references, alas.

I think just having a single, strong hash in Packages ought to be
sufficient.

Cheers,
-- 
   |  .''`.   ** Debian **
  Peter Palfrader  | : :' :  The  universal
 http://www.palfrader.org/ | `. `'  Operating System
   |   `-http://www.debian.org/


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140714182533.gk...@anguilla.noreply.org



Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Peter Palfrader wea...@debian.org writes:
 On Mon, 14 Jul 2014, Russ Allbery wrote:

 Using multiple hashes gives us some theoretical robustness against a
 break in one of the hash functions provided that all clients check all
 the hashes and the hashes would fail independently (which is likely).

 I would like to see some supporting evidence for the claim that they
 will likely fail independently.  In particular given that they are all
 the same construct.

SHA-1 and SHA-2 are relatively independent constructions, so it seems
intuitive to me that achieving a hash collision simultaneously with both
constructions would be harder than finding a hash collision for either of
them independently.

I admit that this argument is much stronger for SHA-2 and SHA-3, where
there is no commonality at all between the algorithms (that I know of).

That said...

 I think just having a single, strong hash in Packages ought to be
 sufficient.

...I agree with this.  I think that, even if this approach works and all
the clients check, the level of additional security that we get from
having multiple hashes isn't worth the overhead.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/871ttn3hdw@windlord.stanford.edu



Re: Let's shrink Packages.xz

2014-07-14 Thread Nathan Schulte

Jeff Epler wrote:

First, I tried encoding the various digests as base64 or base93, rather
than hex.  In each case, the file grew in size; base93 was the worst.


Are you sure you performed this calculation correctly?

ASCII hex encodes 4 bits as 8 (or 7. but really 8.), as each ASCII 
character is a nibble of the digest; that's a 100% increase (factor of 
2) over the bare digest (or a raw mapping of 8 bits of digest to an 8 
bit character set).


base64 encodes 6 bits as 8; that should only be a 33.3% increase (factor 
of 1.333).


I've never heard of base93, but I found a reference that I think 
describes what you mean [0].  This should provide even better efficiency 
over base64, as should any binary-to-ascii mapping of higher radix. 
Perfect segue...


What are we looking for in an encoding?  I'm guessing this needs to be 
printable, suitable for human consumption (or at least copy/paste / 
consumption via text editor), and 7-bit compat?


Is this even up for debate?  The community at large (computer users), 
Debian included, seems to have standardized on message digests as ASCII 
hex...


[0] http://kiwigis.blogspot.com/2013/09/base-93-integer-shortening-in-c.html

--
Nate


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/53c42996.4090...@gmail.com



Re: Let's shrink Packages.xz

2014-07-14 Thread Jakub Wilk

* Peter Palfrader wea...@debian.org, 2014-07-14, 20:25:
The basic idea is that it's much harder to come up with a 
simultaneoush hash collision with both SHA-1 and SHA-2 than breaking 
either of them independently.


ISTR reading papers that put this much harder into doubt.  But I 
can't find those references, alas.


You might have had this paper in mind:
https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf
Quoting §4: “If F and G are good iterated hash functions with no attack 
better than the generic birthday paradox attack, we claim that the hash 
function F||G obtained by concatenating F and G is not really more secure 
that F or G by itself.”


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140714191714.ga4...@jwilk.net



Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Jakub Wilk jw...@debian.org writes:

 You might have had this paper in mind:
 https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf
 Quoting §4: “If F and G are good iterated hash functions with no attack
 better than the generic birthday paradox attack, we claim that the hash
 function F||G obtained by concatenating F and G is not really more secure
 that F or G by itself.”

Ah, if that's the case, that's an argument about a different use case.  I
wouldn't expect just adding more hashes to add more security when the
hashes haven't been broken.  SHA-256 by itself provides more than enough
security if one assumes that it has ideal properties.

The (theoretical) security benefit argued for here is precisely the case
where the hash functions *do* have attacks better than the generic
birthday paradox attack (that we possibly don't know about yet).  It's
basically a defense in depth argument, coupled with the argument that the
special construction of a file to create a collision for one hash function
may be incompatible with the special construction of a file required to
create a collision with the other hash function.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/87oawr2194@windlord.stanford.edu



Re: Let's shrink Packages.xz

2014-07-14 Thread Henrique de Moraes Holschuh
On Mon, 14 Jul 2014, Nathan Schulte wrote:
 ASCII hex encodes 4 bits as 8 (or 7. but really 8.), as each ASCII
 character is a nibble of the digest; that's a 100% increase (factor
 of 2) over the bare digest (or a raw mapping of 8 bits of digest
 to an 8 bit character set).

The figures given refer to changes to the size of the compressed text, not
to the plaintext.

 I've never heard of base93, but I found a reference that I think
 describes what you mean [0].  This should provide even better
 efficiency over base64, as should any binary-to-ascii mapping of
 higher radix. Perfect segue...

It can have lower spece efficiency in compressed text (in fact, that's
exactly what happened), even if it *is* more efficient in the plain text.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140714195507.ga5...@khazad-dum.debian.net



Re: Let's shrink Packages.xz

2014-07-14 Thread Henrique de Moraes Holschuh
On Mon, 14 Jul 2014, Jakub Wilk wrote:
 * Peter Palfrader wea...@debian.org, 2014-07-14, 20:25:
 The basic idea is that it's much harder to come up with a
 simultaneoush hash collision with both SHA-1 and SHA-2 than
 breaking either of them independently.
 
 ISTR reading papers that put this much harder into doubt.  But I
 can't find those references, alas.
 
 You might have had this paper in mind:
 https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf
 Quoting §4: “If F and G are good iterated hash functions with no
 attack better than the generic birthday paradox attack, we claim
 that the hash function F||G obtained by concatenating F and G is not
 really more secure that F or G by itself.”

We don't want F|G to be more secure than F or G by itself.  We want it to be
at least as secure as the stronger of F or G.

Which means it continues being secure if one of G or F, but not both, is
compromised.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140714195728.gb5...@khazad-dum.debian.net



Re: Let's shrink Packages.xz

2014-07-14 Thread Dimitri John Ledkov
On 14 July 2014 20:57, Henrique de Moraes Holschuh h...@debian.org wrote:
 On Mon, 14 Jul 2014, Jakub Wilk wrote:
 * Peter Palfrader wea...@debian.org, 2014-07-14, 20:25:
 The basic idea is that it's much harder to come up with a
 simultaneoush hash collision with both SHA-1 and SHA-2 than
 breaking either of them independently.
 
 ISTR reading papers that put this much harder into doubt.  But I
 can't find those references, alas.

 You might have had this paper in mind:
 https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf
 Quoting §4: “If F and G are good iterated hash functions with no
 attack better than the generic birthday paradox attack, we claim
 that the hash function F||G obtained by concatenating F and G is not
 really more secure that F or G by itself.”

 We don't want F|G to be more secure than F or G by itself.  We want it to be
 at least as secure as the stronger of F or G.

 Which means it continues being secure if one of G or F, but not both, is
 compromised.


Huh, I'm not quite sure that multiple hashes actually gain us anything
at all in terms of compromisation, since ultimately all our archive
metadata is protected by a single hash only.

Whilst replacing individual files  simultaneously matching multiple
hash algorithms, is an interesting problem. It's much more interesting
to match SHA256 of Release file such that Release.gpg validates, then
you can replace /all/ files with valid checksums across the board. Or
otherwise generate/break the archive signing key.

So RSA 4096 key and SHA256 signature is what ultimately secures our
current archive, all other hashes in the Packages file are there
merely to assert that it's the right binary that is signed and that
one downloaded it correctly.

Thus can we please drop MD5  SHA1 hashes? Anything that can't
validate SHA256, can't validate Release.gpg/InRelease and is thus
insecure.

$ gpg -v --verify Release.gpg Release
gpg: armor header: Version: GnuPG v1.4.12 (GNU/Linux)
gpg: Signature made Mon 14 Jul 2014 10:02:39 PM BST using RSA key ID 46925553
gpg: using PGP trust model
gpg: Good signature from Debian Archive Automatic Signing Key
(7.0/wheezy) ftpmas...@debian.org
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: A1BD 8E9D 78F7 FE5C 3E65  D8AF 8B48 AD62 4692 5553
gpg: binary signature, digest algorithm SHA256

-- 
Regards,

Dimitri.


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/canbhluhxyly1rn+fwp12i3zoxhfcenvdoakxn+kaidweh3r...@mail.gmail.com



Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Dimitri John Ledkov x...@debian.org writes:

 Huh, I'm not quite sure that multiple hashes actually gain us anything
 at all in terms of compromisation, since ultimately all our archive
 metadata is protected by a single hash only.

 Whilst replacing individual files  simultaneously matching multiple
 hash algorithms, is an interesting problem. It's much more interesting
 to match SHA256 of Release file such that Release.gpg validates, then
 you can replace /all/ files with valid checksums across the board. Or
 otherwise generate/break the archive signing key.

Ah, yes, excellent point.

So yes, other than backward compatibility, I see no reason to keep any
hash other than the hash we're also using for the GnuPG signature.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/87y4vvy0wa@windlord.stanford.edu