Re: Let's shrink Packages.xz

2014-07-28 Thread Ian Jackson
Russ Allbery writes (Re: Let's shrink Packages.xz): Ian Jackson ijack...@chiark.greenend.org.uk writes: But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think

Re: Let's shrink Packages.xz

2014-07-25 Thread Ian Jackson
Russ Allbery writes (Re: Let's shrink Packages.xz): I'm fairly sure Jakub's message was in response to the recent discussion about small Node.js packages and the frequent complaints that we should not introduce small packages into the archive because it bloats our metadata. Reducing

Re: Let's shrink Packages.xz

2014-07-25 Thread Matt Zagrabelny
On Fri, Jul 25, 2014 at 6:50 AM, Ian Jackson ijack...@chiark.greenend.org.uk wrote: Reducing the size of Packages.xz by 11% or 22% would leave room for quite a lot of small packages while not making the problem any worse than it is today. But the problem with lots of small packages is not

Re: Let's shrink Packages.xz

2014-07-25 Thread Russ Allbery
Ian Jackson ijack...@chiark.greenend.org.uk writes: But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think about too many packages. This makes packaging tools slow,

Re: Let's shrink Packages.xz

2014-07-25 Thread Gerrit Pape
On Fri, Jul 25, 2014 at 10:07:25AM -0700, Russ Allbery wrote: Ian Jackson ijack...@chiark.greenend.org.uk writes: But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to

Re: Let's shrink Packages.xz

2014-07-18 Thread Chris Bannister
On Wed, Jul 16, 2014 at 08:40:29PM +0200, Ondřej Surý wrote: On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote: Ondřej Surý ond...@sury.org writes: On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am

Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Mon, Jul 14, 2014 at 12:26:30PM -0500, Jeff Epler wrote: actually used by current versions of apt. (ideally you'd just go sha256, but iirc it's the md5sum that is used in practice, even today. but please find that thread, don't trust my summary) - apt-get --print-uris defaults to MD5 by

Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Mon, Jul 14, 2014 at 06:25:47PM +0200, Jakub Wilk wrote: Description-md5 794.3 KiB 11.9% Needed to provide a mapping as versions change a lot more often than descriptions do; also, historically, Translation-* were outside of the control of ftpmasters (at least, that is what history

Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Wed, Jul 16, 2014 at 02:23:34PM +0200, David Kalnischkies wrote: With a slight change in semantic we could drop the field from the Packages file again anyhow: At the moment it is the MD5sum of the long description. If it isn't present the clients are expected to calculate it for themselves

Re: Let's shrink Packages.xz

2014-07-16 Thread Ondřej Surý
Hi Jakub, On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? Ondrej -- Ondřej Surý ond...@sury.org Knot

Re: Let's shrink Packages.xz

2014-07-16 Thread Russ Allbery
Ondřej Surý ond...@sury.org writes: On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? I'm fairly sure

Re: Let's shrink Packages.xz

2014-07-16 Thread Ondřej Surý
On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote: Ondřej Surý ond...@sury.org writes: On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at

Let's shrink Packages.xz

2014-07-14 Thread Jakub Wilk
Food for thought: Which fields take up most space in Packages.xz[0]? (whole file) 6662.0 KiB 100.0% SHA256 1463.8 KiB 22.0% SHA1938.9 KiB 14.1% Description-md5 794.3 KiB 11.9% MD5sum 752.4 KiB 11.3% Depends 473.0 KiB7.1%

Re: Let's shrink Packages.xz

2014-07-14 Thread Jeff Epler
I performed a few little experiments, too. First, I tried encoding the various digests as base64 or base93, rather than hex. In each case, the file grew in size; base93 was the worst. Eliminating all the headers (e.g., replacing Package: foo with simply foo) saved 3.2%. Replacing each one with

Re: Let's shrink Packages.xz

2014-07-14 Thread ابراهیم محمدی
Isn't a single (rather small) hash value enough for almost all users? On Mon, Jul 14, 2014 at 8:55 PM, Jakub Wilk jw...@debian.org wrote: Food for thought: Which fields take up most space in Packages.xz[0]? (whole file) 6662.0 KiB 100.0% SHA256 1463.8 KiB 22.0% SHA1

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
ابراهیم محمدی mebra...@gmail.com writes: Isn't a single (rather small) hash value enough for almost all users? Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check all the hashes and the hashes would fail

Re: Let's shrink Packages.xz

2014-07-14 Thread Peter Palfrader
On Mon, 14 Jul 2014, Russ Allbery wrote: ابراهیم محمدی mebra...@gmail.com writes: Isn't a single (rather small) hash value enough for almost all users? Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Peter Palfrader wea...@debian.org writes: On Mon, 14 Jul 2014, Russ Allbery wrote: Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check all the hashes and the hashes would fail independently (which is likely).

Re: Let's shrink Packages.xz

2014-07-14 Thread Nathan Schulte
Jeff Epler wrote: First, I tried encoding the various digests as base64 or base93, rather than hex. In each case, the file grew in size; base93 was the worst. Are you sure you performed this calculation correctly? ASCII hex encodes 4 bits as 8 (or 7. but really 8.), as each ASCII character

Re: Let's shrink Packages.xz

2014-07-14 Thread Jakub Wilk
* Peter Palfrader wea...@debian.org, 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this much harder into doubt. But I can't find

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Jakub Wilk jw...@debian.org writes: You might have had this paper in mind: https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf Quoting §4: “If F and G are good iterated hash functions with no attack better than the generic birthday paradox attack, we claim that the hash

Re: Let's shrink Packages.xz

2014-07-14 Thread Henrique de Moraes Holschuh
On Mon, 14 Jul 2014, Nathan Schulte wrote: ASCII hex encodes 4 bits as 8 (or 7. but really 8.), as each ASCII character is a nibble of the digest; that's a 100% increase (factor of 2) over the bare digest (or a raw mapping of 8 bits of digest to an 8 bit character set). The figures given

Re: Let's shrink Packages.xz

2014-07-14 Thread Henrique de Moraes Holschuh
On Mon, 14 Jul 2014, Jakub Wilk wrote: * Peter Palfrader wea...@debian.org, 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this much

Re: Let's shrink Packages.xz

2014-07-14 Thread Dimitri John Ledkov
On 14 July 2014 20:57, Henrique de Moraes Holschuh h...@debian.org wrote: On Mon, 14 Jul 2014, Jakub Wilk wrote: * Peter Palfrader wea...@debian.org, 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Dimitri John Ledkov x...@debian.org writes: Huh, I'm not quite sure that multiple hashes actually gain us anything at all in terms of compromisation, since ultimately all our archive metadata is protected by a single hash only. Whilst replacing individual files simultaneously matching