Re: Let's shrink Packages.xz
Russ Allbery writes (Re: Let's shrink Packages.xz): Ian Jackson ijack...@chiark.greenend.org.uk writes: But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think about too many packages. This makes packaging tools slow, UIs cluttered, and humans confused. So we need to figure out how to solve that problem. But don't package things is not a good solution to that problem, obviously, and don't package small things or don't use a packaging structure that works well for our tools aren't much better. I think the right answer is that we should try to avoid creating lots of small packages even if that is a bit more work for the specific package. The reason this argument keeps coming up is that the people looking at a particular package see almost entirely the costs of aggregating into a single package. The costs of disaggregating are diffused across the whole of the user and developer population. So there is a need to (a) exercise some self-restraint (b) educate developers who have failed to restrain themselves. I think it's important, when looking at a problem like this, to distinguish between problems that are fixable via a change in policy versus problems that are only deferrable. This is a problem that's deferrable by changing what and how we package, but not fixable. The amount of software in the world is going to continue to grow, and Debian is hopefully going to continue to grow with it, which means that the package list is going to get longer regardless of our policies around packaging small things. So all those problems are going to happen no matter what, which means we should find better solutions to them. Yes, the overall costs of having lots of packages are going to grow because there is always going to be more software and more complicated software. But improving our tools won't make this problem go away, either. The relationship between capability, degradation due to overload, and effort put into scaling, is complicated, but we cannot expect to ever make a system that will scale indefinitely. So we will always need to compromise between having lots of packages because that's convenient for those packages and having fewer packages because that's convenient for the rest of the system. The debate is simply where to put that boundary. Personally I think this should be spelled out more clearly in policy. Having consistency in approach across the archive would be valuable. Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/21462.30846.824437.707...@chiark.greenend.org.uk
Re: Let's shrink Packages.xz
Russ Allbery writes (Re: Let's shrink Packages.xz): I'm fairly sure Jakub's message was in response to the recent discussion about small Node.js packages and the frequent complaints that we should not introduce small packages into the archive because it bloats our metadata. Reducing the size of Packages.xz by 11% or 22% would leave room for quite a lot of small packages while not making the problem any worse than it is today. But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think about too many packages. This makes packaging tools slow, UIs cluttered, and humans confused. Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/21458.17524.168121.211...@chiark.greenend.org.uk
Re: Let's shrink Packages.xz
On Fri, Jul 25, 2014 at 6:50 AM, Ian Jackson ijack...@chiark.greenend.org.uk wrote: Reducing the size of Packages.xz by 11% or 22% would leave room for quite a lot of small packages while not making the problem any worse than it is today. But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think about too many packages. This makes packaging tools slow, UIs cluttered, and humans confused. What is too many packages? It seems that the UI would need to be able to handle an arbitrary number of packages. There have been many thousands of packages since Woody/Sarge. -m -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/caolfk3xaxcftk57cd-qcudwnl1xwr84ebuougg_9qagqto0...@mail.gmail.com
Re: Let's shrink Packages.xz
Ian Jackson ijack...@chiark.greenend.org.uk writes: But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think about too many packages. This makes packaging tools slow, UIs cluttered, and humans confused. So we need to figure out how to solve that problem. But don't package things is not a good solution to that problem, obviously, and don't package small things or don't use a packaging structure that works well for our tools aren't much better. I think it's important, when looking at a problem like this, to distinguish between problems that are fixable via a change in policy versus problems that are only deferrable. This is a problem that's deferrable by changing what and how we package, but not fixable. The amount of software in the world is going to continue to grow, and Debian is hopefully going to continue to grow with it, which means that the package list is going to get longer regardless of our policies around packaging small things. So all those problems are going to happen no matter what, which means we should find better solutions to them. Packaging tools need better, faster algorithms. UIs need better information hiding: we have a lot of that already with, for example, shared libraries, which the average user never has to see (since they're pulled in via other packages the user wants), and therefore doesn't care how many of them there are in the archive. Data formats need to deal with large numbers of packages better. Because, no matter what, we're going to have to deal with large numbers of packages. And, as a bonus, if we solve the underlying problem, we can use a more natural packaging strategy where an upstream corresponds to a package, without the complexity of artificial lumping together of packages that are actually distinct. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/874my5nz8y@windlord.stanford.edu
Re: Let's shrink Packages.xz
On Fri, Jul 25, 2014 at 10:07:25AM -0700, Russ Allbery wrote: Ian Jackson ijack...@chiark.greenend.org.uk writes: But the problem with lots of small packages is not that the Packages.xz has too many bytes. It's that the packaging tools, UIs (for users and developers), and humans, need to think about too many packages. This makes packaging tools slow, UIs cluttered, and humans confused. So we need to figure out how to solve that problem. But don't package things is not a good solution to that problem, obviously, and don't package small things or don't use a packaging structure that works well for our tools aren't much better. SCNR https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422139#151 Regards, Gerrit. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140725172802.16488.qm...@22ba4216a2609a.315fe32.mid.smarden.org
Re: Let's shrink Packages.xz
On Wed, Jul 16, 2014 at 08:40:29PM +0200, Ondřej Surý wrote: On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote: Ondřej Surý ond...@sury.org writes: On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? I'm fairly sure Jakub's message was in response to the recent discussion about small Node.js packages and the frequent complaints that we should not introduce small packages into the archive because it bloats our metadata. Reducing the size of Packages.xz by 11% or 22% would leave room for quite a lot of small packages while not making the problem any worse than it is today. Ok, that makes much more sense now. Still is the main problem the download size or the size on the disk (I can guess that it can be a problem on embedded archs). Or both? Or just being a tidy citizen and try to avoid unnecessary wastage? -- If you're not careful, the newspapers will have you hating the people who are being oppressed, and loving the people who are doing the oppressing. --- Malcolm X -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140718150937.GK8963@tal
Re: Let's shrink Packages.xz
On Mon, Jul 14, 2014 at 12:26:30PM -0500, Jeff Epler wrote: actually used by current versions of apt. (ideally you'd just go sha256, but iirc it's the md5sum that is used in practice, even today. but please find that thread, don't trust my summary) - apt-get --print-uris defaults to MD5 by default as there at least were clients expecting exactly that. jigdo given in the bugreport leading to this default for the time being (#576420). If that is still the case, who knows? In the last iteration the thread mysteriously died after I mentioned that we need someone who checks… If you don't like the default: -o Acquire::ForceHash=hash you wanna force Still up for takers of course, but I am not holding my breath… - pdiffs index is a SHA1-only fileformat at the moment - Description-md5 is not security related, it just needed for mapping, so using something stronger would be non-sense. Something weaker would equally work, but that might be a way to ugly transition. - apt-get source uses MD5 at the moment in all released versions, I guess other clients might as well as the fieldname is super handy… (and for us it is also an abi-breaking change) - the rest uses the first hash it can find out of SHA512, 256, 1 and MD5 (checking in the order given here, not the order presented in the file). Check for yourself if you really care at which point which one was added… –– modulo all the bugs included of course. The later two change in the yet-to-be finished version currently residing in experimental in so far as that 'source' stops relying on the Files field alone and that certain cases in the code will do an all-known hashes comparison instead of best-only (it's difficult to explain which ones these are without expecting a good understanding of how files are acquired by apt, so I go with a each time we can do it for free which is surprisingly often 'thanks' to our architecture). Best regards David Kalnischkies signature.asc Description: Digital signature
Re: Let's shrink Packages.xz
On Mon, Jul 14, 2014 at 06:25:47PM +0200, Jakub Wilk wrote: Description-md5 794.3 KiB 11.9% Needed to provide a mapping as versions change a lot more often than descriptions do; also, historically, Translation-* were outside of the control of ftpmasters (at least, that is what history digging told me). It is also relatively new in the Packages file, which leads me to: With a slight change in semantic we could drop the field from the Packages file again anyhow: At the moment it is the MD5sum of the long description. If it isn't present the clients are expected to calculate it for themselves (well, this was required to work with Translation-* before we moved long descriptions to Translation-en, so very new clients might not know about that). So if we change this to MD5sum of whatever is in the description field (short or long), we could drop it from the Packages file and clients will again calculate this themselves to look stuff up with it in the Translation files (where this field came from). I haven't tested, but that should work without any change in apt (okay, apt-ftparchive needs to be patched), so first stop for someone wanting to drive this is probably dak - takers? Other servers and maybe clients need to be adapted, but that could be done rather uncoordinated as there is usually just one server creating both Packages and Translation-* files, so it will have the same semantic interpretation and clients either take what they get or already implicitely have the whatever is in the field semantic. (sidenote: see my other mail for the non-existent security implications of using md5 here if you care) Description 463.4 KiB7.0% ftpmaster's actually wanted to drop that in their final implementation of the long description splitout. We got the short description back as it wasn't part of the initial plan and clients didn't liked that (= apt-cache search would segfault for example), beside that I prefer to have at least a short description around in any case. I think if we drop one of them, it should be the -md5 field as it isn't as compressible as human-readable text… (not to mention quite useless for a human). SHA256 1463.8 KiB 22.0% SHA1938.9 KiB 14.1% MD5sum 752.4 KiB 11.3% I *guess* the most painless drop would be SHA1. Entirely dropping it from the archive means changing the pdiff infrastructure though. Someone ought to check that claim… Dropping MD5 will break some scripts parsing apt output. I personally hate breaking users, so any takers to check/fix that at least Debian tools do not break? Entirely dropping would be easy after this is done (modulo Description-md5 of course, but see there). Adding/Changing to SHA512 in the indexes is probably close to useless, in the Release file the benefit is probably not worthwhile, but it is here if need would arise. I have some hope that with apt/experimental we will be able to add new hashsums with less pain (aka: no abibreak), too, but that just as a sidenote. [other fields - present hopefully only for comparison proposes] For the rest it is hopefully clear why we can't drop them, even though I kinda like the idea of dropping dependencies… would make installing stuff so much simpler… ;) Format changes ala base-whatever, \0, … Changing the format is _*EXTREMELY*_ painful. It is also nice to have a textfile you can work with easily… If you want to improve, this improvement should be factored into a compression algorithm so that not every parser in the universe needs to be rewritten… (one of apts testcases uses 'rev' as a compression algorithm. You just need to set some options, advertise the availability in the Release file and you are good to go…) Best regards David Kalnischkies signature.asc Description: Digital signature
Re: Let's shrink Packages.xz
On Wed, Jul 16, 2014 at 02:23:34PM +0200, David Kalnischkies wrote: With a slight change in semantic we could drop the field from the Packages file again anyhow: At the moment it is the MD5sum of the long description. If it isn't present the clients are expected to calculate it for themselves (well, this was required to work with Translation-* before we moved long descriptions to Translation-en, so very new clients might not know about that). So if we change this to MD5sum of whatever is in the description field (short or long), we could drop it from the Packages file and clients will again calculate this themselves to look stuff up with it in the Translation files (where this field came from). FTR: This has an obvious sideeffect though: If (lets say) foo/sid has feature x and advertises it in the long description (but short didn't change), while foo/stable didn't have it, it is undefined which description will be shown, it could be any – but it will be only one, the other is 'discarded' as duplicate. Never happens with stable/single archives of course which I was thinking of, but in sid you would see it relatively often. Multiply this with 'outdated' translations still being shown as current. (Did I mention that I dislike feature lists in descriptions as they are always out of date even without that?) You could mix and match it of course, like dropping it for stable as long descriptions aren't going to change there, so that short would be identifier enough, but well… Best regards David Kalnischkies P.S.: version number doesn't work here nicely, as experimental/security do not have translated descriptions, but can piggyback on them this way. Which is the historical reason for this alltogether, as they were first not in the archive and then just file imports (which they still might or might not be, I have no idea – and my last speculation in a train ended in the previous mail - I blame the broken air condition…) signature.asc Description: Digital signature
Re: Let's shrink Packages.xz
Hi Jakub, On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? Ondrej -- Ondřej Surý ond...@sury.org Knot DNS (https://www.knot-dns.cz/) – a high-performance DNS server -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/1405526596.17312.142348849.27365...@webmail.messagingengine.com
Re: Let's shrink Packages.xz
Ondřej Surý ond...@sury.org writes: On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? I'm fairly sure Jakub's message was in response to the recent discussion about small Node.js packages and the frequent complaints that we should not introduce small packages into the archive because it bloats our metadata. Reducing the size of Packages.xz by 11% or 22% would leave room for quite a lot of small packages while not making the problem any worse than it is today. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/87fvi1fc2g@windlord.stanford.edu
Re: Let's shrink Packages.xz
On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote: Ondřej Surý ond...@sury.org writes: On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: Food for thought: Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? I'm fairly sure Jakub's message was in response to the recent discussion about small Node.js packages and the frequent complaints that we should not introduce small packages into the archive because it bloats our metadata. Reducing the size of Packages.xz by 11% or 22% would leave room for quite a lot of small packages while not making the problem any worse than it is today. Ok, that makes much more sense now. Still is the main problem the download size or the size on the disk (I can guess that it can be a problem on embedded archs). Or both? Dropping md5+sha1 or even introducing sha-224 instead of sha-256 would help in this case. Having the fallback mechanism leaves open door for stripping+downgrade attacks anyway. Switching to an optimized binary format isn't an option? But I guess it won't be probably that much better than a good compression algorithm. O. -- Ondřej Surý ond...@sury.org Knot DNS (https://www.knot-dns.cz/) – a high-performance DNS server -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/1405536029.10176.142404101.152ab...@webmail.messagingengine.com
Let's shrink Packages.xz
Food for thought: Which fields take up most space in Packages.xz[0]? (whole file) 6662.0 KiB 100.0% SHA256 1463.8 KiB 22.0% SHA1938.9 KiB 14.1% Description-md5 794.3 KiB 11.9% MD5sum 752.4 KiB 11.3% Depends 473.0 KiB7.1% Description 463.4 KiB7.0% Filename338.9 KiB5.1% Homepage183.1 KiB2.7% Tag 176.1 KiB2.6% Size168.3 KiB2.5% Maintainer 161.3 KiB2.4% Installed-Size 144.6 KiB2.2% Package 134.5 KiB2.0% Version 73.4 KiB1.1% Suggests 68.9 KiB1.0% Recommends 63.7 KiB1.0% [0] More precisely: for each field, how much would Packages.xz shrink if we removed this (and only this) field? -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714162547.ga3...@jwilk.net
Re: Let's shrink Packages.xz
I performed a few little experiments, too. First, I tried encoding the various digests as base64 or base93, rather than hex. In each case, the file grew in size; base93 was the worst. Eliminating all the headers (e.g., replacing Package: foo with simply foo) saved 3.2%. Replacing each one with an integer after its first occurrence (e.g., since the first line is Packages:, every subsequent Packages: line starts 0: instead) saved 1.7%. Using \0 instead of : and \n saved 1.5%. Using : instead of : saved 1.8%. None of these do as much good as simply getting rid of one file hash. I know this has been talked about in the past, and as far as I recall the result, nobody could agree on a hash to drop, in light of both the expected future security of various hashes, as well as the hashes actually used by current versions of apt. (ideally you'd just go sha256, but iirc it's the md5sum that is used in practice, even today. but please find that thread, don't trust my summary) Jeff -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714172629.ga6...@unpythonic.net
Re: Let's shrink Packages.xz
Isn't a single (rather small) hash value enough for almost all users? On Mon, Jul 14, 2014 at 8:55 PM, Jakub Wilk jw...@debian.org wrote: Food for thought: Which fields take up most space in Packages.xz[0]? (whole file) 6662.0 KiB 100.0% SHA256 1463.8 KiB 22.0% SHA1938.9 KiB 14.1% Description-md5 794.3 KiB 11.9% MD5sum 752.4 KiB 11.3% Depends 473.0 KiB7.1% Description 463.4 KiB7.0% Filename338.9 KiB5.1% Homepage183.1 KiB2.7% Tag 176.1 KiB2.6% Size168.3 KiB2.5% Maintainer 161.3 KiB2.4% Installed-Size 144.6 KiB2.2% Package 134.5 KiB2.0% Version 73.4 KiB1.1% Suggests 68.9 KiB1.0% Recommends 63.7 KiB1.0% [0] More precisely: for each field, how much would Packages.xz shrink if we removed this (and only this) field? -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714162547.ga3...@jwilk.net
Re: Let's shrink Packages.xz
ابراهیم محمدی mebra...@gmail.com writes: Isn't a single (rather small) hash value enough for almost all users? Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check all the hashes and the hashes would fail independently (which is likely). The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. I'm a bit dubious the clients actually check, though. Also, it's questionable whether protecting against this theoretical possibility is a good tradeoff. If SHA-2 is broken suddenly, we have larger problems than the integrity of the Packages file, and hopefully we'd get a bit of advance warning (like we have with MD5) and be able to introduce a new hash at that point. MD5 may still be required for backward compatibility; otherwise, it's the obvious one to drop. If we were going to keep only one, we should keep SHA256, as that's the most robust from a cryptographic standpoint at this point (SHA-3 may get there, but is still too new), but obviously all the clients have to support that. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/87lhrv3jfq@windlord.stanford.edu
Re: Let's shrink Packages.xz
On Mon, 14 Jul 2014, Russ Allbery wrote: ابراهیم محمدی mebra...@gmail.com writes: Isn't a single (rather small) hash value enough for almost all users? Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check all the hashes and the hashes would fail independently (which is likely). I would like to see some supporting evidence for the claim that they will likely fail independently. In particular given that they are all the same construct. The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this much harder into doubt. But I can't find those references, alas. I think just having a single, strong hash in Packages ought to be sufficient. Cheers, -- | .''`. ** Debian ** Peter Palfrader | : :' : The universal http://www.palfrader.org/ | `. `' Operating System | `-http://www.debian.org/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714182533.gk...@anguilla.noreply.org
Re: Let's shrink Packages.xz
Peter Palfrader wea...@debian.org writes: On Mon, 14 Jul 2014, Russ Allbery wrote: Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check all the hashes and the hashes would fail independently (which is likely). I would like to see some supporting evidence for the claim that they will likely fail independently. In particular given that they are all the same construct. SHA-1 and SHA-2 are relatively independent constructions, so it seems intuitive to me that achieving a hash collision simultaneously with both constructions would be harder than finding a hash collision for either of them independently. I admit that this argument is much stronger for SHA-2 and SHA-3, where there is no commonality at all between the algorithms (that I know of). That said... I think just having a single, strong hash in Packages ought to be sufficient. ...I agree with this. I think that, even if this approach works and all the clients check, the level of additional security that we get from having multiple hashes isn't worth the overhead. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/871ttn3hdw@windlord.stanford.edu
Re: Let's shrink Packages.xz
Jeff Epler wrote: First, I tried encoding the various digests as base64 or base93, rather than hex. In each case, the file grew in size; base93 was the worst. Are you sure you performed this calculation correctly? ASCII hex encodes 4 bits as 8 (or 7. but really 8.), as each ASCII character is a nibble of the digest; that's a 100% increase (factor of 2) over the bare digest (or a raw mapping of 8 bits of digest to an 8 bit character set). base64 encodes 6 bits as 8; that should only be a 33.3% increase (factor of 1.333). I've never heard of base93, but I found a reference that I think describes what you mean [0]. This should provide even better efficiency over base64, as should any binary-to-ascii mapping of higher radix. Perfect segue... What are we looking for in an encoding? I'm guessing this needs to be printable, suitable for human consumption (or at least copy/paste / consumption via text editor), and 7-bit compat? Is this even up for debate? The community at large (computer users), Debian included, seems to have standardized on message digests as ASCII hex... [0] http://kiwigis.blogspot.com/2013/09/base-93-integer-shortening-in-c.html -- Nate -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/53c42996.4090...@gmail.com
Re: Let's shrink Packages.xz
* Peter Palfrader wea...@debian.org, 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this much harder into doubt. But I can't find those references, alas. You might have had this paper in mind: https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf Quoting §4: “If F and G are good iterated hash functions with no attack better than the generic birthday paradox attack, we claim that the hash function F||G obtained by concatenating F and G is not really more secure that F or G by itself.” -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714191714.ga4...@jwilk.net
Re: Let's shrink Packages.xz
Jakub Wilk jw...@debian.org writes: You might have had this paper in mind: https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf Quoting §4: “If F and G are good iterated hash functions with no attack better than the generic birthday paradox attack, we claim that the hash function F||G obtained by concatenating F and G is not really more secure that F or G by itself.” Ah, if that's the case, that's an argument about a different use case. I wouldn't expect just adding more hashes to add more security when the hashes haven't been broken. SHA-256 by itself provides more than enough security if one assumes that it has ideal properties. The (theoretical) security benefit argued for here is precisely the case where the hash functions *do* have attacks better than the generic birthday paradox attack (that we possibly don't know about yet). It's basically a defense in depth argument, coupled with the argument that the special construction of a file to create a collision for one hash function may be incompatible with the special construction of a file required to create a collision with the other hash function. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/87oawr2194@windlord.stanford.edu
Re: Let's shrink Packages.xz
On Mon, 14 Jul 2014, Nathan Schulte wrote: ASCII hex encodes 4 bits as 8 (or 7. but really 8.), as each ASCII character is a nibble of the digest; that's a 100% increase (factor of 2) over the bare digest (or a raw mapping of 8 bits of digest to an 8 bit character set). The figures given refer to changes to the size of the compressed text, not to the plaintext. I've never heard of base93, but I found a reference that I think describes what you mean [0]. This should provide even better efficiency over base64, as should any binary-to-ascii mapping of higher radix. Perfect segue... It can have lower spece efficiency in compressed text (in fact, that's exactly what happened), even if it *is* more efficient in the plain text. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714195507.ga5...@khazad-dum.debian.net
Re: Let's shrink Packages.xz
On Mon, 14 Jul 2014, Jakub Wilk wrote: * Peter Palfrader wea...@debian.org, 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this much harder into doubt. But I can't find those references, alas. You might have had this paper in mind: https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf Quoting §4: “If F and G are good iterated hash functions with no attack better than the generic birthday paradox attack, we claim that the hash function F||G obtained by concatenating F and G is not really more secure that F or G by itself.” We don't want F|G to be more secure than F or G by itself. We want it to be at least as secure as the stronger of F or G. Which means it continues being secure if one of G or F, but not both, is compromised. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140714195728.gb5...@khazad-dum.debian.net
Re: Let's shrink Packages.xz
On 14 July 2014 20:57, Henrique de Moraes Holschuh h...@debian.org wrote: On Mon, 14 Jul 2014, Jakub Wilk wrote: * Peter Palfrader wea...@debian.org, 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this much harder into doubt. But I can't find those references, alas. You might have had this paper in mind: https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf Quoting §4: “If F and G are good iterated hash functions with no attack better than the generic birthday paradox attack, we claim that the hash function F||G obtained by concatenating F and G is not really more secure that F or G by itself.” We don't want F|G to be more secure than F or G by itself. We want it to be at least as secure as the stronger of F or G. Which means it continues being secure if one of G or F, but not both, is compromised. Huh, I'm not quite sure that multiple hashes actually gain us anything at all in terms of compromisation, since ultimately all our archive metadata is protected by a single hash only. Whilst replacing individual files simultaneously matching multiple hash algorithms, is an interesting problem. It's much more interesting to match SHA256 of Release file such that Release.gpg validates, then you can replace /all/ files with valid checksums across the board. Or otherwise generate/break the archive signing key. So RSA 4096 key and SHA256 signature is what ultimately secures our current archive, all other hashes in the Packages file are there merely to assert that it's the right binary that is signed and that one downloaded it correctly. Thus can we please drop MD5 SHA1 hashes? Anything that can't validate SHA256, can't validate Release.gpg/InRelease and is thus insecure. $ gpg -v --verify Release.gpg Release gpg: armor header: Version: GnuPG v1.4.12 (GNU/Linux) gpg: Signature made Mon 14 Jul 2014 10:02:39 PM BST using RSA key ID 46925553 gpg: using PGP trust model gpg: Good signature from Debian Archive Automatic Signing Key (7.0/wheezy) ftpmas...@debian.org gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: A1BD 8E9D 78F7 FE5C 3E65 D8AF 8B48 AD62 4692 5553 gpg: binary signature, digest algorithm SHA256 -- Regards, Dimitri. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/canbhluhxyly1rn+fwp12i3zoxhfcenvdoakxn+kaidweh3r...@mail.gmail.com
Re: Let's shrink Packages.xz
Dimitri John Ledkov x...@debian.org writes: Huh, I'm not quite sure that multiple hashes actually gain us anything at all in terms of compromisation, since ultimately all our archive metadata is protected by a single hash only. Whilst replacing individual files simultaneously matching multiple hash algorithms, is an interesting problem. It's much more interesting to match SHA256 of Release file such that Release.gpg validates, then you can replace /all/ files with valid checksums across the board. Or otherwise generate/break the archive signing key. Ah, yes, excellent point. So yes, other than backward compatibility, I see no reason to keep any hash other than the hash we're also using for the GnuPG signature. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/87y4vvy0wa@windlord.stanford.edu