Hi, It might be good to keep a private hash in parallel with the MD5 public hash.
cheers, Jamie ----- Original Message ----- From: [email protected] Date: Sunday, September 18, 2011 3:12 pm Subject: Wikitech-l Digest, Vol 98, Issue 30 To: [email protected] > Send Wikitech-l mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikitech-l digest..." > > > Today's Topics: > > 1. Re: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Anthony) > 2. Fwd: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Anthony) > 3. Re: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Chad) > 4. Re: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Anthony) > 5. Re: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Chad) > 6. Re: Fwd: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Roan Kattouw) > 7. Re: Adding MD5 / SHA1 column to revision > table (discussing r94289) (Platonides) > 8. Re: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Anthony) > 9. Re: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Anthony) > 10. Re: Fwd: Adding MD5 / SHA1 column to revision table > (discussing r94289) (Anthony) > > > ----------------------------------------------------------------- > ----- > > Message: 1 > Date: Sun, 18 Sep 2011 16:57:22 -0400 > From: Anthony <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CAPreJLR8Rhut8gdqizxmDuo5-CAd3Yi_S- > [email protected]>Content-Type: text/plain; > charset=ISO-8859-7 > > On Sun, Sep 18, 2011 at 2:33 AM, Ariel T. Glenn > <[email protected]> wrote: > > ???? 17-09-2011, ????? ???, ??? ??? 22:55 -0700, ?/? Robert Rohde > > ??????: > >> On Sat, Sep 17, 2011 at 4:56 PM, Anthony > <[email protected]> wrote: > > > > <snip> > > > >> > For offline analyses, there's no need to change the online > database tables. > >> > >> Need? ?That's debatable, but one of the major motivators is > the desire > >> to have hash values in database dumps (both for revert checks > and for > >> checksums on correct data import / export). ?Both of those are > >> "offline" uses, but it is beneficial to have that information > >> precomputed and stored rather than frequently regenerated. > > > > If we don't have it in the online database tables, this > defeats the > > purpose of having the value in there at all, for the purpose of > > generating the XML dumps. > > > > Recall that the dumps are generated in two passes; in the > first pass we > > retrieve from the db and record all of the metadata about > revisions, and > > in the second (time-comsuming) pass we re-use the text of the > revisions> from a previous dump file if the text is in there. > ?We want to compare > > the has of that text against what the online database says the > hash is; > > if they don't match, we want to fetch the live copy. > > Well, this is exactly the type of use in which collisions do matter. > Do you really want the dump to not record the correct data when some > miscreant creates an intentional collision? > > > > ------------------------------ > > Message: 2 > Date: Sun, 18 Sep 2011 17:00:32 -0400 > From: Anthony <[email protected]> > Subject: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CAPreJLQr4tTyBkrhwc5Lnf6Xw93eYDv02jAtCNtzRp910_CZ- > [email protected]>Content-Type: text/plain; charset=ISO-8859-1 > > On Sun, Sep 18, 2011 at 1:55 AM, Robert Rohde > <[email protected]> wrote: > > If collision attacks really matter we should use SHA-1. > > If collision attacks really matter you should use, at least, SHA- > 256, no? > > > However, do > > any of the proposed use cases care about whether someone might > > intentionally inject a collision? ?In the proposed uses I've > looked at > > it, it seems irrelevant. ?The intentional collision will get flagged > > as a revert and the text leading to that collision would be > discarded.> ?How is that a bad thing? > > Well, what if the checksum of the initial page hasn't been calculated > yet? ?Then some miscreant sets the page to spam which collides, and > then the spam gets reverted. ?The good page would be the one > that gets > thrown out. > > Maybe that's not feasible. ?Maybe it is. ?Either way, I'd feel very > uncomfortable about the fact that someday someone might decide > to use > the checksums in some way in which collisions would matter. > > Now I don't know how important the CPU differences in > calculating the > two versions would be. ?If they're significant enough, then > fine, use > MD5, but make sure there are warnings all over the place about its > use. > > (As another possibility, what if someone writes a bot to detect > certain reverts? ?I can see spammers/vandals having a field day with > this sort of thing.) > > >> For offline analyses, there's no need to change the online > database tables. > > > > Need? ?That's debatable, but one of the major motivators is > the desire > > to have hash values in database dumps (both for revert checks > and for > > checksums on correct data import / export). ?Both of those are > > "offline" uses, but it is beneficial to have that information > > precomputed and stored rather than frequently regenerated. > > Why not in a separate file? ?There's no need to get permission from > anyone or mess with the schema to generate a file with revision ids > and checksums. ?If WMF won't host it at the regular dump location > (which I can't see why they wouldn't), you could host it at > archive.org. > > > > ------------------------------ > > Message: 3 > Date: Sun, 18 Sep 2011 17:30:52 -0400 > From: Chad <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CADn73rM9R26GnyXGAFEC6_8Jb3AbT6ML0sVYyR- > [email protected]>Content-Type: text/plain; charset=UTF-8 > > On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson > <[email protected]> wrote: > > It is meaningless to talk about cryptography without a threat > model, just as Robert says. Is anybody actually attacking us? Or > are we worried about accidental collisions? > > > > I believe it began as accidental collisions, then everyone promptly > put on their tinfoil hats and started talking about a hypothetical > vandal who has the time and desire to generate hash collisions. > > -Chad > > > > ------------------------------ > > Message: 4 > Date: Sun, 18 Sep 2011 17:47:51 -0400 > From: Anthony <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CAPreJLSmMi4qqZLZmY3LzOmEO-8JgqjpJgmLfh- > [email protected]>Content-Type: text/plain; charset=ISO- > 8859-1 > > On Sun, Sep 18, 2011 at 5:30 PM, Chad > <[email protected]> wrote: > > On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson > > <[email protected]> wrote: > >> It is meaningless to talk about cryptography without a threat > model, just as Robert says. Is anybody actually attacking us? Or > are we worried about accidental collisions? > >> > > > > I believe it began as accidental collisions, then everyone promptly > > put on their tinfoil hats and started talking about a hypothetical > > vandal who has the time and desire to generate hash collisions. > > Having run a wiki which I eventually abandoned due to various "Grawp > attacks", I can assure you that there's nothing hypothetical > about it. > > > > ------------------------------ > > Message: 5 > Date: Sun, 18 Sep 2011 17:50:12 -0400 > From: Chad <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <cadn73rmgkgspg4nbvb34ekfkp99d5lwcnualqlrputa45ya...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Sun, Sep 18, 2011 at 5:47 PM, Anthony <[email protected]> > wrote:> On Sun, Sep 18, 2011 at 5:30 PM, Chad > <[email protected]> wrote: > >> On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson > >> <[email protected]> wrote: > >>> It is meaningless to talk about cryptography without a > threat model, just as Robert says. Is anybody actually attacking > us? Or are we worried about accidental collisions? > >>> > >> > >> I believe it began as accidental collisions, then everyone promptly > >> put on their tinfoil hats and started talking about a hypothetical > >> vandal who has the time and desire to generate hash collisions. > > > > Having run a wiki which I eventually abandoned due to various "Grawp > > attacks", I can assure you that there's nothing hypothetical > about it. > > > > For those of us who do not know...what the heck is a Grawp attack? > Does it involve generating hash collisions? > > -Chad > > > > ------------------------------ > > Message: 6 > Date: Mon, 19 Sep 2011 00:00:11 +0200 > From: Roan Kattouw <[email protected]> > Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision > table (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CALoQHwEOyjQhzRKJM_efPCz7OrG=gbubz7wmgyqmzrwbgze...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Sun, Sep 18, 2011 at 11:00 PM, Anthony > <[email protected]> wrote: > > Now I don't know how important the CPU differences in > calculating the > > two versions would be. ?If they're significant enough, then > fine, use > > MD5, but make sure there are warnings all over the place about its > > use. > > > I ran some benchmarks on one of the WMF machines. The input I > used is > a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to > upload to Commons recently. For each benchmark, I hashed the > file 25 > times and computed the average running time. > > MD5: 393 ms > SHA-1: 404 ms > SHA-256: 1281 ms > > Note that the input size is many times higher than $wgMaxArticleSize, > which is set to 2000 KB at WMF. For historical reasons, we have some > revisions in our history that are larger; Ariel would be able to tell > you how large, but I believe nothing in there is larger than 10 > MB. So > I decided to run the numbers for more realistic sizes as well, using > the first 2 MB and 10 MB, respectively, of my OGV file. > > For 2 MB (averages of 1000 runs): > > MD5: 5.66 ms > SHA-1: 5.85 ms > SHA-256: 18.56 ms > > For 10 MB (averages of 200 runs): > > MD5: 28.6 ms > SHA-1: 29.47 ms > SHA-256: 93.49 ms > > So yes, SHA-256 is a few times (just over 3x) more expensive to > compute than SHA-1, which in turn is only a few percent slower than > MD5. However, on the largest possible size we allow for new revisions > it takes < 20ms. It sounds like that's an acceptable worst > case for > on-the-fly population, since saves and parses are slow anyway, > especially for 2 MB of wikitext. The 10 MB case is only relevant for > backfilling, which we could do from a maintenance script, and > < 100ms > is definitely acceptable there. > > Roan Kattouw (Catrope) > > > > ------------------------------ > > Message: 7 > Date: Mon, 19 Sep 2011 00:07:32 +0200 > From: Platonides <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Chad wrote: > > For those of us who do not know...what the heck is a Grawp attack? > > Does it involve generating hash collisions? > > > > -Chad > > It's the name of a wikipedia vandal. > http://en.wikipedia.org/wiki/User:Grawp > > > > > > ------------------------------ > > Message: 8 > Date: Sun, 18 Sep 2011 18:01:47 -0400 > From: Anthony <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CAPreJLRWeY5EhaxZb+wACoX4r5PpenW7fPMiEWrkiwNb=xa...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Sun, Sep 18, 2011 at 5:50 PM, Chad > <[email protected]> wrote: > > On Sun, Sep 18, 2011 at 5:47 PM, Anthony > <[email protected]> wrote: > >> On Sun, Sep 18, 2011 at 5:30 PM, Chad > <[email protected]> wrote: > >>> On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson > >>> <[email protected]> wrote: > >>>> It is meaningless to talk about cryptography without a > threat model, just as Robert says. Is anybody actually attacking > us? Or are we worried about accidental collisions? > >>>> > >>> > >>> I believe it began as accidental collisions, then everyone > promptly>>> put on their tinfoil hats and started talking about > a hypothetical > >>> vandal who has the time and desire to generate hash collisions. > >> > >> Having run a wiki which I eventually abandoned due to various > "Grawp>> attacks", I can assure you that there's nothing > hypothetical about it. > >> > > > > For those of us who do not know...what the heck is a Grawp attack? > > Does it involve generating hash collisions? > > It does not involve generating hash collisions, but it involves > finding various bugs in mediawiki and using them to vandalise, often > by injecting javascript. The best description I could find > was at > Encyclopedia Dramatica, which seems to be taken down (there's a cache > if you do a google search for "grawp wikipedia"). There's > also a > description at http://en.wikipedia.org/wiki/User:Grawp , which does > not do justice to the "mad hacker skillz" of this individual and his > intent on finding bugs in mediawiki and exploiting them. > > If you did something as lame as relying on no one generating an MD5 > collision (*), it would happen. If you use SHA-1, it may > or may not > happen, depending on how quickly computers get faster, and how many > further attacks are made on the algorithm. If you use SHA- > 256 (**), > it's significantly less likely to happen, and you'll probably > have a > warning in the form of an announcement on Slashdot that SHA-256 has > been broken, before it happens. > > (*) Something which I have done myself on my home computer in a couple > minutes, and apparently now can be done in a couple seconds. > > (**) Which, incidentally, is possibly the single most secure > hash for > Wikimedia to use at the current time. SHA-512 is > significantly more > "broken" than SHA-256, and the more theoretically secure hashes have > received much less scrutiny than SHA-256. If you want to > be more > secure than SHA-256, you should combine SHA-256 with some other > hashing algorithm.) > > > > ------------------------------ > > Message: 9 > Date: Sun, 18 Sep 2011 18:06:21 -0400 > From: Anthony <[email protected]> > Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table > (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CAPreJLQ0YUq9j8zr52Lme2eo=ijyjN6x6CssF=xcfcsto6y...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > On Sun, Sep 18, 2011 at 6:01 PM, Anthony <[email protected]> > wrote:> There's also a > > description at http://en.wikipedia.org/wiki/User:Grawp , which does > > not do justice to the "mad hacker skillz" of this individual > and his > > intent on finding bugs in mediawiki and exploiting them. > > (and/or the Grawp copycats - personally I don't know if it was "Grawp" > himself or a copycat that attacked my wiki) > > > > ------------------------------ > > Message: 10 > Date: Sun, 18 Sep 2011 18:12:34 -0400 > From: Anthony <[email protected]> > Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision > table (discussing r94289) > To: Wikimedia developers <[email protected]> > Message-ID: > <CAPreJLR7gd=jNMnrZ- > [email protected]>Content-Type: > text/plain; charset=ISO-8859-1 > > On Sun, Sep 18, 2011 at 6:00 PM, Roan Kattouw > <[email protected]> wrote: > > On Sun, Sep 18, 2011 at 11:00 PM, Anthony > <[email protected]> wrote: > >> Now I don't know how important the CPU differences in > calculating the > >> two versions would be. ?If they're significant enough, then > fine, use > >> MD5, but make sure there are warnings all over the place > about its > >> use. > >> > > I ran some benchmarks on one of the WMF machines. The input I > used is > > a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to > > upload to Commons recently. For each benchmark, I hashed the > file 25 > > times and computed the average running time. > > > > MD5: 393 ms > > SHA-1: 404 ms > > SHA-256: 1281 ms > > Did you try any of the non-secure hash functions? If > you're going to > go with MD5, might as well go with the significantly faster CRC-64. > > If you're just using it to detect reverts, then you can run the > CRC-64 > check first, and then confirm with a check of the entire message. > > > > ------------------------------ > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > End of Wikitech-l Digest, Vol 98, Issue 30 > ****************************************** > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
