Hi,

It might be good to keep a private hash in parallel with the MD5 public hash.

cheers,
Jamie


----- Original Message -----
From: [email protected]
Date: Sunday, September 18, 2011 3:12 pm
Subject: Wikitech-l Digest, Vol 98, Issue 30
To: [email protected]

> Send Wikitech-l mailing list submissions to
>       [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> or, via email, send a message with subject or body 'help' to
>       [email protected]
> 
> You can reach the person managing the list at
>       [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikitech-l digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Anthony)
>    2. Fwd: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Anthony)
>    3. Re: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Chad)
>    4. Re: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Anthony)
>    5. Re: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Chad)
>    6. Re: Fwd: Adding MD5 / SHA1 column to revision table
>       (discussing r94289) (Roan Kattouw)
>    7. Re: Adding MD5 / SHA1 column to revision 
> table (discussing      r94289) (Platonides)
>    8. Re: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Anthony)
>    9. Re: Adding MD5 / SHA1 column to revision table 
> (discussing      r94289) (Anthony)
>   10. Re: Fwd: Adding MD5 / SHA1 column to revision table
>       (discussing r94289) (Anthony)
> 
> 
> -----------------------------------------------------------------
> -----
> 
> Message: 1
> Date: Sun, 18 Sep 2011 16:57:22 -0400
> From: Anthony <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CAPreJLR8Rhut8gdqizxmDuo5-CAd3Yi_S-
> [email protected]>Content-Type: text/plain; 
> charset=ISO-8859-7
> 
> On Sun, Sep 18, 2011 at 2:33 AM, Ariel T. Glenn 
> <[email protected]> wrote:
> > ???? 17-09-2011, ????? ???, ??? ??? 22:55 -0700, ?/? Robert Rohde
> > ??????:
> >> On Sat, Sep 17, 2011 at 4:56 PM, Anthony 
> <[email protected]> wrote:
> >
> > <snip>
> >
> >> > For offline analyses, there's no need to change the online 
> database tables.
> >>
> >> Need? ?That's debatable, but one of the major motivators is 
> the desire
> >> to have hash values in database dumps (both for revert checks 
> and for
> >> checksums on correct data import / export). ?Both of those are
> >> "offline" uses, but it is beneficial to have that information
> >> precomputed and stored rather than frequently regenerated.
> >
> > If we don't have it in the online database tables, this 
> defeats the
> > purpose of having the value in there at all, for the purpose of
> > generating the XML dumps.
> >
> > Recall that the dumps are generated in two passes; in the 
> first pass we
> > retrieve from the db and record all of the metadata about 
> revisions, and
> > in the second (time-comsuming) pass we re-use the text of the 
> revisions> from a previous dump file if the text is in there. 
> ?We want to compare
> > the has of that text against what the online database says the 
> hash is;
> > if they don't match, we want to fetch the live copy.
> 
> Well, this is exactly the type of use in which collisions do matter.
> Do you really want the dump to not record the correct data when some
> miscreant creates an intentional collision?
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 18 Sep 2011 17:00:32 -0400
> From: Anthony <[email protected]>
> Subject: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CAPreJLQr4tTyBkrhwc5Lnf6Xw93eYDv02jAtCNtzRp910_CZ-
> [email protected]>Content-Type: text/plain; charset=ISO-8859-1
> 
> On Sun, Sep 18, 2011 at 1:55 AM, Robert Rohde 
> <[email protected]> wrote:
> > If collision attacks really matter we should use SHA-1.
> 
> If collision attacks really matter you should use, at least, SHA-
> 256, no?
> 
> > However, do
> > any of the proposed use cases care about whether someone might
> > intentionally inject a collision? ?In the proposed uses I've 
> looked at
> > it, it seems irrelevant. ?The intentional collision will get flagged
> > as a revert and the text leading to that collision would be 
> discarded.> ?How is that a bad thing?
> 
> Well, what if the checksum of the initial page hasn't been calculated
> yet? ?Then some miscreant sets the page to spam which collides, and
> then the spam gets reverted. ?The good page would be the one 
> that gets
> thrown out.
> 
> Maybe that's not feasible. ?Maybe it is. ?Either way, I'd feel very
> uncomfortable about the fact that someday someone might decide 
> to use
> the checksums in some way in which collisions would matter.
> 
> Now I don't know how important the CPU differences in 
> calculating the
> two versions would be. ?If they're significant enough, then 
> fine, use
> MD5, but make sure there are warnings all over the place about its
> use.
> 
> (As another possibility, what if someone writes a bot to detect
> certain reverts? ?I can see spammers/vandals having a field day with
> this sort of thing.)
> 
> >> For offline analyses, there's no need to change the online 
> database tables.
> >
> > Need? ?That's debatable, but one of the major motivators is 
> the desire
> > to have hash values in database dumps (both for revert checks 
> and for
> > checksums on correct data import / export). ?Both of those are
> > "offline" uses, but it is beneficial to have that information
> > precomputed and stored rather than frequently regenerated.
> 
> Why not in a separate file? ?There's no need to get permission from
> anyone or mess with the schema to generate a file with revision ids
> and checksums. ?If WMF won't host it at the regular dump location
> (which I can't see why they wouldn't), you could host it at
> archive.org.
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Sun, 18 Sep 2011 17:30:52 -0400
> From: Chad <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CADn73rM9R26GnyXGAFEC6_8Jb3AbT6ML0sVYyR-
> [email protected]>Content-Type: text/plain; charset=UTF-8
> 
> On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson
> <[email protected]> wrote:
> > It is meaningless to talk about cryptography without a threat 
> model, just as Robert says. Is anybody actually attacking us? Or 
> are we worried about accidental collisions?
> >
> 
> I believe it began as accidental collisions, then everyone promptly
> put on their tinfoil hats and started talking about a hypothetical
> vandal who has the time and desire to generate hash collisions.
> 
> -Chad
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Sun, 18 Sep 2011 17:47:51 -0400
> From: Anthony <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CAPreJLSmMi4qqZLZmY3LzOmEO-8JgqjpJgmLfh-
> [email protected]>Content-Type: text/plain; charset=ISO-
> 8859-1
> 
> On Sun, Sep 18, 2011 at 5:30 PM, Chad 
> <[email protected]> wrote:
> > On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson
> > <[email protected]> wrote:
> >> It is meaningless to talk about cryptography without a threat 
> model, just as Robert says. Is anybody actually attacking us? Or 
> are we worried about accidental collisions?
> >>
> >
> > I believe it began as accidental collisions, then everyone promptly
> > put on their tinfoil hats and started talking about a hypothetical
> > vandal who has the time and desire to generate hash collisions.
> 
> Having run a wiki which I eventually abandoned due to various "Grawp
> attacks", I can assure you that there's nothing hypothetical 
> about it.
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Sun, 18 Sep 2011 17:50:12 -0400
> From: Chad <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <cadn73rmgkgspg4nbvb34ekfkp99d5lwcnualqlrputa45ya...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Sun, Sep 18, 2011 at 5:47 PM, Anthony <[email protected]> 
> wrote:> On Sun, Sep 18, 2011 at 5:30 PM, Chad 
> <[email protected]> wrote:
> >> On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson
> >> <[email protected]> wrote:
> >>> It is meaningless to talk about cryptography without a 
> threat model, just as Robert says. Is anybody actually attacking 
> us? Or are we worried about accidental collisions?
> >>>
> >>
> >> I believe it began as accidental collisions, then everyone promptly
> >> put on their tinfoil hats and started talking about a hypothetical
> >> vandal who has the time and desire to generate hash collisions.
> >
> > Having run a wiki which I eventually abandoned due to various "Grawp
> > attacks", I can assure you that there's nothing hypothetical 
> about it.
> >
> 
> For those of us who do not know...what the heck is a Grawp attack?
> Does it involve generating hash collisions?
> 
> -Chad
> 
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Mon, 19 Sep 2011 00:00:11 +0200
> From: Roan Kattouw <[email protected]>
> Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision
>       table (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CALoQHwEOyjQhzRKJM_efPCz7OrG=gbubz7wmgyqmzrwbgze...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Sun, Sep 18, 2011 at 11:00 PM, Anthony 
> <[email protected]> wrote:
> > Now I don't know how important the CPU differences in 
> calculating the
> > two versions would be. ?If they're significant enough, then 
> fine, use
> > MD5, but make sure there are warnings all over the place about its
> > use.
> >
> I ran some benchmarks on one of the WMF machines. The input I 
> used is
> a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to
> upload to Commons recently. For each benchmark, I hashed the 
> file 25
> times and computed the average running time.
> 
> MD5: 393 ms
> SHA-1: 404 ms
> SHA-256: 1281 ms
> 
> Note that the input size is many times higher than $wgMaxArticleSize,
> which is set to 2000 KB at WMF. For historical reasons, we have some
> revisions in our history that are larger; Ariel would be able to tell
> you how large, but I believe nothing in there is larger than 10 
> MB. So
> I decided to run the numbers for more realistic sizes as well, using
> the first 2 MB and 10 MB, respectively, of my OGV file.
> 
> For 2 MB (averages of 1000 runs):
> 
> MD5: 5.66 ms
> SHA-1: 5.85 ms
> SHA-256: 18.56 ms
> 
> For 10 MB (averages of 200 runs):
> 
> MD5: 28.6 ms
> SHA-1: 29.47 ms
> SHA-256: 93.49 ms
> 
> So yes, SHA-256 is a few times (just over 3x) more expensive to
> compute than SHA-1, which in turn is only a few percent slower than
> MD5. However, on the largest possible size we allow for new revisions
> it takes < 20ms. It sounds like that's an acceptable worst 
> case for
> on-the-fly population, since saves and parses are slow anyway,
> especially for 2 MB of wikitext. The 10 MB case is only relevant for
> backfilling, which we could do from a maintenance script, and 
> < 100ms
> is definitely acceptable there.
> 
> Roan Kattouw (Catrope)
> 
> 
> 
> ------------------------------
> 
> Message: 7
> Date: Mon, 19 Sep 2011 00:07:32 +0200
> From: Platonides <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Chad wrote:
> > For those of us who do not know...what the heck is a Grawp attack?
> > Does it involve generating hash collisions?
> >
> > -Chad
> 
> It's the name of a wikipedia vandal.
> http://en.wikipedia.org/wiki/User:Grawp
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 8
> Date: Sun, 18 Sep 2011 18:01:47 -0400
> From: Anthony <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CAPreJLRWeY5EhaxZb+wACoX4r5PpenW7fPMiEWrkiwNb=xa...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Sun, Sep 18, 2011 at 5:50 PM, Chad 
> <[email protected]> wrote:
> > On Sun, Sep 18, 2011 at 5:47 PM, Anthony 
> <[email protected]> wrote:
> >> On Sun, Sep 18, 2011 at 5:30 PM, Chad 
> <[email protected]> wrote:
> >>> On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson
> >>> <[email protected]> wrote:
> >>>> It is meaningless to talk about cryptography without a 
> threat model, just as Robert says. Is anybody actually attacking 
> us? Or are we worried about accidental collisions?
> >>>>
> >>>
> >>> I believe it began as accidental collisions, then everyone 
> promptly>>> put on their tinfoil hats and started talking about 
> a hypothetical
> >>> vandal who has the time and desire to generate hash collisions.
> >>
> >> Having run a wiki which I eventually abandoned due to various 
> "Grawp>> attacks", I can assure you that there's nothing 
> hypothetical about it.
> >>
> >
> > For those of us who do not know...what the heck is a Grawp attack?
> > Does it involve generating hash collisions?
> 
> It does not involve generating hash collisions, but it involves
> finding various bugs in mediawiki and using them to vandalise, often
> by injecting javascript.  The best description I could find 
> was at
> Encyclopedia Dramatica, which seems to be taken down (there's a cache
> if you do a google search for "grawp wikipedia").  There's 
> also a
> description at http://en.wikipedia.org/wiki/User:Grawp , which does
> not do justice to the "mad hacker skillz" of this individual and his
> intent on finding bugs in mediawiki and exploiting them.
> 
> If you did something as lame as relying on no one generating an MD5
> collision (*), it would happen.  If you use SHA-1, it may 
> or may not
> happen, depending on how quickly computers get faster, and how many
> further attacks are made on the algorithm.  If you use SHA-
> 256 (**),
> it's significantly less likely to happen, and you'll probably 
> have a
> warning in the form of an announcement on Slashdot that SHA-256 has
> been broken, before it happens.
> 
> (*) Something which I have done myself on my home computer in a couple
> minutes, and apparently now can be done in a couple seconds.
> 
> (**) Which, incidentally, is possibly the single most secure 
> hash for
> Wikimedia to use at the current time.  SHA-512 is 
> significantly more
> "broken" than SHA-256, and the more theoretically secure hashes have
> received much less scrutiny than SHA-256.  If you want to 
> be more
> secure than SHA-256, you should combine SHA-256 with some other
> hashing algorithm.)
> 
> 
> 
> ------------------------------
> 
> Message: 9
> Date: Sun, 18 Sep 2011 18:06:21 -0400
> From: Anthony <[email protected]>
> Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table
>       (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CAPreJLQ0YUq9j8zr52Lme2eo=ijyjN6x6CssF=xcfcsto6y...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Sun, Sep 18, 2011 at 6:01 PM, Anthony <[email protected]> 
> wrote:> There's also a
> > description at http://en.wikipedia.org/wiki/User:Grawp , which does
> > not do justice to the "mad hacker skillz" of this individual 
> and his
> > intent on finding bugs in mediawiki and exploiting them.
> 
> (and/or the Grawp copycats - personally I don't know if it was "Grawp"
> himself or a copycat that attacked my wiki)
> 
> 
> 
> ------------------------------
> 
> Message: 10
> Date: Sun, 18 Sep 2011 18:12:34 -0400
> From: Anthony <[email protected]>
> Subject: Re: [Wikitech-l] Fwd: Adding MD5 / SHA1 column to revision
>       table (discussing r94289)
> To: Wikimedia developers <[email protected]>
> Message-ID:
>       <CAPreJLR7gd=jNMnrZ-
> [email protected]>Content-Type: 
> text/plain; charset=ISO-8859-1
> 
> On Sun, Sep 18, 2011 at 6:00 PM, Roan Kattouw 
> <[email protected]> wrote:
> > On Sun, Sep 18, 2011 at 11:00 PM, Anthony 
> <[email protected]> wrote:
> >> Now I don't know how important the CPU differences in 
> calculating the
> >> two versions would be. ?If they're significant enough, then 
> fine, use
> >> MD5, but make sure there are warnings all over the place 
> about its
> >> use.
> >>
> > I ran some benchmarks on one of the WMF machines. The input I 
> used is
> > a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to
> > upload to Commons recently. For each benchmark, I hashed the 
> file 25
> > times and computed the average running time.
> >
> > MD5: 393 ms
> > SHA-1: 404 ms
> > SHA-256: 1281 ms
> 
> Did you try any of the non-secure hash functions?  If 
> you're going to
> go with MD5, might as well go with the significantly faster CRC-64.
> 
> If you're just using it to detect reverts, then you can run the 
> CRC-64
> check first, and then confirm with a check of the entire message.
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 
> 
> End of Wikitech-l Digest, Vol 98, Issue 30
> ******************************************
> 
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to