Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-11-27 Thread Brion Vibber
So... this seems to have snuck back in a month ago: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/101021 https://bugzilla.wikimedia.org/show_bug.cgi?id=21860 Have we resolved the deployment questions on how to actually do the change? Just want to make sure ops has plenty of warning

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-11-27 Thread Tim Starling
On 28/11/11 08:29, Brion Vibber wrote: So... this seems to have snuck back in a month ago: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/101021 https://bugzilla.wikimedia.org/show_bug.cgi?id=21860 I don't think it really snuck, Rob has been talking about it for a while, see e.g.

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-11-27 Thread John Erling Blad
I have no idea about the schema changes, but to choose a digest for detection of identity reverts is pretty simple. The really difficult part is to choose a locally sensitive hash or fingerprint that works for very similar revisions with a lot of content. I would propose that the digest is stored

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Ariel T. Glenn
Στις 17-09-2011, ημέρα Σαβ, και ώρα 22:55 -0700, ο/η Robert Rohde έγραψε: On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikim...@inbox.org wrote: snip For offline analyses, there's no need to change the online database tables. Need? That's debatable, but one of the major motivators is the

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Russell N. Nelson - rnnelson
developers wikitech-l@lists.wikimedia.org Sent: Sun, Sep 18, 2011 05:56:15 GMT+00:00 Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289) On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikim...@inbox.org wrote: On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde raro...@gmail.com

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Ilmari Karonen
On 09/18/2011 08:55 AM, Robert Rohde wrote: people find ways to improve the attacks on SHA-1. (The existing attacks usually require the ability to feed arbitrary binary strings into the hash function. Given that both browsers and Mediawiki will tend to reject binary data placed in an edit

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Russell N. Nelson - rnnelson
What is the threat? Sent from my Verizon Wireless Phone -Original message- From: Ilmari Karonen nos...@vyznev.net To: Wikimedia developers wikitech-l@lists.wikimedia.org Sent: Sun, Sep 18, 2011 20:20:34 GMT+00:00 Subject: Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Anthony
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnel...@clarkson.edu wrote: It is meaningless to talk about cryptography without a threat model, just as Robert says. Is anybody actually attacking us? You mean, like Grawp? ___

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Anthony
On Sun, Sep 18, 2011 at 2:33 AM, Ariel T. Glenn ar...@wikimedia.org wrote: Στις 17-09-2011, ημέρα Σαβ, και ώρα 22:55 -0700, ο/η Robert Rohde έγραψε: On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikim...@inbox.org wrote: snip For offline analyses, there's no need to change the online database

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Chad
On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnel...@clarkson.edu wrote: It is meaningless to talk about cryptography without a threat model, just as Robert says. Is anybody actually attacking us? Or are we worried about accidental collisions? I believe it began as

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Anthony
On Sun, Sep 18, 2011 at 5:30 PM, Chad innocentkil...@gmail.com wrote: On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnel...@clarkson.edu wrote: It is meaningless to talk about cryptography without a threat model, just as Robert says. Is anybody actually attacking us? Or are

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Chad
On Sun, Sep 18, 2011 at 5:47 PM, Anthony wikim...@inbox.org wrote: On Sun, Sep 18, 2011 at 5:30 PM, Chad innocentkil...@gmail.com wrote: On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnel...@clarkson.edu wrote: It is meaningless to talk about cryptography without a threat

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Platonides
Chad wrote: For those of us who do not know...what the heck is a Grawp attack? Does it involve generating hash collisions? -Chad It's the name of a wikipedia vandal. http://en.wikipedia.org/wiki/User:Grawp ___ Wikitech-l mailing list

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Anthony
On Sun, Sep 18, 2011 at 5:50 PM, Chad innocentkil...@gmail.com wrote: On Sun, Sep 18, 2011 at 5:47 PM, Anthony wikim...@inbox.org wrote: On Sun, Sep 18, 2011 at 5:30 PM, Chad innocentkil...@gmail.com wrote: On Sun, Sep 18, 2011 at 7:24 AM, Russell N. Nelson - rnnelson rnnel...@clarkson.edu

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Anthony
On Sun, Sep 18, 2011 at 6:01 PM, Anthony wikim...@inbox.org wrote: There's also a description at http://en.wikipedia.org/wiki/User:Grawp , which does not do justice to the mad hacker skillz of this individual and his intent on finding bugs in mediawiki and exploiting them. (and/or the Grawp

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread bawolff
Anthony wrote: It does not involve generating hash collisions, but it involves finding various bugs in mediawiki and using them to vandalise, often by injecting javascript.  The best description I could find was at Encyclopedia Dramatica, which seems to be taken down (there's a cache if you

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-18 Thread Anthony
On Sun, Sep 18, 2011 at 7:20 PM, Anthony wikim...@inbox.org wrote: On Sun, Sep 18, 2011 at 7:07 PM, bawolff bawolff...@gmail.com wrote: Anthony wrote: The pages you link to seem to indicate he's nothing more than a willy-on-wheels type vandal, who at worst tricked an admin into doing a delete

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-17 Thread Roan Kattouw
On Fri, Sep 16, 2011 at 6:48 PM, Thomas Gries m...@tgries.de wrote: Was there a certain reason to chose base 36 ? Why not recoding to base 62 and saving 3 bytes per checksum ? I don't know, this was way, way before my time. But then, why use base 62 if you can use base 64? Encoders/decoders for

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-17 Thread Platonides
Roan Kattouw wrote: On Fri, Sep 16, 2011 at 6:48 PM, Thomas Griesm...@tgries.de wrote: Was there a certain reason to chose base 36 ? Why not recoding to base 62 and saving 3 bytes per checksum ? I don't know, this was way, way before my time. But then, why use base 62 if you can use base

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-17 Thread Robert Rohde
On Sat, Sep 17, 2011 at 8:26 AM, Roan Kattouw roan.katt...@gmail.com wrote: Minor detail: I think it's more likely we'll use SHA-1 hashes rather than MD5 hashes. Is there a good reason to prefer SHA-1? Both have weaknesses allowing one to construct a collision (with considerable effort), but I

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-17 Thread Anthony
On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde raro...@gmail.com wrote: Is there a good reason to prefer SHA-1? Both have weaknesses allowing one to construct a collision (with considerable effort) Considerable effort? I can create an MD5 collision in a few minutes on my home computer. Is

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-17 Thread Robert Rohde
On Sat, Sep 17, 2011 at 4:56 PM, Anthony wikim...@inbox.org wrote: On Sat, Sep 17, 2011 at 6:46 PM, Robert Rohde raro...@gmail.com wrote: Is there a good reason to prefer SHA-1? Both have weaknesses allowing one to construct a collision (with considerable effort) Considerable effort?  I can

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-16 Thread Thomas Gries
RE: http://www.mediawiki.org/wiki/Requests_for_comment/Database_field_for_checksum_of_page_text#Field_type Recently, Adding MD5 / SHA1 column to revision table (discussing r94289) was discussed. For some applications, I use the technique of representing the 128 bit of md5 or other checksums

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-16 Thread Roan Kattouw
On Fri, Sep 16, 2011 at 8:15 AM, Thomas Gries m...@tgries.de wrote: For some applications, I use the technique of representing the 128 bit of md5 or other checksums        as base-62 character strings        instead of hexadecimal (base-16) strings. MediaWiki already uses a similar

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-16 Thread Thomas Gries
Am 16.09.2011 11:24, schrieb Roan Kattouw: For some applications, I use the technique of representing the 128 bit of md5 or other checksums as base-62 character strings instead of hexadecimal (base-16) strings. MediaWiki already uses a similar technique, storing SHA-1 hashes

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-16 Thread Neil Kandalgaonkar
On 9/16/11 9:48 AM, Thomas Gries wrote: Am 16.09.2011 11:24, schrieb Roan Kattouw: For some applications, I use the technique of representing the 128 bit of md5 or other checksums as base-62 character strings instead of hexadecimal (base-16) strings. MediaWiki already uses

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-16 Thread Daniel Friesen
On 11-09-16 09:48 AM, Thomas Gries wrote: Am 16.09.2011 11:24, schrieb Roan Kattouw: For some applications, I use the technique of representing the 128 bit of md5 or other checksums as base-62 character strings instead of hexadecimal (base-16) strings. MediaWiki already uses a

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-16 Thread Brion Vibber
On Fri, Sep 16, 2011 at 9:48 AM, Thomas Gries m...@tgries.de wrote: Am 16.09.2011 11:24, schrieb Roan Kattouw: For some applications, I use the technique of representing the 128 bit of md5 or other checksums as base-62 character strings instead of hexadecimal (base-16)

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-04 Thread Krinkle
2011/9/4 MZMcBride z...@mzmcbride.com Diederik van Liere wrote: I've suggested to generate bulk checksums as well but both Brion and Ariel see the primary purpose of this field to check the validity of the dump generating process and so they want to generate the checksums straight from

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-04 Thread Aryeh Gregor
On Sat, Sep 3, 2011 at 12:33 AM, Rob Lanphier ro...@wikimedia.org wrote: I generally suspect that a standard index is going to be a waste for the most urgent uses of this.  It will rarely be interesting to search for common hashes between articles.  The far more common case will be to search

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-04 Thread Diederik van Liere
Thanks for moving the page. Diederik On 2011-09-04, at 3:29 PM, Krinkle wrote: 2011/9/4 MZMcBride z...@mzmcbride.com Diederik van Liere wrote: I've suggested to generate bulk checksums as well but both Brion and Ariel see the primary purpose of this field to check the validity of the dump

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-03 Thread Diederik van Liere
Hi, I've suggested to generate bulk checksums as well but both Brion and Ariel see the primary purpose of this field to check the validity of the dump generating process and so they want to generate the checksums straight from the external storage. In a general sense, there are two use cases

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-03 Thread Roan Kattouw
On Sat, Sep 3, 2011 at 2:20 AM, Asher Feldman afeld...@wikimedia.org wrote: Is code written to populate rev_sha1 on each new edit? I believe that was part of Aaron's code that got reverted, yes. Offline generation of hashes is definitely possible, but the only reason you'd do it is to minimize

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-02 Thread Erik Moeller
On Thu, Aug 18, 2011 at 7:40 AM, Diederik van Liere dvanli...@gmail.com wrote: Hi! I am starting this thread because Brion's revision r94289 reverted r94289 [0] stating core schema change with no discussion [1]. Bumping this: What are the remaining open questions regarding this schema change?

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-02 Thread Asher Feldman
Would it be possible to generate offline hashes for the bulk of our revision corpus via dumps and load that into prod to minimize the time and impact of the backfill? When using for analysis, will we wish the new columns had partial indexes (first 6 characters?) Is code written to populate

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-02 Thread Daniel Friesen
Bug 2939 is one relevant bug to this, it could probably use an index. [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=2939 ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] On 11-09-02 05:20 PM, Asher Feldman wrote: Would it be possible to generate offline hashes for

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-02 Thread Rob Lanphier
On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen li...@nadir-seen-fire.com wrote: On 11-09-02 05:20 PM, Asher Feldman wrote: When using for analysis, will we wish the new columns had partial indexes (first 6 characters?) Bug 2939 is one relevant bug to this, it could probably use an index. [1]

Re: [Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-09-02 Thread Daniel Friesen
On 11-09-02 09:33 PM, Rob Lanphier wrote: On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen li...@nadir-seen-fire.com wrote: On 11-09-02 05:20 PM, Asher Feldman wrote: When using for analysis, will we wish the new columns had partial indexes (first 6 characters?) Bug 2939 is one relevant bug to

[Wikitech-l] Adding MD5 / SHA1 column to revision table (discussing r94289)

2011-08-18 Thread Diederik van Liere
Hi! I am starting this thread because Brion's revision r94289 reverted r94289 [0] stating core schema change with no discussion [1]. Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash column (either md5 or sha1) in the revision table. The primary use case of this column will be to