On Sun, Sep 18, 2011 at 11:00 PM, Anthony <wikim...@inbox.org> wrote:
> Now I don't know how important the CPU differences in calculating the
> two versions would be.  If they're significant enough, then fine, use
> MD5, but make sure there are warnings all over the place about its
> use.
>
I ran some benchmarks on one of the WMF machines. The input I used is
a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to
upload to Commons recently. For each benchmark, I hashed the file 25
times and computed the average running time.

MD5: 393 ms
SHA-1: 404 ms
SHA-256: 1281 ms

Note that the input size is many times higher than $wgMaxArticleSize,
which is set to 2000 KB at WMF. For historical reasons, we have some
revisions in our history that are larger; Ariel would be able to tell
you how large, but I believe nothing in there is larger than 10 MB. So
I decided to run the numbers for more realistic sizes as well, using
the first 2 MB and 10 MB, respectively, of my OGV file.

For 2 MB (averages of 1000 runs):

MD5: 5.66 ms
SHA-1: 5.85 ms
SHA-256: 18.56 ms

For 10 MB (averages of 200 runs):

MD5: 28.6 ms
SHA-1: 29.47 ms
SHA-256: 93.49 ms

So yes, SHA-256 is a few times (just over 3x) more expensive to
compute than SHA-1, which in turn is only a few percent slower than
MD5. However, on the largest possible size we allow for new revisions
it takes < 20ms. It sounds like that's an acceptable worst case for
on-the-fly population, since saves and parses are slow anyway,
especially for 2 MB of wikitext. The 10 MB case is only relevant for
backfilling, which we could do from a maintenance script, and < 100ms
is definitely acceptable there.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to