On Sun, Sep 18, 2011 at 11:00 PM, Anthony <wikim...@inbox.org> wrote: > Now I don't know how important the CPU differences in calculating the > two versions would be. If they're significant enough, then fine, use > MD5, but make sure there are warnings all over the place about its > use. > I ran some benchmarks on one of the WMF machines. The input I used is a 137.5 MB (144,220,582 bytes) OGV file that someone asked me to upload to Commons recently. For each benchmark, I hashed the file 25 times and computed the average running time.
MD5: 393 ms SHA-1: 404 ms SHA-256: 1281 ms Note that the input size is many times higher than $wgMaxArticleSize, which is set to 2000 KB at WMF. For historical reasons, we have some revisions in our history that are larger; Ariel would be able to tell you how large, but I believe nothing in there is larger than 10 MB. So I decided to run the numbers for more realistic sizes as well, using the first 2 MB and 10 MB, respectively, of my OGV file. For 2 MB (averages of 1000 runs): MD5: 5.66 ms SHA-1: 5.85 ms SHA-256: 18.56 ms For 10 MB (averages of 200 runs): MD5: 28.6 ms SHA-1: 29.47 ms SHA-256: 93.49 ms So yes, SHA-256 is a few times (just over 3x) more expensive to compute than SHA-1, which in turn is only a few percent slower than MD5. However, on the largest possible size we allow for new revisions it takes < 20ms. It sounds like that's an acceptable worst case for on-the-fly population, since saves and parses are slow anyway, especially for 2 MB of wikitext. The 10 MB case is only relevant for backfilling, which we could do from a maintenance script, and < 100ms is definitely acceptable there. Roan Kattouw (Catrope) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l