Would it be possible to generate offline hashes for the bulk of our revision corpus via dumps and load that into prod to minimize the time and impact of the backfill?
When using for analysis, will we wish the new columns had partial indexes (first 6 characters?) Is code written to populate rev_sha1 on each new edit? On Thu, Aug 18, 2011 at 7:40 AM, Diederik van Liere <[email protected]>wrote: > Hi! > I am starting this thread because Brion's revision r94289 reverted > r94289 [0] stating "core schema change with no discussion" [1]. > Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash > column (either md5 or sha1) in the revision table. The primary use > case of this column will be to assist detecting reverts. I don't think > that data integrity is the primary reason for adding this column. The > huge advantage of having such a column is that it will not be longer > necessary to analyze full dumps to detect reverts, instead you can > look for reverts in the stub dump file by looking for the same hash > within a single page. The fact that there is a theoretical chance of a > collision is not very important IMHO, it would just mean that in very > rare cases in our research we would flag an edit being reverted while > it's not. The two bug reports contain quite long discussions and this > feature has also been discussed internally quite extensively but oddly > enough it hasn't happened yet on the mailinglist. > > So let's have a discussion! > > [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289 > [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541 > [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860 > [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312 > > Best, > > Diederik > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
