Bug 2939 is one relevant bug to this, it could probably use an index.

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=2939

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

On 11-09-02 05:20 PM, Asher Feldman wrote:
> Would it be possible to generate offline hashes for the bulk of our revision
> corpus via dumps and load that into prod to minimize the time and impact of
> the backfill?
>
> When using for analysis, will we wish the new columns had partial indexes
> (first 6 characters?)
>
> Is code written to populate rev_sha1 on each new edit?
>
> On Thu, Aug 18, 2011 at 7:40 AM, Diederik van Liere 
> <[email protected]>wrote:
>
>> Hi!
>> I am starting this thread because Brion's revision r94289 reverted
>> r94289 [0] stating "core schema change with no discussion" [1].
>> Bugs 21860 [2] and 25312 [3] advocate for the inclusion of a hash
>> column (either md5 or sha1) in the revision table. The primary use
>> case of this column will be to assist detecting reverts. I don't think
>> that data integrity is the primary reason for adding this column. The
>> huge advantage of having such a column is that it will not be longer
>> necessary to analyze full dumps to detect reverts, instead you can
>> look for reverts in the stub dump file by looking for the same hash
>> within a single page. The fact that there is a theoretical chance of a
>> collision is not very important IMHO, it would just mean that in very
>> rare cases in our research we would flag an edit being reverted  while
>> it's not. The two bug reports contain quite long discussions and this
>> feature has also been discussed internally quite extensively but oddly
>> enough it hasn't happened yet on the mailinglist.
>>
>> So let's have a discussion!
>>
>> [0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94289
>> [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/94541
>> [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=21860
>> [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=25312
>>
>> Best,
>>
>> Diederik
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>

-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to