https://bugzilla.wikimedia.org/show_bug.cgi?id=25312
Summary: MD5 in stub dumps
Product: Wikimedia
Version: unspecified
Platform: All
URL: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
OS/Version: All
Status: NEW
Severity: enhancement
Priority: Normal
Component: Downloads
AssignedTo: [email protected]
ReportedBy: [email protected]
CC: [email protected]
There is growing audience for revert stats. Nimisz Gautam and Erik Zachte both
made scripts to generate revert stats based on comparing revisions in the dumps
via MD5 sums. Rob Lanphier expects MD5 can be used for even fancier processing.
Right now the only way to harvest MD5's is by parsing the full archive dumps
which takes forever.
Proposal is to store MD5's in stub dumps for every revision. This would allow
monthly refresh of revert stats (see URL above) and regular publication of
revert data files for researchers.
e.g.
<page>
<title>United States Declaration of Independence</title>
<id>19</id>
<revision>
<id>1926607</id>
<timestamp>2010-06-15T22:06:14Z</timestamp>
<contributor>
<username>Innotata</username>
<id>172490</id>
</contributor>
<text id="1894246" />
<md5>eff7d5dba32b4da32d9a67a519434d3f</md5>
</revision>
</page>
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l