https://bugzilla.wikimedia.org/show_bug.cgi?id=25312

           Summary: MD5 in stub dumps
           Product: Wikimedia
           Version: unspecified
          Platform: All
               URL: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: Downloads
        AssignedTo: tf...@wikimedia.org
        ReportedBy: erikzac...@infodisiac.com
                CC: ar...@wikimedia.org


There is growing audience for revert stats. Nimisz Gautam and Erik Zachte both
made scripts to generate revert stats based on comparing revisions in the dumps
via MD5 sums. Rob Lanphier expects MD5 can be used for even fancier processing.

Right now the only way to harvest MD5's is by parsing the full archive dumps
which takes forever. 

Proposal is to store MD5's in stub dumps for every revision. This would allow
monthly refresh of revert stats (see URL above) and regular publication of
revert data files for researchers. 

e.g. 

  <page>
    <title>United States Declaration of Independence</title>
    <id>19</id>
    <revision>
      <id>1926607</id>
      <timestamp>2010-06-15T22:06:14Z</timestamp>
      <contributor>
        <username>Innotata</username>
        <id>172490</id>
      </contributor>
      <text id="1894246" />
      <md5>eff7d5dba32b4da32d9a67a519434d3f</md5>
    </revision>
  </page>

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to