https://bugzilla.wikimedia.org/show_bug.cgi?id=26563

           Summary: Add characters changed per revision for stub and full
                    article dumps
           Product: XML Snapshots
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: General
        AssignedTo: ar...@wikimedia.org
        ReportedBy: dvanli...@gmail.com
                CC: tf...@wikimedia.org


Adding a delta characters change to each revision is needed for edit analytics.
This is needed for both the stub and full article dumps. 
Rob suggested that using PHP's UTF-8 support (e.g. just calling
mb_strlen($buffer, 'UTF-8')) to quickly dispatch of the multi-byte problem
would give us a fairly accurate character count. Counting characters will allow
us to compare across different languages.

If there are serious performance concerns then we can fall back to byte count.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to