--- Comment #2 from Rob Lanphier <ro...@wikimedia.org> 2011-01-08 03:31:37 UTC
Committed r79856 into trunk. I did bytes because characters was a little more
involved. I added byte counts to both stub and full dumps.
I thought about not including the byte count in the full dump because it's
pretty trivial to get that count from most XML parsers. However, it is nice to
have the byte count that doesn't include any XML escaping introduced by the
dump, so I left it in.
I'll document how I'd go about characters, just in case anyone wants to tackle
it. The JOIN of the "text" table in WikiExporter::dumpFrom would have to be
performed even in the case of a stub dump. WikiExporter()->text would need to
be passed as a new parameter into XMLDumpWriter::writeRevision(). The stub
logic in XMLDumpWriter::writeRevision() would need to be changed to use the new
parameter to see if we're dealing with a stub dump, rather than inferring it
from the absence of text. Finally, mb_strlen($foo, 'UTF-8') could be called.
It's not a ton of code (probably 10-15 lines of code change, tops) but that's
less likely to get fast-tracked to production.
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
Wikibugs-l mailing list