https://bugzilla.wikimedia.org/show_bug.cgi?id=13721
Brion Vibber <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #5 from Brion Vibber <[email protected]> 2009-10-20 00:09:12 UTC --- Ahhhh ok I think I see the base issue -- if a 2-byte or 3-byte char is cut off at the 255-byte boundary when stored, it becomes an invalid char. The XML dump outputter runs UTF-8 validation and turns the bad char into a valid U+FFFD ... which is 3 bytes of UTF-8, over the 255-char limit again. Yeah, this should be fixed in our DB and MediaWiki should be smarter about truncation, but in the meantime it should be easy to make mwdumper smarter for this too. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
