https://bugzilla.wikimedia.org/show_bug.cgi?id=30705

--- Comment #4 from Brion Vibber <[email protected]> 2011-09-15 22:04:58 UTC 
---
It looks like either the wiki is misconfigured for the database's character set
settings, or the database itself has been corrupted with an incorrect
Latin1-to-UTF-8 conversion applied on export or import.

Page contents usually survive this because they're stored in binary BLOB
fields, but page titles, usernames, edit comments etc may have gotten
misconverted.

Try switching the $wgMySQL5 setting and double-check the encodings. Ideally,
most newly configured wikis will be set to binary charset/collation -- which
allows MediaWiki to speak UTF-8 Unicode without limitations. If things claim to
be either latin1 or utf8 and the contents are clearly wrong when viewed
directly in the db, it may be incorrectly set up.

This sometimes results from mysqldump operating on wikis that were originally
set up with a really old configuration where fields were labeled as Latin1
(such as when upgraded from an old MySQL 4.0 instance, or old versions of
MediaWiki that had aimed primarily for MySQL 4.0 compatibility. MySQL 4.0 and
earlier have *no* customizable charset support so whatever the default charset
was got used, even though we always actually sent/received UTF-8 data. This
sometimes results in data getting "converted to UTF-8" by mysqldump, or some
other sort of problem.)

Sometimes also the database itself is still ok, but a reconfiguration of the
wiki has caused the old settings to be lost and it's now defaulting to using
the modern mode, which can end up doing a similar misconversion. Try turning
off $wgMySQL5 in this case?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to