--- Comment #165 from T. Gries <m...@tgries.de> 2010-03-12 22:18:10 UTC ---
(In reply to comment #164)
> (In reply to comment #163)
> The options I know of are:
> 1) Use MySQL collation support, just using utf8 everywhere. This will mess up
> equality comparisons in ways we almost certainly don't want, doesn't support
> characters outside the BMP, etc.
> 2) Roll our own sortkeys. Probably requires a lot more work, and a lot less
> efficient (need to keep extra column+index around in all relevant tables).
> Also might be tricky, since the default Unicode algorithm gives very long sort
> keys, which we'd have to squish somehow (AFAIK).
> 3) Use some utf8 collation for cl_sortkey, but keep everything else binary.
> could also add an extra page_sortkey using utf8 collation, which is no more
> expensive than (2). This sounds like the best option to me offhand. It would
> be as easy to implement as anything here, and wouldn't have such bad side
> effects. It would somewhat mess up pages with non-BMP characters in the name,
Can you start writing an extension, which outputs the list of pages
http://en.wikipedia.org/wiki/Special:AllPages in a user-definable collation
such as utf8_general_ci; utf8_unicode_ci; utf8_swedish_ci ? This would be a
good starting point in my view.
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
Wikibugs-l mailing list