https://bugzilla.wikimedia.org/show_bug.cgi?id=164
--- Comment #165 from T. Gries <[email protected]> 2010-03-12 22:18:10 UTC --- (In reply to comment #164) > (In reply to comment #163) > The options I know of are: > 1) Use MySQL collation support, just using utf8 everywhere. This will mess up > equality comparisons in ways we almost certainly don't want, doesn't support > characters outside the BMP, etc. > > 2) Roll our own sortkeys. Probably requires a lot more work, and a lot less > efficient (need to keep extra column+index around in all relevant tables). > Also might be tricky, since the default Unicode algorithm gives very long sort > keys, which we'd have to squish somehow (AFAIK). > > 3) Use some utf8 collation for cl_sortkey, but keep everything else binary. > We > could also add an extra page_sortkey using utf8 collation, which is no more > expensive than (2). This sounds like the best option to me offhand. It would > be as easy to implement as anything here, and wouldn't have such bad side > effects. It would somewhat mess up pages with non-BMP characters in the name, > though. Can you start writing an extension, which outputs the list of pages http://en.wikipedia.org/wiki/Special:AllPages in a user-definable collation such as utf8_general_ci; utf8_unicode_ci; utf8_swedish_ci ? This would be a good starting point in my view. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
