[Bug 164] Support collation by a certain locale (sorting order of characters)

bugzilla-daemon Fri, 12 Mar 2010 14:18:27 -0800

https://bugzilla.wikimedia.org/show_bug.cgi?id=164


--- Comment #165 from T. Gries <[email protected]> 2010-03-12 22:18:10 UTC ---
(In reply to comment #164)
> (In reply to comment #163)
> The options I know of are:
> 1) Use MySQL collation support, just using utf8 everywhere.  This will mess up
> equality comparisons in ways we almost certainly don't want, doesn't support
> characters outside the BMP, etc.
> 
> 2) Roll our own sortkeys.  Probably requires a lot more work, and a lot less
> efficient (need to keep extra column+index around in all relevant tables). 
> Also might be tricky, since the default Unicode algorithm gives very long sort
> keys, which we'd have to squish somehow (AFAIK).
> 
> 3) Use some utf8 collation for cl_sortkey, but keep everything else binary.  
> We
> could also add an extra page_sortkey using utf8 collation, which is no more
> expensive than (2).  This sounds like the best option to me offhand.  It would
> be as easy to implement as anything here, and wouldn't have such bad side
> effects.  It would somewhat mess up pages with non-BMP characters in the name,
> though.

Can you start writing an extension, which outputs the list of pages
http://en.wikipedia.org/wiki/Special:AllPages in a user-definable collation
such as utf8_general_ci; utf8_unicode_ci; utf8_swedish_ci ? This would be a
good starting point in my view.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 164] Support collation by a certain locale (sorting order of characters)

Reply via email to