https://bugzilla.wikimedia.org/show_bug.cgi?id=164

--- Comment #164 from Aryeh Gregor <simetrical+wikib...@gmail.com> 2010-03-12 
22:06:57 UTC ---
(In reply to comment #163)
> Thanks for pointing out. I must admit, that you are _fully_ right, also 
> knowing
> of the collation differences. We developers should collaboratively think to
> find a satisfying solution for the "collation problem".

The options I know of are:

1) Use MySQL collation support, just using utf8 everywhere.  This will mess up
equality comparisons in ways we almost certainly don't want, doesn't support
characters outside the BMP, etc.

2) Roll our own sortkeys.  Probably requires a lot more work, and a lot less
efficient (need to keep extra column+index around in all relevant tables). 
Also might be tricky, since the default Unicode algorithm gives very long sort
keys, which we'd have to squish somehow (AFAIK).

3) Use some utf8 collation for cl_sortkey, but keep everything else binary.  We
could also add an extra page_sortkey using utf8 collation, which is no more
expensive than (2).  This sounds like the best option to me offhand.  It would
be as easy to implement as anything here, and wouldn't have such bad side
effects.  It would somewhat mess up pages with non-BMP characters in the name,
though.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to