[Bug 164] Support collation by a certain locale (sorting order of characters)

bugzilla-daemon Thu, 22 Jul 2010 04:16:38 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=164


--- Comment #179 from Roan Kattouw <[email protected]> 2010-07-22 11:16:18 
UTC ---
(In reply to comment #178)
> All this is something that can be avoided completely by using ICU and not
> depending on SQL backends for their support of many more collation locales
This is exactly what Aryeh is proposing. I think everyone agrees that it's
better to use binary sorting with munged sortkeys rather than SQL backends'
half-baked collation support, so you don't need to argue that.

> - Section headings in categories will never need to be stored, they are
> generated on the fly by reading the page names retrieved in the SQL result set
> using the {{COLLATIONMAP:}} function, with the specified locale in the
> "uselang=" HTTP query parameters, and the specified (or default) "clusters="
> parameter (whose default will be 1 or 0 as indicated above). They will be
> diretly readable by users and do not require decoding anything from the stored
> sortkey.
> 
This is not that simple, as was pointed out on wikitech-l: what if you've got a
language where á sorts the same as a (that is, a and á are the same for sorting
purposes), then your sorted result could look like:

Áa
Áb
Ac
Ád
Ae
Af
Ág
...

We humans understand that the proper heading for this is "A", not "Á", but how
will the software know that? Even if we store the original, unmunged sortkeys
(note that the sortkey is not necessarily equal to the page name: [[Albert
Einstein]] sorts as "Einstein, Albert") so we can differentiate A from Á, we
can't just take the first or even the most frequent letter: neither is accurate
in this case.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 164] Support collation by a certain locale (sorting order of characters)

Reply via email to