https://bugzilla.wikimedia.org/show_bug.cgi?id=42412

--- Comment #3 from Bawolff (Brian Wolff) <[email protected]> ---
>Additionally, for a reason that is to me inexplicable, IcuCollation, as well as
>my derivation, insists on sorting articles starting with "A" under "⅍". I have
>no idea why this happens or how to change it properly.

btw, for anyone else following along - that issue is split to bug 43740
-----
Actual patch:

I don't think this is the way that extending uca-default was envisioned of
working. I believe the intention was more to use intl's built in support for
different language collations [via new IcuCollation( 'pl' ) ] and add a
first-letters-pl.ser file. This would probably integrate the sorting of polish
letters with other types of letters more seamlessly. However, how to generate a
first-letters-pl.ser file is a bit of an open question at the moment, and
probably requires a much expanded
maintenance/languages/generateCollationData.php file. [Although I have a vague
idea how to make one that would imperfectly, but probably acceptably by hand]

----

With the actual patch, the getFirstLetter takes a binary string. There's no
guarantee that the icu collation won't use a binary code that is also a code
point for one of the "polish" letters (in practise that's probably rare
though). Additionally since its a binary string, its not guaranteed to be valid
UTF-8, and I'm not sure how mb_substr would handle invalid utf-8.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to