https://bugzilla.wikimedia.org/show_bug.cgi?id=46330
--- Comment #4 from Bartosz Dziewoński <[email protected]> --- (In reply to comment #2) > It would also be nice to know which standard the ICU implementation > is supposed to comply with (my guess: SFS-EN 13710). There are a couple of > slightly different standards. I have no idea, to be honest. Wikimedia wikis are currently running ICU 4.8 (per bug 46036); that's all the information I can give you :) The data used to "partition" the sorted list into headers is probably not standardised at all and somehow based on the information about primary-level collation data. For details you should probably look at the code that generates it, maintenance/language/generateCollationData.php. (In reply to comment #3) > I wonder if there is some fundamental flaw with the grouping of letters under > these one-letter headers? I don't think there's such a "fundamental flaw" in it; the list is generated using generalised data that's reasonably correct for most languages, and thus needs such modifications for some specific ones. For example, no modifications were needed for Portuguese, and Polish only required adding the appropriate letters with diacritics. You and Swedes are just unlucky, I suppose :) It's interesting how those characters are sorted among Latin letters in Finnish, and at the end of the Latin alphabet in Polish or Portuguese. I automatically created a category with all two-letter combinations of ASCII letters + Å, Ä, Ö: http://users.v-lo.krakow.pl/~matmarex/testwiki-fi/index.php?title=Luokka:Autotest . It seems like we need to exclude those four characters: Ǥ, Ŋ, Ŧ, Ʒ. I'll submit a patch to do this later today. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
