https://bugzilla.wikimedia.org/show_bug.cgi?id=43740
Bawolff (Brian Wolff) <bawolff...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch-in-gerrit Summary|IcuCollation doesn't prune |IcuCollation doesn't prune |first letter elements that |first letter elements that |duplicate a prefix of |duplicate a prefix of |another first letter' |another first letter's | |sortkey --- Comment #8 from Bawolff (Brian Wolff) <bawolff...@gmail.com> --- Ok, I read up on icu, after quite a bit of googling, this actually looks not that complicated. From what I gather (if I read the docs right, which is a very big if), there should be no two primary collation elements where one collation element in its entirety is a prefix of some other collation. See https://ssl.icu-project.org/repos/icu/icuhtml/trunk/design/collation/ICU_collation_design.htm specificly: R2. A fractional weight cannot exactly match the initial bytes of another fractional weight at the same level. So assuming nothing funky is done to compress the sort keys (which I don't happens on the primary level, at least not currently), just looking for matching prefixes should work. Anyhow gerrit change 55503. I still need to double check that unsetting an element of an array doesn't modify its sorted order (It doesn't seem to, but should double check). It also might be prettier if the check for duplicate prefix was merged into the general duplicate check, but I didn't see an easy way of doing. Anyhow, feedback appreciated. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. You are watching all bug changes. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l