On Fri, 20 Feb 2015 10:04:32 +0200 Eli Zaretskii <[email protected]> wrote:
> > Date: Thu, 19 Feb 2015 22:02:57 +0000 > > From: Richard Wordingham <[email protected]> > > > > > First, collation data is overkill for search, > > > since the order information is not required, so the weights are > > > simply wasting storage. > > > > The big waste is not in text-dependent storage, but in the > > processing for search orders that bear little relationship to > > alphabetical order. > > Sorry, I don't think I follow: what is "processing for search orders" > to which you allude here? The examples in the CLDR root locale and in DUCET are the massive sets of 'contractions' of consonants with vowels written before the associated consonant in the scripts where spacing characters are stored in the order written, namely Thai, Lao, Tai Viet and, soon, New Tai Lue. When customised collations are applied, there are enormous sets for Burmese (in CLDR) and New Tai Lue (not published in CLDR). The latter two have 'logical order exception' final consonants. (The exception here is that the logical order of characters in a word is not the order one wants for sorting.) > I'm not talking about localized features, like for "å" to match "aa" > in Danish locales. I'm talking about matching strings that are > equivalent under canonical and compatibility decompositions. Nor was I. I was talking about the user interface - commands, menus and messages. > As for user sophistication, AFAIR, Microsoft Word finds "²" when you > search for "2" by default, so it sounds like Word considers all users > sophisticated enough for that. I think that's a solid enough > precedent to follow. But what switches the match off? Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

