2011/8/31 Mark Davis ☕ <[email protected]>: >> Another interesting question is: how can we encode in texts the fact >> that a character usually considered as a ligature in a language (that >> collates it as separate letters, even if the ligature is orthographic >> and not just typographic), should still be collated as only one letter >> ? In other words, are there some controls (or variant selection, or >> other means) which would have the effect of disabling the default >> expansions performed in a correctly tailored collation (for example, >> in a French collator is there a way to disable the expansion of >> occurences of "æ" into "ae" ? > > The CLDR tailoring syntax allows DUCET expansions to be suppressed or > changed for a particular locale. > There is no mechanism in UCA to change expansions on a code-point basis. Eg > in the same string "Cæsium Kværner" to have the first æ expand to 'ae' but > the second sort after 'Z', as in Norwegian.
You just provided the perfect example: we still have no way to specify that one of the 'æ' occurence should not be expanded, and the other one should be. I would expect an orthographic convention, such as adding an invisible control after one of the occurences to change the default behavior *locally*, so that it could be detected by an UCA tailoring (using the rule of longer match). But which kind of invisible control? May be the occurence that should expand could be encoded as (a,ZWJ,e), and in that case there is no more expansion for this substring, but just an ignorable character in the middle... -- Philippe.

