From: "Peter Kirk" <[EMAIL PROTECTED]> > On 27/10/2003 12:28, Mark Davis wrote: > > >Collation is very different, and already has mechanisms for dealing with > >sequences. So no CGJ is needed there (except for case 2). > > > >Mark > > > > > > > Mark, can you outline what these mechanisms are or point me to a > definition e.g. in a section of UTR #10? As I had understood it, the > only way to deal with sequences of the sort I have in mind is to list > each possible individually as a contraction. The Logical_Order_Exception > property (see http://www.unicode.org/reports/tr10/ section 3.1.3) just > might be useful, but doesn't seem to have the necessary flexibility as > it causes a character to be swapped with ANY following character, not > just with any of a restricted list of such characters. The backwards > marking used for French accents (section 3.1.2) seems to apply over too > long a string.
The backwards marking is not restricted to French accents in collation level 2. You can use reverse ordering at any tailored level to fit other needs, and you can also insert an extra collation level. So I think that Mark is right here as it gives you full control on the length of the collating sequence at each level of the collation keys. The case 2 is effectively an exception. The bad thing is that the current default UCA ordering table does not create such collation keys with intermediate levels for Hebrew vowels, and you need tailoring to create a base level with consonnants, one level with vowels, a third level for sin/shin dots, a fourth for meteg, a fifth for accents... unless the text is encoded in logical order using the CCO-convention. Philippe.