Collation is very different, and already has mechanisms for dealing with sequences. So no CGJ is needed there (except for case 2).
Mark __________________________________ http://www.macchiato.com â ààààààààààààààààààààà â ----- Original Message ----- From: "Peter Kirk" <[EMAIL PROTECTED]> To: "Mark Davis" <[EMAIL PROTECTED]> Cc: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Mon, 2003 Oct 27 09:09 Subject: Re: Merging combining classes, was: New contribution N2676 > On 27/10/2003 08:45, Mark Davis wrote: > > >>Thank you for the interesting thoughts. As I understand your suggestion, > >>and bearing in mind that dagesh (and the rare rafe) are also consonant > >>modifiers, you are effectively suggesting an order (already normalised): > >> > >>consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ > >>vowel2 accent2 > >> > >>with each element being optional, and CGJ being omitted when it is at > >>the beginning or the end of the string of combining marks, or doubled. > >> > >>This would, I think, work, and at least come close to being rendered > >>correctly with current fonts modified to ignore CGJ (which actually they > >>should do anyway as CGJ is default ignorable). The down side is the > >> > >> > > > >There are two very different cases that appear to be conflated by the above > >example. > > > > > The issue is not just one of rendering. See below. > > >1. Current engines incorrectly rendering canonically equivalent text. > > > >If a rendering engine renders X Y Z correctly, but doesn't render a > >canonically-equivalent X Z Y correctly, then there is a problem in the engine. > >[Note: this would be for sequences X Y Z that would actually occur in practice.] > > > >Using CGJ for this would simply be a mechanism to get by current deficiencies in > >the engines. > > > > > No, it is more than this. It is also a mechanism to ensure that the > string X Y Z is collated as the string X Y Z, and that this string is > matched by a search for X Y, which is rather difficult if the canonical > order is actually X Z Y, and the Z can be three or more characters which > are moved into the middle of the string X Y. > > Note the following required collation order: > > X Z1 > X Z2 > ... > X Zn > X Y1 Z1 > X Y1 Z2 > ... > X Y1 Zn > X Y2 Z1 > X Y2 Z2 > ... > X Y2 Zn > > This collation is simple when the string is ordered like this. But > consider the problem of generating this collation when the strings are > canonically reordered as follow, and Z is an arbitrary combination of 12 > different marks (I think the collation algorithm can do this only if > every possible Y Z combination is listed as a collation contraction): > > X Z1 > X Z2 > ... > X Zn > X Z1 Y1 > X Z2 Y1 > ... > X Zn Y1 > X Z1 Y2 > X Z2 Y2 > ... > X Zn Y2 > > >2. Unicode not making a distinction between X Y Z and X Z Y. > > > >Where there are cases where canonically-equivalent X Y Z and X Z Y should be > >rendered differently, then CGJ could be used to preserve the distinction, as per > >the UTC decision: > > > >[96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining > >grapheme joiner has the effect of preventing the canonical re-ordering of > >combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234] > > > >[96-A72] Action Item for Ken Whistler: Draft language for consensus 96-C20 (on > >the effect of combining grapheme joiner to prevent canonical re-ordering of > >combining marks during normalization) for inclusion into Unicode 4.0.1 and > >create a FAQ describing this effect as well. [L2/03-235, L2/03-236, L2/03-234] > > > > > Agreed. Has any text been drafted? > > -- > Peter Kirk > [EMAIL PROTECTED] (personal) > [EMAIL PROTECTED] (work) > http://www.qaya.org/ > > >