Collation is very different, and already has mechanisms for dealing with
sequences. So no CGJ is needed there (except for case 2).

Mark
__________________________________
http://www.macchiato.com
â ààààààààààààààààààààà â

----- Original Message ----- 
From: "Peter Kirk" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>
Cc: "Philippe Verdy" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Mon, 2003 Oct 27 09:09
Subject: Re: Merging combining classes, was: New contribution N2676


> On 27/10/2003 08:45, Mark Davis wrote:
>
> >>Thank you for the interesting thoughts. As I understand your suggestion,
> >>and bearing in mind that dagesh (and the rare rafe) are also consonant
> >>modifiers, you are effectively suggesting an order (already normalised):
> >>
> >>consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ
> >>vowel2 accent2
> >>
> >>with each element being optional, and CGJ being omitted when it is at
> >>the beginning or the end of the string of combining marks, or doubled.
> >>
> >>This would, I think, work, and at least come close to being rendered
> >>correctly with current fonts modified to ignore CGJ (which actually they
> >>should do anyway as CGJ is default ignorable). The down side is the
> >>
> >>
> >
> >There are two very different cases that appear to be conflated by the above
> >example.
> >
> >
> The issue is not just one of rendering. See below.
>
> >1. Current engines incorrectly rendering canonically equivalent text.
> >
> >If a rendering engine renders X Y Z correctly, but doesn't render a
> >canonically-equivalent X Z Y correctly, then there is a problem in the
engine.
> >[Note: this would be for sequences X Y Z that would actually occur in
practice.]
> >
> >Using CGJ for this would simply be a mechanism to get by current deficiencies
in
> >the engines.
> >
> >
> No, it is more than this. It is also a mechanism to ensure that the
> string X Y Z is collated as the string X Y Z, and that this string is
> matched by a search for X Y, which is rather difficult if the canonical
> order is actually X Z Y, and the Z can be three or more characters which
> are moved into the middle of the string X Y.
>
> Note the following required collation order:
>
> X Z1
> X Z2
> ...
> X Zn
> X Y1 Z1
> X Y1 Z2
> ...
> X Y1 Zn
> X Y2 Z1
> X Y2 Z2
> ...
> X Y2 Zn
>
> This collation is simple when the string is ordered like this. But
> consider the problem of generating this collation when the strings are
> canonically reordered as follow, and Z is an arbitrary combination of 12
> different marks (I think the collation algorithm can do this only if
> every possible Y Z combination is listed as a collation contraction):
>
> X Z1
> X Z2
> ...
> X Zn
> X Z1 Y1
> X Z2 Y1
> ...
> X Zn Y1
> X Z1 Y2
> X Z2 Y2
> ...
> X Zn Y2
>
> >2. Unicode not making a distinction between X Y Z and X Z Y.
> >
> >Where there are cases where canonically-equivalent X Y Z and X Z Y should be
> >rendered differently, then CGJ could be used to preserve the distinction, as
per
> >the UTC decision:
> >
> >[96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining
> >grapheme joiner has the effect of preventing the canonical re-ordering of
> >combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234]
> >
> >[96-A72] Action Item for Ken Whistler: Draft language for consensus 96-C20
(on
> >the effect of combining grapheme joiner to prevent canonical re-ordering of
> >combining marks during normalization) for inclusion into Unicode 4.0.1 and
> >create a FAQ describing this effect as well. [L2/03-235, L2/03-236,
L2/03-234]
> >
> >
> Agreed. Has any text been drafted?
>
> -- 
> Peter Kirk
> [EMAIL PROTECTED] (personal)
> [EMAIL PROTECTED] (work)
> http://www.qaya.org/
>
>
>


Reply via email to