Re: Merging combining classes, was: New contribution N2676

Peter Kirk Mon, 27 Oct 2003 11:11:57 -0800

On 27/10/2003 08:45, Mark Davis wrote:

Thank you for the interesting thoughts. As I understand your suggestion,
and bearing in mind that dagesh (and the rare rafe) are also consonant
modifiers, you are effectively suggesting an order (already normalised):
consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ
vowel2 accent2
with each element being optional, and CGJ being omitted when it is at
the beginning or the end of the string of combining marks, or doubled.
This would, I think, work, and at least come close to being rendered correctly with current fonts modified to ignore CGJ (which actually they should do anyway as CGJ is default ignorable). The down side is the
There are two very different cases that appear to be conflated by the above example.

The issue is not just one of rendering. See below.

1. Current engines incorrectly rendering canonically equivalent text.
If a rendering engine renders X Y Z correctly, but doesn't render a
canonically-equivalent X Z Y correctly, then there is a problem in the engine.
[Note: this would be for sequences X Y Z that would actually occur in practice.]
Using CGJ for this would simply be a mechanism to get by current deficiencies in the engines.

No, it is more than this. It is also a mechanism to ensure that the string X Y Z is collated as the string X Y Z, and that this string is matched by a search for X Y, which is rather difficult if the canonical order is actually X Z Y, and the Z can be three or more characters which are moved into the middle of the string X Y.

Note the following required collation order:

X Z1
X Z2
...
X Zn
X Y1 Z1
X Y1 Z2
...
X Y1 Zn
X Y2 Z1
X Y2 Z2
...
X Y2 Zn

This collation is simple when the string is ordered like this. But consider the problem of generating this collation when the strings are canonically reordered as follow, and Z is an arbitrary combination of 12 different marks (I think the collation algorithm can do this only if every possible Y Z combination is listed as a collation contraction):

X Z1
X Z2
...
X Zn
X Z1 Y1
X Z2 Y1
...
X Zn Y1
X Z1 Y2
X Z2 Y2
...
X Zn Y2

2. Unicode not making a distinction between X Y Z and X Z Y.
Where there are cases where canonically-equivalent X Y Z and X Z Y should be
rendered differently, then CGJ could be used to preserve the distinction, as per
the UTC decision:
[96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining
grapheme joiner has the effect of preventing the canonical re-ordering of
combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234]
[96-A72] Action Item for Ken Whistler: Draft language for consensus 96-C20 (on the effect of combining grapheme joiner to prevent canonical re-ordering of combining marks during normalization) for inclusion into Unicode 4.0.1 and create a FAQ describing this effect as well. [L2/03-235, L2/03-236, L2/03-234]

Agreed. Has any text been drafted?

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Merging combining classes, was: New contribution N2676

Reply via email to