The issue is not just one of rendering. See below.Thank you for the interesting thoughts. As I understand your suggestion, and bearing in mind that dagesh (and the rare rafe) are also consonant modifiers, you are effectively suggesting an order (already normalised):
consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ vowel2 accent2
with each element being optional, and CGJ being omitted when it is at the beginning or the end of the string of combining marks, or doubled.
This would, I think, work, and at least come close to being rendered
correctly with current fonts modified to ignore CGJ (which actually they
should do anyway as CGJ is default ignorable). The down side is the
There are two very different cases that appear to be conflated by the above
example.
1. Current engines incorrectly rendering canonically equivalent text.No, it is more than this. It is also a mechanism to ensure that the string X Y Z is collated as the string X Y Z, and that this string is matched by a search for X Y, which is rather difficult if the canonical order is actually X Z Y, and the Z can be three or more characters which are moved into the middle of the string X Y.
If a rendering engine renders X Y Z correctly, but doesn't render a canonically-equivalent X Z Y correctly, then there is a problem in the engine. [Note: this would be for sequences X Y Z that would actually occur in practice.]
Using CGJ for this would simply be a mechanism to get by current deficiencies in
the engines.
Note the following required collation order:
X Z1 X Z2 ... X Zn X Y1 Z1 X Y1 Z2 ... X Y1 Zn X Y2 Z1 X Y2 Z2 ... X Y2 Zn
This collation is simple when the string is ordered like this. But consider the problem of generating this collation when the strings are canonically reordered as follow, and Z is an arbitrary combination of 12 different marks (I think the collation algorithm can do this only if every possible Y Z combination is listed as a collation contraction):
X Z1 X Z2 ... X Zn X Z1 Y1 X Z2 Y1 ... X Zn Y1 X Z1 Y2 X Z2 Y2 ... X Zn Y2
2. Unicode not making a distinction between X Y Z and X Z Y.Agreed. Has any text been drafted?
Where there are cases where canonically-equivalent X Y Z and X Z Y should be rendered differently, then CGJ could be used to preserve the distinction, as per the UTC decision:
[96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining grapheme joiner has the effect of preventing the canonical re-ordering of combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234]
[96-A72] Action Item for Ken Whistler: Draft language for consensus 96-C20 (on
the effect of combining grapheme joiner to prevent canonical re-ordering of
combining marks during normalization) for inclusion into Unicode 4.0.1 and
create a FAQ describing this effect as well. [L2/03-235, L2/03-236, L2/03-234]
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/