Thank you for the interesting thoughts. As I understand your suggestion, and bearing in mind that dagesh (and the rare rafe) are also consonant modifiers, you are effectively suggesting an order (already normalised):Thanks a lot for thzese precisions on Hebrew usages that need those combining order overrides. This demonstrates that this occurs relatively infrequently, and so introducing a ignorable "combining order override" control makes sense, without needing to add duplicate codepoints with corrected properties.
What is important here is whever the lack of this ovveride or separate codepoint makes the text ambiguous. With your comments, I see that the Hebrew logical order may not always need to be respected in the encoded string, provided that the character identity (for example the sin letter) is preserved, according to users expectations (notably if a combined character is mapped on the common keyboard).
I would then say that the Hebrew language should need to represent grapheme clusters as: - a logical combining sequence for the initial consonnant and its modifier (like shin dot) - then the logical combining sequences for each extra vowel sign with their accuentation.
The problem here is that consonnant modifiers, vowels and accents in Hebrew are all encoded as combining characters, but each subgroup belong to combining classes whose value ranges are overlapping. With the current model, only 1 combining sequence can be encoded, without sub-hierarchy. If only the Hebrew vowels had been encoded as separate base characters instead of combining characters, we would not have this problem, as they would initiate their own combining sequence.
That's where a CCO (combining class override) control character (CGJ or other) can help: it can be used to force a missing and separate base character for vowels, notably for the second vowel group, but also for the consonnant modifier (shin dot) if it is followed by a vowel group.
We won't change the combining classes. And we won't reform the normalization rules as defined for NF* conformance. But we can add further normalization steps for Hebrew, describing the correct use of the combining order overrides, and that correctly reorders all the combining characters after the initial consonnant, to generate the correct logical order. And we can make font renderers accept this new encoding, by letting them recognize the CCO.
consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ vowel2 accent2
with each element being optional, and CGJ being omitted when it is at the beginning or the end of the string of combining marks, or doubled.
This would, I think, work, and at least come close to being rendered correctly with current fonts modified to ignore CGJ (which actually they should do anyway as CGJ is default ignorable). The down side is the large number of CGJ's required. Dagesh occurs 171701 times in the Hebrew Bible (eBHS), shin dot 46277 times, and sin dot 12128 times. As this proposal would require CGJ to be added after any group or one or more of these together, followed by a vowel (nearly always present) or an accent, the effect of this proposal is that CGJ would have to be used nearly 200,000 times in the Hebrew Bible, instead of just over 1000 times. This is not in itself a reason to reject the idea, but it does undermine your initial argument in favour of CGJ.
I am not sure what you mean by "further normalization steps for Hebrew". If this means that users will be expected to input Hebrew in this order, perhaps with a keyboard driver which inserts the necessary CGJs, this is good. But I don't think it is reasonable to expect software producers to add an extra layer to their software specifically for Hebrew, especially when now they are refusing to add such a layer with more general applicability when specifically required to do so in the standard.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/