On 27/10/2003 06:54, Philippe Verdy wrote:

Thanks a lot for thzese precisions on Hebrew usages that need those
combining order overrides.
This demonstrates that this occurs relatively infrequently, and so
introducing a ignorable "combining order override" control makes sense,
without needing to add duplicate codepoints with corrected properties.

What is important here is whever the lack of this ovveride or separate
codepoint makes the text ambiguous. With your comments, I see that the
Hebrew logical order may not always need to be respected in the encoded
string, provided that the character identity (for example the sin letter) is
preserved, according to users expectations (notably if a combined character
is mapped on the common keyboard).

I would then say that the Hebrew language should need to represent grapheme
clusters as:
- a logical combining sequence for the initial consonnant and its modifier
(like shin dot)
- then the logical combining sequences for each extra vowel sign with their
accuentation.

The problem here is that consonnant modifiers, vowels and accents in Hebrew
are all encoded as combining characters, but each subgroup belong to
combining classes whose value ranges are overlapping. With the current
model, only 1 combining sequence can be encoded, without sub-hierarchy. If
only the Hebrew vowels had been encoded as separate base characters instead
of combining characters, we would not have this problem, as they would
initiate their own combining sequence.

That's where a CCO (combining class override) control character (CGJ or
other) can help: it can be used to force a missing and separate base
character for vowels, notably for the second vowel group, but also for the
consonnant modifier (shin dot) if it is followed by a vowel group.

We won't change the combining classes. And we won't reform the normalization
rules as defined for NF* conformance. But we can add further normalization
steps for Hebrew, describing the correct use of the combining order
overrides, and that correctly reorders all the combining characters after
the initial consonnant, to generate the correct logical order. And we can
make font renderers accept this new encoding, by letting them recognize the
CCO.




Thank you for the interesting thoughts. As I understand your suggestion, and bearing in mind that dagesh (and the rare rafe) are also consonant modifiers, you are effectively suggesting an order (already normalised):

consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ vowel2 accent2

with each element being optional, and CGJ being omitted when it is at the beginning or the end of the string of combining marks, or doubled.

This would, I think, work, and at least come close to being rendered correctly with current fonts modified to ignore CGJ (which actually they should do anyway as CGJ is default ignorable). The down side is the large number of CGJ's required. Dagesh occurs 171701 times in the Hebrew Bible (eBHS), shin dot 46277 times, and sin dot 12128 times. As this proposal would require CGJ to be added after any group or one or more of these together, followed by a vowel (nearly always present) or an accent, the effect of this proposal is that CGJ would have to be used nearly 200,000 times in the Hebrew Bible, instead of just over 1000 times. This is not in itself a reason to reject the idea, but it does undermine your initial argument in favour of CGJ.

I am not sure what you mean by "further normalization steps for Hebrew". If this means that users will be expected to input Hebrew in this order, perhaps with a keyboard driver which inserts the necessary CGJs, this is good. But I don't think it is reasonable to expect software producers to add an extra layer to their software specifically for Hebrew, especially when now they are refusing to add such a layer with more general applicability when specifically required to do so in the standard.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to