On 26/10/2003 19:58, John Hudson wrote:

...
Functionally, inserting a CGJ here resolves the problem fine. I'm just not convinced that CGJ is a good general solution to the normalisation problem: it works, but it requires deliberate insertion in every place where unwanted mark re-ordering may occur. If I have some free time over the next while, I'll try to figure out just how many places in the Bible text this would be needed: I suspect it is quite a lot. Of course, if you insert automatically CGJ after every mark, you are are sure that re-ordering will not take place, but you also lose any benefit of normalisation.


John Hudson

CGJ is likely to be needed:

1) whenever two vowels come together in non-canonical order: approximately 638 times in the WTS eBHS text of the Hebrew Bible (over 5 MB of UTF-8), with little variation in other texts - all but two of these cases are in Yerushala(y)im;

2) according to my proposal, for every occurrence of right meteg: approximately 905 times in eBHS but with a potentially large variation between texts;

3) possibly also for every occurrence of medial meteg: approximately 78 times in eBHS.

Philippe made a good point that the ordering of combining characters relative to CGJ needs to be constrained, as a spelling convention because it cannot be by normalisation. But the ordering here should be related to the logic of the language.

In the case of Yerushalayim, the second vowel is somehow auxiliary and relates to an omitted consonant, whereas the first vowel and the accent (often but not always present) go with the lamed which is written. So in this case the appropriate order is <base character, vowel1, accent, CGJ, vowel2>. In the odd case of two vowels and two accents on one base character in Exodus 20:4 (see http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html section 3.2), the most logical order is actually <base character, vowel1, accent1, CGJ, vowel2, accent2>, because the second accent (geresh) goes with the second vowel (patah).

The situation is rather different for right meteg, if CGJ is used for this, as it is always written to the right of all other combining marks and the other marks are in their regular positions. So the most logical ordering would be <base character, meteg, CGJ, vowel, accent>.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to