On 24/08/2003 10:56, John Hudson wrote:

... However, this does raise the question of what happens to the ZWNJ in reordering

<bet, dagesh, holam, ZWNJ, alef>

If the holam ends up reordered before the dagesh, where does the ZWNJ end up? If it remains immediately in front of the alef, that's fine.

ZWNJ is not a combining character and so is unaffected by canonical reordering. Combining characters can never move from before it to after it or vice versa. Although CGJ is a combining character, it has the same effect on ordering as ZWNJ as its combining class is zero.




...

In the absence of a CGNJ, and since CGJ does not have defined joining properties despite its misleading name, I have suggested using CGJ for this.


Since actual glyph ligation is occuring, the ZWNJ should be used to inhibit ligation. This is consistent with the Unicode 4.0 description of ZWJ and ZWNJ behaviour. ...

But this is where the problem comes. Because ZWJ and ZWNJ are not combining characters, they (theoretically, though not necessarily in your implementation) break the combining character sequence and so the link between the combining characters which follow it and the base character. In fact the following combining characters become a defective combining sequence whose rendering is undefined. I think MS Word currently inserts a dotted circle in this case, and this is conformant behaviour in the case of a defective combining sequence.


Is this correct, anyone, or am I overstating my case? Actually ZWJ is theoretically less of a problem because it does specify a ligature between the preceding and following combining character sequences. But ZWNJ specifies that they should be rendered separately.

... A question remains, however: should medial meteg with hataf be the default rendering of <hataf..., meteg>, or should such ligation require <hataf..., ZWJ, meteg>? This is a rendering issue, but one which affects encoding: if one set of fonts treats ligation as default and another set doesn't, users will produce documents with conflicting encoding conventions depending on the rendering of the fonts they are using (one can even imagine a single document, set in multiple fonts, using different character sequences to obtain the same rendering). Personally, I favour having the medial meteg as default rendering for <hataf..., meteg>, requiring <hataf..., ZWNJ, meteg> in order to obtain a left meteg, because the medial meteg appears to be the most common positioning in the manuscript tradition.

If we do use ZWJ/ZWNJ, and based on the principle in the standard (TUS 4.0 pp. 389-390) "These characters are not to be used in all cases where ligatures or cursive connections are desired; instead, they are only for overriding the
normal behavior of the text", I would suggest that <hataf, meteg> should be rendered according to the font default which may vary (medial for a font based on BHS, left meteg for a font based on an edition in which this is the default); <hataf, ZWJ, meteg> should be used to prefer medial despite the default (not sure if this is ever required); and <hataf, ZWNJ, meteg> to inhibit medial when this must not be used (as in a few cases in BHS).




...

... Thus my suggestion (= indicates canonical equivalence):

left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
right meteg: <meteg, CGJ, vowel>
medial meteg (hataf vowel): <vowel, meteg> = <meteg, vowel>
left meteg (hataf vowel): <vowel, CGJ, meteg>


I basically agree, with the following modification:

left meteg (hataf vowel): <vowel, ZWNJ, meteg>

See the reasons above for not using this.




Does this mean that we are agreed that the medial meteg rendering should be normative?

I am not intending to say that. I want to say that it can be the default for a particular font or perhaps a font level attribute. Other fonts might have left meteg as the default with hatafs and no medial meteg glyphs; in that case the CGJ or ZWNJ would be ignored. Or they might have left meteg as the default but also have medial meteg glyphs, in which case a different mechanism would be required to request use of the medial meteg, perhaps with ZWJ.


So here is a more nuanced version of my suggestion:

left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
right meteg: <meteg, CGJ, vowel>
font's default position of meteg (hataf vowel): <vowel, meteg> = <meteg, vowel>
medial meteg (hataf vowel) (if supported by the font): TBD (<vowel, ZWJ, meteg> ???)
left meteg (hataf vowel): <vowel, CGJ, meteg>


...

2.10 Extraordinary Points

The SII encoded only the upper extraordinary point, as 05C4 HEBREW MARK UPPER DOT. A character for the lower dot could be added, although it appears only a few times.

Agreed. Although this latter character is rare, it is in regular and undisputed use in a widely used text, and so probably does need to be encoded.


I am content either to have the lower punctum encoded or to use a generic combining mark (U+0323), although the latter raises issues for multiscript fonts in applications that do not support writing system-specific glyph substitution (currently all applications). ...

Presumably a font could be programmed to substitute a glyph based on context, especially for a combining mark where it would be relatively simple to determine that the base character is in the Hebrew block and so the Hebrew glyph variant is required. No help of course if you want an isolated diacritic or a Qere without Ketiv form.


... What I am most keen to have is a clear statement from the UTC identifying 05C4 HEBREW MARK UPPER DOT as the upper punctum, as Jony indicates was intended by SII, and specifying a codepoint for the Hebrew number / masoretic note dot, which requires its own glyph and cannot be harmonised with the upper punctum character. Again, this could mean a new Hebrew block character or U+0307 could be used.

Note that until Jony's note on SII's intent, I had presumed U+05C4 to be the number / masoretic note dot, because of the absence of a corresponding lower mark to indicate that it was the upper punctum. Now I would like a definitive ruling from the UTC, to avoid future confusion.

Agreed. Notes should be added to the code charts for U+05C4, e.g. "= upper punctum extraordinarium", and for U+0307 e.g. "= Hebrew number dot", each with pointers to the other.



-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/





Reply via email to