----- Original Message ----- From: "Peter Constable" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, April 09, 2004 6:50 PM Subject: [hebrew] Re: Draft proposal for Unicode encoding of holam male
> > >this does not make it a vowel. > > > > Only in the sense that... > > Bringing the discussion back on topic... Let me try to support Jony's > position contra John for a moment. To avoid terminology like consonant > and vowel, let me simply refer to "base" characters, meaning everything > but the points -- the stuff that would be included in an unpointed text. > > There is a need to produce unpointed Biblical text, in which case the > vav will appear in a single form, regardless of whether it corresponds > in pointed text to C (vav) + V (holam) or to holam male. But the same > base characters should be used in unpointed and pointed data. This has > implications relating to the two alternative solutions: > > - (in PK's proposed solution) the vav + holam and the holam male text > elements are both represented by a single character, VAV, or > > - (in John's alternate solution) a distinct character, holam male, must > have an unpointed glyph variant > > The latter would be awkward for implementers and users. Therefore, the > former is preferable. Why not encoding instead a VAV variant to be used when VAV is not a real consonnant but a special base which alters the meaning of the following vowel? i.e. <VAV,VS1,HOLAM> This has the additional benefit of still allowing to render it (not strictly correctly) as <VAV,HOLAM>, i.e. vav haluma with the central/right holam dot, if the special form is not supported. However I wonder what it could impact for collation, as variant selectors are normally ignorable... But it allows a font to treat <VAV,VS1> as a separate glyph id which has a distinct ligature and positioning pair for the following HOLAM. And it keeps the structure of Hebrew as a base consonnant followed by optional vowel points. On the opposite, a separate <HOLAM MALE> vowel codepoint would preferably already encode both the VAV glyph and the left-holam dot (so <HOLAM MALE,HOLAM> would be probably rendered as the HOLAM MALE base letter, with a HOLAM point on the right, i.e. with two dots above the VAV glyph.) and it would need a new collation rule, as well as reencoding most texts that assume the opposite convention where <VAV, HOLAM> was used to encode holam male and not the newer vav haluma. Other proposals based on ZWJ and ZWNJ will just complicate things. in fact, as holam male is the most common case, and vav haluma is rare, the legacy sequence <VAV, HOLAM> should preferably encode the HOLAM MALE (to avoid reencoding too many texts). But I'd like to see what can be done quite simply on Bliblic texts (which are wellknown, stable and easily reencodable), and what is used in modern Hebrew using <VAV, HOLAM> (these resources are unlimited, unknown, and nearly impossible to guarantee that they will be reencodable easily). For example I think about modern people names, tononyms and trademarks which should not need any reencoding, as well as most modern publications where this work is impossible to finish (notably the texts of newspapers and lots of cheap books and publications). If in modern pointed Hebrew, there's no real distinction between the glyphs shown for holam male and vav haluma, reencoding will not be necessary to render the text correctly, but it may affect some areas like collation. I suppose then that a modern Hebrew collation would probably collate a new <HOLAM MALE> codepoint as <VAV, HOLAM> for vav haluma, this is quite simple to do with a two-level collation key.

