On 28/07/2003 23:37, Jony Rosenne wrote:

We had a discussion in the SII and the consensus was that we should object
to:

- any change or addition related to Hebrew that would invalidate existing
Unicode data or require its modification or re-examination


I can agree that any change should not invalidate existing valid data. But that shouldn't imply that we must validate existing invalid data. There is a lot of existing data which, although encoded in Unicode characters, is invalid or mis-spelled in one sense or another, deliberately so in order to kludge a reasonably good visual representation from bad old software. For example, at www.mechon-mamre.org ZWJ is inserted after vav and before holam when the vav is a consonant because with certain software and font combinations that has the required effect of shifting the holam to the left. We can't simply declare in some kind of amnesty that every existing text is validly encoded.

- any change or addition to Unicode that would make the use of Hebrew more
complicated or confuse the common user


Absolutely. But nothing confuses the common user more than not knowing how he or she is supposed to encode a particular text. What is needed is not so much changes to Unicode as clear guidelines for the common user.

- any change or addition to Unicode that would require a user of Hebrew to
have a higher level of knowledge, e.g. to distinguish between items not
commonly distinguished, for example the two meanings of Vav with Holam.


Are we confining "user of Hebrew" to people who know how to speak the language? If so these people already know how to distinguish the two meanings of vav with holam because they pronounce them quite differently. Some users of biblical Hebrew may not know the pronunciation, but I don't think these are the people you have in mind.

On the other hand, if you are determined that these two graphically and semantically distinct entities should be encoded identically, then at least those of us who want or need to make a graphical or semantic distinction are not entirely stuffed i.e. left without a way ahead. For it does seem to be possible to determine algorithmically, though not entirely without ambiguity in some theoretical cases, which vav with holam is which - the only ambiguity would be in cases where the word before the vav with holam consists only of a string of vavs with dagesh of which the first may be a vowel (shuruq) or a consonant.

- the suggestion to encode Biblical Hebrew separately is unacceptable.

I am glad to hear this clearly stated. I agree.

The requirements of professional and knowledgeable users, such as Biblical
scholars, should not be allowed to impose upon everyday users who are not
blessed with such a profound knowledge and understanding.


Indeed. But also support for the special requirements of scholars should not be restricted just because it goes beyond the requirements of everyday users.

Consequently, it was suggested that the several issues with Biblical Hebrew
recently mentioned, and several more which were not, should be solved by
means of markup, outside the scope of Unicode. This is how they have been
addressed in many of the references given. This is our recommendation.


What references are you referring to? Haralambous? I accept that markup may be suitable for the rare cases of enlarged, reduced, raised and broken letters which he mentions, as these are semantically the base letter plus some essentially extra-textual information. But markup is not appropriate for distinguishing between commonly occurring letters which are distinct semantically and phonetically, as well as very often graphically, like the different forms of vav with holam. Or is markup being suggested as a solution of the Yerushala(y)im issue? If so I fail to see how it addresses the problem, as markup does not inhibit normalisation.

Failing that, it was suggested that an existing Unicode character, such as
ZERO WIDTH NO-BREAK SPACE, be used for "invisible" Hebrew letters, in cases
such as Yerushala(y)im.


As there are many objections to ZWNBS, would CGJ be an acceptable alternative? But I do see why you might prefer to use a zero width base character here rather than a combining character, although that would not be appropriate for mittaxat in Exodus 20:4 and for right meteg.

The third, and least favored, option is to add a special Unicode character
to represent missing base characters such as the Yod in Yerushala(y)im.

Jony







--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to