I can agree that any change should not invalidate existing valid data. But that shouldn't imply that we must validate existing invalid data. There is a lot of existing data which, although encoded in Unicode characters, is invalid or mis-spelled in one sense or another, deliberately so in order to kludge a reasonably good visual representation from bad old software. For example, at www.mechon-mamre.org ZWJ is inserted after vav and before holam when the vav is a consonant because with certain software and font combinations that has the required effect of shifting the holam to the left. We can't simply declare in some kind of amnesty that every existing text is validly encoded.We had a discussion in the SII and the consensus was that we should object to:
- any change or addition related to Hebrew that would invalidate existing
Unicode data or require its modification or re-examination
- any change or addition to Unicode that would make the use of Hebrew moreAbsolutely. But nothing confuses the common user more than not knowing how he or she is supposed to encode a particular text. What is needed is not so much changes to Unicode as clear guidelines for the common user.
complicated or confuse the common user
- any change or addition to Unicode that would require a user of Hebrew toAre we confining "user of Hebrew" to people who know how to speak the language? If so these people already know how to distinguish the two meanings of vav with holam because they pronounce them quite differently. Some users of biblical Hebrew may not know the pronunciation, but I don't think these are the people you have in mind.
have a higher level of knowledge, e.g. to distinguish between items not
commonly distinguished, for example the two meanings of Vav with Holam.
On the other hand, if you are determined that these two graphically and semantically distinct entities should be encoded identically, then at least those of us who want or need to make a graphical or semantic distinction are not entirely stuffed i.e. left without a way ahead. For it does seem to be possible to determine algorithmically, though not entirely without ambiguity in some theoretical cases, which vav with holam is which - the only ambiguity would be in cases where the word before the vav with holam consists only of a string of vavs with dagesh of which the first may be a vowel (shuruq) or a consonant.
- the suggestion to encode Biblical Hebrew separately is unacceptable.I am glad to hear this clearly stated. I agree.
The requirements of professional and knowledgeable users, such as BiblicalIndeed. But also support for the special requirements of scholars should not be restricted just because it goes beyond the requirements of everyday users.
scholars, should not be allowed to impose upon everyday users who are not
blessed with such a profound knowledge and understanding.
Consequently, it was suggested that the several issues with Biblical HebrewWhat references are you referring to? Haralambous? I accept that markup may be suitable for the rare cases of enlarged, reduced, raised and broken letters which he mentions, as these are semantically the base letter plus some essentially extra-textual information. But markup is not appropriate for distinguishing between commonly occurring letters which are distinct semantically and phonetically, as well as very often graphically, like the different forms of vav with holam. Or is markup being suggested as a solution of the Yerushala(y)im issue? If so I fail to see how it addresses the problem, as markup does not inhibit normalisation.
recently mentioned, and several more which were not, should be solved by
means of markup, outside the scope of Unicode. This is how they have been
addressed in many of the references given. This is our recommendation.
Failing that, it was suggested that an existing Unicode character, such asAs there are many objections to ZWNBS, would CGJ be an acceptable alternative? But I do see why you might prefer to use a zero width base character here rather than a combining character, although that would not be appropriate for mittaxat in Exodus 20:4 and for right meteg.
ZERO WIDTH NO-BREAK SPACE, be used for "invisible" Hebrew letters, in cases
such as Yerushala(y)im.
The third, and least favored, option is to add a special Unicode character to represent missing base characters such as the Yod in Yerushala(y)im.
Jony
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/