I believe this to the wrong outlook. The real situation is that the real text has no Yod--deliberately so from a Masoretic standpoint. No invisible Yod should be inserted to 'emend' the text.
(Note that I am not making a pietistic argument, I'm not the least bit pietistic, though I suspect there are Biblical scholars who would take that view. I'm simply making a text faithfulness argument.) K ----- Original Message ----- From: "Jony Rosenne" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, July 26, 2003 2:24 AM Subject: RE: Yerushala(y)im - or Biblical Hebrew > This explanation makes me unhappy with CGJ. > > Ken says: "The important things are that it is a) invisible, b) a combining > mark, and c) has combining class zero". > > And: "There is no need for an invisible base character here". > > On the contrary, to represent the text we do need an invisible base > character for the Hiriq, representing the unwritten Yod. > > Another possibility is to encode the Yod with a complex text (in the meaning > non plain text) control saying the Yod is invisible. > > I think it is important, whatever solution is chosen, to represent the real > situation, rather than just a sequence of codes that happens to be able to > produce the desired visual output. > > Jony > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Kenneth Whistler > > Sent: Saturday, July 26, 2003 2:40 AM > > To: [EMAIL PROTECTED] > > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > Subject: Re: Yerushala(y)im - or Biblical Hebrew > > > > > > Ted continued: > > > > > If I recall correctly, the suggestion for using CGJ for > > yerushala(y)im > > > was to encode it as: <...lamed, patah, cgj, hiriq, final > > mem>. Also, I > > > seem to recall that this gave some people heartburn because CGJ was > > > not intended to join two combining characters. What if this > > case were > > > encoded as: <...lamed, patah, cgj, zwnbs, hiriq, final > > mem>? (Please > > > forgive me if this is what had been proposed all along.) > > > > > > As I understand it from reading the description of CGJ (and > > ignoring > > > for the moment that zwnbs has no visible glyph and is > > general category > > > Cf), this is exactly what CGJ was designed for: treat the two base > > > characters on either side of the CGJ as a single grapheme for the > > > purpose of placing combining characters. This approach uses > > zero width > > > no-break space to represent the "missing letter" > > interpretation of the > > > two vowels pointed out by Jony Rosenne. Normalization > > wouldn't destroy > > > the ordering of the vowels, and Hebrew-aware software could > > be written > > > to do all this more-or-less transparently and automatically. > > > > Hmm. Some further clarifications are in order, since the > > documentation for both of these characters has not quite > > caught up to the UTC decisions regarding them. A lot of work > > went into the Unicode 4.0 documentation on these, and the > > Unicode 4.0 chapters will be posted online very soon -- at > > which point it would be helpful if everyone concerned about > > this issue takes the time to read the latest on these > > characters in particular. > > > > First, about ZWNBS (U+FEFF). Because of the confusing overlap > > of functionality of U+FEFF as the BOM (byte order mark) in > > the Unicode encoding schemes and as what its name, ZERO WIDTH > > NO-BREAK SPACE implies, the UTC (as of Unicode 3.2) > > standardized a separate character, U+2060 WORD JOINER. That > > character is described in UAX #14, Line Breaking Properties: > > http://www.unicode.org/reports/tr14/ > > U+2060 is "the preferred choice for an invisible character to keep > > other characters together that would otherwise be split > > across the line at a direct break." U+FEFF retains that > > semantic, for backwards compatibility, but its preferred use > > is as the byte order mark only. > > > > So whether or not a line break format control character is > > relevant to the Biblical Hebrew vowel problem (and I don't > > think it is, actually), one should be talking about use of > > U+2060 WORD JOINER (WJ), rather than U+FEFF ZWNBS in any such > > new context. > > > > Second, there is U+034F COMBINING GRAPHEME JOINER (CGJ) > > itself. The impetus for encoding the CGJ at all was to have a > > plain text means of distinguishing, for example, an "ie" > > sequence that weights as two units for collation and an "ie" > > sequence that weights as a single unit for collation. > > > > During the debate about such an addition, the entity was > > called various things, but the moniker "GRAPHEME JOINER" > > caught on in the committee and stuck. There was also debate > > about an equal and opposite "GRAPHEME NON-JOINER", on the > > principle that inserting a GNJ between, e.g., a "ch" weighted > > as a unit, so as to force it to be treated as two units would > > be the more normal requirement in collation. However, the > > committee did not develop consensus that that was a required > > *character*, in part because insertion of *any* delimiting > > character in that context could be taken as having that > > effect or be tailored in collation to weight as desired to > > distinguish it from the digraphic unit, for example. > > > > The "COMBINING" became part of the CGJ's name when it > > became clear that the character should be given the > > General Category Mn, making it a combining mark, rather > > than General Category Cf to make it a format control. > > > > During this debate, high hopes were also placed on the > > COMBINING GRAPHEME JOINER as being the magic bullet for all > > kinds of things: it could "glue together" a pair of accents > > so that they would render side-by-side instead of using the > > default accent placement rules. It could also "glue together" > > sequences of characters into a "grapheme cluster", so that > > the grapheme cluster would become the target of an enclosing > > combining mark -- that would resolve the problem of how to > > get an enclosing circle to circle an arbitrary number, rather > > than just a single digit, for example. > > > > In the end, however, the inconsistent and troubling > > implications of this attempt at getting the Unicode > > Standard further involved in the monkey business of trying > > to be a glyph description language, rather than a character > > encoding, caused many second thoughts. And the UTC formally > > backed away from all those silver bullet aspects of CGJ. In > > Unicode 4.0, CGJ has been stripped of all interpretation > > except as an invisible mark which can be used to tailor > > collation (and searching), so as to distinguish digraphic > > units from sequences of the same characters. > > > > If you look at UAX #29, Text Boundaries, now, and in > > particular, Section 3, Grapheme Cluster Boundaries, you will > > see that CGJ has nothing to do with the definition of such > > boundaries. While it has the Grapheme_Link property (as do > > all the Indic viramas), Grapheme_Link is no longer even > > mentioned in UAX #29, and Grapheme_Link is nowhere else used, > > not even in a derived property. > > > > So the shorthand interpretation of CGJ currently is > > "invisible target for collation tailoring of neighboring > > characters into a digraphic unit." Even calling it by its > > formal name, COMBINING GRAPHEME JOINER, immediately conjures > > up the wrong connotations, so it is better to just use the > > CGJ acronym and not spell it out. Or think of CGJ as standing > > for "Collation kluGJe", if you wish. ;-) > > > > Now when you say: > > > > > If I recall correctly, the suggestion for using CGJ for > > yerushala(y)im > > > was to encode it as: <...lamed, patah, cgj, hiriq, final > > mem>. Also, I > > > seem to recall that this gave some people heartburn because CGJ was > > > not intended to join two combining characters. > > > > If people are getting "heartburn" because CGJ is not intended > > to join two combining characters, the problem they are having > > is the result of a misunderstanding of the intent here. > > > > It is *true* that the CGJ is no longer intended to "join two > > combining characters", although people tried for awhile to > > see if it would work to "glue together two combining > > characters" for different rendering. > > > > But the point of the CGJ proposal with respect to Biblical > > Hebrew is *not* to somehow sneak back around to interpreting > > the CGJ as gluing two combining characters together. Instead, > > it turns out that the CGJ, whose interpretation has been > > whittled down to being almost nothing, has the appropriate > > set of character *properties* to serve to block canonical > > reordering of a combining character sequence. The important > > things are that it is a) invisible, b) a combining mark, and > > c) has combining class zero. To serve the purpose of blocking > > the canonical ordering, it doesn't have to *do* anything but > > just sit there with its properties as defined. It doesn't > > "join" anything, and it doesn't have anything to do with the > > "grapheme" status of the resulting sequence. > > > > The only other Unicode characters with those properties are > > the variation selectors, but those characters *do* have > > cooccurrence constraints that prevent them from following a > > combining mark (at least in a legally interpretable way). > > That leaves the CGJ as the *only* Unicode character which has > > the desired properties and which has no constraints against > > occurrence in the middle of a combining character sequence. > > > > Another way of thinking of this is that in addition to CGJ > > being the "Collation kluGJe", it can be interpreted as the > > "Canonical Gradient Jigger", if we simply acknowledge the > > fact that, given its current properties, if it occurs in the > > relevant sequences of combining marks, it already has the > > effect of jiggering the canonical gradients to produce just > > the distinctions desired. ;-) > > > > > Of course, zwnbs is not a base character. If using zwnbs is > > a problem > > > (because it has no visible glyph and/or because it has > > category Cf), > > > then perhaps what is needed is another character (perhaps a > > new one) > > > that has no width or visible glyph but can be treated as a base > > > character (category Lo). That may be needed anyway, since > > some of the > > > boundary definitions have special rules for zwnbs. > > > > There is no need for an invisible base character here. That > > *would* be going further than is necessary to solve the > > problem, and would create arguments about the actual content > > of the text -- are we encoding an inherent consonant here or > > not? Why go there, when the problem is simply to represent > > the text as shown and then let commentators and phonologists > > argue about whether the yod is "really" there or not. > > > > > Ted > > > > > > P.S. It's two p's but only one d. :) > > > > Sorry. Anticipatory doubling, I guess... > > > > --Ken > > > > > > > > > > > > >