On 14/09/2004 22:44, Andy Heninger wrote:
These are the Hebrew characters I had in mind. But then wouldn't the Hebrew accents 0591-05AF also be affected in the same way? If these don't have Grapheme_Extend = true, why not?Peter Kirk wrote: > I have in mind certain situations found in Hebrew (Ketiv/Qere blended > forms) in which anomalous (but quite frequently found) word forms > begins > with a spacing combining character. The currently specified way of > supporting this situation is to use SPACE or NBSP followed by the > combining character (as these combining characters do not have > non-spacing clones). It would be highly undesirable to make a change > here which would allow word breaks, line breaks etc after the > combining > character but before the rest of the word.
The proposed change to word boundaries would have no effect on the case you describe, but word boundaries may already not be doing what you want. If you have a SPACE or NBSP preceding the combining character, the grapheme cluster composed of the space plus the combining char will behave as just a space, and be split off from the remainder of the word.
I found 16 Hebrew characters that would be affected by the change, \u05B0 HEBREW POINT SHEVA through \u05C2 HEBREW POINT SIN DOT with a couple of holes in the middle of the range.
To have these characters attach to a following word, an alphabetic base character is needed.
Well, all of this rather surprises me, because we have been through this one on this list before and others have assured me that there is a special rule by which spaces with combining marks are treated specially. But I see, that is in TR14 under line breaking, not in TR29 under word breaking: "If U+0020 SPACE is used as a base character, it is treated as ID instead of SP." Well, it is perhaps more critical that there should be no line break in these situations than that there should be no word break. I must say I am confused as to why line breaking and word breaking are considered such different issues that they are dealt with entirely separately, when at least in the scripts I am familiar with the rules should be almost identical.
But this fact that SPACE or even NBSP with a combining character is treated as not part of a word for word boundary calculation is another strong argument that INVISIBLE LETTER is necessary, cf. Public Review Issue #41.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

