On 13/09/2004 23:39, Andy Heninger wrote:

In looking at how the proposed changes to the TR 29 word boundary rules would be implemented in the ICU library, I came across an odd situation in the rules.

...

While thinking about what to do about this, it struck me that it would probably be more consistent all the way around to remove the Grapheme Extend characters from the ALetter set. The only effect of this change would be on the breaking behavior of combining characters with no base character.

Any thoughts?

Would the effect of this be to allow (in some cases) a word break immediately after a combining character with no base letter?

I have in mind certain situations found in Hebrew (Ketiv/Qere blended forms) in which anomalous (but quite frequently found) word forms begins with a spacing combining character. The currently specified way of supporting this situation is to use SPACE or NBSP followed by the combining character (as these combining characters do not have non-spacing clones). It would be highly undesirable to make a change here which would allow word breaks, line breaks etc after the combining character but before the rest of the word.

Public Review Issue #41 proposes that a new INVISIBLE LETTER be used instead of SPACE or NBSP to carry the combining character in such situations. Presumably, if this is accepted, the problem will go away once this new letter is in use at it has letter-like properties. But the existing usage with SPACE will continue to be found documents already existing now.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to