From: "Peter Kirk" <[EMAIL PROTECTED]> > On 29/10/2003 10:46, John Hudson wrote: > > > While we're about it, we could propose a spacing, non-breaking ELIDED > > CHARACTER for use in ketiv/qere where combining marks need to be > > applied to empty space within a word. > > How would this differ from NBSP? Now if it were a right-to-left > character specifically for RTL scripts, that would help. But failing > that one can safely use <RLM, NBSP>.
Isn't NBSP neutral for directionality (like SPACE)? May be the issue comes with word breaks, but the UTR (proposed UTS) defining text boundaries explicitly states that word breaks should not occur in the middle of a combining sequence (more exactly in the middle of a grapheme cluster if we consider hangul syllables). So any diacritic added on top of a space or NBSP must remain unchanged. What the Text boundaries report does not say clearly is what breaking category is given to a space or NBSP with diacritics. My opinion is that a sequence with a space character and modifiers (category M) becomes adopts the behavior of "Lo" general category for the purpose of determining text boundaries. Its category remains neutral for directionality, unless there's a first diacritic that has a explicit directionality. This means that the sequence <NBSP, KETIV> is a "Lo" for text boundaries, it adopts the directionality of <KETIV>, and its minimum glyphic width becomes 0, its minimum height becomes the x-height of the font used to render it, and further diacritics are laid out / centered around this zero-width base, possibly extending the glyph positioning box with the minimum layout box of each diacritic (note that some diacritics may have their minimum layout box smaller than their effective bounding box, notably if they create ligatures or are kerned within the surrounding base characters; this is true for "double diacritics" whose final layout depends on the next combining sequence). It would be equivalent to <SPACE, KETIV>, but I prefer keeping the <SPACE> free of any diacritic as it has a strongly implied word boundary before and after it, and a candidate line boundary after it. Also because many algorithms are set to ignore all SPACEs at end of lines when determining line widths, including for the full justification of paragraphs, and if there were diacritics on these spaces, they would be rendered within the final margin, or not rendered at all. (NBSP does not have this problem: if it does not fit in the line, or occurs at end of a line, it is still rendered and its width is taken into account when determining where to actually insert line breaks.)

