At 04:28 PM 3/29/2004, Kenneth Whistler wrote:
> I will say again as I have said before - but the above (and what I
> snipped) is extra evidence for it - that what is broke ... is
> the rule that the isolated (generally spacing) form of a combining mark
> should be formed by SPACE or NBSP followed by the combining mark.

This has been the *intent* of the standard since its inception in
1989.

> There
> are many good reasons for not using SPACE for this, including default
> behaviour like inserting line breaks immediately after SPACE.

Nope. UAX #14 specifies the following regarding SPACE followed by
combining marks:

"If U+0020 SPACE is used as a base character, it is treated as AL
instead of SP."

This is an unfortunate typo in UAX#14. The correct statement is:


"If U+0020 SPACE is used as a base character, it is treated as ID
instead of SP."

see the description of these issues in the rules section of the UAX
which are quite explicit:
LB 7a In all of the following rules, if a space is the base character for a combining mark, the space is changed to type <http://www.unicode.org/reports/tr14/#ID>ID. In other words, break before <http://www.unicode.org/reports/tr14/#SP>SP <http://www.unicode.org/reports/tr14/#CM>CM* in the same cases as one would break before an <http://www.unicode.org/reports/tr14/#ID>ID.


Treat SP CM* as if it were ID

As stated in [<http://www.unicode.org/reports/tr14/#Unicode>Unicode], Section 7.7 Combining Marks, combining characters are shown in isolation by applying them to either U+0020 SPACE (SP) or U+00A0 NO- BREAK SPACE (NBSP). The visual appearance is the same, but the line breaking result is different. Correspondingly, if there is no base, or if the base character is <http://www.unicode.org/reports/tr14/#SP>SP, <http://www.unicode.org/reports/tr14/#CM>CM* or <http://www.unicode.org/reports/tr14/#SP>SP <http://www.unicode.org/reports/tr14/#CM>CM* behave like <http://www.unicode.org/reports/tr14/#ID>ID.

This means that a combining character sequence of this type is treated
as a unit for the purposes of line breaking, and this overrides the
behavior otherwise of SPACE to be treated as a line break
opportunity.

There's never a line break opportunity between a SPACE and a combining mark, but
since SP is treated like an ID (ideopgrahic line breaking class), there are
break opportunities *before* the SP that will not be there if an NBSP is used.


Of course UAX #14 only spells out default behavior,
but then "default behaviour" is what was claimed just above.

> Using NBSP rather than SPACE has several advantages, and has long been
> specified in Unicode, although not widely implemented. It is less likely
> to occur accidentally. But it has disadvantages, especially that it will
> always be a spacing character, whereas for display of isolated Indic
> vowels no extra spacing is required.

NBSP is not a fixed-width space.

Correct. Somewhere in the standard, we should point out that using a space/NBSP as base character does not require these spaces to be at the same widths as elsewhere in the text, but that they can (and should) be adjusted to best serve this 'base character' function.


A./






Reply via email to