Andy Pepperdine wrote:
On Monday 05 March 2007 12:25, Henk de Leeuw wrote:
[...]
This question piqued my curiosity, so I created a simple file containing a non-breaking space, and find they encode it with Unicode 00A0, which http://www.unicode.org/charts/PDF/U0080.pdf defines as a non-breaking space, but not any other characteristics.

Checking at http://www.fileformat.info/info/unicode/char/00a0/index.htm indicates that .NET wants
  Char.IsWhiteSpace()  True.
But Java is odd, it has both:
  Character.isSpaceChar()  Yes
  Character.isWhitespace()  No

Nothing "odd" about that. Character.isSpaceChar() is defined:

  Determines if the specified character (Unicode code point) is a
  Unicode space character. A character is considered to be a space
  character if and only if it is specified to be a space character
  by the Unicode standard. This method returns true if the character's
  general category type is any of the following:

    * SPACE_SEPARATOR
    * LINE_SEPARATOR
    * PARAGRAPH_SEPARATOR

Character.isWhiteSpace() is defined:

  Determines if the specified character (Unicode code point) is
  is white space according to Java. A character is a Java whitespace
  character if and only if it satisfies one of the following criteria:

        * It is a Unicode space character (SPACE_SEPARATOR,
        LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a
        non-breaking space ('\u00A0', '\u2007', '\u202F').
        * It is '\u0009', HORIZONTAL TABULATION.
        * It is '\u000A', LINE FEED.
        * It is '\u000B', VERTICAL TABULATION.
        * It is '\u000C', FORM FEED.
        * It is '\u000D', CARRIAGE RETURN.
        * It is '\u001C', FILE SEPARATOR.
        * It is '\u001D', GROUP SEPARATOR.
        * It is '\u001E', RECORD SEPARATOR.
        * It is '\u001F', UNIT SEPARATOR.

So the exclusion is by design.

\u00A0 is the classic non-breaking space (used, e.g., for HTML  ).
\u2007 is figure space (guaranteed to be the same width as digits).
\u202F is narrow no-break space ("narrow" is undefined).

It seems to me that there is a basic design question here. There can be no doubt that \u2007 and \u202F are supposed to have locked widths, but \u00A0 is undefined in this respect, and /could/ be treated as an adjustable space for justification. However, I'll eat my hat if there aren't users who'll have a problem with that.

This makes yet another case (the third I've come across, I think) where a no-break character attribute, such as DeScribe had, would be the most obvious and convenient solution. Let \u00A0 be fixed width, as it is now, and let a no-break character attribute and an ordinary space handle the adjustable case.

--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to