Andy Pepperdine wrote:
On Monday 05 March 2007 12:25, Henk de Leeuw wrote:
[...]
This question piqued my curiosity, so I created a simple file containing a
non-breaking space, and find they encode it with Unicode 00A0, which
http://www.unicode.org/charts/PDF/U0080.pdf defines as a non-breaking space,
but not any other characteristics.
Checking at http://www.fileformat.info/info/unicode/char/00a0/index.htm
indicates that .NET wants
Char.IsWhiteSpace() True.
But Java is odd, it has both:
Character.isSpaceChar() Yes
Character.isWhitespace() No
Nothing "odd" about that. Character.isSpaceChar() is defined:
Determines if the specified character (Unicode code point) is a
Unicode space character. A character is considered to be a space
character if and only if it is specified to be a space character
by the Unicode standard. This method returns true if the character's
general category type is any of the following:
* SPACE_SEPARATOR
* LINE_SEPARATOR
* PARAGRAPH_SEPARATOR
Character.isWhiteSpace() is defined:
Determines if the specified character (Unicode code point) is
is white space according to Java. A character is a Java whitespace
character if and only if it satisfies one of the following criteria:
* It is a Unicode space character (SPACE_SEPARATOR,
LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a
non-breaking space ('\u00A0', '\u2007', '\u202F').
* It is '\u0009', HORIZONTAL TABULATION.
* It is '\u000A', LINE FEED.
* It is '\u000B', VERTICAL TABULATION.
* It is '\u000C', FORM FEED.
* It is '\u000D', CARRIAGE RETURN.
* It is '\u001C', FILE SEPARATOR.
* It is '\u001D', GROUP SEPARATOR.
* It is '\u001E', RECORD SEPARATOR.
* It is '\u001F', UNIT SEPARATOR.
So the exclusion is by design.
\u00A0 is the classic non-breaking space (used, e.g., for HTML ).
\u2007 is figure space (guaranteed to be the same width as digits).
\u202F is narrow no-break space ("narrow" is undefined).
It seems to me that there is a basic design question here. There can be
no doubt that \u2007 and \u202F are supposed to have locked widths, but
\u00A0 is undefined in this respect, and /could/ be treated as an
adjustable space for justification. However, I'll eat my hat if there
aren't users who'll have a problem with that.
This makes yet another case (the third I've come across, I think) where
a no-break character attribute, such as DeScribe had, would be the most
obvious and convenient solution. Let \u00A0 be fixed width, as it is
now, and let a no-break character attribute and an ordinary space handle
the adjustable case.
--
John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]