Am 14.11.2011 18:30, schrieb [email protected]:
1. No. That is not what Unicode is for. Unicode's goal is to subsume
all reasonable pre-existing encodings.
Unicode is even more. Look at all the Annexes to Unicode 6.0
Some reasonable pre-existing
encodings include a non-breaking space character, so Unicode includes one.
That does not mean Unicode says you should actually use it! There are
many precedents of Unicode providing multiple ways of representing
things, as a result of including characters from other systems, without
it being reasonable to demand that all Unicode-compatible systems must
support all of them. For instance, most of the U+FFxx range is devoted
to different kinds of hacks for handling partial-width characters in
Asian-language typesetting; the preferred way to do that nowadays is via
OpenType features, but the code points remain in the standard. The U+0000
to U+001F range is basically control characters for Teletype machines;
some of those, like U+000A and U+000D, are widely used in modern documents
(but in varying ways by different systems!) and others, like U+001D, are
virtually unheard-of. Unicode does NOT say everybody has to support them
all let alone all in the same way.
Hmm, I have difficulties exactly understanding the conformance chapter
of Unicode 6.0 ( http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
), but it seems to me, that claiming unicode support seems a very strong
statement.
The U+00A0 code points is not explicitly deprecated in Unicode, but it was
never a principle of Unicode that all implementations have to support all
defined control characters regardless of appropriateness to the particular
purpose. "Non-breaking space" is, from TeX's point of view, not really a
character at all, but a formatting command; and TeX already has a way of
dealing with formatting commands in general and this one in particular.
It is appropriate to say that the preferred way of handling non-breaking
spaces in TeX input is the existing TeX way; and saying that in NO WAY AT
ALL contradicts anything in Unicode. Unicode is servant, not master.
I think it's more like math being servant _and_ master of natural sciences.
2. Inevitably, people will include invalid characters in TeX input; and
U+00A0 is an invalid character for TeX input. The best way to deal with
it is to treat it like any other invalid character and generate an error
message. A reasonable alternative would be to say "it is whitespace; it
will be treated like other whitespace." That would mean ignoring its
breaking/non-breaking-ness, as we have for a long time similarly ignored
the special properties of U+0009 (tab). Of course, if users want to
define a special meaning for U+00A0 in their own input, they can do so
with the existing mechanisms for redefining the meanings of input
characters; but "U+00A0 is equivalent to U+007E (~)," for instance, should
never be the default and (because of trouble displaying it) shouldn't be
encouraged.
Now we come to the trouble of Unicode specifying a line-breaking
algorithm ( http://www.unicode.org/reports/tr14/tr14-26.html ), which
probably isn't exactly TeX's. I'm not into these algorithms, so I can't
compare. But I would ask some Master of this Art to speak up about this
conflict.
3. No. Better to keep everything visible and backward compatible. U+007E
(~) should remain the preferred way of doing non-breaking space.
Should and is … (see other posts).
4. Not applicable because of the answer to #3. Users who do insist on
putting U+00A0 in their input presumably have *already* got their own
reasons to think that it's more convenient for them, including solutions
satisfactory to themselves for how to type it on keyboards and see it on
screens, so that's their business and not a problem we need to solve.
I'm personally trying hard to find a correct way. As of now, I have
found a very simple solution to input special whitespace characters.
(Using Linux, doing this is easy business with ibus.) Alas, I haven't
found any editor suited better to my TeX needs than Kile, but I haven't
yet managed to highlight these special whitespace characters properly.
=> Some experts can do all these things. That doesn't mean, everyone
else should stick do "stupid old" ASCII-7.
bye
Toscho
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex