Chris Jacobs wrote:

> [ cc Theodore Smith ]
> 
> So I had it wrong, it _is_ deprecated.

It isn't exactly "deprecated", since deprecation has a
rather strong sense in the standard, and is correlated with
the formal assignment of a deprecated property to the
character.

Use of the code point U+FEFF is clearly *not* deprecated
in the standard.

The current situation is briefly as follows:

The standard *requires* the use of U+FEFF for some of
the Unicode encoding schemes. Details are spelled out in:

http://www.unicode.org/book/preview/ch03.pdf

Because of those requirements and the nature of the encoding
scheme definitions, the occurrence of U+FEFF in initial
position in *some* of the encoding schemes forces its
interpretation as a zero width no-break space, rather
than as a byte order mark. The difference is roughly
as follows: a BOM is not formally part of the content
of the text, but rather is part of the specification
of the encoding scheme; a ZWNBSP is formally part of the
content of the text.

*Because* this distinction, which is required for backwards
compatibility with existing usage of U+FEFF, is rather
subtle and confusing, and *because*, nonetheless, the
idea of having a character to indicate a no-break position
is a useful one, the UTC standardized (in Unicode 3.2),
U+2060 WORD JOINER as the *preferred* character to use
in the latter situation.

In other words, if what you need is to glue things together,
i.e. a zero width no-break space *function*, then use
U+2060. If what you need is a BOM for the encoding scheme
specifications, then use U+FEFF.

What is *discouraged*, but not prohibited, of course, is
using U+FEFF for a zero width no-break space *function*,
precisely because that interacts so confusingly with
the BOM.

--Ken


Reply via email to