Philippe VERDY wrote:

> (In fact I also think that mapping invalid sequences to U+FFFD is also
> an error, because U+FFFD is valid, and the presence of the encoding
> error in the source is lost, and will not throw exceptions in further
> processings of the remapped text, unless the application constantly
> checks for the presence of U+FFFD in the text stream, and all modules
> in the application explicitly forbids U+FFFD within its interface...)

Mapping invalid sequences to U+FFFD is explicitly permitted by
conformance clause C12a (TUS 4.0, p. 61):

"When faced with [an] ill-formed code unit sequence while transforming
or interpreting text, a conformant process must treat the first code
unit... as an illegally terminated code unit sequence -- for example, by
signaling an error, filtering the code unit out, or representing the
code unit with a marker such as U+FFFD REPLACEMENT CHARACTER."

Of course, any subsequent process that handles this text would have to
understand this convention, and not choke if handed a U+FFFD.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/


Reply via email to