Philippe VERDY wrote: > (In fact I also think that mapping invalid sequences to U+FFFD is also > an error, because U+FFFD is valid, and the presence of the encoding > error in the source is lost, and will not throw exceptions in further > processings of the remapped text, unless the application constantly > checks for the presence of U+FFFD in the text stream, and all modules > in the application explicitly forbids U+FFFD within its interface...)
Mapping invalid sequences to U+FFFD is explicitly permitted by conformance clause C12a (TUS 4.0, p. 61): "When faced with [an] ill-formed code unit sequence while transforming or interpreting text, a conformant process must treat the first code unit... as an illegally terminated code unit sequence -- for example, by signaling an error, filtering the code unit out, or representing the code unit with a marker such as U+FFFD REPLACEMENT CHARACTER." Of course, any subsequent process that handles this text would have to understand this convention, and not choke if handed a U+FFFD. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

