2013/9/18 Stephan Stiller <[email protected]> > In what way does UTF-16 "use" surrogate code *points*? An encoding form > is a mapping. Let's look at this mapping: > > - One *inputs* scalar values (not surrogate code points). > > In fact the input is one code point.
Then only if that code point has a scalar value (this may be tested or not by the application), the rest of the algorithm applies. The standard does not specify what the converter will do or if it will produce some conforming UTF-16 on output. Applications may still do everything they want in that case, provided that the output will be conforming to the standard each time the input is conforming. In that case, the application can claim conformance, even if it uses these unspecified extensions (the application conformance is different from the conformance of the ouput, given that an non-standard extension in a conforming application can still produce conforming UTF-16 output... or not). Even the simple fact of returning an error in the application can be considered as a distinct ouput, which is also NOT part of the UTF-16 standard (UTF-16 contains nothing for encoding the concept of encoding errors). So conforming applications are free to either: drop the offending codepoint siliently, or generating some non-standard ouput, or replacing that codepoint to another one that has a scalar value (the replacement character is not specified in the UTF-16 standard), or output some data/event to another out-of-band channel separated from the UTF-16 output stream, or stopping the process (producing a output truncated prematurely, or continuing but changing the status returned along with the UTF-16 output).

