It is much clearer now.  Thanks.  Just a few minor issues:

"Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode characters must be converted to U+FFFD REPLACEMENT CHARACTER code points."

With the new definition of Unicode characters as Unicode scalar values, this excludes surrogate code points, which are also handled separately (and cause a parse error) in the step quoted below. You may want to say "Unicode code points" rather than "Unicode characters".

"U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere and probably reads better than "U+FFFD REPLACEMENT CHARACTER code points".
All U+0000 NULL characters and code points in the range U+D800 to U +DFFF in the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters and code points are parse errors.

The phrase "characters and code points" (in the second sentence) is awkward given that all characters are in fact code points.

--
Øistein E. Andersen

Reply via email to