Re: [whatwg] Surrogate pairs and character references

Øistein E . Andersen Wed, 16 Sep 2009 17:38:19 -0700

It is much clearer now.  Thanks.  Just a few minor issues:

"Bytes or sequences of bytes in the original byte stream that couldnot be converted to Unicode characters must be converted to U+FFFDREPLACEMENT CHARACTER code points."

With the new definition of Unicode characters as Unicode scalarvalues, this excludes surrogate code points, which are also handledseparately (and cause a parse error) in the step quoted below. Youmay want to say "Unicode code points" rather than "Unicode characters".

"U+FFFD REPLACEMENT CHARACTERs" is sufficient, used elsewhere andprobably reads better than "U+FFFD REPLACEMENT CHARACTER code points".

All U+0000 NULL characters and code points in the range U+D800 to U+DFFF in the input must be replaced by U+FFFD REPLACEMENTCHARACTERs. Any occurrences of such characters and code points areparse errors.

The phrase "characters and code points" (in the second sentence) isawkward given that all characters are in fact code points.


--
Øistein E. Andersen

Re: [whatwg] Surrogate pairs and character references

Reply via email to