In the spec preview it had a section about UTF-8 decoding and the handling
of invalid byte sequences,
http://dev.w3.org/html5/spec-preview/infrastructure.html#utf-8 . But I have
noticed this section has been removed from the current version. So what
algorithm is used for handling of invalid UTF-8
The spec states:
Any occurrences of any characters in the ranges U+0001 to U+0008,
U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters
U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE,
U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6,
U+7FFFE,
On Thu, Oct 11, 2012 at 9:07 AM, Ian Hickson i...@hixie.ch wrote:
User agents are required to treat U+0001 the same as, say, A.
Yeah that is how I understood the specification.
And testing in Firefox and Chrome it appears these characters are
ignored. But I see no mention of this anywhere to
On Wed, Oct 10, 2012 at 4:47 AM, Ian Hickson i...@hixie.ch wrote:
I could add a note... based on what Boris described, what would you want
the note to say and where would you want it placed, such that you would
have seen it when your original reading caused you to e-mail the list?
(This part
I noticed the specification usually treats null characters U+ by
replacing them with the replacement character U+FFFD . The other cases
it will be ignored by the tree construction stage when the mode is 'in
body', 'in table text', 'in select'.
Would it not be simpler and more consistent to
On Tue, Oct 9, 2012 at 1:36 PM, Ian Hickson i...@hixie.ch wrote:
On Tue, 9 Oct 2012, Cameron Zemek wrote:
I noticed the specification usually treats null characters U+ by
replacing them with the replacement character U+FFFD . The other cases
it will be ignored by the tree construction