[whatwg] Handling of invalid UTF-8

2013-08-29 Thread Cameron Zemek
In the spec preview it had a section about UTF-8 decoding and the handling of invalid byte sequences, http://dev.w3.org/html5/spec-preview/infrastructure.html#utf-8 . But I have noticed this section has been removed from the current version. So what algorithm is used for handling of invalid UTF-8

[whatwg] Control and Undefined Characters

2012-10-10 Thread Cameron Zemek
The spec states: Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE, U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6, U+7FFFE,

Re: [whatwg] Control and Undefined Characters

2012-10-10 Thread Cameron Zemek
On Thu, Oct 11, 2012 at 9:07 AM, Ian Hickson i...@hixie.ch wrote: User agents are required to treat U+0001 the same as, say, A. Yeah that is how I understood the specification. And testing in Firefox and Chrome it appears these characters are ignored. But I see no mention of this anywhere to

Re: [whatwg] Null characters

2012-10-09 Thread Cameron Zemek
On Wed, Oct 10, 2012 at 4:47 AM, Ian Hickson i...@hixie.ch wrote: I could add a note... based on what Boris described, what would you want the note to say and where would you want it placed, such that you would have seen it when your original reading caused you to e-mail the list? (This part

[whatwg] Null characters

2012-10-08 Thread Cameron Zemek
I noticed the specification usually treats null characters U+ by replacing them with the replacement character U+FFFD . The other cases it will be ignored by the tree construction stage when the mode is 'in body', 'in table text', 'in select'. Would it not be simpler and more consistent to

Re: [whatwg] Null characters

2012-10-08 Thread Cameron Zemek
On Tue, Oct 9, 2012 at 1:36 PM, Ian Hickson i...@hixie.ch wrote: On Tue, 9 Oct 2012, Cameron Zemek wrote: I noticed the specification usually treats null characters U+ by replacing them with the replacement character U+FFFD . The other cases it will be ignored by the tree construction