[whatwg] Handling of invalid UTF-8

2013-08-29 Thread Cameron Zemek
In the spec preview it had a section about UTF-8 decoding and the handling
of invalid byte sequences,
http://dev.w3.org/html5/spec-preview/infrastructure.html#utf-8 . But I have
noticed this section has been removed from the current version. So what
algorithm is used for handling of invalid UTF-8 byte sequences? Or this no
longer part of the HTML 5 specification?

My testing on firefox and chrome seems to indicate that they follow the
algorithm of replacing the first byte of an invalid sequence with the
replacement
character http://en.wikipedia.org/wiki/Replacement_character � (U+FFFD)
and then continue with the parsing of the next byte.


Re: [whatwg] Handling of invalid UTF-8

2013-08-29 Thread Glenn Maynard
On Thu, Aug 29, 2013 at 5:29 PM, Cameron Zemek grom...@gmail.com wrote:

 In the spec preview it had a section about UTF-8 decoding and the handling
 of invalid byte sequences,
 http://dev.w3.org/html5/spec-preview/infrastructure.html#utf-8 . But I
 have
 noticed this section has been removed from the current version. So what
 algorithm is used for handling of invalid UTF-8 byte sequences? Or this no
 longer part of the HTML 5 specification?


http://www.whatwg.org/specs/web-apps/current-work/#dependencies has a
reference to the Encoding spec, which is where the UTF-8 decoding logic
lives now: http://encoding.spec.whatwg.org/#utf-8

-- 
Glenn Maynard


Re: [whatwg] Handling of invalid UTF-8

2013-08-29 Thread Ian Hickson
On Fri, 30 Aug 2013, Cameron Zemek wrote:

 In the spec preview it had a section about UTF-8 decoding and the 
 handling of invalid byte sequences, 
 http://dev.w3.org/html5/spec-preview/infrastructure.html#utf-8 

You really don't want to be using that as a reference. It's a very out of 
date copy of a fork of the spec.

On Thu, 29 Aug 2013, Glenn Maynard wrote:
 
 http://www.whatwg.org/specs/web-apps/current-work/#dependencies has a 
 reference to the Encoding spec, which is where the UTF-8 decoding logic 
 lives now: http://encoding.spec.whatwg.org/#utf-8

Right, the HTML standard (http://whatwg.org/html) now uses the Encoding 
standard (http://encoding.spec.whatwg.org/) to define UTF-8 processing.

Let us know if you see anything wrong with either of these specs!

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'