On Jan 6, 2012, at 8:10 PM, Glenn Maynard <gl...@zewt.org> wrote:

> On Fri, Jan 6, 2012 at 7:36 PM, Jarred Nicholls <jar...@webkit.org> wrote:
>> Correction: rfc4627 doesn't describe BOM detection, it describes zero-byte 
>> detection.  My question remains, though: what exactly are you doing?  Do you 
>> do zero-byte detection?  Do you do BOM detection?  What's the order of 
>> precedence between zero-byte and/or BOM detection, HTTP Content-Type 
>> headers, and overrideMimeType if they disagree?  All of this would need to 
>> be specified; currently none of it is.
> 
> None of that matters if a specific codec is the one all be all.  If that's 
> the consensus then that's it, period.
> 
> WebKit shares a single text decoder globally for HTML, XML, plain text, etc. 
> the XHR payload runs through it before it would pass to JSON.parse.  Read the 
> code if you're interested.  I would need to change the text decoder to skip 
> BOM detection for this one case unless the spec added that wording of 
> discarding when encoding != UTF-8, then that can be enforced all in XHR with 
> no decoder changes.  I don't want to get hung on explaining WebKit's specific 
> impl. details.
> 
> All of the details I asked about are user-visible, not WebKit implementation 
> details, and would need to be specified if encodings other than UTF-8 were 
> allowed.  I do think this should remain UTF-8 only, but if you want to 
> discuss allowing other encodings, these are things that would need to be 
> defined (which requires a clear proposal, not "read the code").

Of course, I apologize I didn't mean it as a dismissal, I just figured if we 
are settled on one codec then I'd spare ourselves the time.  I'm also mobile :) 
I could provide you those details if no decoding changes (enforcement) were 
done in WebKit, if you'd like.  But since this is a new API, might as well just 
stick to UTF-8.

> 
> I assume it's not using the exact same decoder logic as HTML.  After all, 
> that would allow non-Unicode encodings.

Not exact, but close.  For discussion's sake and in this context, you could 
call it the "Unicode" text decoder that does BOM detection and switches Unicode 
codecs automatically.  For enforced UTF-8 I'd (have to) disable the BOM 
detection, but additionally could avoid decoding altogether if the specified 
encoding is not explicitly UTF-8 (and that was a part of the spec).  We'll make 
it work either way :)

> 
> -- 
> Glenn Maynard
> 

Reply via email to