Re: [websec] #22: content-type sniffing should include charset sniffing

Adam Barth Sun, 23 Oct 2011 20:38:03 -0700

I mean, that's how the code works, so it must be possible.  :)

Adam



On Sun, Oct 23, 2011 at 8:32 PM, Larry Masinter <[email protected]> wrote:
> I know it's complicated, but scanning text is necessarily part of determining 
> which application/something+xml  you have.  I think (but should really check 
> before saying this) that XML media type registrations describe what the 
> DOCTYPE or XML namespace or root element are, and that, to properly "sniff" 
> them, you'd have to scan text. But before you scan text, you have to 
> determine charset.
>
> So if we're going to support sniffing of media types in general, I don't see 
> how we can do that without also specifying charset determination.
>
>
>
> Larry
> ]
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Adam Barth
> Sent: Sunday, October 23, 2011 8:28 PM
> To: Tobias Gondrom
> Cc: [email protected]
> Subject: Re: [websec] #22: content-type sniffing should include charset 
> sniffing
>
> The charset sniffing is also complicated by the fact that sometimes user 
> agents need to parse some of the HTML to find a <meta> element.
> In some situations, user agents need to restart the parsing algorithm, which 
> is quite delicate and better to describe in the same document as HTML parsing 
> (at least for use by HTML processing engines).
>
> Adam
>
>
> On Sun, Oct 23, 2011 at 8:24 PM, Tobias Gondrom <[email protected]> 
> wrote:
>> <hat="individual">
>> I tend not to agree with that.
>>
>> The fact that charset sniffing might happen at the same time as
>> mime-sniffing does not seem like a strong argument to include this in
>> the draft.
>>
>> Furthermore I would rather have these issues separate:
>> First you determine the content-type and then after that you may want
>> to determine the charset used within that content-type (if you really
>> have to sniff the charset). I can also imagine that charset sniffing
>> algorithm might be depending on the application identified by the
>> sniffed mime-type, which again would speak against throwing it in together 
>> with mime-sniffing....
>>
>> Kind regards, Tobias
>>
>>
>>
>> On 24/10/11 00:55, websec issue tracker wrote:
>>>
>>> #22: content-type sniffing should include charset sniffing
>>>
>>>  the HTML5 spec contains some algorithms for sniffing charset,
>>> overriding
>>>  labeled charset, etc.
>>>
>>>  MIME parameters like charset are as much a part of the content-type
>>> as the
>>>  base internet media type, and any sniffing of parameters and other
>>>  metadata (overriding content-type or guessing where it is not
>>> supplied or
>>>  wrong) should be included in this document, since the sniffing will
>>> happen
>>>  at the same time.
>>>
>>
>> _______________________________________________
>> websec mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/websec
>>
> _______________________________________________
> websec mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/websec
>
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec

Re: [websec] #22: content-type sniffing should include charset sniffing

Reply via email to