Re: [websec] #22: content-type sniffing should include charset sniffing

Adam Barth Sun, 23 Oct 2011 21:38:33 -0700

One way to look at the situation is that the sniffing algorithm
operates on octets, not on characters, which is actually what the
draft says.  In that view, it's perfectly well-defined without
reference to character sets.  The fact that the octets happen to
correspond to certain ASCII characters is somewhat beside the point.


Adam


On Sun, Oct 23, 2011 at 9:14 PM, Larry Masinter <[email protected]> wrote:
> I was talking about the necessary dependency of the specifications -- that 
> you couldn't specify media type sniffing completely without making at least a 
> normative reference to charset sniffing.
>
> The fact that the code works that way is evidence, of course, but we're not 
> talking about possibility of implementation (where a single implementation is 
> evidence) but rather orthogonality of interfaces (where the question is 
> whether ALL implementations must follow this pattern.)
>
> Larry
>
>
>
>
> -----Original Message-----
> From: Adam Barth [mailto:[email protected]]
> Sent: Sunday, October 23, 2011 8:37 PM
> To: Larry Masinter
> Cc: Tobias Gondrom; [email protected]
> Subject: Re: [websec] #22: content-type sniffing should include charset 
> sniffing
>
> I mean, that's how the code works, so it must be possible.  :)
>
> Adam
>
>
> On Sun, Oct 23, 2011 at 8:32 PM, Larry Masinter <[email protected]> wrote:
>> I know it's complicated, but scanning text is necessarily part of 
>> determining which application/something+xml  you have.  I think (but should 
>> really check before saying this) that XML media type registrations describe 
>> what the DOCTYPE or XML namespace or root element are, and that, to properly 
>> "sniff" them, you'd have to scan text. But before you scan text, you have to 
>> determine charset.
>>
>> So if we're going to support sniffing of media types in general, I don't see 
>> how we can do that without also specifying charset determination.
>>
>>
>>
>> Larry
>> ]
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On
>> Behalf Of Adam Barth
>> Sent: Sunday, October 23, 2011 8:28 PM
>> To: Tobias Gondrom
>> Cc: [email protected]
>> Subject: Re: [websec] #22: content-type sniffing should include
>> charset sniffing
>>
>> The charset sniffing is also complicated by the fact that sometimes user 
>> agents need to parse some of the HTML to find a <meta> element.
>> In some situations, user agents need to restart the parsing algorithm, which 
>> is quite delicate and better to describe in the same document as HTML 
>> parsing (at least for use by HTML processing engines).
>>
>> Adam
>>
>>
>> On Sun, Oct 23, 2011 at 8:24 PM, Tobias Gondrom <[email protected]> 
>> wrote:
>>> <hat="individual">
>>> I tend not to agree with that.
>>>
>>> The fact that charset sniffing might happen at the same time as
>>> mime-sniffing does not seem like a strong argument to include this in
>>> the draft.
>>>
>>> Furthermore I would rather have these issues separate:
>>> First you determine the content-type and then after that you may want
>>> to determine the charset used within that content-type (if you really
>>> have to sniff the charset). I can also imagine that charset sniffing
>>> algorithm might be depending on the application identified by the
>>> sniffed mime-type, which again would speak against throwing it in together 
>>> with mime-sniffing....
>>>
>>> Kind regards, Tobias
>>>
>>>
>>>
>>> On 24/10/11 00:55, websec issue tracker wrote:
>>>>
>>>> #22: content-type sniffing should include charset sniffing
>>>>
>>>>  the HTML5 spec contains some algorithms for sniffing charset,
>>>> overriding
>>>>  labeled charset, etc.
>>>>
>>>>  MIME parameters like charset are as much a part of the content-type
>>>> as the
>>>>  base internet media type, and any sniffing of parameters and other
>>>>  metadata (overriding content-type or guessing where it is not
>>>> supplied or
>>>>  wrong) should be included in this document, since the sniffing will
>>>> happen
>>>>  at the same time.
>>>>
>>>
>>> _______________________________________________
>>> websec mailing list
>>> [email protected]
>>> https://www.ietf.org/mailman/listinfo/websec
>>>
>> _______________________________________________
>> websec mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/websec
>>
>
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec

Re: [websec] #22: content-type sniffing should include charset sniffing

Reply via email to