I agree with Adam and Tobias that we should not pull all of charset
sniffing into this document. Many charset details depend on the mime
type in the first place, and are carefully described in the respective
specs. For some transfer protocols, the question of charset may be
irrelevant (e.g. for text over Websocket, which prescribes and checks
for UTF-8).
Larry is right that in some cases, some preliminary charset sniffing is
necessary to get at some information at the start of the document, but I
think we should strictly limit this draft to these cases.
Regards, Martin.
On 2011/10/24 13:14, Larry Masinter wrote:
I was talking about the necessary dependency of the specifications -- that you
couldn't specify media type sniffing completely without making at least a
normative reference to charset sniffing.
The fact that the code works that way is evidence, of course, but we're not
talking about possibility of implementation (where a single implementation is
evidence) but rather orthogonality of interfaces (where the question is whether
ALL implementations must follow this pattern.)
Larry
-----Original Message-----
From: Adam Barth [mailto:[email protected]]
Sent: Sunday, October 23, 2011 8:37 PM
To: Larry Masinter
Cc: Tobias Gondrom; [email protected]
Subject: Re: [websec] #22: content-type sniffing should include charset sniffing
I mean, that's how the code works, so it must be possible. :)
Adam
On Sun, Oct 23, 2011 at 8:32 PM, Larry Masinter<[email protected]> wrote:
I know it's complicated, but scanning text is necessarily part of determining which
application/something+xml you have. I think (but should really check before saying
this) that XML media type registrations describe what the DOCTYPE or XML namespace or
root element are, and that, to properly "sniff" them, you'd have to scan text.
But before you scan text, you have to determine charset.
So if we're going to support sniffing of media types in general, I don't see
how we can do that without also specifying charset determination.
Larry
]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Adam Barth
Sent: Sunday, October 23, 2011 8:28 PM
To: Tobias Gondrom
Cc: [email protected]
Subject: Re: [websec] #22: content-type sniffing should include
charset sniffing
The charset sniffing is also complicated by the fact that sometimes user agents need
to parse some of the HTML to find a<meta> element.
In some situations, user agents need to restart the parsing algorithm, which is
quite delicate and better to describe in the same document as HTML parsing (at
least for use by HTML processing engines).
Adam
On Sun, Oct 23, 2011 at 8:24 PM, Tobias Gondrom<[email protected]>
wrote:
<hat="individual">
I tend not to agree with that.
The fact that charset sniffing might happen at the same time as
mime-sniffing does not seem like a strong argument to include this in
the draft.
Furthermore I would rather have these issues separate:
First you determine the content-type and then after that you may want
to determine the charset used within that content-type (if you really
have to sniff the charset). I can also imagine that charset sniffing
algorithm might be depending on the application identified by the
sniffed mime-type, which again would speak against throwing it in together with
mime-sniffing....
Kind regards, Tobias
On 24/10/11 00:55, websec issue tracker wrote:
#22: content-type sniffing should include charset sniffing
the HTML5 spec contains some algorithms for sniffing charset,
overriding
labeled charset, etc.
MIME parameters like charset are as much a part of the content-type
as the
base internet media type, and any sniffing of parameters and other
metadata (overriding content-type or guessing where it is not
supplied or
wrong) should be included in this document, since the sniffing will
happen
at the same time.
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec