<section anchor="intro" title="Introduction">
<t>HTTP provides a way of labeling content with its
Content-Type, as an indication of the file format / language by
which the content is to be interpreted. Unfortunately, many web
servers, as deployed, supply incorrect Content-Type header
fields with their HTTP responses. In order to be compatible
with these servers, web clients would consider the content of
HTTP responses as well as the Content-Type header fields when
determining how the content was interpreted (the "effective
media type"). Looking at content to determine its type (aka
"sniffing") is also used when no Content-Type header is
supplied.</t>
Seemed important to define "sniffing".
<list style="symbols">
<t> Q: Why doesn't file upload sniff? </t>
<t>Q: where is the concept
of 'privilege' defined?</t>
<t> Why not treat sniffed content as a
different origin to prevent XSS? </t>
</list>
I'm not sure, but at least some of the bigger unaddressed issues could be in
the document? Probably the "status of this document" should just point to the
tracker and I should enter in things as issues, not sure how the group wants to
track these.
<t>However, overly ambitious sniffing has resulted in a number
of security issues in the past. For example, consider a simple
server which allows users to upload content, which is then
served as simple content such as plain text or an images.
However, if the content is subsequently 'sniffed' to be active
content; for example, a malicious user might be able to leverage
content sniffing to mount a cross-site script attack by
including JavaScript code in the uploaded file that a user agent
treats as text/html.</t>
As I noted before, I wish there were more examples of sniffing security issues
since that's the main justification for this document, at least as a 'websec'
document.
<t>This document describes a method for sniffing that carefully
balances the compatibility needs of user agent implementors with the
security constraints.</t>
I only changed "algorithm" to "method" because of the many unspecified options
(e.g., how long to wait for additional data).
<t>Often, sniffing is done in a context where the use
of the data retrieved is not merely for independent presentation,
but for embedding (as an image, as video) or other uses
(as a style sheet, a script). </t>
I think this is the crux of some additional material, where you know that
you're sniffing a font or a script or a style sheet, and that knowledge
influences the sniffing decision.
<t>One can consider 'sniffing' in several categories:
<list style="symbols">
<t>Content delivered via a channel which does not allow
supplying Content-Type </t>
<t>Content delivered via HTTP, but No Content-Type supplied</t>
<t>Content-Type is malformed</t>
<t>Content-Type is duplicated with different values</t>
<t>Content-Type is syntactically legal, but content clearly
does not
match constraints of specified content-type. </t>
<t>Content-Type is syntactically legal, content may actually match
constraints of specified content-type, but the content
is intended for use in a limited context, in which the
content could also be interpreted as another type.</t>
<t>Content matches the specified content-type constraints, and that
type is appropriate for the context of use, but there is some
other belief that content has been mislabeled.</t>
</list></t>
<t>The supplied content-type usually comes from HTTP, but in
some situations, the link to the content contains a
content-type. (For example, in a style sheet or script.)
</t>
This is trying to address the question of when sniffing might result in "false
positives". The main issue is that sniffing needs to come up with a
definitive answer ("what is this") even in situations where the signature of
the data is consistent with multiple results (data could be interpreted as
application/octet-stream, text/plain, application/xml,
application/something1+xml, application/something2+xml, and all of those match
the signature data; same issue happens with zip-based packaging formats...
<t>ftp: and file: resources also examine the file extension.</t>
The widget packaging recommendation, which normatively references some version
of sniffing, also uses file extensions for some content and not others, but I
haven't figured out yet where that belongs.
<t> The methods described here have been constructed with
reference to content sniffing algorithms present in popular user
agents, an extensive database of existing web content, and
metrics collected from implementations deployed to a sizable
number of users <xref target="BarthCaballeroSong2009" />.</t>
<t>For reasons discussed in http://www.w3.org/2001/tag/doc/mime-respect,
sniffing should be avoided when the content could likely be reasonably
interpreted as the content-type supplied. If it is necessary to sniff
in such situations, it is preferable to do so only with care, e.g.,
by offering the user an alternative or explicit choice, or by noting
and remembering origins which have content that requires sniffing.</t>
This should turn into a reference. I know current implementors don't want to
bother warning users that their favorite sites actually are sending out
incorrect MIME labels, but we should still recommend it.
<t>Sniffing is by its nature a heuristic process, because there are
many situations where content matches the signatures and capabilities
of many different possible content-type values. False positives result
in security problems, while inconsistent sniffing results in
interoperability problems. For these reasons, implementations of
any receiver of content, attempting to follow the guidelines in this
document, MUST NOT result in any value other than those permitted
in this specification.</t>
I'm still not sure what the scope of this document is, insofar as whether it is
normative for every browser.
Perhaps the best thing is to try to explicitly address "scope" by moving those
parts of the introduction which address scope into a separate section.
Larry
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec