On Sun, Jan 8, 2012 at 9:12 AM, Larry Masinter <[email protected]> wrote: > <section anchor="intro" title="Introduction"> > <t>HTTP provides a way of labeling content with its > Content-Type, as an indication of the file format / language by > which the content is to be interpreted. Unfortunately, many web > servers, as deployed, supply incorrect Content-Type header > fields with their HTTP responses. In order to be compatible > with these servers, web clients would consider the content of > HTTP responses as well as the Content-Type header fields when > determining how the content was interpreted (the "effective > media type"). Looking at content to determine its type (aka > "sniffing") is also used when no Content-Type header is > supplied.</t> > > Seemed important to define “sniffing”. > > <list style="symbols"> > <t> Q: Why doesn't file upload sniff? </t>
Because it hasn't historically. > <t>Q: where is the concept > of 'privilege' defined?</t> RFC 6454, but we might want to update the terminology to "authority" to better align with that document. > <t> Why not treat sniffed content as a > different origin to prevent XSS? </t> I answered this question in my previous mail. > </list> > > I’m not sure, but at least some of the bigger unaddressed issues could be in > the document? Probably the “status of this document” should just point to > the tracker and I should enter in things as issues, not sure how the group > wants to track these. Referencing the tracker seems fine, but I would assume that's true of every working document in every IETF working group. > <t>However, overly ambitious sniffing has resulted in a number > of security issues in the past. For example, consider a simple > server which allows users to upload content, which is then > served as simple content such as plain text or an images. > However, if the content is subsequently 'sniffed' to be active > content; for example, a malicious user might be able to leverage > content sniffing to mount a cross-site script attack by > including JavaScript code in the uploaded file that a user agent > treats as text/html.</t> > > As I noted before, I wish there were more examples of sniffing security > issues since that’s the main justification for this document, at least as a > ‘websec’ document. Feel free to add a reference to <http://www.adambarth.com/papers/2009/barth-caballero-song.pdf>, which contains a number of concrete attacks. > <t>This document describes a method for sniffing that carefully > balances the compatibility needs of user agent implementors with the > security constraints.</t> > > I only changed “algorithm” to “method” because of the many unspecified > options (e.g., how long to wait for additional data). > > <t>Often, sniffing is done in a context where the use > of the data retrieved is not merely for independent presentation, > but for embedding (as an image, as video) or other uses > (as a style sheet, a script). </t> > > I think this is the crux of some additional material, where you know that > you’re sniffing a font or a script or a style sheet, and that knowledge > influences the sniffing decision. > > <t>One can consider 'sniffing' in several categories: > > <list style="symbols"> > <t>Content delivered via a channel which does not allow > supplying Content-Type </t> > <t>Content delivered via HTTP, but No Content-Type > supplied</t> > <t>Content-Type is malformed</t> > <t>Content-Type is duplicated with different values</t> > <t>Content-Type is syntactically legal, but content clearly > does not > match constraints of specified content-type. </t> > <t>Content-Type is syntactically legal, content may actually match > constraints of specified content-type, but the content > is intended for use in a limited context, in which the > content could also be interpreted as another type.</t> > <t>Content matches the specified content-type constraints, and that > type is appropriate for the context of use, but there is some > other belief that content has been mislabeled.</t> > </list></t> I'm not sure what the point of this taxonomy is. > <t>The supplied content-type usually comes from HTTP, but in > some situations, the link to the content contains a > content-type. (For example, in a style sheet or script.) > </t> > > This is trying to address the question of when sniffing might result in > “false positives”. The main issue is that sniffing needs to come up with a > definitive answer (“what is this”) even in situations where the signature of > the data is consistent with multiple results (data could be interpreted as > application/octet-stream, text/plain, application/xml, > application/something1+xml, application/something2+xml, and all of those > match the signature data; same issue happens with zip-based packaging > formats… Why not just say that then. > <t>ftp: and file: resources also examine the file extension.</t> > > The widget packaging recommendation, which normatively references some > version of sniffing, also uses file extensions for some content and not > others, but I haven’t figured out yet where that belongs. The widget spec is very confused. I would pay more attention to code that's been widely deployed. > <t> The methods described here have been constructed with > reference to content sniffing algorithms present in popular user > agents, an extensive database of existing web content, and > metrics collected from implementations deployed to a sizable > number of users <xref target="BarthCaballeroSong2009" />.</t> > > <t>For reasons discussed in > http://www.w3.org/2001/tag/doc/mime-respect, > sniffing should be avoided when the content could likely be reasonably > interpreted as the content-type supplied. If it is necessary to sniff > in such situations, it is preferable to do so only with care, e.g., > by offering the user an alternative or explicit choice, or by noting > and remembering origins which have content that requires sniffing.</t> I strongly disagree with this last paragraph. If you have your heart set on adding it, let's discuss it in a separate thread first. > This should turn into a reference. I know current implementors don’t want > to bother warning users that their favorite sites actually are sending out > incorrect MIME labels, but we should still recommend it. We shouldn't recommend behavior that implementations aren't going to implement. > <t>Sniffing is by its nature a heuristic process, because there are > many situations where content matches the signatures and capabilities > of many different possible content-type values. I disagree with this statement as well. The sniffing we're talking about here is not a heuristic. It's a historical anomaly that needs to be corrected for in order for user agents to be compatible with some web sites. > False positives result > in security problems, while inconsistent sniffing results in > interoperability problems. For these reasons, implementations of > any receiver of content, attempting to follow the guidelines in this > document, MUST NOT result in any value other than those permitted > in this specification.</t> > > I’m still not sure what the scope of this document is, insofar as whether it > is normative for every browser. It does. > Perhaps the best thing is to try to explicitly address “scope” by moving > those parts of the introduction which address scope into a separate section. Adam _______________________________________________ websec mailing list [email protected] https://www.ietf.org/mailman/listinfo/websec
