By the way, my colleague Juan has been doing an analysis of CFNetwork's content sniffing algorithm and it looks like CFNetwork doesn't have a heuristic for GIF. Based on our data, this is the second most important heuristic. Safari should see a noticeable compatibility gain from this convergence effort.
Adam On Thu, Oct 9, 2008 at 10:38 AM, Adam Barth <[EMAIL PROTECTED]> wrote: > Currently, every WebKit port has to implement its own content sniffing > algorithm. This is problematic for compatibility and security. We > should implement a content sniffing algorithm in WebCore so that it > can be used by every port. > > Background > > A number of web servers don't properly set the Content-Type header > when they serve responses. One common misconfiguration is to not send > a Content-Type header at all or to send a bogus Content-Type header > (i.e., with a value like "(null)" or "application/unknown"). To > render these sites correctly, all browsers employ content sniffing > algorithms that look at the contents of the response to determine the > type of the resource. > > Some browsers have very aggressive content sniffing algorithms that > often change the type of a resource. This can be dangerous if a web > server allows users to upload content, such as images, and the browser > treats these resources as HTML because this lets an attacker XSS the > site. Designing a content sniffing algorithm is a careful balancing > act between compatibility and security. > > WebKit > > WebKit itself does not contain a content sniffing algorithm, leaving > each port to design their own. For example, Safari and Chromium each > implement their own content sniffing algorithm and I imagine (although > I haven't tested) that other ports do so as well. This causes > unnecessary compatibility issues between different WebKit ports and > leaves each port vulnerable to fend for itself in avoiding the > security pitfalls. > > I think it makes sense for WebCore itself to implement one content > sniffing algorithm that every port can use. One starting point for > this common implementation is the Chromium content sniffer, which is > open source. A number of Chromium contributors, myself included, have > spent a lot of effort tuning that content sniffer to maximize > compatibility while minimizing attack surface, and we'd like everyone > to benefit from our efforts. > > Standardization > > We've also been working with the HTML 5 working group on standardizing > content sniffing algorithms across all browsers. Eventually, I'd like > to see WebKit's content sniffer converge with the HTML 5 > specification. This process will likely involve the WebKit content > sniffer and the HTML 5 specification evolving over time towards > convergence. > > Feedback > > I'm sending this email to the list to get buy-in from the rest of the > WebKit community on the general direction of implementing a content > sniffer. I'd also like specific feedback about which content sniffing > heuristics you think are important to include. As a starting point > for discussion, you can see the Chromium content sniffer here: > > http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc?view=markup > > The top of that file has some comments that explain some of the > guiding design choices in the algorithm and a comparison with the > behavior of some other browsers. > > Adam > _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

