On Mon, Jul 13, 2009 at 12:49 AM, Jonas Sicking<jo...@sicking.cc> wrote:
> 2. How do we deal with identifying libraries.
> As Aaron Boodman pointed out, SHA hashes means that you can't make
> upgrades for security problems etc.

I think a hash would be fine.  You don't want to load a different
version of the library than the author intended, even if it may
supposedly have bug fixes.  Perhaps the author is relying on the bugs
that were fixed.  This should be transparent, so that the same script
will always be used whether browsers support this feature or not.

> 3. Compat when the browser doesn't have a library cached.
> A solution that includes using a different uri for browsers that do
> support caching than browsers that don't is scary since there is a big
> risk that the author will forget to update one but not the other.

How about you have an extra HTTP header like "X-Content-Hash"?  This
could provide a SHA256 hash (or something else that looks safe for
now, progressively upgradeable) of the content.  The browser can keep
its cached copies of these files indexed by hash.  If it tries
downloading a file, and notices that the hash is the same as a file
already downloaded, it can terminate the HTTP connection and use the
existing file (even if it's from a different site).  It will then
proceed as though it had actually downloaded the file: e.g., it will
respect the Expires headers separately (two sites might serve the same
file but have different expectations about how likely it is to
change).

Of course, when the file is downloaded, it would be indexed based on
actual SHA256, not the reported SHA256.  This way you can't poison the
cache.  If an incorrect hash is provided, that will potentially result
in the wrong script being used, of course, but you can't generate a
preimage of the bad hash (hopefully!), so attackers can't exploit it.
Unless they can inject HTTP headers, but then you're screwed anyway.

So this would basically be like ETag, but with the tag unique across
all sites.  In fact, maybe piggybacking it on ETag would be a better
idea, so you don't have to do stuff like terminate the TCP connection
unexpectedly.  (Which I don't *think* would be harmful? but is
certainly suboptimal.)  Just define a magic ETag format that nobody's
currently using.

I'm not sure how useful this would actually be, though.  Are there
really *so* many sites using the *exact* same version of jQuery or any
other single library?  It seems like a better idea would be to get all
the valuable user-created APIs standardized and implemented in some
form cross-browser, as a few (e.g., getElementsByClassName) have been.
 Then the common uses of jQuery or whatever would just be
compatibility layers for legacy browsers.  Kind of like how a bunch of
Boost is getting added to standard C++.  Of course, this requires more
effort, but it's more likely to actually work, in the long term.

> I believe all these problems are solvable, but they do need to be
> solved before considering inclusion in any spec. I'm also not
> convinced this needs to be included in the HTML5 spec, but that might
> depend on what the ultimate solution looks like.

The most obvious place to solve this seems to be HTTP, not HTML.  HTTP
is closer to the resource itself.  If you do something with HTML, like
an extra <link> attribute, then you're going to get authors updating
the HTML but not the thing it points to or vice versa.  An ETag-like
solution would be implemented either in the web server or whatever
script is serving the content, and those should always know whether
the file has changed.  (Modulo pathological behavior like something
changing the file and then forging the mtime/ctime.)

So yeah, doesn't seem like something for HTML 5.

Reply via email to