Hello,

as of 2012, some websites are including popular javascript libraries from CDNs, 
like
Google's. The benefits are:

* Traffic savings for the site operator because the javascript libraries are 
downloaded from
  the CDN and not from the site that uses them
* If enough sites refer to the same external file, the browser will cache the 
file and even if
  it's a first visit, the (potentially large) javascript file will not have to 
be downloaded.

There are however some drawbacks to this approach:

* Security: The site operator is trusting an external site.  If the CDN serves 
a malicious file
  it will directly lead to code execution in browsers under the domain settings 
of the site
  including it (a form of cross site scripting).
* Availability: The site depends on the CDN to be available. If the CDN is down 
the site may not
  be available at all.
* Privacy: The CDN will see requests for the file with HTTP referer headers for 
every visitor
  of the site.
* Extra DNS lookup if file is not already cached
* Extra HTTP connection (can't use persistent connection because it's a 
different site) if file is not cached

I am proposing a solution that will solve all these problems, keep the benefits 
and offers
some extra advantages:

1. The site stores a copy of the library file(s) on its own site.
2. The web page includes the library from the site itself instead of from the 
CDN
3. The script tag specifies a checksum calculated using a cryptographic hash 
function.

With this solution, whenever a browser downloads a file and stores it in the 
local cache, it calculates
its checksum. The browser can check its cache for an (identical) file with the 
same checksum
(no matter what URL it was retrieved from) and use it instead of downloading 
the file again.

This suggestion has previously been discussed here ( 
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-November/thread.html#7825
 ), however for a different purpose (file integrity instead of caching 
identical files from different sites) and I don't feel the points raised back 
then apply.

If a library is popular, chances are that many sites are including the 
identical file and it will
already be in the browser's cache. No network access is necessary to use it, 
improving the users'
privacy. It doesn't matter if the sites store the library file at a different 
URL. It will always
be identified by its checksum. The cached file can be used more often.

The syntax used to specify the checksum is using the fragment identifier 
component of a URI
(RFC 3986 section 3.5).

Here's an example using a SHA-256 checksum:

<script type="text/javascript" 
src="jquery-1.7.1.min.js#!sha256!8171413fc76dda23ab32baa17b11e4fff89141c633ece737852445f1ba6c1bd"></script>

It will not break existing implementations because the fragment is currently 
ignored for
external javascript files.

The checksum can be used for other html elements as well if other files (style 
sheets, media files) should
start being used in identical versions on many websites.

There is of course the other interesting use of these checksums (that has 
previously been discussed on this list):
To make it trivial for the user to check data integrity of downloads (when 
added to the <a> tag). The MDHashTool already uses this idea: 
http://mdhashtool.mozdev.org/lfinfo.html

The integrity check could also prove to be a disadvantage:
If a file is modified but the checksum isn't updated in the web page, it will 
no longer be used (feature, not bug).

I look forward to your comments and suggestions.

-Sven Neuhaus
-- 
Sven Neuhaus                                       Web: www.evision.de
Head of mobile development                       Tel.: +49 231 47790-0
Evision GmbH                           E-Mail: [email protected]
Wittekindstr. 105                Registergericht: Amtsgericht Dortmund
44139 Dortmund                               Registernummer: HRB 12477

Reply via email to