(If you missed the start of this thread, look here: http://allmydata.org/pipermail/tahoe-dev/2009-August/002724.html and http://allmydata.org/pipermail/tahoe-dev/2009-September/002804.html)
After a chat with Tyler Close, a few more ideas came to mind. He pointed out that a stubborn problem with web applications these days is that the HTTP browser caches are not doing as much good as developers expect. Despite frequently-used things like "jquery.js" being cacheable, a substantial portion of clients are showing up to the server empty-handed, and must re-download the library on pretty much every visit. One theory is that the browser caches are too small, and pretty much everything is marked as being cacheable, so it's just simple cache-thrashing. There's no way for the server to say "I'll be telling you to load jquery.js a lot, so prioritize it above everything else, and keep it in cache as long as you can". And, despite hundreds of sites all using the exact same copy of jquery.js, there's no way to share those cached copies, increasing the cache pressure even more. Google is encouraging all web developers to pull jquery.js from a google.com server, to reduce the pressure, but of course that puts you at their mercy from a security point of view: they (plus everyone else that can meddle with your bits on the wire) can inject arbitrary code into that copy of jquery.js, and compromise millions of secure pages. So the first idea is that the earlier "#hash=XYZ" URL annotation could be considered as a performance-improving feature. Basically the browser's cache would have an additional index using the "XYZ" secure hash (and *not* the hostname or full URL) as the key. Any fetch that occurs with this same XYZ annotation could be served from the local cache, without touching the network. As long as the previously-described rules were followed (downloads of a #hash=XYZ URL are validated against the hash, and rejected on mismatch), then the cache could only be populated with validated files, and this would be a safe way to share common files between sites. The second idea involves some of the capability-security work, specifically Mark Miller's "Caja" group which has developed a secure subset of JavaScript. Part of the capability world's efforts are to talk about different properties that a given object has, as determined by mechanical auditing of the code that implements that object. One of these properties is called "DeepFrozen", which basically means that the object has no mutable state and has no access to mutable state. If Alice and Bob (who are both bits of code, in this example) share access to a DeepFrozen object, there's no way for one of them to affect the other through that object: they might as well have two identical independent copies of the same object. The common "memoization" technique depends upon the function being optimized to be DeepFrozen, to make sure that it will always produce the same output for any given input. (note that this doesn't mean that the object can't create, say, a mutable array and manipulate it while the function runs.. it just means that it can't retain that array from one invocation to the next) So the second idea is that, if your #hash=XYZ-validated jquery.js library can be proven to be DeepFrozen (say, by passing it through the Caja verifier with a flag that says "only accept DeepFrozen classes" or something), then not only can you cache the javascript source code, but you can also cache the parse tree, saving you the time and memory needed to re-parse and evaluate the same source code on every single page load. (incidentally, it is quite likely that jquery.js would pass a DeepFrozen auditor, or could be made to do so fairly easily: anything that's written in a functional style will avoid using much mutable state) This requires both the DeepFrozen property (which makes it safe to share the parsed data structure) and the #hash=XYZ validation (which insures that the data structure was really generated from the right JS source code). I know that one of the Caja goals is to eventually get the verifier functionality into the browser, since that's where it can do the most good. If that happened, then the performance improvements to be had by writing verifiable code and using strong URL references could be used as a carrot to draw developers into using these new techniques. thoughts? cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
