Re: [whatwg] Subresource Integrity-based caching
On Fri, Mar 3, 2017 at 11:01 PM, Alex Jordanwrote: > On Fri, Mar 03, 2017 at 09:21:20AM +0100, Anne van Kesteren wrote: >> I think https://github.com/w3c/webappsec-subresource-integrity/issues/22 >> is the canonical issue, but no concrete ideas thus far. > > Great, thanks! I've got some thoughts on potential solutions; where > would be the best place to put those - here or on GitHub? I'm assuming > the latter but figured I'd ask :) You are assuming correctly. -- https://annevankesteren.nl/
Re: [whatwg] Subresource Integrity-based caching
I'd like to apologize to Alex Jordan for mistaking him for James Roper, and vice versa mistaking James Roper for Alex Jordan. In the previous email when I said "your" as in "your suggestion" I meant to refer that to Alex, while the hash stuff was meant for James. I got confused by a email from James with a fully quoted copy of the email from before where I quoted Alex but with no text or comments from James, and I assumed for a moment it was the same person with different emails (work vs private, or a alt which is not unusual). I hope this confusion won't derail the topic fully. -- Roger Hågensen, Freelancer, Norway.
Re: [whatwg] Subresource Integrity-based caching
On 2017-03-03 01:02, James Roper wrote: How about you miss-understanding the fact that a hash can only ever guarantee that two resources are different. A hash can not guarantee that two resources are the same. A hash do infer a high probability they are the same but can never guarantee it, such is the nature of of a hash. A carefully tailored jquery.js that matches the hash of the "original jquery.js" could be crafted and contain a hidden payload. Now the browser suddenly injects this script into all websites that the user visits that use that particular version of jquery.js which I'd call a extremely serious security hole. you can't rely on length either as that could also be padded to match the length. Not to mention that this is also crossing the CORS threshold (the first instance is from a different domain than the current page is for example). Accidental (natural) collision probabilities for sha256/sha384/sha512 is very low, but intentional ones are higher than accidental ones. This is completely wrong. No one has *ever* produced an intentional collision in sha256 or greater. Huh? When did I ever state that? I have never said that sha256 or higher having been broken, do not put words/lies in my mouth please. I find that highly offensive. I said "could", just ask any cryptographer. It is highly improbable, but theoretically possible, but fully impractical to attempt (current stages of quantum computing has not shown any magic bullet yet). I'm equally concerned with a natural collision, while the probability is incredibly small the chance is 50/50 (if we imagine all files containing random data and file lengths, which they don't). And as to my statement "a hash can only ever guarantee that two resources are different. A hash can not guarantee that two resources are the same" again that is true. You can even test this by using small enough hashes (CRC-4 or something simple) and editing a file and you'll see that what I say is true. You know how a these types of hashes works right? They are NOT UNIQUE, if if you want something unique then those are called "Perfect Hash" which is not something you want to use for cryptography. If a hash like sha256 was unique it would be a compression miracle as you could then just "uncompress" the hash. Only if the data you hash is the same size as the hash can you perfectly re-create the data that is hashed. Which is what I proposed with my UUID suggestion. Do note that I'm talking about Version 1 UUIDs and not the random Version 4 ones which re not unique. In case you missed the headlines, last week Google announced it created a sha1 collision. That is the first, and only known sha1 collision ever created. This means sha1 is broken, and must not be used. Now it's unlikely (as in, it's not likely to happen in the history of a billion universes), but it is possible that at some point in the history of sha256 that a collision was accidentally created. This probability is non zero, which is greater than the impossibility of intentionally creating a collision, hence it is more likely that we will get an accidental collision than an intentional collision. Sha1 still has it's uses. Now I haven't checked but sha1 just as md5 are still ok to use with HMAC. Also it's odd that you say sha1 should not be used at all. Nothing wrong with using it as a file hash/checksum. With the number of files and the increase in-data CRC32 is nit that useful (unless you divide the file in chunks and provide a CRC32 array instead). A hash is not the right way to do what you want, a UUID and a (or multiple) trusted shared cache(s) is. The issue with using a hash is that at some point sha256 could become deprecated, do the browser start ignoring i then? Should it behave as if the javascript file had no hash or that it's potentially dangerous now? Also take note that a UUID can also be made into a valid URI, but I suggested adding a attribute as that would make older browsers/version "forward compatible" as the URI till works normally. And to try and not entirely run you idea into the ground. It's not detailed enough. By that I mean you would need a way for the webdesigner to inform the browser that they do not want the scripts hosted on their site replaced by these from another site. Now requiring a Opt Out is a pain in the ass, and when security is concerned one such never have to "Opt Out to get more secure", one should by default be more secure. Which means that you would need to add another attribute or modify the integrity one to allow cache sharing. Now myself I would never do that, even if the hash matches I'd never feel comfortable running a script originating from some other site in the page I'm delivering to my visitor. I would not actually want the browser to even cache my script and provide that to other sites pages. I might however feel comfortable adding
Re: [whatwg] Subresource Integrity-based caching
On Thu, Mar 2, 2017 at 6:07 PM, Domenic Denicolawrote: > I don't know what the latest is on attempting to get around this, although > that document suggests some ideas. I think https://github.com/w3c/webappsec-subresource-integrity/issues/22 is the canonical issue, but no concrete ideas thus far. -- https://annevankesteren.nl/
Re: [whatwg] Subresource Integrity-based caching
On 3 Mar. 2017 00:09, "Roger Hågensen"wrote: On 2017-03-02 02:59, Alex Jordan wrote: > Here's the basic problem: say I want to include jQuery in a page. I > have two options: host it myself, or use a CDN. > Not to be overly pedantic but you might re-evaluate the need for jquery and other such frameworks. "HTML5" now do pretty much the same as these older frameworks wit the same or less amount of code. The fundamental issue is that there isn't a direct correspondence to > what a resource's _address_ is and what the resource _itself_ is. In > other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google > CDN are the exact same resource in terms of content, but are > considered different because they have different addresses. > Yes and no. The URI is a unique identifier for a resource. If the URI is different then it is not the same resource. The content may be the same but the resource is different. You are mixing up resource and content in your explanation. Address and resource is in this case the same thing. 2. This could potentially be a carrot used to encourage adoption of > Subresource Integrity, because it confers a significant performance > benefit. > This can be solved by improved webdesign. Serve a static page (and not forget gzip compression), and then background load the script and extra CSS etc. By the time the visitor has read/looked/scanned down the page the scripts are loaded. There is however some bandwidth savings merit in your suggestion. ...That's okay, though, because the fact that it's based on a hash > guarantees that the cache > > matches what would've been sent over the network - if these were > different, the hash wouldn't match and the mechanism wouldn't kick in. > > ... > > Anyway, this email is long enough already but I'd love to hear > thoughts about things I've missed, etc. > How about you miss-understanding the fact that a hash can only ever guarantee that two resources are different. A hash can not guarantee that two resources are the same. A hash do infer a high probability they are the same but can never guarantee it, such is the nature of of a hash. A carefully tailored jquery.js that matches the hash of the "original jquery.js" could be crafted and contain a hidden payload. Now the browser suddenly injects this script into all websites that the user visits that use that particular version of jquery.js which I'd call a extremely serious security hole. you can't rely on length either as that could also be padded to match the length. Not to mention that this is also crossing the CORS threshold (the first instance is from a different domain than the current page is for example). Accidental (natural) collision probabilities for sha256/sha384/sha512 is very low, but intentional ones are higher than accidental ones. This is completely wrong. No one has *ever* produced an intentional collision in sha256 or greater. That's the whole point of cryptographic hashes, it is impossible to intentionally create a collision, if it were possible to create a collision, the algorithm would need to be declared broken and never used again. In case you missed the headlines, last week Google announced it created a sha1 collision. That is the first, and only known sha1 collision ever created. This means sha1 is broken, and must not be used. Now it's unlikely (as in, it's not likely to happen in the history of a billion universes), but it is possible that at some point in the history of sha256 that a collision was accidentally created. This probability is non zero, which is greater than the impossibility of intentionally creating a collision, hence it is more likely that we will get an accidental collision than an intentional collision. While I haven't checked the browser source codes I would not be surprised if browsers in certain situations cache a single instance of a script that is used on multiple pages on a website (different url but the same hash). This would be within the same domain and usually not a security issue. It might be better to use UUIDs instead and a trusted "cache", this cache could be provided by a 3rd party or the Browser developer themselves. Such a solution would require a uuid="{some-uuid-number}" attribute added to the script tag. And if encountered the browser could ignore the script url and integrity attribute and use either a local cache (from earlier) or a trusted cache on the net somewhere. The type of scripts that would benefit from this are the ones that follow a Major.Minor.Patch version format, and a UUID would apply to the major version only, so if the major version changed then the script would require a new UUID. Only the most popular scripts and major versions of such would be cached, but those are usually the larger and more important ones anyway. It's your jquery, bootstrap, angular, modernizer, and so on. -- Roger Hågensen, Freelancer, Norway.
Re: [whatwg] Subresource Integrity-based caching
Hi Alex! Glad to have you here. This is indeed a popular idea. The biggest problem with it is privacy concerns. The best summary I've seen is at https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html. In particular if such a suggestion were implemented, any web page would be able to easily determine the browsing history of any user, similar to the old visited link color trick. I don't know what the latest is on attempting to get around this, although that document suggests some ideas.
Re: [whatwg] Subresource Integrity-based caching
On 2017-03-02 02:59, Alex Jordan wrote: Here's the basic problem: say I want to include jQuery in a page. I have two options: host it myself, or use a CDN. Not to be overly pedantic but you might re-evaluate the need for jquery and other such frameworks. "HTML5" now do pretty much the same as these older frameworks wit the same or less amount of code. The fundamental issue is that there isn't a direct correspondence to what a resource's _address_ is and what the resource _itself_ is. In other words, jQuery 2.0.0 on my domain and jQuery 2.0.0 on the Google CDN are the exact same resource in terms of content, but are considered different because they have different addresses. Yes and no. The URI is a unique identifier for a resource. If the URI is different then it is not the same resource. The content may be the same but the resource is different. You are mixing up resource and content in your explanation. Address and resource is in this case the same thing. 2. This could potentially be a carrot used to encourage adoption of Subresource Integrity, because it confers a significant performance benefit. This can be solved by improved webdesign. Serve a static page (and not forget gzip compression), and then background load the script and extra CSS etc. By the time the visitor has read/looked/scanned down the page the scripts are loaded. There is however some bandwidth savings merit in your suggestion. ...That's okay, though, because the fact that it's based on a hash guarantees that the cache matches what would've been sent over the network - if these were different, the hash wouldn't match and the mechanism wouldn't kick in. ... Anyway, this email is long enough already but I'd love to hear thoughts about things I've missed, etc. How about you miss-understanding the fact that a hash can only ever guarantee that two resources are different. A hash can not guarantee that two resources are the same. A hash do infer a high probability they are the same but can never guarantee it, such is the nature of of a hash. A carefully tailored jquery.js that matches the hash of the "original jquery.js" could be crafted and contain a hidden payload. Now the browser suddenly injects this script into all websites that the user visits that use that particular version of jquery.js which I'd call a extremely serious security hole. you can't rely on length either as that could also be padded to match the length. Not to mention that this is also crossing the CORS threshold (the first instance is from a different domain than the current page is for example). Accidental (natural) collision probabilities for sha256/sha384/sha512 is very low, but intentional ones are higher than accidental ones. While I haven't checked the browser source codes I would not be surprised if browsers in certain situations cache a single instance of a script that is used on multiple pages on a website (different url but the same hash). This would be within the same domain and usually not a security issue. It might be better to use UUIDs instead and a trusted "cache", this cache could be provided by a 3rd party or the Browser developer themselves. Such a solution would require a uuid="{some-uuid-number}" attribute added to the script tag. And if encountered the browser could ignore the script url and integrity attribute and use either a local cache (from earlier) or a trusted cache on the net somewhere. The type of scripts that would benefit from this are the ones that follow a Major.Minor.Patch version format, and a UUID would apply to the major version only, so if the major version changed then the script would require a new UUID. Only the most popular scripts and major versions of such would be cached, but those are usually the larger and more important ones anyway. It's your jquery, bootstrap, angular, modernizer, and so on. -- Roger Hågensen, Freelancer, Norway.