Re: [whatwg] HTML resource packages
On Mon, Aug 9, 2010 at 1:40 PM, Justin Lebar justin.le...@gmail.com wrote: Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I'll post this to the bug when I get home tonight. But your comments are astute -- the page I used is a pretty bad benchmark for a variety of reasons. It sounds like you probably could hack up a much better one. a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? Since I was running on a simulated network with no random parameters (e.g. no packet loss), there was very little variance in load time across runs. I suspect you are right. Still, it's good due diligence - especially for a whitepaper :-) The good news is that if it really is consistent, then it should be easy... d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? That's correct. I'm betting time-to-paint goes through the roof with resource bundles:-) It does right now because we don't support incremental extraction, which is why I didn't bother measuring time-to-paint. The hope is that with incremental extraction, we won't take too much of a hit. Well, here is the crux then. What should browsers optimize for? Should we take performance features which optimize for PLT or time-to-first-paint or something else? I have spent a *ton* of time trying to answer this question (as have many others), and this is just a tough one to answer. For now, I believe the Chrome/WebKit teams are in agreement that sacrificing time-to-first render to decrease PLT is a bad idea. I'm not sure what the firefox philosophy here is? One thing we can do to better evaluate features is to simply always measure both metrics. If both metrics get better, then it is a clear win. But without recording both metrics, we just don't really know how to evaluate if a feature is good or bad. Sorry to send you through more work - I am not trying to nix your feature :-( I think it is great you are taking the time to study all of this. Mike -Justin On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe m...@belshe.com wrote: Justin - Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I have a few concerns about the benchmark: a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? b) As you note in the report, slow start will kill you. I've verified this so many times it makes me sick. If you try more combinations, I believe you'll see this. c) The 1.3MB of subresources in a single bundle seems unrealistic to me. On one hand you say that its similar to CNN, but note that CNN has JS/CSS/images, not just thumbnails like your test. Further, note that CNN pulls these resources from multiple domains; combining them into one domain may work, but certainly makes the test content very different from CNN. So the claim that it is somehow representative seems incorrect. For more accurate data on what websites look like, see http://code.google.com/speed/articles/web-metrics.html d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? e) There is more to a browser than page-load-time. Time-to-first-paint is critical as well. For instance, in WebKit and Chrome, we have specific heuristics which optimize for time-to-render instead of total page load. CNN is always cited as a bad page, but it's really not - it just has a lot of content, both below and above the fold. When the user can interact with the page successfully, the user is happy. In other words, I know I can make webkit's PLT much faster by removing a couple of throttles. But I also know that doing so worsens the user experience by delaying the time to first paint. So - is it possible to measure both times? I'm betting time-to-paint goes through the roof with resource bundles:-) If you provide the content, I'll try to run some tests. It will take a few days. Mike On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar justin.le...@gmail.com wrote: On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.comsimetrical%2b...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the
Re: [whatwg] HTML resource packages
On 8/10/10 2:40 PM, Mike Belshe wrote: For now, I believe the Chrome/WebKit teams are in agreement that sacrificing time-to-first render to decrease PLT is a bad idea. I'm not sure what the firefox philosophy here is? Fairly similar (though we have had people complain at us when we do in fact incrementally load a page for 20s that webkit just throws up on the screen all at once after sitting there with a blank viewport for 7s, for what it's worth) -Boris
Re: [whatwg] HTML resource packages
On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the domain of a large change to the networking stack, such as SPDY, not a small hack like resource packages. -Justin On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar justin.le...@gmail.com wrote: I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. Only if the resource package contains a different file from the real one. I suggest we treat this as a pathological case and accept that it will be broken and confusing -- or at least we consider how many extra optimizations we could make if we did accept that, before deciding whether the extra performance is worth the confusion. * There's no easy way to opt out of this behavior. That is, if I explicitly *don't* want to load content cached from a resource package, I have to name that content differently. Why would you want that, if the files are the same anyway? * The avatars-on-a-forum use case is less convincing the more I think about it. Certainly you'd want each page which displays many avatars to package up all the avatars into a single package. So you wouldn't benefit from the suggested caching changes on those pages. I don't see why not. If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. So if twenty different people post on the page, and you've been browsing for a while and have eighteen of their avatars (this will be common, a handful of people tend to account for most posts in a given forum): 1) With no resource packages, you fetch two separate avatars (but on earlier page views you suffered). 2) With resource packages as you suggest, you fetch a whole resource package, 90% of which you don't need. In fact, you have to fetch a resource package even if you have 100% of the avatars on the page! No two pages will be likely to have the same resource package, so you can't share cache at all. 3) With resource packages as I suggest, you fetch only two separate avatars, *and* you got the benefits of resource packages on earlier pages. The UA gets to guess whether using resource packages would be a win on a case-by-case basis, so in particular, it should be able to perform strictly better than either (1) or (2), given decent heuristics. E.g., the heuristic fetch the resource package if I need at least two files, fetch the file if I only need one will perform better than either (1) or (2) in any reasonable circumstance. I think this sort of situation will be fairly common. Has anyone looked at a bunch of different types of web pages and done a breakdown of how many assets they have, and how they're reused across pages? If we're talking about assets that are used only on one page (image search) or all pages (logos, shared scripts), your approach works fine, but not if they're used on a random mix of pages. I think a lot of files will wind up being used on only particular subsets of pages. In general, I think we need something like SPDY to really address the problem of duplicated downloads. I don't think resource packages can fix it with any caching policy. Certainly there are limits to what resource packages can do, but we can wind up closer to the limits or farther from them depending on the implementation details.
Re: [whatwg] HTML resource packages
Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I'll post this to the bug when I get home tonight. But your comments are astute -- the page I used is a pretty bad benchmark for a variety of reasons. It sounds like you probably could hack up a much better one. a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? Since I was running on a simulated network with no random parameters (e.g. no packet loss), there was very little variance in load time across runs. d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? That's correct. I'm betting time-to-paint goes through the roof with resource bundles:-) It does right now because we don't support incremental extraction, which is why I didn't bother measuring time-to-paint. The hope is that with incremental extraction, we won't take too much of a hit. -Justin On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe m...@belshe.com wrote: Justin - Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I have a few concerns about the benchmark: a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? b) As you note in the report, slow start will kill you. I've verified this so many times it makes me sick. If you try more combinations, I believe you'll see this. c) The 1.3MB of subresources in a single bundle seems unrealistic to me. On one hand you say that its similar to CNN, but note that CNN has JS/CSS/images, not just thumbnails like your test. Further, note that CNN pulls these resources from multiple domains; combining them into one domain may work, but certainly makes the test content very different from CNN. So the claim that it is somehow representative seems incorrect. For more accurate data on what websites look like, see http://code.google.com/speed/articles/web-metrics.html d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? e) There is more to a browser than page-load-time. Time-to-first-paint is critical as well. For instance, in WebKit and Chrome, we have specific heuristics which optimize for time-to-render instead of total page load. CNN is always cited as a bad page, but it's really not - it just has a lot of content, both below and above the fold. When the user can interact with the page successfully, the user is happy. In other words, I know I can make webkit's PLT much faster by removing a couple of throttles. But I also know that doing so worsens the user experience by delaying the time to first paint. So - is it possible to measure both times? I'm betting time-to-paint goes through the roof with resource bundles:-) If you provide the content, I'll try to run some tests. It will take a few days. Mike On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar justin.le...@gmail.com wrote: On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the domain of a large change to the networking stack, such as SPDY, not a small hack like resource packages. -Justin On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar justin.le...@gmail.com wrote: I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. Only if the resource package contains a different file from the real one. I suggest we treat this as a pathological case and accept that it will be broken and confusing -- or at least we consider how many extra optimizations we could make if we did accept that, before deciding whether the extra performance is worth the confusion. * There's no easy way to opt out of this behavior. That is, if I explicitly *don't* want to load content cached from a resource package, I have to name that content differently. Why would you want that, if the files are the same anyway? * The avatars-on-a-forum use case is less convincing the more I think about it. Certainly you'd want each page
Re: [whatwg] HTML resource packages
On 8/9/10 4:30 PM, Mike Belshe wrote: CNN is always cited as a bad page, but it's really not - it just has a lot of content, both below and above the fold. It's a bad page because 1) It sends hundreds of kilobytes of content for no obvious reason whatsoever; most of it is unused and 2) it sends said content with no gzip compression. -Boris
Re: [whatwg] HTML resource packages
The files I used for the rough benchmarks are available in a tarball at [1]. Live pages are at [2] and [3]. [1] http://people.mozilla.org/~jlebar/respkg/test/benchmark_files.tgz [2] http://people.mozilla.org/~jlebar/respkg/test/test-pkg.html [3] http://people.mozilla.org/~jlebar/respkg/test/test-nopkg.html -Justin On Mon, Aug 9, 2010 at 1:40 PM, Justin Lebar justin.le...@gmail.com wrote: Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I'll post this to the bug when I get home tonight. But your comments are astute -- the page I used is a pretty bad benchmark for a variety of reasons. It sounds like you probably could hack up a much better one. a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? Since I was running on a simulated network with no random parameters (e.g. no packet loss), there was very little variance in load time across runs. d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? That's correct. I'm betting time-to-paint goes through the roof with resource bundles:-) It does right now because we don't support incremental extraction, which is why I didn't bother measuring time-to-paint. The hope is that with incremental extraction, we won't take too much of a hit. -Justin On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe m...@belshe.com wrote: Justin - Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I have a few concerns about the benchmark: a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? b) As you note in the report, slow start will kill you. I've verified this so many times it makes me sick. If you try more combinations, I believe you'll see this. c) The 1.3MB of subresources in a single bundle seems unrealistic to me. On one hand you say that its similar to CNN, but note that CNN has JS/CSS/images, not just thumbnails like your test. Further, note that CNN pulls these resources from multiple domains; combining them into one domain may work, but certainly makes the test content very different from CNN. So the claim that it is somehow representative seems incorrect. For more accurate data on what websites look like, see http://code.google.com/speed/articles/web-metrics.html d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? e) There is more to a browser than page-load-time. Time-to-first-paint is critical as well. For instance, in WebKit and Chrome, we have specific heuristics which optimize for time-to-render instead of total page load. CNN is always cited as a bad page, but it's really not - it just has a lot of content, both below and above the fold. When the user can interact with the page successfully, the user is happy. In other words, I know I can make webkit's PLT much faster by removing a couple of throttles. But I also know that doing so worsens the user experience by delaying the time to first paint. So - is it possible to measure both times? I'm betting time-to-paint goes through the roof with resource bundles:-) If you provide the content, I'll try to run some tests. It will take a few days. Mike On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar justin.le...@gmail.com wrote: On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the domain of a large change to the networking stack, such as SPDY, not a small hack like resource packages. -Justin On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar justin.le...@gmail.com wrote: I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. Only if the resource package contains a different file from the real one. I suggest we treat this as a pathological case and accept that it will be broken and confusing -- or at least we consider how many extra optimizations we could make if we did accept that, before deciding whether the
Re: [whatwg] HTML resource packages
Justin Lebar: Christoph Päper christoph.pae...@crissov.de wrote: Why do you want to put this on the HTML level (exclusively), not the HTTP level? If you reference an image from a CSS file and include that CSS file in an HTML file which uses resource packages, the image can be loaded from the resource package. Yeah, it’s still wrong. Resource packages in HTML seem okay for the image gallery use case (and then could be done with ‘link’), but they’re commonly inappropriate for anything referenced from ‘link’, ‘script’ and ‘style’ elements. Your remark on loading order just proves this point: you want resource packages referenced before ‘head’. You should move one step further than the root element, i.e. to the transport layer.
Re: [whatwg] HTML resource packages
On Fri, Aug 6, 2010 at 12:46 AM, Christoph Päper christoph.pae...@crissov.de wrote: Justin Lebar: Christoph Päper christoph.pae...@crissov.de wrote: Why do you want to put this on the HTML level (exclusively), not the HTTP level? If you reference an image from a CSS file and include that CSS file in an HTML file which uses resource packages, the image can be loaded from the resource package. Yeah, it’s still wrong. Resource packages in HTML seem okay for the image gallery use case (and then could be done with ‘link’), but they’re commonly inappropriate for anything referenced from ‘link’, ‘script’ and ‘style’ elements. Your remark on loading order just proves this point: you want resource packages referenced before ‘head’. You should move one step further than the root element, i.e. to the transport layer. We want resource packages to work for people who don't have the ability to set custom headers for their pages or who don't even know what an HTTP header is. I agree that it's a hack, but I don't understand how putting the packages information in the html element makes it inappropriate to load from a resource package resources referenced in link, script, and style elements. Is the issue just that the HTML file's |packages| attribute affects what we load when we see @import url() in a separate CSS file? This seems like a feature, not a bug, to me. SPDY will do this the Right Way, if we're patient. -Justin
Re: [whatwg] HTML resource packages
On Fri, Aug 6, 2010 at 12:46 AM, Christoph Päper christoph.pae...@crissov.de wrote: Justin Lebar: Christoph Päper christoph.pae...@crissov.de wrote: Why do you want to put this on the HTML level (exclusively), not the HTTP level? If you reference an image from a CSS file and include that CSS file in an HTML file which uses resource packages, the image can be loaded from the resource package. Yeah, it’s still wrong. Resource packages in HTML seem okay for the image gallery use case (and then could be done with ‘link’), but they’re commonly inappropriate for anything referenced from ‘link’, ‘script’ and ‘style’ elements. Your remark on loading order just proves this point: you want resource packages referenced before ‘head’. You should move one step further than the root element, i.e. to the transport layer. This doesn't seem to make sense. If you want resource packages referenced before head, then the nearest appropriate location is still html. Moving it up to the transport layer isn't *wrong*, but it's not *necessary* in this case. ~TJ
Re: [whatwg] HTML resource packages
On Tue, Aug 3, 2010 at 8:31 PM, Justin Lebar justin.le...@gmail.com wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ I have some concerns about caching behavior here, which I've mentioned before. Consider a site that has a landing page that has lots of first-time viewers. To accelerate that page view, you might want to add a resource package containing all the assets on the page, to speed up views in the cold cache case. Some of those assets will be reused on other pages, and some will not. When the user navigates to another page, what's supposed to happen? If you hadn't used resource packages at all, they would have a hot cache, so they'd get all the shared assets on every subsequent page view for free. But now they don't -- instead of the first view being slow, it's the second view, when they leave the landing page. This isn't a big improvement. So if resource packages don't share caches, you need to either give up on caching, put a given file only in one resource package on your whole site. The latter is not practical if pages use small, fairly random subsets of your assets and it's not feasible to package them all on every page view. Think avatars on a web forum: you might have 20 different avatars displayed per page, from a pool of tens of thousands or more. Do you have to decide between not using resource packages and not getting any caching? You've said before that your goal in this requirement is predictability -- if there's an inconsistency between different resource packages or between a resource package and the real file, you don't want users to get different results depending on what order they visit the pages in. This is fair enough, but I'm worried that the caching problems this approach causes will make it more of a hindrance than a benefit for a wide class of use-cases. There's some possible inconsistency anyway whenever caching is permitted at all, because if the page provides incorrect caching headers, the UA might have an out-of-date copy. Also, different browsers will be inconsistent too, until all UAs in common use have implemented resource packages -- some will use the packaged file and some the real file. Is the extra inconsistency from letting the caches mix really too much to ask for the cacheability benefits? I don't think so.
Re: [whatwg] HTML resource packages
So if resource packages don't share caches, you need to either give up on caching, [or] put a given file only in one resource package on your whole site. The latter is not practical if pages use small, fairly random subsets of your assets and it's not feasible to package them all on every page view. Think avatars on a web forum I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. * There's no easy way to opt out of this behavior. That is, if I explicitly *don't* want to load content cached from a resource package, I have to name that content differently. * The avatars-on-a-forum use case is less convincing the more I think about it. Certainly you'd want each page which displays many avatars to package up all the avatars into a single package. So you wouldn't benefit from the suggested caching changes on those pages. You might benefit on a user profile page which just displays one avatar. You might try and be clever and leave the avatar out of the profile page's resource package on the assumption that the UA already has that avatar in its cache. But then your page would load slower for users who visited the profile page without first getting the avatar from another resource package. Maybe you'd benefit from the suggested changes if you'd half-deployed resource packages on your site, so some pages had packages and others didn't. But I don't think that's a use case we should design for. In general, I think we need something like SPDY to really address the problem of duplicated downloads. I don't think resource packages can fix it with any caching policy. -Justin On Fri, Aug 6, 2010 at 2:17 PM, Aryeh Gregor simetrical+...@gmail.com wrote: On Tue, Aug 3, 2010 at 8:31 PM, Justin Lebar justin.le...@gmail.com wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ I have some concerns about caching behavior here, which I've mentioned before. Consider a site that has a landing page that has lots of first-time viewers. To accelerate that page view, you might want to add a resource package containing all the assets on the page, to speed up views in the cold cache case. Some of those assets will be reused on other pages, and some will not. When the user navigates to another page, what's supposed to happen? If you hadn't used resource packages at all, they would have a hot cache, so they'd get all the shared assets on every subsequent page view for free. But now they don't -- instead of the first view being slow, it's the second view, when they leave the landing page. This isn't a big improvement. So if resource packages don't share caches, you need to either give up on caching, put a given file only in one resource package on your whole site. The latter is not practical if pages use small, fairly random subsets of your assets and it's not feasible to package them all on every page view. Think avatars on a web forum: you might have 20 different avatars displayed per page, from a pool of tens of thousands or more. Do you have to decide between not using resource packages and not getting any caching? You've said before that your goal in this requirement is predictability -- if there's an inconsistency between different resource packages or between a resource package and the real file, you don't want users to get different results depending on what order they visit the pages in. This is fair enough, but I'm worried that the caching problems this approach causes will make it more of a hindrance than a benefit for a wide class of use-cases. There's some possible inconsistency anyway whenever caching is permitted at all, because if the page provides incorrect caching headers, the UA might have an out-of-date copy. Also, different browsers will be inconsistent too, until all UAs in common use have implemented resource packages -- some will use the packaged file and some the real file. Is the extra inconsistency from letting the caches mix really too much to ask for the cacheability benefits? I don't think so.
Re: [whatwg] HTML resource packages
Justin Lebar: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, http://people.mozilla.org/~jlebar/respkg/ | html packages='[pkg1.zip img1.png script.js styles/style.css] |[static/pkg2.zip]' A page indicates in its html element that it uses one or more resource packages (…). Why do you want to put this on the HTML level (exclusively), not the HTTP level? As far as I undestand it, authors would usually put stylesheets, scripts and decorative images, but not HTML files into a resource package. These are usually common to several pages or the entire site or domain. Images might be referenced from within HTML or CSS files. Why did you decide against link rel=resource-package href=pkg1.zip#files='img1.png,…'/ or something like that? (The hash part is just guesswork.) * Argument: What about incremental rendering? If there are, for instance, lots of (content) images in the resource file I will see them all at once as soon as the ZIP has been downloaded completely and decompressed, but with single files I would have seen them appear one after the other, which might have been enough.
Re: [whatwg] HTML resource packages
On 4 August 2010 20:08, Christoph Päper christoph.pae...@crissov.de wrote: * Argument: What about incremental rendering? If there are, for instance, lots of (content) images in the resource file I will see them all at once as soon as the ZIP has been downloaded completely and decompressed, but with single files I would have seen them appear one after the other, which might have been enough. ZIP files are progressively renderable, dependant on file order.
Re: [whatwg] HTML resource packages
On Wed, Aug 4, 2010 at 12:11 PM, James May wha...@fowlsmurf.net wrote: On 4 August 2010 20:08, Christoph Päper christoph.pae...@crissov.de wrote: * Argument: What about incremental rendering? If there are, for instance, lots of (content) images in the resource file I will see them all at once as soon as the ZIP has been downloaded completely and decompressed, but with single files I would have seen them appear one after the other, which might have been enough. ZIP files are progressively renderable, dependant on file order. In my experience gzip compression is blocking browser rendering until the compressed file has been received completely. I believe this is the reason we should not compress the HTML source, just its external binary components. I don't think the browser can separately decompress each block of a chunked transfer as it arrives, am I wrong ? Diego Perini
Re: [whatwg] HTML resource packages
On Wed, Aug 4, 2010 at 1:31 AM, Justin Lebar justin.le...@gmail.com wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ It seems a bit surprising that [pkg.zip img1.png img2.png] provides more files than [pkg.zip img1.png] but *fewer* files than [pkg.zip] (which includes all files). I can imagine people would write code like: print html packages='[cached-image-thumbnails.zip . (join , @thumbnails_which_are_not_out_of_date) . ]'; (intending the package to be updated infrequently, and used only for images that haven't been modified since the last package update), and they would get completely the wrong behaviour when the list is empty. So maybe [pkg.zip] should mean no files (vs pkg.zip which still means all files). Filenames in zips are byte-strings, not Unicode-character-strings. What should happen with non-ASCII in the zip's list of contents? People will use standard zip programs and frequently end up with various random character encodings in their file - would browsers guess or decode as CP437 or decode as UTF-8 or fail? would they look at the zip header's language encoding flag? etc. What happens if the document contains multiple html elements (not all the root element)? (e.g. if it's XHTML, or the elements are added by scripts). The packages spec seems to assume there is only ever one. The note at the end of 4.1 seems to be about avoiding problems like http://evil.com/ saying: html packages=eviloverride.zip !-- gets downloaded from evil.com -- base href=http://bank.com/; img src=http://bank.com/logo.png; !-- this shouldn't be allowed to come from the .zip -- Why is this particular example an important problem? If the attacker wants to insert their own files into their own pages, they can just do it directly without using packages. Since this is (I assume) only used for resources like images and scripts and stylesheets, and not for a hrefs or iframe hrefs, I don't see how it would let the attacker circumvent any same-origin restrictions or do anything else dangerous. The opposite way seems more dangerous, where evil.com says: html packages=http://evil.com/redirect.cgi?http://secret-bank-intranet-server/packages.zip; img src=http://evil.com/logo.png; !-- now use canvas to read the pixel data of the secret logo, since it was loaded from the evil.com origin -- Is anything stopping that? In 4.3 step 2: What is pkg-url initialised to? (The package href of p?) -- Philip Taylor exc...@gmail.com
Re: [whatwg] HTML resource packages
On 4 Aug 2010, at 11:46, Diego Perini wrote: * Argument: What about incremental rendering? If there are, for instance, lots of (content) images in the resource file I will see them all at once as soon as the ZIP has been downloaded completely and decompressed, but with single files I would have seen them appear one after the other, which might have been enough. ZIP files are progressively renderable, dependant on file order. In my experience gzip compression is blocking browser rendering until the compressed file has been received completely. I believe this is the reason we should not compress the HTML source, just its external binary components. I don't think the browser can separately decompress each block of a chunked transfer as it arrives, am I wrong ? You are wrong. gzip compression is streamable and browsers can uncompress parts of gzipped file as it is downloaded. gzip only needs to buffer some data before returning uncompressed chunk, but it's only few KB. Chunks of gzipped data don't have to align with chunked HTTP encoding (those are independent layers). -- regards, Kornel
Re: [whatwg] HTML resource packages
People should probably consider reading the Web Apps Widgets working group archives (they're public) about widget packaging. There are long discussions about zip and gzip, etc. http://www.w3.org/TR/widgets/#zip-archive Especially http://www.w3.org/TR/widgets/#character-sets covers character sets. As for zip streaming / gzip streaming... Officially Zip technically has ways to construct archives which are painful. In practice I don't think that's a real problem (beyond that user agents would need to ensure to fail any packages which abuse those features). People tend to come late to the game and say why didn't you use gzip. The general short answer is that gzip doesn't cover a file container format at all, and browsers tend to already support zip. So the cost of using zip is negligible whereas adding something else which is messy (e.g. tar, star, pax) is painful. And if you think that tar is well specified, I have a bridge I'd like to sell you.
Re: [whatwg] HTML resource packages
Brett Zamir bret...@yahoo.com wrote: 1) I think it would be nice to see explicit confirmation in the spec that this works with offline caching. Yes. I'll do that. 2) Could data files such as .txt, .json, or .xml files be used as part of such a package as well? 3) Can XMLHttpRequest be made to reference such files and get them from the cache, and if so, when referencing only a zip in the packages attribute, can XMLHttpRequest access files in the zip not spelled out by a tag like link/? I think this would be quite powerful/avoid duplication, even if it adds functionality (like other HTML5 features) which would not be available to older browsers. This is tricky. The problem is: If you have an img on a page which might be able to be served from a resource package, we'll block the download of the image until can either serve the request from a resource package or can be sure that no package contains the image. I can imagine this behavior being confusing with XMLHttpRequests. On the other hand, it could certainly be powerful when used correctly. I think the natural thing is go ahead and treat things requested by an XMLHttpRequest the same as anything else on a page and retrieve them from packages as possible. If you really don't want your XMLHttpRequest to block on a resource package, you can always use a POST. But I need to investigate more to determine whether this makes sense. 4) Could such a protocol also be made to accommodate profiles of packages, e.g., by a namespace being allowable somewhere for each package? This sounds way outside the scope of what we're trying to do with resource packages. I'm all for designing for the future, but I don't think we want to introduce the complexity even of these namespaces unless we intend to use them immediately. Maciej Stachowiak m...@apple.com wrote: Have you done any performance testing of this feature, and if so can you share any of that data? There's a document (PDF) with some rough performance numbers in the bug: https://bugzilla.mozilla.org/attachment.cgi?id=455820 Although the results are preliminary, I think doing much more than this on a simulated network for a test page might be going a bit overboard. Results from real pages over real networks would be much more meaningful at this point. Separately, I am curious to hear how http headers are handled; it's a TODO in the spec, and what the TODO says seems poor for the Content-Type header in particular. It would make it hard to use package resources in any context that looks at the MIME type rather than always sniffing. Any thoughts on this? The intent is for UAs to sniff the content-type of anything coming from a resource package, so I think that TODO needs to be turned on its head: The UA shouldn't apply any of the response headers from the resource package to its elements. Christoph Päper christoph.pae...@crissov.de wrote: A page indicates in its html element that it uses one or more resource packages (…). Why do you want to put this on the HTML level (exclusively), not the HTTP level? ... Images might be referenced from within HTML or CSS files. If you reference an image from a CSS file and include that CSS file in an HTML file which uses resource packages, the image can be loaded from the resource package. Why did you decide against link rel=resource-package href=pkg1.zip#files='img1.png,…'/ or something like that? (The hash part is just guesswork.) We actually originally spec'ed resource packages with the link tag, but we encountered some difficulties with this. For example, it led to confusing behavior when a resource package was defined after a link rel='javascript'. Do we load the script from the network, or do we wait until we've received the whole head before loading any scripts? Resource packages as a link also interacted poorly with Mozilla's speculative parsing algorithm, which tries to download resources before we run the page's scripts. We probably could have come up with semantics which didn't run into problems with our own speculative parsing implementation, but we realized it would be difficult to spec it in such a way that we didn't make things very difficult for *someone*. * Argument: What about incremental rendering? The spec (and our implementation in Firefox) cares deeply about incremental rendering. Although the zip format isn't strictly suitable for incremental extraction, I defined alternate semantics in the spec which should work. Zip is better than tar-gz for this kind of thing for two reasons: * Zip file headers are uncompressed, so you don't have to extract the whole file in order to tell what's inside. * Entries in a zip file are individually compressed. Although this might cause you to compress less effectively, you can compress all your files ahead of time and construct a zip file on the fly pretty very cheaply. Philip Taylor excors+wha...@gmail.com wrote: It seems a bit surprising that
Re: [whatwg] HTML resource packages
On Wed, Aug 4, 2010 at 9:01 PM, Justin Lebar justin.le...@gmail.com wrote: What happens if the document contains multiple html elements (not all the root element)? (e.g. if it's XHTML, or the elements are added by scripts). The packages spec seems to assume there is only ever one. The packages attribute should work like the manifest attribute currently works. I don't see language in the cache manifest section of HTML5 (6.6) specifying what happens when there are multiple html elements, so I hope I don't need to specify this either. :) http://whatwg.org/html#attr-html-manifest says: The manifest attribute only has an effect during the early stages of document load. Changing the attribute dynamically thus has no effect (and thus, no DOM API is provided for this attribute). Its effect is triggered from http://whatwg.org/html#parser-appcache (html token in the before html insertion mode) or from http://whatwg.org/html#read-xml , so it will only ever run for the root html element of the document. The packages attribute is defined as running Whenever the packages attribute is changed (including when the document is first loaded, if its html element has a packages attribute), so it's not the same. If you do want it to work the same then you'll need to hook into the parser and ignore dynamic updates. -- Philip Taylor exc...@gmail.com
Re: [whatwg] HTML resource packages
If you do want it to work the same then you'll need to hook into the parser and ignore dynamic updates. Indeed. And since I explicitly *do* want dynamic updates, it'll need to change. Thanks. On Wed, Aug 4, 2010 at 1:32 PM, Philip Taylor excors+wha...@gmail.com wrote: On Wed, Aug 4, 2010 at 9:01 PM, Justin Lebar justin.le...@gmail.com wrote: What happens if the document contains multiple html elements (not all the root element)? (e.g. if it's XHTML, or the elements are added by scripts). The packages spec seems to assume there is only ever one. The packages attribute should work like the manifest attribute currently works. I don't see language in the cache manifest section of HTML5 (6.6) specifying what happens when there are multiple html elements, so I hope I don't need to specify this either. :) http://whatwg.org/html#attr-html-manifest says: The manifest attribute only has an effect during the early stages of document load. Changing the attribute dynamically thus has no effect (and thus, no DOM API is provided for this attribute). Its effect is triggered from http://whatwg.org/html#parser-appcache (html token in the before html insertion mode) or from http://whatwg.org/html#read-xml , so it will only ever run for the root html element of the document. The packages attribute is defined as running Whenever the packages attribute is changed (including when the document is first loaded, if its html element has a packages attribute), so it's not the same. If you do want it to work the same then you'll need to hook into the parser and ignore dynamic updates. -- Philip Taylor exc...@gmail.com
Re: [whatwg] HTML resource packages
2010/8/4 Kornel Lesiński kor...@geekhood.net On 4 Aug 2010, at 11:46, Diego Perini wrote: * Argument: What about incremental rendering? If there are, for instance, lots of (content) images in the resource file I will see them all at once as soon as the ZIP has been downloaded completely and decompressed, but with single files I would have seen them appear one after the other, which might have been enough. ZIP files are progressively renderable, dependant on file order. In my experience gzip compression is blocking browser rendering until the compressed file has been received completely. I believe this is the reason we should not compress the HTML source, just its external binary components. I don't think the browser can separately decompress each block of a chunked transfer as it arrives, am I wrong ? You are wrong. gzip compression is streamable and browsers can uncompress parts of gzipped file as it is downloaded. gzip only needs to buffer some data before returning uncompressed chunk, but it's only few KB. Chunks of gzipped data don't have to align with chunked HTTP encoding (those are independent layers). Tank you for the informations and correcting my statements. I just tried and definitely chunked and gzip compression can happen at the same time. The problem is I have a strange effect on my pages if I enable Apache SetOutputFilter DEFLATE, the pages progress and rendering are different. It works well with zlib.output_compression, more or less with no visible changes from non compressed. I will have to dig what makes this difference. Diego Perini -- regards, Kornel
[whatwg] HTML resource packages
We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ and the bug (complete with builds you can try and some preliminary performance numbers) is here: https://bugzilla.mozilla.org/show_bug.cgi?id=529208 You can think of resource packages as image spriting 2.0. A page indicates in its html element that it uses one or more resource packages (which are just zip files). Then when that page requests a resource (be it an image, a css file, a script, or whatever), the browser first checks whether one of the packages contains the requested resource. If so, the browser uses the resource out of the package instead of making a separate HTTP request for the resource. There's of course more detail than that, of course. Hopefully it's (mostly) clear in the spec. I envision two classes of users of resource packages. I'll call the first resource-constrained developers. These developers care about how fast their page is (who doesn't?), but can't spend weeks speeding up their page. For these developers, resource packages are an easy way to make their pages faster without going through the pain of spriting their images and packaging their js/css. The other class of users are the resource-unconstrained developers; think Google or Facebook. These developers have already put a huge amount of effort into making their pages fast, and a naive application of resource packages is unlikely to make them any faster. But these developers may be able to use resource packages cleverly to gain speedups. In particular, nobody (to my knowledge) currently sprites content images, such as the results of an image search. A determined set of developers should be able to construct resource packages for image search results on the fly and save some HTTP requests. So we can avoid rehashing here the common objections to resource packages, here's a brief overview of the arguments I've heard against the feature and my responses. * Argument: Packaging isn't the way forward. When you change one resource in a package you have to change the whole package and so the user has to re-download all the bits when most of what was in their cache would have been fine. This is of course correct, but we don't think it eliminates the utility of resource packages. The resource-constrained developer is probably happy with anything which speeds up page loads, even if it's not optimal when one part of the page changes. And the resource-unconstrained developer probably won't find resource packages too useful for non-dynamic content, so caching isn't an issue in that case. * Argument: We can already package things pretty well. Mozilla should instead be focusing on improving caching (or something else). I'd contend that we don't package particularly well in general. The Facebook homepage loads 100 separate resources on a cold cache, and they certainly care about speed. But anyway, this is just one project. We're also looking at caching. :) * Argument: Isn't this subsumed by HTTP pipelining? Mostly. But we can't turn on HTTP pipelining because transparent proxies break it. Resource packages have the further benefit that they allow page authors to explicitly set the order in which the UA will download the resources -- with pipelining, an important resource might get stuck behind a large, unimportant resource, while with resource packages, the UA always downloads resources in the order they appear in the zip file. Last, my understanding is that the HTTP pipeline isn't particularly deep, so perhaps resource packages fill the TCP pipe better on high-latency connections. I haven't looked into this, though. * Argument: What about SPDY? I think SPDY should subsume resource packages. But its deployment will require changes to both web clients and servers, so it will probably take a while after it's released before it's available on all web servers. And we have no idea when to expect SPDY to be ready for production. Resource packages, in contrast, are something we can have Right Now. Additionally, since resource packages are backwards-compatible -- a page which specifies resource packages should display just fine in a browser which doesn't support them -- we should be able to turn off resource packages in the future if we decide we don't want them anymore. We'd love to hear what you think of the specification and our implementation. -Justin
Re: [whatwg] HTML resource packages
On Tue, Aug 3, 2010 at 5:31 PM, Justin Lebar justin.le...@gmail.com wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ and the bug (complete with builds you can try and some preliminary performance numbers) is here: https://bugzilla.mozilla.org/show_bug.cgi?id=529208 [snip] We'd love to hear what you think of the specification and our implementation. I love it! You guys seem to have hit all the big points while sidestepping the obvious problems. Resource packages as Sprites 2.0 makes me very, very happy (and also more confident about removing any attempt at a spriting solution from CSS - it's the wrong layer). ~TJ
Re: [whatwg] HTML resource packages
This is and was a great idea. A few points/questions: 1) I think it would be nice to see explicit confirmation in the spec that this works with offline caching. 2) Could data files such as .txt, .json, or .xml files be used as part of such a package as well? 3) Can XMLHttpRequest be made to reference such files and get them from the cache, and if so, when referencing only a zip in the packages attribute, can XMLHttpRequest access files in the zip not spelled out by a tag like link/? I think this would be quite powerful/avoid duplication, even if it adds functionality (like other HTML5 features) which would not be available to older browsers. 4) Could such a protocol also be made to accommodate profiles of packages, e.g., by a namespace being allowable somewhere for each package? Thus, if a package is specified as say being under the XProc (XML Pipelining) namespace profile, the browser would know it could confidently look for a manifest file with a given name and act accordingly if the profile were eventually formalized through future specifications or implemented by general purpose scripting libraries or browser extensions, etc. Another example would be if a file packaging format were referenced by a page, allowing, along with a set of files, a manifest format like METS to be specified and downloaded, describing a sitemap for a package of files (perhaps to be added immediately to the user's IndexedDB database, navigated Gopher-like, etc.) and then made navigable online or offline if the files were included in the zip, thus allowing a single HTTP request to download a whole site (e.g., if a site offered a collection of books). And manifest files might be made to specify which files should be updated at a specific time independently of the package (e.g., checking periodically for an updated manifest file outside of a zip which could point to newer versions). Note: the above is not asking browsers to implement any such additional complex functionality here and now; rather, it is just to allow for the possibility of automated discovery of package files having a particular structure (e.g., with specifically named manifest files to indicate how to interpret the package contents) by providing a programmatically accessible namespace for each package which could be unique per application and interpreted in particular ways, including by general purpose JavaScript libraries. This is not talking about adding namespaces to HTML itself, but rather for specifying package profiles. Such extensibility would, as far as I can see it, allow for some very powerful declarative styles of programming in relation to handling of multiple files (whether resource files, data files, or complete pages), while piggybacking on the proposal's ability to minimize the HTTP requests needed to get them. best wishes, Brett On 8/4/2010 8:31 AM, Justin Lebar wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ and the bug (complete with builds you can try and some preliminary performance numbers) is here: https://bugzilla.mozilla.org/show_bug.cgi?id=529208 You can think of resource packages as image spriting 2.0. A page indicates in itshtml element that it uses one or more resource packages (which are just zip files). Then when that page requests a resource (be it an image, a css file, a script, or whatever), the browser first checks whether one of the packages contains the requested resource. If so, the browser uses the resource out of the package instead of making a separate HTTP request for the resource. There's of course more detail than that, of course. Hopefully it's (mostly) clear in the spec. I envision two classes of users of resource packages. I'll call the first resource-constrained developers. These developers care about how fast their page is (who doesn't?), but can't spend weeks speeding up their page. For these developers, resource packages are an easy way to make their pages faster without going through the pain of spriting their images and packaging their js/css. The other class of users are the resource-unconstrained developers; think Google or Facebook. These developers have already put a huge amount of effort into making their pages fast, and a naive application of resource packages is unlikely to make them any faster. But these developers may be able to use resource packages cleverly to gain speedups. In particular, nobody (to my knowledge) currently sprites content images, such as the results of an image search. A determined set of developers should be able to construct resource packages for image search results on the fly and save some HTTP requests. So we can avoid rehashing here the common objections to resource packages, here's a brief overview of the arguments I've
Re: [whatwg] HTML resource packages
On Aug 3, 2010, at 5:31 PM, Justin Lebar wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ and the bug (complete with builds you can try and some preliminary performance numbers) is here: https://bugzilla.mozilla.org/show_bug.cgi?id=529208 Have you done any performance testing of this feature, and if so can you share any of that data? I'm particularly interested in: * Effect of using a resource package on page-load time, in the initial fully uncached case. * Effect of using a resource package on page-load time, in the case where the resources in the package have expired but not have changed. * Effect of using a resource package on page-load time, in the case where the resources in the package have expired and a subset of them have changed. (This could still be a win for packages.) * Effect of using a resource package on page-load time, in the case where everything in the package is cached. These are probably most interesting under high-latency network conditions (real or simulated). You address these points qualitatively in your comments but I'd love to see some numbers. That would make it easier to evaluate the performance tradeoffs. Separately, I am curious to hear how http headers are handled; it's a TODO in the spec, and what the TODO says seems poor for the Content-Type header in particular. It would make it hard to use package resources in any context that looks at the MIME type rather than always sniffing. Any thoughts on this? In general I am in favor of features that can improve page load times and which are Cheers, Maciej