Re: [whatwg] Navigation and history traversal issues
, certainly before the parser stops, the user agent must update the session history with the new page. That invokes [2] update the session history with the new page, which invokes [3] Traverse the history to the new entry, which fires popstate in step 14. However, After creating the Document object, but before any script execution seems like it could happen before or after the body element has been parsed, so the alert may or may not happen. Yeah, this is an oversight as specced. Fixed. On Sun, 16 Sep 2012, Justin Lebar wrote: Suppose an attack page evil.html controls a separate frame F (e.g. evil.html frames F, evil.html opened F as a popup window, or vice versa). We discovered that if evil.html causes F to 1. load a.html 2. start loading b.html 3. load a.html#h then step (3) cannot cancel the load of b.html. That is, the final session history from this sequence must be either a.html -- oldest a.html#h b.html -- current or a.html -- oldest b.html -- current. All browsers I tested gave one of the above two results. Doing anything else breaks the web (we shipped this in Firefox Nightly and people were unable to log into ingdirect.com, for example). I didn't investigate too thoroughly, but I believe what happens is, some sites use a link with href # and then navigate themselves in the link's onclick handler, without cancelling the click event. In that case, we do precisely steps 1-3 above. As I read the spec, browsers are supposed to cancel the load of b.html in step 3 above. In the navigation algorithm [1], step 6 explicitly cancels the load of b.html, because the load of b.html has not matured. So if I understand correctly, the spec is dictating behavior that we know won't work and that no browser implements. The presence of steps 6 and 8 in the algorithm suggest that the spec is already trying to walk this line, so maybe I misunderstand what's going on, either in my tests or in the spec. The existing text in the spec step 4 is attempting to prevent a page from having you click on a link to a href=http://paypal.com/; and in the unload change that to a location.href=http://paypa1.com/; navigation, or something similar but with the user typing in the location bar and the page hijacking that navigation. If it turns out that you can't ever block a cross-origin navigation, though, that's a lot easier to fix. :-) It's not that simple though. Browsers agree on this page that we should go to the second of the two cross-origin navigations (replace false with 1 in the script to run the test): http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1778 This one too (frame nav): http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1780 So this is presumably specific to fragment identifiers. And sure enough, when we change the latter one above to changing to a fragment identifier, it works as you describe: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1782 (Things aren't so simple in this example (same-page nav): http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1784 ...where Firefox no longer exhibits the restraint we're looking for here, but Chrome and Opera still do.) Anyway, yeah, looks like step 6 is just bogus. I've removed it. This now means that fragment identifier navigations just happen without screwing around with ongoing loads. == Issue #2 == Suppose again that evil.com controls a frame F, and evil.com causes F to 1. load a.html 2. load a.html#h 3. start loading b.html 4. go back When we go back, we traverse the history [2] from a.html#h to a.html. Per the spec, this doesn't cancel the load of b.html. This caused a problem for us in Firefox because we create a session history entry for b.html at the beginning of step 3 and insert it after the current one. Then, when the load of b.html completes, we use whichever session history entry happens to be after the current one, assuming that it was the session history entry we created earlier. [...] The fix for this bug is not as simple as merely ensuring that the session history entry's URL matches the document's URL. Due to hash navigations and pushstate, these URLs may not match even when we're behaving correctly. We fixed this bug by cancelling the load of b.html when you go back. This matches Chrome's behavior in my tests [3]. Notice that this means we're cancelling an outstanding network load due to a synchronous same-document load, which I said in part 1 breaks the web. But based on the (lack of) feedback we've received from our test audience, it seems that cancelling the load of b.html does /not/ break the web if the navigation from a.html to a.html#h is a history navigation. The right thing to do is probably to load b.html after a.html, so the final session history is a.html -- oldest b.html -- current. I /think/ this is what the spec says should happen
Re: [whatwg] Navigation and history traversal issues
I've also made back()/forward()/go() not work during the document's unload handler, since that could be used for griefing. I'm tempted to disable it entirely for all docs a la alert(), but I've no idea if that's Web- compatible and I suspect not. I don't know what you mean by the last sentence here. In my tests, IE and Opera do not support cross-origin back/forward/go, if that's what you mean. I don't see any good reason for us to support that in Firefox, either, if we could get away with removing it. I meant blocking all scripted back/forward session history traversal while any page is running the unload algorithms. Ah, I see. I don't have any idea if that's a good idea or not, so, okay. :) As far as cross-origin back/forward, there are 404 pages on the Web that have javascript:history.back() links; these would break for cross-origin links if we blocked cross-origin history traversal. I don't really see much point. What's the security risk? The issue isn't a history.back() which crosses origins -- that seems fine -- but rather calling history.back() on a cross-origin window. (Sorry that wasn't clear.) It's not clear that this poses a security risk (otherwise, I'm sure we'd have removed it by now), aside from making it easier to tickle Firefox into buggy states like this bug [1]. But it's also not clear to me what benefit there is to being able to call back() on an arbitrary window. I guess I can navigate a window, so I might as well be able to make it go back? But those aren't quite the same thing. -Justin [1] https://bugzilla.mozilla.org/show_bug.cgi?id=737307
Re: [whatwg] Document's base URI should use the document's *current* address
From an author's point of view, there's no such thing as the document's original URI and, unless you're a nerd, you've never heard of the base URI. There's just the document's URI, modified by pushState. From this point of view, I'd say it's less surprising that relative URIs would break when you change directories (hey, you *asked* for it) than that anchor refs would update the browser's address bar and document.location relative to the old URI. In my tests, Chrome and Firefox both immediately change document.baseURL when you call pushState. Images (and I presume other resources) are resolved relative to the new base. I'm not sure why your earlier test with seemed to work in Chrome, Hixie. :-/ I think this ship may have sailed. -Justin button onclick='push()'Click me/button function push() { history.pushState('', '', '/' + Math.random() + '/file'); alert(document.baseURI); }
Re: [whatwg] Document's base URI should use the document's *current* address
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1342 It doesn't make sense that the second image is broken. (For some reason in Firefox I get an exception. Not sure if I'm misusing the API or if it's a bug in Firefox.) Not sure what's going on with that Firefox exception. But I'm not terribly surprised that the second image shouldn't work... :) Similarly, if for some bizarre reason the page pushState's to a new directory, shouldn't all the links point relative to that new directory? That would break all existing images, stylesheets, scripts, etc, if their URLs are reused somehow. Hm...maybe you're right. But then, how do we jive this with #foo and ?foo links, both of which resolve relative to the current URI in both Firefox and WebKit? - Start the Follow a hyperlink algorithm. - [snip] - It sets the document's current address to .../page.html#foo. Well, this is pretty bad. document.location is the document's current address [1]. So clicking #foo changed document.location from page2.html to page.html#foo, which I certainly wouldn't expect (and does not match implementations). -Justin [1] The href attribute [of document.location] must return the current address of the associated Document object, as an absolute URL. On Wed, Feb 15, 2012 at 3:50 PM, Ian Hickson i...@hixie.ch wrote: On Wed, 20 Jul 2011, Justin Lebar wrote: The spec as written decides whether a link is a same-resource reference or not based on comparing the URLs to what you're calling the original address, not comparing it to the current address. See the navigation algorithm, step 7 /Fragment identifiers/. Maybe I'm misunderstanding, but this might not be the case in the history traversal algorithm. In history traversal, the URLs compared are those of the entries involved. However, clicking a link is primarily navigation, not session history traversal (though it can involve the latter). Step 6: If the specified entry has a URL whose fragment identifier differs from that of the current entry's when compared in a case-sensitive manner, and the two share the same Document object, then let hash changed be true. It's not clear to me what the current/specified entry's URL is, or where this is properly defined, but earlier, we say: Hm, yes, the spec doesn't quite clearly define the URL in all cases. Fixed. The current entry is usually an entry for the location of the Document. That's a non-normative statement. I've made it more explicitly so. and the document's location changes when we call push/replaceState. The current entry is whatever the algorithms last set the current entry to. I've made that clearer in the spec. As currently specified, we'll resolve #foo relative to the document's original URL; that is, clicking the link will take the user to page.html#foo, not page2.html#foo. But the intent of a link with href #foo is clearly to navigate within the current page, not to go somewhere else. Were you saying that this isn't the right interpretation of the spec? Because #foo is resolved relative to the document's base URI, which is the same as the document's original URI, so we decide that #foo is a same-document link? That's comforting, if it's true. :) When you click a link to #foo on a document whose current address is page2.html but whose document's address is page.html, then you go through these steps: - Start the Follow a hyperlink algorithm. - Resolve href relative to the a element. - This uses XML Base, with the fallback base url being the document's address, which is what you were calling the original URL. - This results in .../page.html#foo. - Navigate to that URL. - Step Fragment identifiers then compares this URL to the document's address (page.html, not page2.html), and finds a match. - Navigating to a fragment identifier is invoked and creates a new session history entry with the URL page.html#foo. - Traverse the history is then invoked. - It sets the document's current address to .../page.html#foo. - Scrolling happens. - The current entry's URL is ../page2.html and the specified entry's URL is .../page.html#foo so the fragids differ and hashchange fires. - The current entry becomes the new specified entry. Note that there are problems with what you describe: what if the new URL has a different path, and there are img elements whose URLs are relative, and after pushState() you clone one? Or what about relative links in the original markup? I don't think we can change the base URL on the fly, all kinds of problems could result. I agree there are problems with changing the base URI. But it seems much less intuitive for common use-cases not to change it. We can change my example above to use ?foo instead of #foo, and I think the same argument applies. Should a link with href ?foo always resolve relative to the document's original URI (unless the base is explicitly
Re: [whatwg] Document's base URI should use the document's *current* address
On Wed, Feb 15, 2012 at 5:31 PM, Ian Hickson i...@hixie.ch wrote: On Wed, 15 Feb 2012, Justin Lebar wrote: - It sets the document's current address to .../page.html#foo. Well, this is pretty bad. document.location is the document's current address [1]. So clicking #foo changed document.location from page2.html to page.html#foo, which I certainly wouldn't expect (and does not match implementations). Seems to me we should change the implementations then. There isn't any fundamental difference between linking to #foo and linking to page.html#foo if the base URL is page.html, as far as I can tell. If the implementations can't change, then I'll change the spec, but it really seems bad to me that relative URLs will break depending on when they are resolved relative to pushState() changes. When I implemented pushState, I explicitly didn't want authors to have to rewrite all their anchor links after they changed the document's current URI. From an author's point of view, there's no such thing as the document's original URI and, unless you're a nerd, you've never heard of the base URI. There's just the document's URI, modified by pushState. From this point of view, I'd say it's less surprising that relative URIs would break when you change directories (hey, you *asked* for it) than that anchor refs would update the browser's address bar and document.location relative to the old URI. If we did make the change you're suggesting, we'd have to check that it doesn't break at least the major sites which use pushstate (Facebook, anyone?). And I'd want to try to coordinate the change with WebKit so we quickly move away from the old behavior. But I'm not convinced it's worthwhile, given that there's at least an argument for the current behavior. -Justin -- Ian Hickson U+1047E )\._.,--,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Document's base URI should use the document's *current* address
The spec as written decides whether a link is a same-resource reference or not based on comparing the URLs to what you're calling the original address, not comparing it to the current address. See the navigation algorithm, step 7 /Fragment identifiers/. Maybe I'm misunderstanding, but this might not be the case in the history traversal algorithm. Step 6: If the specified entry has a URL whose fragment identifier differs from that of the current entry's when compared in a case-sensitive manner, and the two share the same Document object, then let hash changed be true. It's not clear to me what the current/specified entry's URL is, or where this is properly defined, but earlier, we say: The current entry is usually an entry for the location of the Document. and the document's location changes when we call push/replaceState. In any case, the navigation algorithm is clear as written. As currently specified, we'll resolve #foo relative to the document's original URL; that is, clicking the link will take the user to page.html#foo, not page2.html#foo. But the intent of a link with href #foo is clearly to navigate within the current page, not to go somewhere else. Were you saying that this isn't the right interpretation of the spec? Because #foo is resolved relative to the document's base URI, which is the same as the document's original URI, so we decide that #foo is a same-document link? That's comforting, if it's true. :) Note that there are problems with what you describe: what if the new URL has a different path, and there are img elements whose URLs are relative, and after pushState() you clone one? Or what about relative links in the original markup? I don't think we can change the base URL on the fly, all kinds of problems could result. I agree there are problems with changing the base URI. But it seems much less intuitive for common use-cases not to change it. We can change my example above to use ?foo instead of #foo, and I think the same argument applies. Should a link with href ?foo always resolve relative to the document's original URI (unless the base is explicitly changed)? Similarly, if for some bizarre reason the page pushState's to a new directory, shouldn't all the links point relative to that new directory? I kind of think this ship has sailed wrt implementations. Chrome and Firefox both have the same behavior in this respect. See http://people.mozilla.org/~jlebar/whatwg/test_pushstate_resolve.html (source included below, since I have a bad habit of deleting these test files right before someone else wants to look at them). Ian, how hard do you think it would be to spec changing the base and resolve the issues with that? -Justin html body a href='#foo'#foo/abr a href='?foo'?foo/abr a href='foo'foo/abr button onclick='history.pushState(, , Math.random())'pushState to new file/buttonbr button onclick='history.pushState(, , Math.random() + /file)'pushState to new directory/button /body /html On Tue, Jul 19, 2011 at 5:35 PM, Ian Hickson i...@hixie.ch wrote: On Wed, 27 Apr 2011, Justin Lebar wrote: The document base URL is used when fetching resources. Right now, if a page doesn't have a base element, the document base URL is set to the document's address. (I'm going to call this the document's original address.) The document's original address does not change when you call pushState; only the document's current address does. I think the base URI should use the document's current address, not the original address. To see why this makes sense, consider the following scenario: * User loads page.html * Page calls pushState and changes its url to page2.html * User clicks on a link with href #foo. As currently specified, we'll resolve #foo relative to the document's original URL; that is, clicking the link will take the user to page.html#foo, not page2.html#foo. But the intent of a link with href #foo is clearly to navigate within the current page, not to go somewhere else. Firefox 4 already implements pushState as I'm suggesting here. The spec as written decides whether a link is a same-resource reference or not based on comparing the URLs to what you're calling the original address, not comparing it to the current address. See the navigation algorithm, step 7 /Fragment identifiers/. Note that there are problems with what you describe: what if the new URL has a different path, and there are img elements whose URLs are relative, and after pushState() you clone one? Or what about relative links in the original markup? I don't think we can change the base URL on the fly, all kinds of problems could result. -- Ian Hickson U+1047E )\._.,--,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
[whatwg] Document's base URI should use the document's *current* address
The document base URL [1] is used when fetching resources. Right now, if a page doesn't have a base element, the document base URL is set to the document's address. (I'm going to call this the document's original address.) The document's original address does not change when you call pushState; only the document's current address [2] does. I think the base URI should use the document's current address, not the original address. To see why this makes sense, consider the following scenario: * User loads page.html * Page calls pushState and changes its url to page2.html * User clicks on a link with href #foo. As currently specified, we'll resolve #foo relative to the document's original URL; that is, clicking the link will take the user to page.html#foo, not page2.html#foo. But the intent of a link with href #foo is clearly to navigate within the current page, not to go somewhere else. Firefox 4 already implements pushState as I'm suggesting here. -Justin [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#document-base-url [2] http://www.whatwg.org/specs/web-apps/current-work/multipage/dom.html#the-document%27s-current-address
Re: [whatwg] Onpopstate is Flawed
I'm not sure I follow you here. My idea for option A is that you never get a popstate when doing the initial parsing of a page. Okay, I may still have misunderstood, despite my best efforts! :) Option B: Fire popstates as we currently do, with the caveat that you never fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it. Is my option B clear? It's also what the patch I have [1] does. We'd might want to make popstate sync again, since otherwise you have to schedule a task which synchronously checks if no state changes have occurred, and dispatches popstate only if appropriate. I know Olli has some thoughts on making popstate sync, and fwiw, FF currently dispatches it synchronously. The main problem with this proposal is that it's a big change from what the API is today. However it's only a change in the situation when the spec today calls for firing popstate during the initial page load. Something that it seems like pages don't deal with properly today anyway, at least in the case of facebook. Given the adoption the feature has seen, I guess I'd favor a smaller change. In particular, the option B above makes it possible to write correct pages without ever reading the DOM current state property -- it's there only as an optimization to allow pages to set their state faster, so no rush to put it in Right Away. In contrast, a correct page with option A would have to check its state at some point as it loads. I guess I don't see why it's better to make a big change than a small one, if they both work equally well. -Justin [1] Patch v4: https://bugzilla.mozilla.org/show_bug.cgi?id=615501 On Mon, Feb 7, 2011 at 5:07 PM, Jonas Sicking jo...@sicking.cc wrote: On Sun, Feb 6, 2011 at 10:18 AM, Justin Lebar justin.le...@gmail.com wrote: 1) Fire popstates as we currently do, with the caveat that you never fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it. Proposal B has the advantage of requiring fewer changes. The more I think about this, the more I like this option. It's a smaller change than option A (though again, we certainly could expose the state object through a DOM property separately from this proposal), and I think it would be sufficient to fix some sites which are currently broken. (For instance, I've gotten Facebook to receive stale popstates and show me the wrong page just by clicking around quickly.) Furthermore, this avoids the edge case in option B of you don't get a popstate on the initial initial load, but you do get a popstate if you're reloading from far enough back in the session history, or after a session restore. I'm not sure I follow you here. My idea for option A is that you never get a popstate when doing the initial parsing of a page. So if you're reloading from session restore or if you're going far back enough in history that you end up parsing a Document, you never get a popstate. You get a popstate when and only when you transition between two history entries while remaining on the same Document. So the basic code flow would be: Whenever creating a part of the UI (for example during page load or if called upon to render a new AJAX page), use document.currentState to decide what state to render. Whenever you receive a popstate, rerender UI as described by the popstate. So no edge cases that I can think of? The main problem with this proposal is that it's a big change from what the API is today. However it's only a change in the situation when the spec today calls for firing popstate during the initial page load. Something that it seems like pages don't deal with properly today anyway, at least in the case of facebook. I was concerned that pages might become confused when they don't get a popstate they were expecting -- for instance, if you pushState before the initial popstate, a page may never see a popstate event -- but I think this might not be such a big deal. A call to push/replaceState would almost certainly be accompanied by code updating the DOM to the new state. Popstate's main purpose is to tell me to update the DOM, so I don't think I'd be missing much by not getting it in that case. That was my thinking too FWIW. / Jonas
Re: [whatwg] Onpopstate is Flawed
The problem with option B is that pages can't display correctly until the load event fires, which can be quite late in the game what with slow loading images and ads. It means that if you're on a page which uses state, and reload the page, you'll first see the page in a state-less mode while it's loading, and at some point later (generally when the last image finishes loading) it'll snap to be in the state it was when you pressed reload. You'll get the same behavior going back to a state-using page which has been kicked out of the fast-cache. But isn't this problem orthogonal to option B? That is, we could still add the DOM property to address this concern, right? But at least with option B, one can write a correct page without reading that property -- that is, pages won't have to change in order to be as fast and correct as they currently are. -Justin
Re: [whatwg] Onpopstate is Flawed
1) Fire popstates as we currently do, with the caveat that you never fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it. Proposal B has the advantage of requiring fewer changes. The more I think about this, the more I like this option. It's a smaller change than option A (though again, we certainly could expose the state object through a DOM property separately from this proposal), and I think it would be sufficient to fix some sites which are currently broken. (For instance, I've gotten Facebook to receive stale popstates and show me the wrong page just by clicking around quickly.) Furthermore, this avoids the edge case in option B of you don't get a popstate on the initial initial load, but you do get a popstate if you're reloading from far enough back in the session history, or after a session restore. I was concerned that pages might become confused when they don't get a popstate they were expecting -- for instance, if you pushState before the initial popstate, a page may never see a popstate event -- but I think this might not be such a big deal. A call to push/replaceState would almost certainly be accompanied by code updating the DOM to the new state. Popstate's main purpose is to tell me to update the DOM, so I don't think I'd be missing much by not getting it in that case. I don't know if this is something we can get done in time for FF4, but I can see. -Justin On Wed, Feb 2, 2011 at 3:37 PM, Justin Lebar justin.le...@gmail.com wrote: Oh, I think I now understand what Jonas meant. Proposal A, as I understand it: 1) Don't fire an initial popstate, because this causes stale popstates when pushState is called before the popstate. 2) Expose the state object to the DOM so pages can find out what the initial state is when they load. (The initial state might not be null if we're restoring after a crash, or if we're going back in history after we unloaded the document.) 3) Otherwise, fire popstate like normal, once for each navigation. (With the caveat that you never want to fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it.) I think we need the caveat in step 3 because firing popstate isn't synchronous (step 11 at [1]). But if we need that caveat, maybe it's better to do what Jonas originally proposed. Proposal B: 1) Fire popstates as we currently do, with the caveat that you never fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it. Proposal B has the advantage of requiring fewer changes. (We could, of course, add the DOM property later -- it's orthogonal to proposal B, but required by proposal A.) [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#traverse-the-history On Wed, Feb 2, 2011 at 2:48 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Feb 2, 2011 at 2:34 PM, Justin Lebar justin.le...@gmail.com wrote: So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. Yeah, I think it's too late to move to this approach. Even if we also include the new state in the popstate events? Such a change seems mostly additive to the current spec. My thinking was that if someone calls replaceState, then probably that means that they're currently changing the page to represent that new state. If they do that then I don't see that they initial popstate would help them in any way? I agree it's potentially misinformative to give the page a popstate in this case. But it's possible that a page might be built so that it doesn't begin to function properly until it receives the initial popstate. If a user clicks on a link and causes a replaceState call before the initial popstate, then such a page could break. But with my suggested change, pages have no reason to wait until the initial popstate fires. And in fact they can't since we don't fire it at all :) But yes, I agree that it could break already existing pages that have the above behavior. So the question is if webkit would be ok with such a change. So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. Oh, is this why we needed the initial popstate? For instance, we persist state objects across session restore, so when the user restarts, a page could get an onload followed by a popstate with a non-null state object. [Aside: What we currently have doesn't work well for this case, since the page really needs the state object at the moment it starts to run script so it can decide what content to load, but it doesn't get
Re: [whatwg] Onpopstate is Flawed
I'm a bit uncomfortable with this behavior, since it seems that having replaceState cancel the initial popstate is at least somewhat surprising. How is this better than never firing an initial popstate? -Justin On Mon, Jan 31, 2011 at 6:32 PM, Jonas Sicking jo...@sicking.cc wrote: On Thu, Dec 23, 2010 at 6:18 PM, Henry Chan henry.fai.hang.c...@gmail.com wrote: It fixes the bit where back/forward before onload doesn't fire onpopstate. But no, it still doesn't let us detect inital onpopstate. And back/forward buttons don't work properly until onload. A workaround would be to assign the handlers to the a tags at onload but again that's not feasible for my site. I need it domready. Please make onpopstate fire as early as possible in the navigation sequence. And drop the pending state object. I need exactly each firing. Not just the last one. Would the following behavior solve your issue: If pushState or replaceState is called before the initial popstate, simply don't fire the after-onload-popstate. If the back button is pressed (or history.back() is called) after a pushState/replaceState, but before onload, fire a popstate for the newly transitioned to state. Still leave the after-onload-popstate canceled. I.e. if the webpage calls pushState or replaceState before onload fires, then it is deemed that the page has transitioned to the new state and no after-onload-popstate is needed. This behavior makes the most sense to me and allows the page to start handling state transitions before the page finishes loading. / Jonas
Re: [whatwg] Onpopstate is Flawed
So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. Yeah, I think it's too late to move to this approach. My thinking was that if someone calls replaceState, then probably that means that they're currently changing the page to represent that new state. If they do that then I don't see that they initial popstate would help them in any way? I agree it's potentially misinformative to give the page a popstate in this case. But it's possible that a page might be built so that it doesn't begin to function properly until it receives the initial popstate. If a user clicks on a link and causes a replaceState call before the initial popstate, then such a page could break. It's an edge case, but that's exactly why it concerns me -- nobody's going to test to make sure that their page works properly if the initial popstate is canceled by a push/replaceState. So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. Oh, is this why we needed the initial popstate? For instance, we persist state objects across session restore, so when the user restarts, a page could get an onload followed by a popstate with a non-null state object. [Aside: What we currently have doesn't work well for this case, since the page really needs the state object at the moment it starts to run script so it can decide what content to load, but it doesn't get the state object until after onload.] If we can't get rid of the initial popstate because of the above, then I think what Jonas proposed is reasonable. I just wish we had something with fewer gotchas. -Justin On Wed, Feb 2, 2011 at 2:15 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Feb 2, 2011 at 2:05 PM, Justin Lebar justin.le...@gmail.com wrote: I'm a bit uncomfortable with this behavior, since it seems that having replaceState cancel the initial popstate is at least somewhat surprising. How is this better than never firing an initial popstate? My thinking was that if someone calls replaceState, then probably that means that they're currently changing the page to represent that new state. If they do that then I don't see that they initial popstate would help them in any way? Yet another solution would be to always expose the current state through a member on the window or the document. Then popstate would represent any transition in the current state and wouldn't be needed for the initial page load. So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. The main problem I can think of with this design, apart from it being a bigger change from what we've got, is what happens if someone modifies the current-state member on the window/document. While we can make the member read-only, that doesn't help if the state is a deep object hierarchy. In IndexedDB we decided to not attempt to solve the problem and instead rely on authors not to trigger the footgun. / Jonas
Re: [whatwg] Onpopstate is Flawed
Oh, I think I now understand what Jonas meant. Proposal A, as I understand it: 1) Don't fire an initial popstate, because this causes stale popstates when pushState is called before the popstate. 2) Expose the state object to the DOM so pages can find out what the initial state is when they load. (The initial state might not be null if we're restoring after a crash, or if we're going back in history after we unloaded the document.) 3) Otherwise, fire popstate like normal, once for each navigation. (With the caveat that you never want to fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it.) I think we need the caveat in step 3 because firing popstate isn't synchronous (step 11 at [1]). But if we need that caveat, maybe it's better to do what Jonas originally proposed. Proposal B: 1) Fire popstates as we currently do, with the caveat that you never fire a stale popstate -- that is, if any navigations or push/replaceStates have occurred since you queued the task to fire the popstate, don't fire it. Proposal B has the advantage of requiring fewer changes. (We could, of course, add the DOM property later -- it's orthogonal to proposal B, but required by proposal A.) [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#traverse-the-history On Wed, Feb 2, 2011 at 2:48 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Feb 2, 2011 at 2:34 PM, Justin Lebar justin.le...@gmail.com wrote: So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. Yeah, I think it's too late to move to this approach. Even if we also include the new state in the popstate events? Such a change seems mostly additive to the current spec. My thinking was that if someone calls replaceState, then probably that means that they're currently changing the page to represent that new state. If they do that then I don't see that they initial popstate would help them in any way? I agree it's potentially misinformative to give the page a popstate in this case. But it's possible that a page might be built so that it doesn't begin to function properly until it receives the initial popstate. If a user clicks on a link and causes a replaceState call before the initial popstate, then such a page could break. But with my suggested change, pages have no reason to wait until the initial popstate fires. And in fact they can't since we don't fire it at all :) But yes, I agree that it could break already existing pages that have the above behavior. So the question is if webkit would be ok with such a change. So during loading, any script that wants to know what the initial (or current) state is does not need to wait for the first popstate, but can simply grab the state and go. Oh, is this why we needed the initial popstate? For instance, we persist state objects across session restore, so when the user restarts, a page could get an onload followed by a popstate with a non-null state object. [Aside: What we currently have doesn't work well for this case, since the page really needs the state object at the moment it starts to run script so it can decide what content to load, but it doesn't get the state object until after onload.] If we can't get rid of the initial popstate because of the above, then I think what Jonas proposed is reasonable. I just wish we had something with fewer gotchas. I think my latest proposed change makes this a whole lot better since the state is immediately available to scripts. The problem with only sticking the state in an event is that there is really no good point to fire the event. The later you fire it the longer it takes before the page works properly. The sooner you fire it the bigger risk you run that some script runs too late to get be able to catch the event. / Jonas
Re: [whatwg] Firing popstate for all history entry changes
It might also help if the event wasn't called popstate, since that implies a 1:1 relationship with pushState calls, but you can already get popstate events without corresponding pushState calls. historytraversal perhaps? I think we've decided here that the time for major changes to this API has past -- it's already in use in the wild. If we *do* want to change the API, I'd like to get in line. :) However, it seems like the (web) developer's mental model for popstate would be much simpler if it fired whenever the current session history entry changed, regardless of whether it has a state object or was the first entry. This is the model Firefox uses, and we're prepared to ship it in the upcoming release of version 4. It's divergent from WebKit, which has already shipped, but WebKit is going to have to change anyway. (http://webkit.org/b/41372) -Justin On Wed, Aug 25, 2010 at 2:55 PM, Mihai Parparita mih...@chromium.org wrote: There's been some discussion on http://webkit.org/b/41372 about Gecko's vs. WebKit's implementation of the popstate event. It turns out that a careful reading of http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#history-traversal, specifically of item 10, indicates that if you have this sequence of steps: 1. Go to a page 2. Change the location's fragment to #step1 3. Change the location's fragment to #step2 4. Go back 5. Go back Then popstate should be fired after every step, except for step 4 (test case at https://bugs.webkit.org/attachment.cgi?id=65467). That's because in step 4 we're going back from one history entry without a state object to another without a state object, and the target entry is not the first one for the document either. However, it seems like the (web) developer's mental model for popstate would be much simpler if it fired whenever the current session history entry changed, regardless of whether it has a state object or was the first entry. Then if someone wished to listen to all history events, they would just have to use onpopstate, instead of a combination of onpopstate and onhashchange. It might also help if the event wasn't called popstate, since that implies a 1:1 relationship with pushState calls, but you can already get popstate events without corresponding pushState calls. historytraversal perhaps? Mihai
Re: [whatwg] HTML resource packages
On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the domain of a large change to the networking stack, such as SPDY, not a small hack like resource packages. -Justin On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar justin.le...@gmail.com wrote: I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. Only if the resource package contains a different file from the real one. I suggest we treat this as a pathological case and accept that it will be broken and confusing -- or at least we consider how many extra optimizations we could make if we did accept that, before deciding whether the extra performance is worth the confusion. * There's no easy way to opt out of this behavior. That is, if I explicitly *don't* want to load content cached from a resource package, I have to name that content differently. Why would you want that, if the files are the same anyway? * The avatars-on-a-forum use case is less convincing the more I think about it. Certainly you'd want each page which displays many avatars to package up all the avatars into a single package. So you wouldn't benefit from the suggested caching changes on those pages. I don't see why not. If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. So if twenty different people post on the page, and you've been browsing for a while and have eighteen of their avatars (this will be common, a handful of people tend to account for most posts in a given forum): 1) With no resource packages, you fetch two separate avatars (but on earlier page views you suffered). 2) With resource packages as you suggest, you fetch a whole resource package, 90% of which you don't need. In fact, you have to fetch a resource package even if you have 100% of the avatars on the page! No two pages will be likely to have the same resource package, so you can't share cache at all. 3) With resource packages as I suggest, you fetch only two separate avatars, *and* you got the benefits of resource packages on earlier pages. The UA gets to guess whether using resource packages would be a win on a case-by-case basis, so in particular, it should be able to perform strictly better than either (1) or (2), given decent heuristics. E.g., the heuristic fetch the resource package if I need at least two files, fetch the file if I only need one will perform better than either (1) or (2) in any reasonable circumstance. I think this sort of situation will be fairly common. Has anyone looked at a bunch of different types of web pages and done a breakdown of how many assets they have, and how they're reused across pages? If we're talking about assets that are used only on one page (image search) or all pages (logos, shared scripts), your approach works fine, but not if they're used on a random mix of pages. I think a lot of files will wind up being used on only particular subsets of pages. In general, I think we need something like SPDY to really address the problem of duplicated downloads. I don't think resource packages can fix it with any caching policy. Certainly there are limits to what resource packages can do, but we can wind up closer to the limits or farther from them depending on the implementation details.
Re: [whatwg] HTML resource packages
Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I'll post this to the bug when I get home tonight. But your comments are astute -- the page I used is a pretty bad benchmark for a variety of reasons. It sounds like you probably could hack up a much better one. a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? Since I was running on a simulated network with no random parameters (e.g. no packet loss), there was very little variance in load time across runs. d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? That's correct. I'm betting time-to-paint goes through the roof with resource bundles:-) It does right now because we don't support incremental extraction, which is why I didn't bother measuring time-to-paint. The hope is that with incremental extraction, we won't take too much of a hit. -Justin On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe m...@belshe.com wrote: Justin - Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I have a few concerns about the benchmark: a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? b) As you note in the report, slow start will kill you. I've verified this so many times it makes me sick. If you try more combinations, I believe you'll see this. c) The 1.3MB of subresources in a single bundle seems unrealistic to me. On one hand you say that its similar to CNN, but note that CNN has JS/CSS/images, not just thumbnails like your test. Further, note that CNN pulls these resources from multiple domains; combining them into one domain may work, but certainly makes the test content very different from CNN. So the claim that it is somehow representative seems incorrect. For more accurate data on what websites look like, see http://code.google.com/speed/articles/web-metrics.html d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? e) There is more to a browser than page-load-time. Time-to-first-paint is critical as well. For instance, in WebKit and Chrome, we have specific heuristics which optimize for time-to-render instead of total page load. CNN is always cited as a bad page, but it's really not - it just has a lot of content, both below and above the fold. When the user can interact with the page successfully, the user is happy. In other words, I know I can make webkit's PLT much faster by removing a couple of throttles. But I also know that doing so worsens the user experience by delaying the time to first paint. So - is it possible to measure both times? I'm betting time-to-paint goes through the roof with resource bundles:-) If you provide the content, I'll try to run some tests. It will take a few days. Mike On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar justin.le...@gmail.com wrote: On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the domain of a large change to the networking stack, such as SPDY, not a small hack like resource packages. -Justin On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar justin.le...@gmail.com wrote: I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. Only if the resource package contains a different file from the real one. I suggest we treat this as a pathological case and accept that it will be broken and confusing -- or at least we consider how many extra optimizations we could make if we did accept that, before deciding whether the extra performance is worth the confusion. * There's no easy way to opt out of this behavior. That is, if I explicitly *don't* want to load content cached from a resource package, I have to name that content differently. Why would you want that, if the files are the same anyway? * The avatars-on-a-forum use case is less convincing the more I think about it. Certainly you'd want each page
Re: [whatwg] HTML resource packages
The files I used for the rough benchmarks are available in a tarball at [1]. Live pages are at [2] and [3]. [1] http://people.mozilla.org/~jlebar/respkg/test/benchmark_files.tgz [2] http://people.mozilla.org/~jlebar/respkg/test/test-pkg.html [3] http://people.mozilla.org/~jlebar/respkg/test/test-nopkg.html -Justin On Mon, Aug 9, 2010 at 1:40 PM, Justin Lebar justin.le...@gmail.com wrote: Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I'll post this to the bug when I get home tonight. But your comments are astute -- the page I used is a pretty bad benchmark for a variety of reasons. It sounds like you probably could hack up a much better one. a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? Since I was running on a simulated network with no random parameters (e.g. no packet loss), there was very little variance in load time across runs. d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? That's correct. I'm betting time-to-paint goes through the roof with resource bundles:-) It does right now because we don't support incremental extraction, which is why I didn't bother measuring time-to-paint. The hope is that with incremental extraction, we won't take too much of a hit. -Justin On Mon, Aug 9, 2010 at 1:30 PM, Mike Belshe m...@belshe.com wrote: Justin - Can you provide the content of the page which you used in your whitepaper? (https://bug529208.bugzilla.mozilla.org/attachment.cgi?id=455820) I have a few concerns about the benchmark: a) Looks like pages were loaded exactly once, as per your notes? How hard is it to run the tests long enough to get to a 95% confidence interval? b) As you note in the report, slow start will kill you. I've verified this so many times it makes me sick. If you try more combinations, I believe you'll see this. c) The 1.3MB of subresources in a single bundle seems unrealistic to me. On one hand you say that its similar to CNN, but note that CNN has JS/CSS/images, not just thumbnails like your test. Further, note that CNN pulls these resources from multiple domains; combining them into one domain may work, but certainly makes the test content very different from CNN. So the claim that it is somehow representative seems incorrect. For more accurate data on what websites look like, see http://code.google.com/speed/articles/web-metrics.html d) What did you do about subdomains in the test? I assume your test loaded from one subdomain? e) There is more to a browser than page-load-time. Time-to-first-paint is critical as well. For instance, in WebKit and Chrome, we have specific heuristics which optimize for time-to-render instead of total page load. CNN is always cited as a bad page, but it's really not - it just has a lot of content, both below and above the fold. When the user can interact with the page successfully, the user is happy. In other words, I know I can make webkit's PLT much faster by removing a couple of throttles. But I also know that doing so worsens the user experience by delaying the time to first paint. So - is it possible to measure both times? I'm betting time-to-paint goes through the roof with resource bundles:-) If you provide the content, I'll try to run some tests. It will take a few days. Mike On Mon, Aug 9, 2010 at 9:52 AM, Justin Lebar justin.le...@gmail.com wrote: On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: If UAs can assume that files with the same path are the same regardless of whether they came from a resource package or which, and they have all but a couple of the files cached, they could request those directly instead of from the resource package, even if a resource package is specified. These kinds of heuristics are far beyond the scope of resource packages as we're planning to implement them. Again, I think this type of behavior is the domain of a large change to the networking stack, such as SPDY, not a small hack like resource packages. -Justin On Mon, Aug 9, 2010 at 9:47 AM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Aug 6, 2010 at 7:40 PM, Justin Lebar justin.le...@gmail.com wrote: I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. Only if the resource package contains a different file from the real one. I suggest we treat this as a pathological case and accept that it will be broken and confusing -- or at least we consider how many extra optimizations we could make if we did accept that, before deciding whether
Re: [whatwg] HTML resource packages
On Fri, Aug 6, 2010 at 12:46 AM, Christoph Päper christoph.pae...@crissov.de wrote: Justin Lebar: Christoph Päper christoph.pae...@crissov.de wrote: Why do you want to put this on the HTML level (exclusively), not the HTTP level? If you reference an image from a CSS file and include that CSS file in an HTML file which uses resource packages, the image can be loaded from the resource package. Yeah, it’s still wrong. Resource packages in HTML seem okay for the image gallery use case (and then could be done with ‘link’), but they’re commonly inappropriate for anything referenced from ‘link’, ‘script’ and ‘style’ elements. Your remark on loading order just proves this point: you want resource packages referenced before ‘head’. You should move one step further than the root element, i.e. to the transport layer. We want resource packages to work for people who don't have the ability to set custom headers for their pages or who don't even know what an HTTP header is. I agree that it's a hack, but I don't understand how putting the packages information in the html element makes it inappropriate to load from a resource package resources referenced in link, script, and style elements. Is the issue just that the HTML file's |packages| attribute affects what we load when we see @import url() in a separate CSS file? This seems like a feature, not a bug, to me. SPDY will do this the Right Way, if we're patient. -Justin
Re: [whatwg] HTML resource packages
So if resource packages don't share caches, you need to either give up on caching, [or] put a given file only in one resource package on your whole site. The latter is not practical if pages use small, fairly random subsets of your assets and it's not feasible to package them all on every page view. Think avatars on a web forum I think this is a fair point. But I'd suggest we consider the following: * It might be confusing for resources from a resource package to show up on a page which doesn't opt-in to resource packages in general or to that specific resource package. * There's no easy way to opt out of this behavior. That is, if I explicitly *don't* want to load content cached from a resource package, I have to name that content differently. * The avatars-on-a-forum use case is less convincing the more I think about it. Certainly you'd want each page which displays many avatars to package up all the avatars into a single package. So you wouldn't benefit from the suggested caching changes on those pages. You might benefit on a user profile page which just displays one avatar. You might try and be clever and leave the avatar out of the profile page's resource package on the assumption that the UA already has that avatar in its cache. But then your page would load slower for users who visited the profile page without first getting the avatar from another resource package. Maybe you'd benefit from the suggested changes if you'd half-deployed resource packages on your site, so some pages had packages and others didn't. But I don't think that's a use case we should design for. In general, I think we need something like SPDY to really address the problem of duplicated downloads. I don't think resource packages can fix it with any caching policy. -Justin On Fri, Aug 6, 2010 at 2:17 PM, Aryeh Gregor simetrical+...@gmail.com wrote: On Tue, Aug 3, 2010 at 8:31 PM, Justin Lebar justin.le...@gmail.com wrote: We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ I have some concerns about caching behavior here, which I've mentioned before. Consider a site that has a landing page that has lots of first-time viewers. To accelerate that page view, you might want to add a resource package containing all the assets on the page, to speed up views in the cold cache case. Some of those assets will be reused on other pages, and some will not. When the user navigates to another page, what's supposed to happen? If you hadn't used resource packages at all, they would have a hot cache, so they'd get all the shared assets on every subsequent page view for free. But now they don't -- instead of the first view being slow, it's the second view, when they leave the landing page. This isn't a big improvement. So if resource packages don't share caches, you need to either give up on caching, put a given file only in one resource package on your whole site. The latter is not practical if pages use small, fairly random subsets of your assets and it's not feasible to package them all on every page view. Think avatars on a web forum: you might have 20 different avatars displayed per page, from a pool of tens of thousands or more. Do you have to decide between not using resource packages and not getting any caching? You've said before that your goal in this requirement is predictability -- if there's an inconsistency between different resource packages or between a resource package and the real file, you don't want users to get different results depending on what order they visit the pages in. This is fair enough, but I'm worried that the caching problems this approach causes will make it more of a hindrance than a benefit for a wide class of use-cases. There's some possible inconsistency anyway whenever caching is permitted at all, because if the page provides incorrect caching headers, the UA might have an out-of-date copy. Also, different browsers will be inconsistent too, until all UAs in common use have implemented resource packages -- some will use the packaged file and some the real file. Is the extra inconsistency from letting the caches mix really too much to ask for the cacheability benefits? I don't think so.
Re: [whatwg] HTML resource packages
Brett Zamir bret...@yahoo.com wrote: 1) I think it would be nice to see explicit confirmation in the spec that this works with offline caching. Yes. I'll do that. 2) Could data files such as .txt, .json, or .xml files be used as part of such a package as well? 3) Can XMLHttpRequest be made to reference such files and get them from the cache, and if so, when referencing only a zip in the packages attribute, can XMLHttpRequest access files in the zip not spelled out by a tag like link/? I think this would be quite powerful/avoid duplication, even if it adds functionality (like other HTML5 features) which would not be available to older browsers. This is tricky. The problem is: If you have an img on a page which might be able to be served from a resource package, we'll block the download of the image until can either serve the request from a resource package or can be sure that no package contains the image. I can imagine this behavior being confusing with XMLHttpRequests. On the other hand, it could certainly be powerful when used correctly. I think the natural thing is go ahead and treat things requested by an XMLHttpRequest the same as anything else on a page and retrieve them from packages as possible. If you really don't want your XMLHttpRequest to block on a resource package, you can always use a POST. But I need to investigate more to determine whether this makes sense. 4) Could such a protocol also be made to accommodate profiles of packages, e.g., by a namespace being allowable somewhere for each package? This sounds way outside the scope of what we're trying to do with resource packages. I'm all for designing for the future, but I don't think we want to introduce the complexity even of these namespaces unless we intend to use them immediately. Maciej Stachowiak m...@apple.com wrote: Have you done any performance testing of this feature, and if so can you share any of that data? There's a document (PDF) with some rough performance numbers in the bug: https://bugzilla.mozilla.org/attachment.cgi?id=455820 Although the results are preliminary, I think doing much more than this on a simulated network for a test page might be going a bit overboard. Results from real pages over real networks would be much more meaningful at this point. Separately, I am curious to hear how http headers are handled; it's a TODO in the spec, and what the TODO says seems poor for the Content-Type header in particular. It would make it hard to use package resources in any context that looks at the MIME type rather than always sniffing. Any thoughts on this? The intent is for UAs to sniff the content-type of anything coming from a resource package, so I think that TODO needs to be turned on its head: The UA shouldn't apply any of the response headers from the resource package to its elements. Christoph Päper christoph.pae...@crissov.de wrote: A page indicates in its html element that it uses one or more resource packages (…). Why do you want to put this on the HTML level (exclusively), not the HTTP level? ... Images might be referenced from within HTML or CSS files. If you reference an image from a CSS file and include that CSS file in an HTML file which uses resource packages, the image can be loaded from the resource package. Why did you decide against link rel=resource-package href=pkg1.zip#files='img1.png,…'/ or something like that? (The hash part is just guesswork.) We actually originally spec'ed resource packages with the link tag, but we encountered some difficulties with this. For example, it led to confusing behavior when a resource package was defined after a link rel='javascript'. Do we load the script from the network, or do we wait until we've received the whole head before loading any scripts? Resource packages as a link also interacted poorly with Mozilla's speculative parsing algorithm, which tries to download resources before we run the page's scripts. We probably could have come up with semantics which didn't run into problems with our own speculative parsing implementation, but we realized it would be difficult to spec it in such a way that we didn't make things very difficult for *someone*. * Argument: What about incremental rendering? The spec (and our implementation in Firefox) cares deeply about incremental rendering. Although the zip format isn't strictly suitable for incremental extraction, I defined alternate semantics in the spec which should work. Zip is better than tar-gz for this kind of thing for two reasons: * Zip file headers are uncompressed, so you don't have to extract the whole file in order to tell what's inside. * Entries in a zip file are individually compressed. Although this might cause you to compress less effectively, you can compress all your files ahead of time and construct a zip file on the fly pretty very cheaply. Philip Taylor excors+wha...@gmail.com wrote: It seems a bit surprising that
Re: [whatwg] HTML resource packages
If you do want it to work the same then you'll need to hook into the parser and ignore dynamic updates. Indeed. And since I explicitly *do* want dynamic updates, it'll need to change. Thanks. On Wed, Aug 4, 2010 at 1:32 PM, Philip Taylor excors+wha...@gmail.com wrote: On Wed, Aug 4, 2010 at 9:01 PM, Justin Lebar justin.le...@gmail.com wrote: What happens if the document contains multiple html elements (not all the root element)? (e.g. if it's XHTML, or the elements are added by scripts). The packages spec seems to assume there is only ever one. The packages attribute should work like the manifest attribute currently works. I don't see language in the cache manifest section of HTML5 (6.6) specifying what happens when there are multiple html elements, so I hope I don't need to specify this either. :) http://whatwg.org/html#attr-html-manifest says: The manifest attribute only has an effect during the early stages of document load. Changing the attribute dynamically thus has no effect (and thus, no DOM API is provided for this attribute). Its effect is triggered from http://whatwg.org/html#parser-appcache (html token in the before html insertion mode) or from http://whatwg.org/html#read-xml , so it will only ever run for the root html element of the document. The packages attribute is defined as running Whenever the packages attribute is changed (including when the document is first loaded, if its html element has a packages attribute), so it's not the same. If you do want it to work the same then you'll need to hook into the parser and ignore dynamic updates. -- Philip Taylor exc...@gmail.com
[whatwg] HTML resource packages
We at Mozilla are hoping to ship HTML resource packages in Firefox 4, and we wanted to get the WhatWG's feedback on the feature. For the impatient, the spec is here: http://people.mozilla.org/~jlebar/respkg/ and the bug (complete with builds you can try and some preliminary performance numbers) is here: https://bugzilla.mozilla.org/show_bug.cgi?id=529208 You can think of resource packages as image spriting 2.0. A page indicates in its html element that it uses one or more resource packages (which are just zip files). Then when that page requests a resource (be it an image, a css file, a script, or whatever), the browser first checks whether one of the packages contains the requested resource. If so, the browser uses the resource out of the package instead of making a separate HTTP request for the resource. There's of course more detail than that, of course. Hopefully it's (mostly) clear in the spec. I envision two classes of users of resource packages. I'll call the first resource-constrained developers. These developers care about how fast their page is (who doesn't?), but can't spend weeks speeding up their page. For these developers, resource packages are an easy way to make their pages faster without going through the pain of spriting their images and packaging their js/css. The other class of users are the resource-unconstrained developers; think Google or Facebook. These developers have already put a huge amount of effort into making their pages fast, and a naive application of resource packages is unlikely to make them any faster. But these developers may be able to use resource packages cleverly to gain speedups. In particular, nobody (to my knowledge) currently sprites content images, such as the results of an image search. A determined set of developers should be able to construct resource packages for image search results on the fly and save some HTTP requests. So we can avoid rehashing here the common objections to resource packages, here's a brief overview of the arguments I've heard against the feature and my responses. * Argument: Packaging isn't the way forward. When you change one resource in a package you have to change the whole package and so the user has to re-download all the bits when most of what was in their cache would have been fine. This is of course correct, but we don't think it eliminates the utility of resource packages. The resource-constrained developer is probably happy with anything which speeds up page loads, even if it's not optimal when one part of the page changes. And the resource-unconstrained developer probably won't find resource packages too useful for non-dynamic content, so caching isn't an issue in that case. * Argument: We can already package things pretty well. Mozilla should instead be focusing on improving caching (or something else). I'd contend that we don't package particularly well in general. The Facebook homepage loads 100 separate resources on a cold cache, and they certainly care about speed. But anyway, this is just one project. We're also looking at caching. :) * Argument: Isn't this subsumed by HTTP pipelining? Mostly. But we can't turn on HTTP pipelining because transparent proxies break it. Resource packages have the further benefit that they allow page authors to explicitly set the order in which the UA will download the resources -- with pipelining, an important resource might get stuck behind a large, unimportant resource, while with resource packages, the UA always downloads resources in the order they appear in the zip file. Last, my understanding is that the HTTP pipeline isn't particularly deep, so perhaps resource packages fill the TCP pipe better on high-latency connections. I haven't looked into this, though. * Argument: What about SPDY? I think SPDY should subsume resource packages. But its deployment will require changes to both web clients and servers, so it will probably take a while after it's released before it's available on all web servers. And we have no idea when to expect SPDY to be ready for production. Resource packages, in contrast, are something we can have Right Now. Additionally, since resource packages are backwards-compatible -- a page which specifies resource packages should display just fine in a browser which doesn't support them -- we should be able to turn off resource packages in the future if we decide we don't want them anymore. We'd love to hear what you think of the specification and our implementation. -Justin
Re: [whatwg] history.pushState() and replaceState()'s title parameter
Just to follow up on this: We just pushed a change to Firefox to completely ignore the title parameter, as WebKit does. We're getting close to locking down Firefox for the next release. If we want to do something more creative with the title parameter, now is the time for action. -Justin On Wed, Jun 23, 2010 at 11:15 AM, Justin Lebar justin.le...@gmail.com wrote: Safari 5 and Chrome 5 recently shipped the history.pushState and replaceState methods. Firefox 4 will also include those methods when it ships. pushState and replaceState take three arguments: An opaque data object, a title, and an optional URL. Currently, Safari and Chrome both ignore the title parameter. Jonas Sicking jo...@sicking.cc and I have been talking with Brady Eidson beid...@apple.com and Darin Fisher da...@chromium.org, about what we can do to clean up this API, since having an unused parameter in our brand-new functions is unfortunate. Ideally, we might change the pushState and replaceState methods themselves, perhaps changing them so they only take a URL and an optional data object. But since Chrome and Safari have already shipped the method, and since we hear that the functions are already being used on the web, it's probably too late to add or remove arguments from the functions. It seems that the intent of the spec as it stands is that the title parameter should show up in the session history list (shown e.g. when you click the down arrow next to the forward button), but not in the application's title bar. We think this is confusing (as evidence, observe that two browsers skipped this step!) and adds a lot of complexity for a small amount of gain, so we're not in favor of this approach. If modifying the document's title in the session history list is a desirable feature, then we could expose that property to the DOM just as we expose document.title. Seeing as we're stuck with the title argument in pushState and replaceState, we propose that it modify document.title in an intuitive way: * Before we unload a history entry, we save document.title into the history entry. * When we activate a history entry, we set document.title to the value stored in the history entry. * When we pushState, we set document.title to the title parameter after activating the new history entry. * When we replaceState, we set document.title to the title parameter. In the last two cases, if the title parameter is empty, we leave document.title unchanged. We think this is a good compromise between complexity and functionality. -Justin
[whatwg] push/replaceState interacting with POSTs
We have a minor issue using replaceState in Bugzilla that we may or may not want to fix up in the spec. When you make a change to a bug, Bugzilla POSTs you from a nice-looking URL, say https://bugzilla.mozilla.org/show_bug.cgi?id=577720 , to https://bugzilla.mozilla.org/process_bug.cgi This is annoying because it breaks refresh and bookmarking, even though process_bug.cgi is logically displaying the same page that show_bug.cgi was previously displaying. Apparently fixing this the Right Way is difficult in Bugzilla, so the developers are considering using history.replaceState() to change the URL of process_bug.cgi back to show_bug.cgi?id=xxx. This works well, but it has the small problem that when you refresh the page after processing a bug, Firefox shows you the warning it shows when you refresh a page which was POST'ed to. I wonder if calling push/replaceState should cause the browser to consider the affected history entry as the result of a GET, even if it was the result of a POST. Bugzilla may be abusing the API here a bit, but it's still not clear that we're doing the right thing when we prompt the user on a refresh (or if we were to refuse to load the page on a session restore since the load isn't idempotent). I'm curious what the WhatWG thinks of this. -Justin
Re: [whatwg] push/replaceState interacting with POSTs
On Fri, Jul 16, 2010 at 3:11 PM, Aryeh Gregor simetrical+...@gmail.com wrote: What do other browsers do? Chrome 6.0.458.1 dev on Linux warns on refresh after a pushState or a replaceState. Firefox trunk (Mozilla/5.0 (X11; Linux x86_64; en-US; rv:2.0b2pre) Gecko/20100716 Minefield/4.0b2pre) warns on a refresh only after a replaceState. http://people.mozilla.org/~jlebar/test/general/pushstate-post.html -Justin On Fri, Jul 16, 2010 at 3:11 PM, Aryeh Gregor simetrical+...@gmail.com wrote: On Fri, Jul 16, 2010 at 1:13 PM, Justin Lebar justin.le...@gmail.com wrote: We have a minor issue using replaceState in Bugzilla that we may or may not want to fix up in the spec. When you make a change to a bug, Bugzilla POSTs you from a nice-looking URL, say https://bugzilla.mozilla.org/show_bug.cgi?id=577720 , to https://bugzilla.mozilla.org/process_bug.cgi This is annoying because it breaks refresh and bookmarking, even though process_bug.cgi is logically displaying the same page that show_bug.cgi was previously displaying. Apparently fixing this the Right Way is difficult in Bugzilla, so the developers are considering using history.replaceState() to change the URL of process_bug.cgi back to show_bug.cgi?id=xxx. This is a standard nuisance: you want to display a success/failure message. You don't want to just display it in the POST result, because then you get browser warnings, the URL can't be copy-pasted, etc. You don't want to tack it on as a URL parameter because then the success/failure messages can be forged. There's no good answer I'm aware of barring tedious server-side trickery (like queuing up a message for display on the next view of certain types of pages). replaceState() sounds like it should be a decent solution if implemented as you'd like, although it only works if JavaScript is enabled, so it's not ideal. This works well, but it has the small problem that when you refresh the page after processing a bug, Firefox shows you the warning it shows when you refresh a page which was POST'ed to. I wonder if calling push/replaceState should cause the browser to consider the affected history entry as the result of a GET, even if it was the result of a POST. Bugzilla may be abusing the API here a bit, but it's still not clear that we're doing the right thing when we prompt the user on a refresh (or if we were to refuse to load the page on a session restore since the load isn't idempotent). I'm curious what the WhatWG thinks of this. I'd think that hitting refresh when the URL has been changed by JavaScript should load the current URL displayed in the location bar. If this is different from the actual URL that the page was originally served from, then submitting POST data that was submitted for the current page probably makes no sense, so treating the new request in all ways as a GET seems like the only sensible thing. So I'd say this is a Firefox bug, if Firefox does this. (What do other browsers do? WebKit implements replaceState, right?)
[whatwg] Ambiguity re firing the popstate event
Section 6.5.9.1 [1] says: The popstate event is fired when navigating to a session history entry that represents a state object. In contrast, section 6.5.9 [2] indicates in step 10 that a popstate event is fired if the history entry represents a state object or the first entry for a document. Unfortunately this ambiguity has caused WebKit and Mozilla to implement popstate in two different ways [3]. I think we can resolve this in the spec by changing the line from 6.5.9.1 to: The popstate event is fired when navigating to a session history entry. -Justin [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#event-definitions [2] http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#history-traversal [3] https://bugs.webkit.org/show_bug.cgi?id=41372
[whatwg] history.pushState() and replaceState()'s title parameter
Safari 5 and Chrome 5 recently shipped the history.pushState and replaceState methods. Firefox 4 will also include those methods when it ships. pushState and replaceState take three arguments: An opaque data object, a title, and an optional URL. Currently, Safari and Chrome both ignore the title parameter. Jonas Sicking jo...@sicking.cc and I have been talking with Brady Eidson beid...@apple.com and Darin Fisher da...@chromium.org, about what we can do to clean up this API, since having an unused parameter in our brand-new functions is unfortunate. Ideally, we might change the pushState and replaceState methods themselves, perhaps changing them so they only take a URL and an optional data object. But since Chrome and Safari have already shipped the method, and since we hear that the functions are already being used on the web, it's probably too late to add or remove arguments from the functions. It seems that the intent of the spec as it stands is that the title parameter should show up in the session history list (shown e.g. when you click the down arrow next to the forward button), but not in the application's title bar. We think this is confusing (as evidence, observe that two browsers skipped this step!) and adds a lot of complexity for a small amount of gain, so we're not in favor of this approach. If modifying the document's title in the session history list is a desirable feature, then we could expose that property to the DOM just as we expose document.title. Seeing as we're stuck with the title argument in pushState and replaceState, we propose that it modify document.title in an intuitive way: * Before we unload a history entry, we save document.title into the history entry. * When we activate a history entry, we set document.title to the value stored in the history entry. * When we pushState, we set document.title to the title parameter after activating the new history entry. * When we replaceState, we set document.title to the title parameter. In the last two cases, if the title parameter is empty, we leave document.title unchanged. We think this is a good compromise between complexity and functionality. -Justin
Re: [whatwg] History API, pushState(), and related feedback
On Thu, Jan 14, 2010, Hixie...oh dear. On Tue, 18 Aug 2009, Justin Lebar wrote: (An attempt at describing how pushstate is supposed to be used.) That's not quite how I would describe it. It's more that each entry in the session history has a URL and optionally some data. The data can be used for two main purposes: first, storing a preparsed description of the state in the URL so that in the simple case you don't have to do the parsing (though you still need the parsing for handling URLs passed around by users, so it's only a minor optimisation), and second, so that you can store state that you wouldn't store in the URL, because it only applies to the current Document instance and you would have to reconstruct it if a new Document were opened. An example of the latter would be something like keeping track of the precise coordinate from which a popup div was made to animate, so that if the user goes back, it can be made to animate to the same location. Or alternatively, it could be used to keep a pointer into a cache of data that would be fetched from the server based on the information in the URL, so that when going back and forward, the information doesn't have to be fetched again. Basically any information that is not information that you would not include in a URL describing the page, but which could be useful when going backwards and forwards in the history. Can we publish this somewhere? This is crucial and not obvious. If the Document is not recoverable, then recovering the state object makes little sense, IMHO. We should not be encouraging a world in which the meaningful state of a page is described by more than its URL. However, it's a UA decision whether to enable this or not. Yes, but we want to make sure we're making the right UA decision. :) I approached this from a different angle: Does it make sense to persist the fact that two history entries with (potentially) different URLs correspond to the same document across session history? If pushState is supposed to replace using the hash to store data, then we should persist this fact across session restores, right? But then we have to also persist the state data; otherwise, if the page used pushState with no URL argument, it wouldn't be able to distinguish between the two states. I think you have a strong argument above. On the other hand, the fact that history entries X and Y are in fact the same Document is itself page state which isn't stored in the URL. On Tue, 5 Jan 2010, Justin Lebar wrote: I think this is correct. A popstate event is always dispatched whenever a new session history entry is activated (6.10.3). Actually if multiple popstates are fired before 'load' fires, all but the last are discarded, and the last waits until after 'load' fires to be fired. But otherwise yes. Oh, interesting. I didn't even notice that popstate is async again. Good to know. -Justin
Re: [whatwg] question about the popstate event
If I'm understanding the bug correctly, Brady is suggesting not that a popstate event isn't fired when we navigate back to a document which has been unloaded from memory, but that the state object in that popstate event is null. As I understand it, the crux of his argument relates to the algorithm to update the session history with the new page [1]: 2) If the navigation was initiated for entry update of an entry 1) Replace the entry being updated with a new entry representing the new resource and its Document object and related state. I think he's arguing that the set of related state that is copied to the new entry does not contain the state object. His evidence for this is mostly textual: This state is referenced in other parts of the spec, and in those places, it's made clear that the state consists of scroll position and form fields: (From comment #4 at https://bugs.webkit.org/show_bug.cgi?id=33224) I believe state in this context is not referring to state objects, but rather persisted user state as set forth in 5.11.9 step 3: For example, some user agents might want to persist the scroll position, or the values of form controls. I think this is a good point from a textual perspective. But I think it's clear that we actually want to persist state objects across Document unloads. If we didn't care about this use case, we could do away with state objects altogether. A document could just call pushstate with no state variable and store its state objects in a global variable indexed by an identifier in the URL. When the page receives a popstate, it checks its URL and grabs the relevant state object. Simple. (This doesn't handle multiple entries with the same URL, but hash navigation doesn't handle that either, so that's not a big problem.) My point is that state objects are pretty much useless unless you persist them after the document has been unloaded. I also think the fact that we take the structured clone of a state object before saving it (and that structured clone forbids pointers to DOM objects and whatnot) indicates that the spec intended for state objects to stick around after document unload. Otherwise, why bother making a restrictive copy? (It should go without saying that if you're saving state objects across document unloads, you should also be saving the has same document relationships between history entries. That is, suppose history entry A calls pushstate and creates history entry B. Some time later, the document for A and B is unloaded, then the user goes back to B, which is re-fetched into a fresh Document. Then the user clicks back, activating A. We should treat the activation of A from B as an activation between two entries with the same document, and not re-fetch A.) Where the spec needs to be clarified to support this, I think it should be. But let's first agree that this is the right thing to do. -Justin [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#update-the-session-history-with-the-new-page On Tue, Jan 12, 2010 at 3:54 PM, Darin Fisher da...@chromium.org wrote: Hi, I've been discussing this issue with Brady Eidson over at https://bugs.webkit.org/show_bug.cgi?id=33224, and his interpretation appears to be different. (I think he may have convinced me too.) I'd really like some help understanding how pushState is intended to work and to see how that lines up with the spec. Also, assuming Brady is correct, then I wonder why pushState was designed this way. It seems strange to me that entries in session history would disappear when you navigate away from a document that used pushState. -Darin On Tue, Jan 5, 2010 at 6:55 PM, Justin Lebar justin.le...@gmail.com wrote: From my reading of the spec, I would expect the following steps: 5. Page A is loaded. 6. The load event for Page A is dispatched. 7. The popstate event for Page A is dispatched. I think this is correct. A popstate event is always dispatched whenever a new session history entry is activated (6.10.3). -Justin On Tue, Jan 5, 2010 at 4:53 PM, Darin Fisher da...@chromium.org wrote: I'd like to make sure that I'm understanding the spec for pushState and the popstate event properly. Suppose, I have the following sequence of events: 1. Page A is loaded. 2. Page A calls pushState(foo, null). 3. The user navigates to Page B. 4. The user navigates back to Page A (clicks the back button once). Assuming the document of Page A was disposed upon navigation to Page B (i.e., that it was not preserved in a page cache), should a popstate event be generated as a result of step 4? From my reading of the spec, I would expect the following steps: 5. Page A is loaded. 6. The load event for Page A is dispatched. 7. The popstate event for Page A is dispatched. Do I understand correctly? Thanks, -Darin
Re: [whatwg] question about the popstate event
From my reading of the spec, I would expect the following steps: 5. Page A is loaded. 6. The load event for Page A is dispatched. 7. The popstate event for Page A is dispatched. I think this is correct. A popstate event is always dispatched whenever a new session history entry is activated (6.10.3). -Justin On Tue, Jan 5, 2010 at 4:53 PM, Darin Fisher da...@chromium.org wrote: I'd like to make sure that I'm understanding the spec for pushState and the popstate event properly. Suppose, I have the following sequence of events: 1. Page A is loaded. 2. Page A calls pushState(foo, null). 3. The user navigates to Page B. 4. The user navigates back to Page A (clicks the back button once). Assuming the document of Page A was disposed upon navigation to Page B (i.e., that it was not preserved in a page cache), should a popstate event be generated as a result of step 4? From my reading of the spec, I would expect the following steps: 5. Page A is loaded. 6. The load event for Page A is dispatched. 7. The popstate event for Page A is dispatched. Do I understand correctly? Thanks, -Darin
Re: [whatwg] Question about pushState
On Wed, Dec 16, 2009 at 3:06 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Dec 16, 2009 at 11:51 AM, Darin Fisher da...@chromium.org wrote: I would have expected it to behave like a reference fragment navigation, which prunes *all* forward session history entries. I agree. I *think* what you are suggesting is what the implementation that Justin Lebar has written for Firefox does. Yes, with my patch, the forward button is never active after a pushState. It wasn't an intentional deviation from the spec, but I agree with Darin's reasoning: If pushState is a replacement for the hash-navigation hack, then it should behave like a hash navigation. -Justin
[whatwg] push/replaceState title parameter (was AJAX History Concerns)
On Mon, Nov 23, 2009 at 5:01 PM, Ian Hickson i...@hixie.ch wrote: On Fri, 13 Nov 2009, Justin Lebar wrote: On Thu, Nov 12, 2009 at 5:43 PM, Ian Hickson i...@hixie.ch wrote: The idea is that the string you would put into the back button or history menu is not the same as the string you would put into the title bar or bookmarks (i.e. not the same as title). That doesn't seem too unreasonable, but I think it's strange to set that title through push/replaceState, since an alternate page title is orthogonal to the idea of an AJAX page with state objects. No more so than an alternative URL, surely? I'm not sure I agree. It seems to me that if you set the page's URL, it's likely that you'll want to change the state object (if you're not storing all your data in the URL). On the other hand, one might want to change the history entry title without ever changing the URL or the state object. In the simple case, consider a page which uses no AJAX at all, but just wants to display a shorter title in the history than in the titlebar of the browser. Does it make sense for this page to call history.replaceState(null, 'new title');? It might be confusing to expose this alternate title in the document object, but perhaps we could expose it as a property or setter function somewhere else. Then we could persist it properly across forward / backs within the same document. It seems like that would just cause everyone to call pushState() and updateTitle() instead of just calling pushState(), except that then people would forget to update the title and your history would have a bunch of silly-looking titles like Inbox (3), Inbox (20), Inbox (4). Well, people are already going to have to call pushState() and then set document.title if they want to update the title at the top of the browser, even if they specify a title in pushState(). I imagine that most pages aren't going to try to maintain two parallel sets of titles. For these cases, I think a pushState() function without a title and propagating document.title changes into the history entry makes sense, because this is what those pages already were doing without pushstate. For those pages which really want to have two titles, it doesn't seem unreasonable to me that they should have to write an extra line of code to explicitly set the history entry's title. Without this extra setHistoryEntryTitle() function, I think the API for updating the history entry title becomes unnecessarily complicated. If you haven't used pushState() or replaceState(), then the history entry's title gets updated when you modify document.title. But as soon as you call one of those functions, the two titles become permanently unlinked, and further updates to the history entry's title have to go through replaceState. And if you want to change the history entry's title, you now have to save or reconstruct a copy of your state object just so you can pass it back to replaceState(). In addition to avoiding this complexity, the updateTitle() function has the advantage that it allows us to call |updateTitle(undefined)| (or something) to re-link the two titles. I guess the essential question is whether we see the history entry title as being a separate feature from pushState. If most or all pages will update the history entry title only in response to a pushState or a replaceState that they'd have made anyway, then maybe it makes sense to keep the history entry title there. But I don't see why the features should be coupled like that. By analogy, none of us would argue that we should couple setting document.title with clicking links and setting document.location. -Justin
Re: [whatwg] push/replaceState title parameter (was AJAX History Concerns)
On Mon, Nov 23, 2009 at 6:46 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 23 Nov 2009, Justin Lebar wrote: I'm not sure I agree. It seems to me that if you set the page's URL, it's likely that you'll want to change the state object (if you're not storing all your data in the URL). On the other hand, one might want to change the history entry title without ever changing the URL or the state object. In the simple case, consider a page which uses no AJAX at all, but just wants to display a shorter title in the history than in the titlebar of the browser. Does it make sense for this page to call history.replaceState(null, 'new title');? I've never heard anyone asking for this; do you have a concrete example? In the absence of push/replaceState, changes to document.title propagate to the history entry title -- they're linked together. Calling pushState unlinks them in the sense that after the call, changes to document.title no longer affect the history entry's title. To modify the history entry's title when residing at a history entry which was pushState'd to, you have to call replaceState. Thus you'd need to call history.replaceState(currentStateObject, newTitle) when you changed document.title on a page which was pushState'd to and wanted to reflect that change in the history entry. Suppose Gmail wanted to update the unread messages count in both the history and in document.title. Honestly, I don't think adding an extra set of titles will be particularly useful, and I imagine that most websites will use just one title for both the history entry and the browser title. But that's exactly the problem: As soon as you call pushState, you now have to be aware that changes to document.title now no longer affect the history title. To be clear, my contention is that pushState shouldn't have a title parameter, not that we should have a updateHistoryEntryTitle() function. I'm fine with the idea of the history entry title reflecting the state of document.title immediately before the most recent time we navigated away from that entry, as it does now. But if we want to allow the titles to be set independently, I don't think pushState is the right mechanism. By analogy, none of us would argue that we should couple setting document.title with clicking links and setting document.location. Actually, I would; that's exactly what I'm arguing in fact. With normal navigation, the coupling is done by the UA (first setting the title to the URL, and then updating it when a title element is found during parsing). With pushState(), the navigation is implicit (scripted) and so the URL and title changes have to be done explicitly. This doesn't suggest that we shouldn't have a updateHistoryEntryTitle() function, just as the existence of title doesn't suggest that code for modifying the document's title should be document.navigateTo(document.location, newTitle) Adding an updateHistoryEntryTitle() function while leaving the title parameter in pushState might be better than things are now. But since we have to explicitly set document.title after a pushState anyway, removing the title from pushState doesn't create any more work for the vast majority of use cases. I don't see why we need to add all this complexity to support the edge use case where the history title and document title are different. -Justin
Re: [whatwg] Do we really need history.clearState()?
On Sat, Nov 14, 2009 at 5:23 PM, timeless timel...@gmail.com wrote: what if pushState returned a value which could be passed to clearState? I'm not sure how this would work. What would clearState do with that value? (i can't find clearState in http://www.whatwg.org/specs/web-apps/current-work/#dom-history-pushstate Hixie removed it a few days ago. -Justin
Re: [whatwg] AJAX History Concerns
The title [argument to pushState] is purely advisory. User agents might use the title in the user interface. But unlike the URL which actually changes in the Document object and is therefore exposed to the DOM, this purely advisory title change is hidden from the DOM. I'm questioning the reasoning behind this distinction and am curious if it was intentional or not. What I did in my Firefox patch (which should be checked into trunk within a few weeks, I hope) is use that title only to identify the history entry in the pull-down back-forward menu (what's shown when you click the down arrow next to the forward button in Firefox). If you want the rest of the UI (e.g. browser title bar) to match up with this title, you have to set document.title. In fact, if you pushState with title 'Foo', then navigate back and then forward, the history entry's title will be reset to the document's title. (I intend to write something detailing tricks like these once we land the pushState patch.) On the one hand, the implementation as it is allows developers some control over the history entry title independent of the document title, and perhaps that's useful. On the other hand, most use cases I can imagine for setting the history entry title are only useful if it persists between back/forwards. It appears in my testing that if you do pushState(title1); document.title=title2 Firefox shows title2 in both the local and global history, so setting document.title appears to subsume most of the functionality of pushState's title argument. We could make the API change document.title and remember that change between back/forwards, but I think that would be unnecessarily complicated. After a pushState, you'd get a new document which shares all mutable state *except* its title with its sibling. Unless there's a compelling use for it, perhaps we should simplify the API by getting rid of the title parameter altogether. One can pretty easily update document.title on popStates manually. But perhaps I'm missing something; I recall at one time being convinced that the title parameter was important. :) [Given] A1 - A2 - B1 - *B2* - B3 - C1 - C2 When this method is invoked, the user agent must remove from the session history all the entries from the first state object entry for that Document object up to the last entry that references that same Document object, if any. In my original message I liberally interpreted this to mean the new current entry should be a copy of B3 but without the state object because, clearly, we just removeState()ed. I don't think removing the entry from the history implies that we clear its state object. Certainly the spec could be clarified, however. I don't think that Marius's reading, here that B1, B2, and B3 would all be removed, is completely unsupported by the text. But I also don't think that's what we want. If I understand things correctly, we always remove the current entry after a clearState. So perhaps the language could be When this method is invoked, the user agent must remove from the session history all the entries from the first state object entry for that Document up to the second-to-last entry that references the same Document. The current entry is then set to the one remaining entry for the Document. That said, we didn't implement clearState when we did push/replaceState because it's hard to get right and we don't currently have a compelling use case. There are probably lots of things we'd change if we were going to implement it -- for instance, why go back to the last entry instead of staying at the current one? But that's probably a conversation for another thread. -Justin
[whatwg] Do we really need history.clearState()?
As I alluded to in the thread AJAX History Concerns, I'm not convinced that we need the history.clearState() function. I haven't been able to come up with a compelling case where a page would use this. I guess the idea is that I'm on Google Maps, which is using pushState to make a history entry every time I scroll the map. If I scroll around a lot, it might clobber my history and make it hard to go back to the page I was at before I began looking at the map. But it could be nice and at some point (possibly triggered by some user action) call clearState. Then I'd be able to click back and actually go back to the Document I was previously viewing. clearState as it exists doesn't match this use case particularly well. If we were concerned about clobbering history, we'd probably want to keep the two or three newest history entries and throw out all the rest of them. If you were really clever, you might be able to accomplish this by calling clearState and then using pushState to reconstruct the part of the history you want to keep. But getting the URLs right would be pretty tricky, especially if clearState took you to the last entry for the document, as currently specified. clearState is also useless if you don't use this single-document pushState model for your site. If we think clearing the history is useful for AJAX pages, I'm not sure why it wouldn't be useful for a web application which loads multiple documents. I think the use case I proposed is much better served by something like history.truncate(numBefore, numAfter), which would remove all but the numBefore entries before the current entry and the numAfter entries after the current entry. We'd subject this to the same-origin policy, of course, and stop removing entries in a direction as soon as we encountered an entry from another origin. I'm not sure if history.truncate() is a good idea -- do we really want to give pages that kind of control over the history? -- but at least I can actually imagine a page using it. Perhaps a better idea is leaving this whole issue to the UA, which could collapse all the entries from a single origin in the UI. Then we wouldn't need either function. -Justin
Re: [whatwg] Do we really need history.clearState()?
On Thu, Nov 12, 2009 at 1:08 PM, Olli Pettay olli.pet...@helsinki.fi wrote: On 11/12/09 10:00 PM, Justin Lebar wrote: Perhaps a better idea is leaving this whole issue to the UA, which could collapse all the entries from a single origin in the UI. Then we wouldn't need either function. How would UA collapse entries from a single origin? Right now, the back button means take me to the previous history entry. The UA could add a take me back to the previous document/origin button. Similarly, the browser could collect together all the entries from a document or origin in the drop-down menu of history entries (the down arrow next to the forward button in Firefox). When you click the down arrow, it could show a list of documents/origins, and when you hovered over an entry in the list, it could expand out and show all the entries associated with that document/origin. Brady Eidson beid...@apple.com wrote: Imagine the use case of the checkout procedure at an online merchant. [...] I think this is a pretty good example of where clearState actually helps. I'm not sure how general it is, though. A designer who wants to use clearState in this way is forced to begin the checkout wizard in a new Document. Maybe that's OK, but it seems like an arbitrary limitation to me. -Justin
[whatwg] pushState / replaceState nits
In section 6.10.2: The pushState(data, title, url) method adds a state object to the history. perhaps should be ... adds a state object *entry* to the history. The replaceState(data, title, url) method updates the current entry in the history to have a state object. perhaps should be The replaceState(data, title, url) method adds a stateObject to the current history entry or modifies (updates?) the entry's stateObject. When either of these methods are invoked should be When either of these methods is invoked. -Justin
Re: [whatwg] Criticism of pushState (was Global Script proposal)
To be clear, I'm not suggesting that pushState obviates the need for global script. My point is that pushState is useful in its own right, with our without global script. Without pushstate, you can't make a non-hash navigation without hitting the network. Even if you're clever and store all of JQuery and your whole DOM in global storage, if you want to change the pre-hash part of the URI, you need to load a new page. Imagine Google Maps trying to update the URI to match your current location as you pan around the map. Right now, they could update the hash as you panned. With pushstate, they could update the URI in an arbitrary way. With global state, they'd have to load a new page every time you panned. That's obviously worse, and probably not even an option. Then why the heck would we want to come up with a fancier way to provide hash-navigation? Perhaps the point is to do something which works like hash-navigation, but to the user, looks like real navigation. Imagine Bugzilla using pushstate to navigate between bugs, but keeping the familiar show_bug.cgi?id=1234 URI. I don't pretend that the code necessary to make this work would be easy to write, but it's certainly no more difficult than changing the hash, and the resulting URLs are much nicer. Once you introduce pushState, you deviate from the normalcy -- now you can have a URL in the address bar that the user agent hasn't requested from the server. Again, this is just what happens when you're at your Gmail Inbox and click a link to http://mail.google.com/mail/#Drafts. You now have a URL in the address bar that the UA hasn't requested from the server. pushState improves this -- at least now the URL you didn't request from the server looks like one which you plausibly might have requested from the server. I really don't care about how the URLs look. I just want the Web development to be easier. And in my humble opinion, building a request controller in JS and essentially a whole alternative reality navigation system using hashes is not. If you don't care how URLs look or if you don't mind making a network request when you navigate a page, then don't use the feature! A lot of people do care about one or both of those things, though, and they're willing to go through the pain of developing these alternative-reality navigation systems. PushState does not subsume global script. For many applications, storing the whole DOM in global script would get you sufficiently fast navigations -- I agree. But global script does not subsume pushState, either. Even with global script, you can't change the URI arbitrarily without navigating the page. Panning on Google Maps and changing the referer sent to a page are two instances where extra page navigations might be unacceptable. I understand that pushState doesn't alleviate much of the pain of developing no-navigation web apps. But I don't think that's a reason to get rid of it. -Justin
[whatwg] Criticism of pushState (was Global Script proposal)
Dimitri Glazkov wrote: But more to the point, I think globalScript is a good replacement for the pushState additions to the History spec. I'm not sure I agree. pushState lets you change the URI very quickly, without doing any kind of navigation at all. To emulate a pushSate with globalScript, you'd have to save and restore the whole document, and the browser would still have to do at least one network request, unless you were only changing the hash of the URI. I am becoming somewhat convinced that pushState is confusing, hard to get right, and full of fail. You should simply look at the motivation behind building JS-based history state managers -- it all becomes fairly clear. Could you elaborate on these points? It seems to me that pushState attacks a specific problem and delivers a simple solution which is much better than the current workarounds (using the URL's hash to identify a page and store state). Yes, it's nontrivial to develop an AJAX app which uses pushState and works correctly with bookmarking and page refreshes. On the other hand, pushState makes this a lot easier than it would be otherwise. My big issue with pushHistory is that it messes with the nature of the Web: a URL is a resource you request from the server. Not something you arrive to via clever sleight of hand in a user agent. Like it or not, this ship has already sailed. When I load Gmail, I'm taken to https://mail.google.com/mail/#inbox, but my browser never sends #inbox to the server as part of the HTTP request. Pandora and Facebook do something like this too. Perhaps the new intuition is that a URL tells you how to get back to where you were. So, you've managed to pushState your way to a.com/some/path/10/clicks/from/the/home/page. Now the user bookmarks it. What are you going to do know? When reading this message in Gmail, my browser shows that I'm at https://mail.google.com/mail/#label/WhatWG/{guid} . If I bookmark this page and go back to it, Gmail takes me back to this exact message. There's no actual resource named #label/WhatWG/{guid} on Google's servers, but the URL I bookmarked is sufficient to identify where I was, and Gmail's servers were intelligent enough to take me there. Maybe you think that Gmail's URLs should name real resources; maybe they should look like https://mail.google.com/mail.cgi?label=WhatWGmessage={guid} or something. I'm not convinced this is better, but even if it suits you, pushState still helps you navigate between mail.cgi?label=WhatWG and mail.cgi?label=Drafts without a page refresh. I think pushState API is really useful, but what do I know? We're going to land it in Firefox trunk Real Soon Now, so developers and members of this list will be able to play with it and decide for themselves whether it's the right API to solve the problem at hand. -Justin
Re: [whatwg] first script and impersonating other pages - pushState(url)
Mike Wilson wrote: The result is that the address bar URL can't be trusted, as any page on the site can impersonate any other without consent from that page or part of the site? Someone will correct me if I'm wrong, but I think this is already pretty much the case with today's same-origin policy, albeit with a bit more work. My understanding is that if A and B have the same origin, they can do whatever they want to each others' documents, including modifying content. So if you can control script at http://google.com/~mwilson , and a user has both your site and http://google.com/securesite , then your malicious page can do whatever it wants to the secure page. That's why it's important that you trust all the javascript which runs on your origin. -Justin
Re: [whatwg] Proposed changes to the History API
Sorry, it seems we are not talking about the same application. Jonas referred to attachment pages in your bug database, which I assumed would f ex be a page like this one: https://bugzilla.mozilla.org/attachment.cgi?id=386244action=edit (The textarea in this app is not created onload, it is delivered in the server-generated HTML and thus is subject to form field value persistence.) STR: * Open https://bugzilla.mozilla.org/attachment.cgi?id=386244action=edit * Click Edit as comment * Change the text in the textarea * Close and re-open your browser Actual behavior: The textarea is back to its original state, read-only and without your edits. Even after you press edit as comment, the state still doesn't reflect the changes you made before you closed the browser. Behavior with History API: When you click edit as comment and as you type your comments, the page periodically saves the data to pageState. When the page receives a popstate, it restores the state of the textarea. I imagine that one could rework the Bugzilla page to function better on browser restart using existing web technologies. But as the page is designed right now, some kind of pageStorage would be helpful. -Justin
Re: [whatwg] Proposed changes to the History API
Mike Wilson wrote: What you're essentially saying here is that when restarting the browser, you will also restore history data, correct? For tabs that were open when the browser was closed, this will mean that these will reappear after restart with full history, being able to go Back and restore state on previous pages? Right. We already do this, sans popping a state object. But for pages that were explicitly closed, and then navigated to in a new tab, will you restore the full history in these as well? No. The state object is attached to the session history entry, not to the page's URI. If you close a tab, all its session history entries go away. If you navigate to a page which was open in the tab you just closed, that new instance of the page won't be aware of the old page's state object(s). And if there has been several sessions in parallel on that URL space, which one do you respawn for a navigation to a related page in a new tab? A navigation on a new tab would get an entirely new environment. Otherwise, like you suggested, this would be very confusing. -Justin
Re: [whatwg] Proposed changes to the History API
On Wed, Aug 19, 2009 at 5:31 PM, Jeremy Orlowjor...@chromium.org wrote: but here it seems like everything can just stay in memory...right? My thought was that if you had a tab open and restarted the browser, that the state objects would be there after the restart, so we'd have to serialize to disk. I also thought that we'd persist this state data even after we take a Document out of memory. It might be possible to store some subset of DOM objects while still meeting those requirements, but that seems like it might be a serious can of worms. Do you have a use case which would be facilitated by being able to store some DOM objects in this way? -Justin
Re: [whatwg] Proposed changes to the History API
I guess this is just a vision about what the developer really wants to do, or are you thinking of any solutions that would actually allow changing path (or query string) without loading a new Document? The pushState function as currently specified allows you to do precisely this. History.pushState(obj, title, url) creates a new history entry with the given URL, but doesn't load a new document. It would further be nice if your comments weren't lost even if you navigate away from the page. This is the way it works in most browsers, as the browser persists form field values when you navigate back and forth in history. Right. But the difficulty with this page in particular is that it's structured such that it's difficult/impossible for the browser to properly restore its form state after a crash. Onload, the page creates a textarea and populates it with the text of the patch. So if we crash then restore, the page won't have created the textarea by the time the browser looks to restore the text. One can imagine reworking this page to make it play nicely with session restore as it currently exists, but what we really want is a way to programmatically do the restore. click link to navigate from page1.html#a to page1.html#b: [snip] I think this is pretty much what we want to do, except that we'd like to let authors use arbitrary URIs instead of constraining them to using URIs which differ only in their hashes, so we still want PopState to fire on all loads, not like hashchange. The idea of having an unload event similar to PopState is intriguing, however. -Justin
Re: [whatwg] Proposed changes to the History API
On Thu, Aug 20, 2009 at 11:20 AM, Jeremy Orlowjor...@chromium.org wrote: I see. It makes more sense why you mentioned the session storage element then. Note that there has been some discussion about whether session storage should survive crashes, but I know Safari and Chrome are currently planning to _not_ serialize it to disk. I just did a quick test, and it appears that Firefox does save sessionStorage across browser sessions, but IE8 does not. Leaving aside the question of what the right thing to do is with sessionStorage, I think there are some serious benefits to saving the pushState'ed state across sessions. Suppose I'm using a webmail client which uses this new API. I click around to a few of my folders and messages and then close the browser. If the page wants the back/forward buttons to work when the browser re-opens, it needs to store all of the state for those history entries in the URI. At the point that pages have to do that, we might as well not store a per-page state object. I still think we shouldn't force app developers to serialize everything to strings. Maybe we can just raise an exception if they try to set the history state to something unserializable? (I guess that's what you're already doing?) Right now, I just serialize to JSON and throw an exception if that fails. I don't have a problem continuing to do that, at least until we get the structured clone thing sorted out. -Justin On Thu, Aug 20, 2009 at 11:05 AM, Justin Lebar justin.le...@gmail.com wrote: On Wed, Aug 19, 2009 at 5:31 PM, Jeremy Orlowjor...@chromium.org wrote: but here it seems like everything can just stay in memory...right? My thought was that if you had a tab open and restarted the browser, that the state objects would be there after the restart, so we'd have to serialize to disk. I also thought that we'd persist this state data even after we take a Document out of memory. It might be possible to store some subset of DOM objects while still meeting those requirements, but that seems like it might be a serious can of worms. Do you have a use case which would be facilitated by being able to store some DOM objects in this way? -Justin
Re: [whatwg] Proposed changes to the History API
Overall, I think preserving history API information when restoring sessions is a good thing. My only concern is whether web developers will program in such a way that this works. Unless ALL state will need to be either saved in the history API or reconstructible from that information, bad things will happen. (Note that this was difficult if not impossible with the original API, but your new proposal makes this quite practical.) Maybe the right solution is to have a pageStorage object, which works just like sessionStorage but is local to a session history entry and perhaps carries some weak promise of persistence. It might be a little confusing that in the following code var len1 = pageStorage.length history.pushState(...) var len2 = pageStorage.length len1 != len2, but that doesn't seem too complicated. Do most web apps that use iframe hacks (for tracking history) come back cleanly from a session restore? I don't know, but I presume it would be possible so long as form data is saved across session restore. -Justin
[whatwg] Proposed changes to the History API
I'm in the process of implementing the HTML5 History API (History.pushState(), History.clearState(), and the PopState event) in Firefox. I'd like to discuss whether the API might benefit from some changes. To my knowledge, no other browser implements this API, so I'm assuming we have freedom to make large alterations to it. My basic proposal is that History.pushState() be split into a function for creating new history entries and functions or a property for getting/setting an object associated with that entry. In its current form, the History API allows us to identify session history entries by way of an arbitrary object, which we pass as the first argument to pushState() and which we receive as part of the PopState event when that history entry is activated. If the page gets a null popstate, it's supposed to use the URL to decide what state to display. Notably unsupported by this API is support for pages altering their saved state. For instance, a page might want to save a text box's edit history to implement a fancy undo. It could store the edit history in a cookie or in the session storage, but then if we loaded the page twice in the same tab, those two instances would step on each other when we went back and forth between them. The page could just store its state in variables in the document, but then it would loose that state when the browser crashed or was closed, or when the browser decided to kick the document out of the history. I think this page would be better served by a History.setStateObject() function, which does exactly what the page wants in a simple fashion. We'd still keep the history-entry-creating functionality of History.pushState() in a new History function (I'll call it createNewEntry(), but it probably needs a better name), which takes a title and URL, as pushState() does now. The API might be more intuitive if we had a History.stateObject propery, but I'm concerned that then we'd be promising the page that we'll keep around literally any objects it wants, including DOM objects. In fact, I'd be happy restricting the state object to being a string. If a page wants to store an object, it can convert it to JSON, or it can store a GUID as its state string and index into the session storage. Pages could retrieve the state object just as they do now, in a PopState event, although we'd probably want to change the name of the event. We'd probably want to fire PopState on all loads and history navigations, since any document might have a state to pop, and even those documents which didn't call setStateObject() might store state in their URI which they need to restore when their history entry is activated. Last, I'm not sure that we need the History.clearState() function. It's confusing (why do we end up at the last entry for the current document instead of staying at the current entry?) and I haven't been able to come up with a compelling use case. I think the main benefit of these changes is added simplicity. There's a right and wrong way to use pushState, and setState/createNewEntry doesn't require such rules. But additionally, these changes allow pages flexibility to do things we haven't yet thought of. I don't know what those things might be, but I suspect they may be pretty cool. :) -Justin
Re: [whatwg] Reading spec without boxes
Unbeknownst to me, I had a minimum font size of 12pt set. FWIW, I don't remember setting this, so it may have been a default. -Justin On Thu, Aug 6, 2009 at 2:09 PM, Ian Hicksoni...@hixie.ch wrote: On Thu, 6 Aug 2009, Elliotte Rusty Harold wrote: Same issue on Firefox 3.5.1 Mac at various font sizes. :-( On Thu, 6 Aug 2009, Justin Lebar wrote: Happens to me on Ubuntu 9.04 with FF 3.5.2. Screenshot at [1] http://stanford.edu/~jlebar/moz/screen1.png Do either of you have a minimum font size preference set? -- Ian Hickson U+1047E )\._.,--,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] A New Way Forward for HTML5
That being said, inline spec comments sound interesting. I'm not quite sure what the UI would look like, but if anyone has any ideas, feel free to e-mail me directly and we can figure something out. (This would be exceedingly useful once we're in last call in a few months.) Ian, Other people have probably pointed this out, but the hg book has inline comments. http://hgbook.red-bean.com/read/preface.html Regards, -Justin
Re: [whatwg] Plus Signs in Signed Integers
What does IE do in these two examples? It appears that IE8 has the following behavior: ol start=+4 start = 4 ol start=H2SO4 start = 1 Test at http://stanford.edu/~jlebar/moz/list.html -Justin On Tue, Jul 14, 2009 at 12:43 AM, Jonas Sickingjo...@sicking.cc wrote: On Thu, Jun 18, 2009 at 9:33 AM, Smylerssmyl...@stripey.com wrote: It also doesn't seem to match browser behaviour: the ol element's start attribute is an integer, so I tried this out in various browsers: ol start=+4 liPlus four /ol All the ones I had to hand (Firefox, Opera, Konqueror, Dillo, Lynx, Links, and W3M) numbered the element with 4. [snip] To check that it is specifically the plus sign they are ignoring and not any non-digit character I also tried: ol start=H2SO4 liAcid test /ol That should cause parsing an integer to abort and so the default of start=1 to be used. Opera, Links, and W3M get that right. Konqueror, Dillo, and Lynx all also seem to manage the aborting, but use a default of zero instead. Firefox parses the 2 out of H2SO4, seemingly using the first integer it can find in the attribute, so possibly isn't special-casing +. What does IE do in these two examples? It appears webkit treats the first one as start=4 and the second as start=0. / Jonas