Re: Blob URLs | autoRevoke, defaults, and resolutions
On Mon, May 6, 2013 at 5:57 PM, Anne van Kesteren ann...@annevk.nl wrote: On Mon, May 6, 2013 at 5:45 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, May 6, 2013 at 4:28 PM, Anne van Kesteren ann...@annevk.nl wrote: Okay. So that fails for XMLHttpRequest :-( What do you mean? Those are the steps we take for XHR requests too. So e.g. open() needs to do URL parsing (per XHR spec), send() would cause CSP to fail (per CSP spec), send() also does the fetch (per XHR spec). Overall it seems like a different model from the other APIs, but maybe I'm missing something? The only thing that's different about XHR is that the first step in my list lives in one function, and the other steps live in another function. Doesn't seem to have any effect on the discussions here other than that we'd need to define which of the two functions does the step which grabs a reference to the Blob. / Jonas
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Mon, May 6, 2013 at 10:52 PM, Eric U er...@google.com wrote: I'm not really sure what you're saying, here. If you want an URL to expire or otherwise be revoked, no, you can't use it multiple times after that. If you want it to work multiple times, don't revoke it or don't set oneTimeOnly. No, I'm saying that APIs *internally* may perform multiple fetches, such as if you load an API into video and the user performs multiple pauses and seeks. This should be completely transparent to script. Similarly, if you load blob URLs into srcset, the fact that srcset might load or reload the images any number of times in the future due to changes to the environment should be completely transparent to script. There are lots of cases of this, and we should have a simple, predictable approach to dealing with it. At a high-level, my view is that (within reason) blobs should be captured at the point of entry into the native API. As soon as you say img.srcset = '...; blob URL; ...', or xhr.open(objectURL), or img.src = createObjectURL(), that point where the URL first enters the native API is what matters. That's a simple rule that's easy for developers to understand in general, without needing to care about when or how many times the fetch algorithm is run (if ever, as with srcset) on the URLs. -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Mon, May 6, 2013 at 11:11 PM, Jonas Sicking jo...@sicking.cc wrote: The only thing that's different about XHR is that the first step in my list lives in one function, and the other steps live in another function. Doesn't seem to have any effect on the discussions here other than that we'd need to define which of the two functions does the step which grabs a reference to the Blob. Fair enough. So I guess we can indeed fix this by changing http://fetch.spec.whatwg.org/#concept-fetch to get a reference to the Blob/MediaStream/... before returning early as Arun suggested. -- http://annevankesteren.nl/
Re: Blob URLs | autoRevoke, defaults, and resolutions
On May 7, 2013, at 10:45 AM, Anne van Kesteren wrote: On Mon, May 6, 2013 at 11:11 PM, Jonas Sicking jo...@sicking.cc wrote: The only thing that's different about XHR is that the first step in my list lives in one function, and the other steps live in another function. Doesn't seem to have any effect on the discussions here other than that we'd need to define which of the two functions does the step which grabs a reference to the Blob. Fair enough. So I guess we can indeed fix this by changing http://fetch.spec.whatwg.org/#concept-fetch to get a reference to the Blob/MediaStream/... before returning early as Arun suggested. \o/ :-) Filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=21955 -- A*
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Tue, May 7, 2013 at 12:17 PM, Arun Ranganathan a...@mozilla.com wrote: \o/ :-) Filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=21955 So actually, after I emailed that this morning I wondered how this would work for img srcset or image() in CSS where fetch is unlikely to be sync. Most of those are new features which might explain why it has not been seen as a problem thus far. -- http://annevankesteren.nl/
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Tue, May 7, 2013 at 9:45 AM, Anne van Kesteren ann...@annevk.nl wrote: On Mon, May 6, 2013 at 11:11 PM, Jonas Sicking jo...@sicking.cc wrote: The only thing that's different about XHR is that the first step in my list lives in one function, and the other steps live in another function. Doesn't seem to have any effect on the discussions here other than that we'd need to define which of the two functions does the step which grabs a reference to the Blob. Fair enough. So I guess we can indeed fix this by changing http://fetch.spec.whatwg.org/#concept-fetch to get a reference to the Blob/MediaStream/... before returning early as Arun suggested. Step 1 is resolve, step 3 is fetch. Moving it into step 1 means it would go in resolve, not fetch. Putting it in fetch wouldn't help, since fetch doesn't always start synchronously. (I'm confused, because we've talked about this distinction several times.) -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
I'd be worried about letting any resolved URL to hold a reference to the Blob. We are playing very fast and loose with URLs in Gecko and it's never been intended that they hold on to any resources of significant size. / Jonas On Tue, May 7, 2013 at 1:34 PM, Glenn Maynard gl...@zewt.org wrote: On Tue, May 7, 2013 at 9:45 AM, Anne van Kesteren ann...@annevk.nl wrote: On Mon, May 6, 2013 at 11:11 PM, Jonas Sicking jo...@sicking.cc wrote: The only thing that's different about XHR is that the first step in my list lives in one function, and the other steps live in another function. Doesn't seem to have any effect on the discussions here other than that we'd need to define which of the two functions does the step which grabs a reference to the Blob. Fair enough. So I guess we can indeed fix this by changing http://fetch.spec.whatwg.org/#concept-fetch to get a reference to the Blob/MediaStream/... before returning early as Arun suggested. Step 1 is resolve, step 3 is fetch. Moving it into step 1 means it would go in resolve, not fetch. Putting it in fetch wouldn't help, since fetch doesn't always start synchronously. (I'm confused, because we've talked about this distinction several times.) -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
Fwiw, to the extent it may be helpful when it comes to spec writing, here are some quick-n-dirty thoughts about how some approximation of the 'autorevoke' behavior could be implemented in chromium. 1) Extend the lifetime of the PublicBlobURL registration until *after* the last latchee has redeemed its ticket to ride the url. The url registration itself is refcounted. The url registration implies the underlying data is not reclaimable. At time of coining the url registration, a microtask is scheduled to release it. Latchee's may addref it safely prior to microtask execution, keeping the url registration valid until a later balancing release. This would be a non-compliant approximation of the way the spec is shaping up... but good enough... maybe has the important characteristics of freeing memory up eventually (generally prior to doc unload) and guaranteeing a latchee's ticket to ride. 2) Extend the lifetime of the underlying BlobData until *after* the latchee has redeemed its ticket to ride the url, but revoke the URL well in advance of that. The url registration is not refcounted, The url registration implies the underlying data is not reclaimable. There is a means to lookup a handle to the data given the url. The underlying blob data is refcounted. Latchee's must take and release a ref on that. The url registration is revoked at microtask execution time after being coined. Piggy back a handle to the blob data on future Fetch (or other network requests) for the PublicBlobURL. Ignore the URL when processing those requests and refer only to the piggybacked blob data handle. Probably a more compliant approach, but maybe more tedious to implement in chromium (since the URL is no longer useful as the identifier for what to fetch/load/whathaveyou, we need sideband data for that in addition to addref/release at url latch/redemption times). (I'm confused, because we've talked about this distinction several times.) lol, picture a herd of cats chasing their tails, you can call me calico :) On Tue, May 7, 2013 at 1:34 PM, Glenn Maynard gl...@zewt.org wrote: On Tue, May 7, 2013 at 9:45 AM, Anne van Kesteren ann...@annevk.nlwrote: On Mon, May 6, 2013 at 11:11 PM, Jonas Sicking jo...@sicking.cc wrote: The only thing that's different about XHR is that the first step in my list lives in one function, and the other steps live in another function. Doesn't seem to have any effect on the discussions here other than that we'd need to define which of the two functions does the step which grabs a reference to the Blob. Fair enough. So I guess we can indeed fix this by changing http://fetch.spec.whatwg.org/#concept-fetch to get a reference to the Blob/MediaStream/... before returning early as Arun suggested. Step 1 is resolve, step 3 is fetch. Moving it into step 1 means it would go in resolve, not fetch. Putting it in fetch wouldn't help, since fetch doesn't always start synchronously. (I'm confused, because we've talked about this distinction several times.) -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Tue, May 7, 2013 at 2:54 PM, Jonas Sicking jo...@sicking.cc wrote: I'd be worried about letting any resolved URL to hold a reference to the Blob. We are playing very fast and loose with URLs in Gecko and it's never been intended that they hold on to any resources of significant size. That said, the one way to find out if this approach works is to simply try it. / Jonas
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Tue, May 7, 2013 at 4:54 PM, Jonas Sicking jo...@sicking.cc wrote: I'd be worried about letting any resolved URL to hold a reference to the Blob. We are playing very fast and loose with URLs in Gecko and it's never been intended that they hold on to any resources of significant size. Note that I'm not suggesting that every invocation of the resolve algorithm start capturing blob URLs. It'd be an explicit operation at entry points that support it, not a catch-all happening behind the scenes any time you resolve a URL anywhere. (Actually, I went a bit further--entry points that don't explicitly do this shouldn't allow autorevoke URLs at all.) The actual change required in the particular entry points might be as simple as saying resolve URL with capture instead of resolve URL to invoke a wrapper algorithm, but it lets it be introduced gradually and make it clear exactly where it happens. -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Sun, May 5, 2013 at 5:37 PM, Jonas Sicking jo...@sicking.cc wrote: What we do is that we 1. Resolve the URL against the current base URL 2. Perform some security checks 3. Kick off a network fetch 4. Return Okay. So that fails for XMLHttpRequest :-( But if we made it part of resolve that could work. -- http://annevankesteren.nl/
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Mon, May 6, 2013 at 4:28 PM, Anne van Kesteren ann...@annevk.nl wrote: On Sun, May 5, 2013 at 5:37 PM, Jonas Sicking jo...@sicking.cc wrote: What we do is that we 1. Resolve the URL against the current base URL 2. Perform some security checks 3. Kick off a network fetch 4. Return Okay. So that fails for XMLHttpRequest :-( What do you mean? Those are the steps we take for XHR requests too. / Jonas
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Mon, May 6, 2013 at 5:45 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, May 6, 2013 at 4:28 PM, Anne van Kesteren ann...@annevk.nl wrote: Okay. So that fails for XMLHttpRequest :-( What do you mean? Those are the steps we take for XHR requests too. So e.g. open() needs to do URL parsing (per XHR spec), send() would cause CSP to fail (per CSP spec), send() also does the fetch (per XHR spec). Overall it seems like a different model from the other APIs, but maybe I'm missing something? -- http://annevankesteren.nl/
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Mon, May 6, 2013 at 7:57 PM, Anne van Kesteren ann...@annevk.nl wrote: On Mon, May 6, 2013 at 5:45 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, May 6, 2013 at 4:28 PM, Anne van Kesteren ann...@annevk.nl wrote: Okay. So that fails for XMLHttpRequest :-( What do you mean? Those are the steps we take for XHR requests too. So e.g. open() needs to do URL parsing (per XHR spec), send() would cause CSP to fail (per CSP spec), send() also does the fetch (per XHR spec). Overall it seems like a different model from the other APIs, but maybe I'm missing something? XHR isn't so different from other APIs, it's just that the separation of URL enters the API and the fetch is started is more obvious, and more easily controlled from script. I think that makes it a really good test case. -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Wed, May 1, 2013 at 5:16 PM, Glenn Maynard gl...@zewt.org wrote: On Wed, May 1, 2013 at 7:01 PM, Eric U er...@google.com wrote: Hmm...now Glenn points out another problem: if you /never/ load the image, for whatever reason, you can still leak it. How likely is that in good code, though? And is it worse than the current state in good or bad code? I think it's much too easy for well-meaning developers to mess this up. The example I gave is code that *does* use the URL, but the browser may or may not actually do anything with it. (I wouldn't even call that author error--it's an interoperability failure.) Also, the failures are both expensive and subtle (eg. lots of big blobs being silently leaked to disk), which is a pretty nasty failure mode. True. Another problem is that APIs should be able to receive an API, then use it multiple times. For example, srcset can change the image being displayed when the environment changes. oneTimeOnly would be weird in that case. For example, it would work when you load your page on a tablet, then work again when your browser outputs the display to a TV and changes the srcset image. (The image was never used, so the URL is still valid.) But then when you go back to the tablet screen and reconfigure back to the original configuration, it suddenly breaks, since the first URL was already used and discarded. The blob capture approach can be made to work with srcset, so this would work reliably. I'm not really sure what you're saying, here. If you want an URL to expire or otherwise be revoked, no, you can't use it multiple times after that. If you want it to work multiple times, don't revoke it or don't set oneTimeOnly.
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Fri, May 3, 2013 at 6:55 AM, Anne van Kesteren ann...@annevk.nl wrote: On Thu, May 2, 2013 at 12:53 AM, Jonas Sicking jo...@sicking.cc wrote: It actually has turned out to be surprisingly easy in Gecko. But I realize the same might not be true everywhere. Can we have a description of this (and how it does not run into the problems Glenn mentioned). I feel that I have an incomplete understanding of what actually happens at the moment. img.src = url What we do is that we 1. Resolve the URL against the current base URL 2. Perform some security checks 3. Kick off a network fetch 4. Return Note that no actual network activity happens here. That is all being done on background threads. But what we do in step 3 is to send the signal to the network code that it should start doing all the stuff that it needs to do. Step 3 is where we inserted the code to grab a reference to the Blob such that it doesn't matter if the URL is revoked. Some of this code will change. For example I'd like to move towards doing the security checks asynchronously. Essentially by making them part of the the stuff that the network code needs to do. But I we'll always need to fire off that algorithm from the main thread, and generally doing that synchronously is the simplest solution. / Jonas
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Sun, May 5, 2013 at 7:37 PM, Jonas Sicking jo...@sicking.cc wrote: What we do is that we 1. Resolve the URL against the current base URL 2. Perform some security checks 3. Kick off a network fetch 4. Return Note that no actual network activity happens here. That is all being done on background threads. But what we do in step 3 is to send the signal to the network code that it should start doing all the stuff that it needs to do. Step 3 is where we inserted the code to grab a reference to the Blob such that it doesn't matter if the URL is revoked. Some of this code will change. For example I'd like to move towards doing the security checks asynchronously. Essentially by making them part of the the stuff that the network code needs to do. But I we'll always need to fire off that algorithm from the main thread, and generally doing that synchronously is the simplest solution. I think the only difference between this and what I'm suggesting is that grabbing the blob happens in step #1, instead of step #3. That way, it still works if the fetch isn't actually started right away (srcset, on-demand image loading, xhr.open(), etc). -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
Oops, forgot this was sitting here. On Fri, May 3, 2013 at 8:55 AM, Anne van Kesteren ann...@annevk.nl wrote: Glenn has at times suggested we could make a pertinent reference to the Blob object from the URL object you get from the parsing. That might work, but requires some special casing of blob URLs and soon mediastream URLs (and ...) in a thin wrapper around the URL parser which all end points would need to use. The special casing doesn't seem bad (the specs using it don't need to know anything about it). It's the need to insert something into every entry point that's annoying, but I don't see any way around that. -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Thu, May 2, 2013 at 12:53 AM, Jonas Sicking jo...@sicking.cc wrote: But if we can figure out this problem, then my proposal would be to add a new method which has a nicer name than createObjectURL as to encourage authors to use that and have fewer leaks. Yeah, I've been thinking the same thing. URL.create(...) is available. Or maybe URL.from() or some such. It actually has turned out to be surprisingly easy in Gecko. But I realize the same might not be true everywhere. Can we have a description of this (and how it does not run into the problems Glenn mentioned). I feel that I have an incomplete understanding of what actually happens at the moment. img.src = url Will at least parse the URL. As you can synchronously request img.src and it will return a serialization of the parsed URL (or the same value in case parsing failed). Are any other algorithms synchronously invoked? img.style.backgroundImage = url( + url + ) Similarly seems to require (apparent) synchronous parsing. Glenn has at times suggested we could make a pertinent reference to the Blob object from the URL object you get from the parsing. That might work, but requires some special casing of blob URLs and soon mediastream URLs (and ...) in a thin wrapper around the URL parser which all end points would need to use. What am I missing? -- http://annevankesteren.nl/
Blob URLs | autoRevoke, defaults, and resolutions
At the recent TPAC for Working Groups held in San Jose, Adrian Bateman, Jonas Sicking and I spent some time taking a look at how to remedy what the spec. says today about Blob URLs, both from the perspective of default behavior and in terms of what correct autoRevoke behavior should be. This email is to summarize those discussions. Blob URLs are used in different parts of the platform today, and are expected to work on the platform wherever URLs do. This includes CSS, MediaStream and MediaSource use cases [1], along with use of 'src='. (Separate discussions about a v2 of the File API spec, including use of a Futures-based model in lieu of the event model, took place, but submitting a LCWD with major interoperability amongst all browsers is a good goal for this draft.) Here's a summary of the Blob URL issues: 1. There's the relatively easy question of defaults. While the spec says that URL.createObjectURL should create a Blob URL which has autoRevoke: true by default [2], there isn't any implementation that supports this, whether that's IE's oneTimeOnly behavior (which is related but different), or Firefox's autoRevoke implementation. Chrome doesn't touch this yet :) The spec. will roll back the default from true to false. At least this matches what implementations do; there's been resistance to changing the default due to shipping applications relying on autoRevoke being false by default, or at least implementor reluctance [1]. Switching the default to false would enable IE, Chrome, andFirefox to have interoperability with URL.createObjectURL(blobArg), though such a default places burdens on web developers to couple create* calls with revoke* calls to not leak Blobs. Jonas proposes a separate method, URL.createAutoRevokeObjectURL, which creates an autoRevoke URL. I'm lukewarm on that :-\ 2. Regardless of the default, there's the hard question of what to do with Blob URL revocation. Glenn / zewt points out that this applies, though perhaps less dramatically, to *manually* revoked Blob URLs, and provides some test cases [3]. Options are: 2a. To meticulously special-case Blob URLs, per Bug 17765 [4]. This calls for a synchronous step attached to wherever URLs are used to peg Blob URL data at fetch, so that the chance of a concurrent revocation doesn't cause things to behave unpredictably. Firefox does a variation of this with keeping channels open, but solving this bug interoperably is going to be very hard, and has to be done in different places across the platform. And even within CSS. This is hard to move forward with. 2b.To adopt an 80-20 rule, and only specify what happens for some cases that seem common, but expressly disallow other cases. This might be a more muted version of Bug 17765, especially if it can't be done within fetch [5]. This could mean that the blob clause for basic fetch[5] only defines some cases where a synchronous fetch can be run (TBD) but expressly disallows others where synchronous fetching is not feasible. This would limit the use of Blob URLs pretty drastically, but might be the only solution. For instance, asynchronous calls accompanying embed, defer etc. might have to be expressly disallowed. It would be great if we do this in fetch [5] :-) Essentially, this might be to do what Firefox does but document what dereference means [6], and be clear about what might break. Most implementors acknowledge that use of Blob URLs simply won't work in some cases (e.g. CSS cases, etc.). We should formalize that; it would involve listing what works explicitly. Anne? 2c. Re-use oneTimeOnly as in IE's behavior for autoRevoke (but call it autoRevoke). But we jettisoned this for race conditions e.g. // This is in IE only img2.src = URL.createObjectURL(fileBlob, {oneTimeOnly: true}); // race now! then fail in IE only img1.src = img2.src; will fail in IE with oneTimeOnly. It appears to fail reliably, but again, dereference URL may not be interoperable here. This is probably not what we should do, but it was worth listing, since it carries the brute force of a shipping implementation, and shows how some % of the market has actively solved this problem :) 3. We can lift origin restrictions in v2 on Blob URL; currently, one shipping implementation (IE) actively relies on origin restrictions, but expressed willingness to phase this out. Most use cases needing Blob data across origins can be met without needing Blob URLs to not be origin restricted. Blob URLs must be unguessable for that to happen, and today, they aren't unguessable in some implementations. -- A* [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=19594 [2] http://dev.w3.org/2006/webapi/FileAPI/#creating-revoking [3] http://lists.w3.org/Archives/Public/public-webapps/2013AprJun/0294.html [4] https://www.w3.org/Bugs/Public/show_bug.cgi?id=17765 [5] http://fetch.spec.whatwg.org/#concept-fetch [6]
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Wed, May 1, 2013 at 3:36 PM, Arun Ranganathan a...@mozilla.com wrote: At the recent TPAC for Working Groups held in San Jose, Adrian Bateman, Jonas Sicking and I spent some time taking a look at how to remedy what the spec. says today about Blob URLs, both from the perspective of default behavior and in terms of what correct autoRevoke behavior should be. This email is to summarize those discussions. Blob URLs are used in different parts of the platform today, and are expected to work on the platform wherever URLs do. This includes CSS, MediaStream and MediaSource use cases [1], along with use of 'src='. (Separate discussions about a v2 of the File API spec, including use of a Futures-based model in lieu of the event model, took place, but submitting a LCWD with major interoperability amongst all browsers is a good goal for this draft.) Here's a summary of the Blob URL issues: 1. There's the relatively easy question of defaults. While the spec says that URL.createObjectURL should create a Blob URL which has autoRevoke: true by default [2], there isn't any implementation that supports this, whether that's IE's oneTimeOnly behavior (which is related but different), or Firefox's autoRevoke implementation. Chrome doesn't touch this yet :) The spec. will roll back the default from true to false. At least this matches what implementations do; there's been resistance to changing the default due to shipping applications relying on autoRevoke being false by default, or at least implementor reluctance [1]. Sounds good. Let's just be consistent. Switching the default to false would enable IE, Chrome, andFirefox to have interoperability with URL.createObjectURL(blobArg), though such a default places burdens on web developers to couple create* calls with revoke* calls to not leak Blobs. Jonas proposes a separate method, URL.createAutoRevokeObjectURL, which creates an autoRevoke URL. I'm lukewarm on that :-\ I'd support a new method with a different default, if we could figure out a reasonable thing for that new method to do. 2. Regardless of the default, there's the hard question of what to do with Blob URL revocation. Glenn / zewt points out that this applies, though perhaps less dramatically, to *manually* revoked Blob URLs, and provides some test cases [3]. Options are: 2a. To meticulously special-case Blob URLs, per Bug 17765 [4]. This calls for a synchronous step attached to wherever URLs are used to peg Blob URL data at fetch, so that the chance of a concurrent revocation doesn't cause things to behave unpredictably. Firefox does a variation of this with keeping channels open, but solving this bug interoperably is going to be very hard, and has to be done in different places across the platform. And even within CSS. This is hard to move forward with. Hard. 2b.To adopt an 80-20 rule, and only specify what happens for some cases that seem common, but expressly disallow other cases. This might be a more muted version of Bug 17765, especially if it can't be done within fetch [5]. Ugly. This could mean that the blob clause for basic fetch[5] only defines some cases where a synchronous fetch can be run (TBD) but expressly disallows others where synchronous fetching is not feasible. This would limit the use of Blob URLs pretty drastically, but might be the only solution. For instance, asynchronous calls accompanying embed, defer etc. might have to be expressly disallowed. It would be great if we do this in fetch [5] :-) Just to be clear, this would limit the use of *autoRevoke* Blob URLs, not all Blob URLs, yes? Essentially, this might be to do what Firefox does but document what dereference means [6], and be clear about what might break. Most implementors acknowledge that use of Blob URLs simply won't work in some cases (e.g. CSS cases, etc.). We should formalize that; it would involve listing what works explicitly. Anne? 2c. Re-use oneTimeOnly as in IE's behavior for autoRevoke (but call it autoRevoke). But we jettisoned this for race conditions e.g. // This is in IE only img2.src = URL.createObjectURL(fileBlob, {oneTimeOnly: true}); // race now! then fail in IE only img1.src = img2.src; will fail in IE with oneTimeOnly. It appears to fail reliably, but again, dereference URL may not be interoperable here. This is probably not what we should do, but it was worth listing, since it carries the brute force of a shipping implementation, and shows how some % of the market has actively solved this problem :) I'm not really sure this is so bad. I know it's the case I brought up, and I must admit that I disliked the oneTimeOnly when I first heard about it, but all other proposals [including not having automatic revocation at all] now seem worse. Here you've set something to be oneTimeOnly and used it twice; if that fails in IE, that's correct. If it works some of
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Wed, May 1, 2013 at 4:25 PM, Eric U er...@google.com wrote: On Wed, May 1, 2013 at 3:36 PM, Arun Ranganathan a...@mozilla.com wrote: Switching the default to false would enable IE, Chrome, andFirefox to have interoperability with URL.createObjectURL(blobArg), though such a default places burdens on web developers to couple create* calls with revoke* calls to not leak Blobs. Jonas proposes a separate method, URL.createAutoRevokeObjectURL, which creates an autoRevoke URL. I'm lukewarm on that :-\ I'd support a new method with a different default, if we could figure out a reasonable thing for that new method to do. Yeah, the if-condition here is quite important. But if we can figure out this problem, then my proposal would be to add a new method which has a nicer name than createObjectURL as to encourage authors to use that and have fewer leaks. 2. Regardless of the default, there's the hard question of what to do with Blob URL revocation. Glenn / zewt points out that this applies, though perhaps less dramatically, to *manually* revoked Blob URLs, and provides some test cases [3]. Options are: 2a. To meticulously special-case Blob URLs, per Bug 17765 [4]. This calls for a synchronous step attached to wherever URLs are used to peg Blob URL data at fetch, so that the chance of a concurrent revocation doesn't cause things to behave unpredictably. Firefox does a variation of this with keeping channels open, but solving this bug interoperably is going to be very hard, and has to be done in different places across the platform. And even within CSS. This is hard to move forward with. Hard. It actually has turned out to be surprisingly easy in Gecko. But I realize the same might not be true everywhere. 2b.To adopt an 80-20 rule, and only specify what happens for some cases that seem common, but expressly disallow other cases. This might be a more muted version of Bug 17765, especially if it can't be done within fetch [5]. Ugly. This could mean that the blob clause for basic fetch[5] only defines some cases where a synchronous fetch can be run (TBD) but expressly disallows others where synchronous fetching is not feasible. This would limit the use of Blob URLs pretty drastically, but might be the only solution. For instance, asynchronous calls accompanying embed, defer etc. might have to be expressly disallowed. It would be great if we do this in fetch [5] :-) Just to be clear, this would limit the use of *autoRevoke* Blob URLs, not all Blob URLs, yes? No, it would limit the use of all *revokable* Blob URLs. Since you get exactly the same issues when the page calls revokeObjectURL manually. So that means that it applies to all Blob URLs. 2c. Re-use oneTimeOnly as in IE's behavior for autoRevoke (but call it autoRevoke). But we jettisoned this for race conditions e.g. // This is in IE only img2.src = URL.createObjectURL(fileBlob, {oneTimeOnly: true}); // race now! then fail in IE only img1.src = img2.src; will fail in IE with oneTimeOnly. It appears to fail reliably, but again, dereference URL may not be interoperable here. This is probably not what we should do, but it was worth listing, since it carries the brute force of a shipping implementation, and shows how some % of the market has actively solved this problem :) I'm not really sure this is so bad. I know it's the case I brought up, and I must admit that I disliked the oneTimeOnly when I first heard about it, but all other proposals [including not having automatic revocation at all] now seem worse. Here you've set something to be oneTimeOnly and used it twice; if that fails in IE, that's correct. If it works some of the time in other browsers [after they implement oneTimeOnly], that's not good, but you did pretty much aim at your own foot. Developers that actively try to do the right thing will have consistent good results without extra code, at least. I realize that img1.src = img2.src failing is odd, but as [IIRC] Adrian pointed out, if it's an uncacheable image on a server that's gone away, couldn't that already happen, depending on your network stack implementation? I'm more worried that if implementations doesn't initiate the load synchronously, which is hard per your comment above, then it can easily be random which of the two loads succeeds and which fails. If the revoking happens at the end of the load, both loads could even succeed depending on timing and implementation details. / Jonas
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Wed, May 1, 2013 at 5:36 PM, Arun Ranganathan a...@mozilla.com wrote: 2a. To meticulously special-case Blob URLs, per Bug 17765 [4]. This calls for a synchronous step attached to wherever URLs are used to peg Blob URL data at fetch, so that the chance of a concurrent revocation doesn't cause things to behave unpredictably. Firefox does a variation of this with keeping channels open, but solving this bug interoperably is going to be very hard, and has to be done in different places across the platform. And even within CSS. This is hard to move forward with. 2b.To adopt an 80-20 rule, and only specify what happens for some cases that seem common, but expressly disallow other cases. This might be a more muted version of Bug 17765, especially if it can't be done within fetch [5]. I'm okay with limiting this in cases where it's particularly hard to define. In particular, it seems like placing a hook in CSS in any deterministic way is hard, at least today: from what I understand, the time CSS parsing happens is unspecified. However, we probably can't break non-autorevoke blob URLs with CSS. So, I'd propose: - Start by defining that auto-revoke blob URLs may only be used with APIs that explicitly capture the blob (putting aside the mechanics of how we do that, for now). Blob capture would still affect non-autorevoke blob URLs, since it fixes race conditions, but an uncaptured blob URL would continue to work with non-autorevoke URLs. - Apply blob capture to one or two test cases. I think XHR is a good place for this, because it's easy to test, due to the xhr.open() and xhr.send() split. xhr.open() is where blob capture should happen, and xhr.send() is where the fetch happens. - Once people are comfortable with how it works, start applying it to other major blob URL cases (eg. img). Whether to apply it broadly to all APIs next or not is something that could be decided at this point. This will make autorevoke blob URLs work, gradually fix manual-revoke blob URLs as a side-effect, and leave manual-revoke URLs unspecified but functional for the remaining cases. It also doesn't require us to dive in head-first and try to apply this to every API on the platform all at once, which nobody wants to do; it lets us test it out, then apply it to more APIs at whatever pace makes sense. (I don't know any way to deal with the CSS case.) 2c. Re-use oneTimeOnly as in IE's behavior for autoRevoke (but call it autoRevoke). But we jettisoned this for race conditions e.g. // This is in IE only img2.src = URL.createObjectURL(fileBlob, {oneTimeOnly: true}); // race now! then fail in IE only img1.src = img2.src; will fail in IE with oneTimeOnly. It appears to fail reliably, but again, dereference URL may not be interoperable here. This is probably not what we should do, but it was worth listing, since it carries the brute force of a shipping implementation, and shows how some % of the market has actively solved this problem :) There are a lot of problems with oneTimeOnly. It's very easy for the URL to never actually be used, which results in a subtle and expensive blob leak. For example, this: setInterval(function() { img.src = URL.createObjectURL(createBlob(), {oneTimeOnly: true}); }, 100); might leak 10 blobs per second, since a browser that obtains images on demand might not fetch the blob at all, while a browser that obtains images immediately wouldn't. -- Glenn Maynard
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Wed, May 1, 2013 at 4:53 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, May 1, 2013 at 4:25 PM, Eric U er...@google.com wrote: On Wed, May 1, 2013 at 3:36 PM, Arun Ranganathan a...@mozilla.com wrote: Switching the default to false would enable IE, Chrome, andFirefox to have interoperability with URL.createObjectURL(blobArg), though such a default places burdens on web developers to couple create* calls with revoke* calls to not leak Blobs. Jonas proposes a separate method, URL.createAutoRevokeObjectURL, which creates an autoRevoke URL. I'm lukewarm on that :-\ I'd support a new method with a different default, if we could figure out a reasonable thing for that new method to do. Yeah, the if-condition here is quite important. But if we can figure out this problem, then my proposal would be to add a new method which has a nicer name than createObjectURL as to encourage authors to use that and have fewer leaks. Heh; I wasn't even going to mention the name. 2. Regardless of the default, there's the hard question of what to do with Blob URL revocation. Glenn / zewt points out that this applies, though perhaps less dramatically, to *manually* revoked Blob URLs, and provides some test cases [3]. Options are: 2a. To meticulously special-case Blob URLs, per Bug 17765 [4]. This calls for a synchronous step attached to wherever URLs are used to peg Blob URL data at fetch, so that the chance of a concurrent revocation doesn't cause things to behave unpredictably. Firefox does a variation of this with keeping channels open, but solving this bug interoperably is going to be very hard, and has to be done in different places across the platform. And even within CSS. This is hard to move forward with. Hard. It actually has turned out to be surprisingly easy in Gecko. But I realize the same might not be true everywhere. Right, and defining just when it happens, across browsers, may also be hard. 2b.To adopt an 80-20 rule, and only specify what happens for some cases that seem common, but expressly disallow other cases. This might be a more muted version of Bug 17765, especially if it can't be done within fetch [5]. Ugly. This could mean that the blob clause for basic fetch[5] only defines some cases where a synchronous fetch can be run (TBD) but expressly disallows others where synchronous fetching is not feasible. This would limit the use of Blob URLs pretty drastically, but might be the only solution. For instance, asynchronous calls accompanying embed, defer etc. might have to be expressly disallowed. It would be great if we do this in fetch [5] :-) Just to be clear, this would limit the use of *autoRevoke* Blob URLs, not all Blob URLs, yes? No, it would limit the use of all *revokable* Blob URLs. Since you get exactly the same issues when the page calls revokeObjectURL manually. So that means that it applies to all Blob URLs. Ah, right; all revoked Blob URLs. 2c. Re-use oneTimeOnly as in IE's behavior for autoRevoke (but call it autoRevoke). But we jettisoned this for race conditions e.g. // This is in IE only img2.src = URL.createObjectURL(fileBlob, {oneTimeOnly: true}); // race now! then fail in IE only img1.src = img2.src; will fail in IE with oneTimeOnly. It appears to fail reliably, but again, dereference URL may not be interoperable here. This is probably not what we should do, but it was worth listing, since it carries the brute force of a shipping implementation, and shows how some % of the market has actively solved this problem :) I'm not really sure this is so bad. I know it's the case I brought up, and I must admit that I disliked the oneTimeOnly when I first heard about it, but all other proposals [including not having automatic revocation at all] now seem worse. Here you've set something to be oneTimeOnly and used it twice; if that fails in IE, that's correct. If it works some of the time in other browsers [after they implement oneTimeOnly], that's not good, but you did pretty much aim at your own foot. Developers that actively try to do the right thing will have consistent good results without extra code, at least. I realize that img1.src = img2.src failing is odd, but as [IIRC] Adrian pointed out, if it's an uncacheable image on a server that's gone away, couldn't that already happen, depending on your network stack implementation? I'm more worried that if implementations doesn't initiate the load synchronously, which is hard per your comment above, then it can easily be random which of the two loads succeeds and which fails. If the revoking happens at the end of the load, both loads could even succeed depending on timing and implementation details. Yup; I'm just saying that if you get a failure here, you shouldn't be surprised, no matter which img gets it. You did something explicitly wrong. Ideally we'd give predictable behavior, but if we can't do
Re: Blob URLs | autoRevoke, defaults, and resolutions
On Wed, May 1, 2013 at 7:01 PM, Eric U er...@google.com wrote: Hmm...now Glenn points out another problem: if you /never/ load the image, for whatever reason, you can still leak it. How likely is that in good code, though? And is it worse than the current state in good or bad code? I think it's much too easy for well-meaning developers to mess this up. The example I gave is code that *does* use the URL, but the browser may or may not actually do anything with it. (I wouldn't even call that author error--it's an interoperability failure.) Also, the failures are both expensive and subtle (eg. lots of big blobs being silently leaked to disk), which is a pretty nasty failure mode. Another problem is that APIs should be able to receive an API, then use it multiple times. For example, srcset can change the image being displayed when the environment changes. oneTimeOnly would be weird in that case. For example, it would work when you load your page on a tablet, then work again when your browser outputs the display to a TV and changes the srcset image. (The image was never used, so the URL is still valid.) But then when you go back to the tablet screen and reconfigure back to the original configuration, it suddenly breaks, since the first URL was already used and discarded. The blob capture approach can be made to work with srcset, so this would work reliably. -- Glenn Maynard