Re: File API oneTimeOnly is too poorly defined
On Mon, Apr 9, 2012 at 3:52 PM, Feras Moussa fer...@microsoft.com wrote: We agree that the spec text should be updated to more clearly define what dereference means. When we were trying to solve this problem, we looked for a simple and consistent way that a developer can understand what dereferencing is. What we came up with was the following definition: revoking should happen at the first time that any of the bits of the BLOB are accessed. This is a simple concept for a developer to understand, and is not complex to spec or implement. This also helps avoid having to explicitly spec out in the File API spec the various edge cases that different APIs exhibit – such as XHR open/send versus imgtag.src load versus css link href versus when a URL gets resolved or not. Instead those behaviors will continue to be documented in their respective spec. The definition above would imply that some cases, such as a cross-site-of-origin request to a Blob URL do not revoke, but we think that is OK since it implies a developer error. If we use the above definition for dereferencing, then in the XHR example you provided, xhr.send would be responsible for revoking the URL. Depending on how you define accessing the bits I think this has the risk of resulting in quite a few race conditions, both in a single implementation, as well as across implementations. For example, the following code: url = URL.createObjectURL(blob, { oneTimeOnly: true; }); myImg.src = url; setTimeout(function() { myOtherImg.src = url }, 0); Assuming that the blob is backed by a OS file, this will start reading the bits from the blob as soon as the IO thread is free to read from the requested file. When that is depends on a lot of other things, such as what happens in other tabs, what other actions did the page just do, how much of a back-log does the IO thread currently have etc. In gecko things are even worse since blobs can be backed by either OS files, or by memory, or a combination thereof. We plan to use various optimizations for determining which backing type to use. For example if you get a blob from a WebSocket connection, it might depend on how much data was downloaded before .binaryType was set to blob as well as how big the websocket frame is. So if what you have is a memory backed blob then accessing the bits will likely happen sooner making it more likely that the above code snippet will fail to load the second image. I would expect other browsers to use other strategies for what backing stores to use, introducing more uncertainty. Like Glenn points out, basically all of the situations we are talking about are error conditions. The way you should use the url after using oneTimeOnly is to load from it exactly once. Anything else is an error for some definition of an error. But we all know that people use web APIs in ways we wish they didn't. Intentionally or accidentally. What will happen for the following three code snippets in IE? 1. url = URL.createObjectURL(blob, { oneTimeOnly: true; }); myImg.src = url; setTimeout(function() { myOtherImg.src = url }, 0); 2. url = URL.createObjectURL(blob); myImg.src = url; setTimeout(function() { URL.revokeObjectURL(url) }, 0); 3. url = URL.createObjectURL(blob); myImg.src = url; URL.revokeObjectURL(url); / Jonas
RE: File API oneTimeOnly is too poorly defined
We agree that the spec text should be updated to more clearly define what dereference means. When we were trying to solve this problem, we looked for a simple and consistent way that a developer can understand what dereferencing is. What we came up with was the following definition: revoking should happen at the first time that any of the bits of the BLOB are accessed. This is a simple concept for a developer to understand, and is not complex to spec or implement. This also helps avoid having to explicitly spec out in the File API spec the various edge cases that different APIs exhibit – such as XHR open/send versus imgtag.src load versus css link href versus when a URL gets resolved or not. Instead those behaviors will continue to be documented in their respective spec. The definition above would imply that some cases, such as a cross-site-of-origin request to a Blob URL do not revoke, but we think that is OK since it implies a developer error. If we use the above definition for dereferencing, then in the XHR example you provided, xhr.send would be responsible for revoking the URL. Thanks, Feras From: Charles Pritchard [mailto:ch...@jumis.com] Sent: Thursday, March 29, 2012 1:03 AM To: Glenn Maynard Cc: Jonas Sicking; public-webapps WG Subject: Re: File API oneTimeOnly is too poorly defined Any feedback on what exists in modern implementations? MS seems to have the most hard-line stance when talking about this API. When it comes to it, we ought to look at what happened in the latest harvest. IE10, O12, C19, and so forth. On Mar 28, 2012, at 6:12 PM, Glenn Maynard gl...@zewt.org wrote: On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking jo...@sicking.cc wrote: This would still require work in each URL-consuming spec, to define taking a reference to the underlying blob's data when it receives an object URL. I think this is inherent to the feature. This is an interesting idea for sure. It doesn't solve any of the issues I brought up, so we still need to define when dereferencing happens. But it does solve the problem of the URL leaking if it never gets dereferenced, which is nice. Right, that's what I meant above. The dereferencing step needs to be defined no matter what you do. This just makes it easier to define (eliminating task ordering problems as a source of problems). Also, I still think that all APIs should consistently do that as soon as it first sees the URL. For example, XHR should do it in open(), not in send(). That's makes it easy for developers to understand when the dereferencing actually happens (in the general case, for all APIs). One other thing: dereferencing should take a reference to the underlying data of the Blob, not the Blob itself, so it's unaffected by neutering (transfers and Blob.close). That avoids a whole category of problems.
Re: File API oneTimeOnly is too poorly defined
On Mon, Apr 9, 2012 at 5:52 PM, Feras Moussa fer...@microsoft.com wrote: We agree that the spec text should be updated to more clearly define what dereference means. When we were trying to solve this problem, we looked for a simple and consistent way that a developer can understand what dereferencing is. What we came up with was the following definition: revoking should happen at the first time that any of the bits of the BLOB are accessed. No, this is hard to understand, because with many APIs it's not at all obvious when the actual access happens. It's also error-prone, as was explained before. This is a simple concept for a developer to understand, and is not complex to spec or implement. This also helps avoid having to explicitly spec out in the File API spec the various edge cases that different APIs exhibit – such as XHR open/send versus imgtag.src load versus css link href versus when a URL gets resolved or not. Instead those behaviors will continue to be documented in their respective spec. Neither the current proposal (release at stable state) nor the previous (release at first entry point) would require File API to specify cases for other APIs. Your proposal is essentially impossible to spec in an interoperable way, because the point you propose that URLs should be released--at fetch time--often happens in a queued task, and different APIs (and sometimes even the same API, eg. XHR2) perform fetches in different event queues. Some fetches happen asynchronously (eg. HTMLImageElement's update the image data), in which case the ordering is even weaker. The definition above would imply that some cases, such as a cross-site-of-origin request to a Blob URL do not revoke, but we think that is OK since it implies a developer error. Everything that can result in a blob not being revoked is a developer error. The entire point of this API is to eliminate these error-prone cases. If we use the above definition for dereferencing, then in the XHR example you provided, xhr.send would be responsible for revoking the URL. The current proposal, which there seems to be interest in, is to have the object URL be revoked at the next stable state. This guarantees that the URL is always revoked. It gives consistent behavior regardless of which combinations of APIs are used with it. APIs that begin fetches synchronously would not need any modification at all, and APIs that begin fetches asynchronously would essentially need to take a reference synchronously (which should be fairly simple, and would be required for any interoperable form of this API). This model is simpler, easier to understand, easier to use, and easier to spec; and makes it impossible for programming errors to cause leaked Blobs (which is the very purpose of this API). -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com If I understand you, you find it problematic that by using weak ref, URL would for some time reference actual Blob and other time it would not? The problem is that the following code might or might not work, depending on the behavior of the browser's GC: url = createObjectURL(blob); blob = null; setTimeout(function() { img.src = url; }, 0); If the timer executes before GC collects the blob, this works, because the URL is still valid. Otherwise, it fails, because--since the Blob no longer exists--the URL is no longer valid. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On 30.3.2012 15:21, Glenn Maynard wrote: 2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com mailto:bronislav.klu...@bauglir.com If I understand you, you find it problematic that by using weak ref, URL would for some time reference actual Blob and other time it would not? The problem is that the following code might or might not work, depending on the behavior of the browser's GC: url = createObjectURL(blob); blob = null; setTimeout(function() { img.src = url; }, 0); If the timer executes before GC collects the blob, this works, because the URL is still valid. Otherwise, it fails, because--since the Blob no longer exists--the URL is no longer valid. -- Glenn Maynard That should not be problematic; yes, GC may actually free that blob allocated memory later than that timeout function triggers, but there is explicit release of that blob (blob = null), so this must fail (memory might be allocated by blob data, but that variable should be out). But following may cause the same issue var url = createObjectUrl(new Blob(['hello'])); setTimeout(function() { img.src = url; }, 0); and even this in some aggressive GC implementation img.src = createObjectUrl(new Blob(['hello'])); //since nothing references this blob, it could be destroyed right after function ends, but before assigning result, // thou GC works most likely in some application idle state not to delay application process, theoretically this // applies as well, since the functionality of GS is not specified = implementation specific Thou I could live with such issue, I see the problem (from others programmers perspective and from specification perspective)... there's no light at the end of this tunnel... Either carefully treat weak refs, or have one time ULRs with dereferencing and concurrency issues, or explicit URL release one might forget.. Brona
Re: File API oneTimeOnly is too poorly defined
2012/3/30 Bronislav Klučka bronislav.klu...@bauglir.com url = createObjectURL(blob); blob = null; setTimeout(function() { img.src = url; }, 0); That should not be problematic; yes, GC may actually free that blob allocated memory later than that timeout function triggers, but there is explicit release of that blob (blob = null), so this must fail (memory might be allocated by blob data, but that variable should be out). But following may cause the same issue blob = null is not an explicit release of anything. It's just clearing a reference. The only way to know that the blob is no longer referenced is to wait for GC--that's what garbage collection *is*. It's impossible to guarantee that the img.src = url above will fail (without placing severe constraints on how GC can be implemented, which won't happen). -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On Wed, Mar 28, 2012 at 5:49 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Mar 28, 2012 at 4:36 PM, Glenn Maynard gl...@zewt.org wrote: Here's another proposal, which is an iteration of the previous. It's based on the microtask concept, which is creeping up here and there but hasn't yet been properly defined. The idea is that microtasks can be queued (call it queue a microtask), and the microtask queue is executed by the event loop as soon as the current task completes, so it executes as soon as the outermost task returns to the event loop. oneTimeOnly (a poor name in this proposal) would simply queue a microtask to revoke the URL. This is simpler, and answers a lot of questions. It means you can use the URL as many times as you want synchronously, since it's not released until the script returns. Any cases where the ordering may not be strictly defined (eg. the videovideo case in http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.htmlmay be like this; I don't know how innerHTML works, exactly) are now defined: both video elements would get the object. It has another nice side-effect: it's much less prone to leaks. For example, under previous approaches, the following code would leak the blob: function updateProgressMeter() { throw obscure error; } url = URL.createObjectURL(blob, {oneTimeOnly: true}); updateProgressMeter(); img.src = url; // never happens Since the URL is never actually used, the blob reference leaks. You'd have to work around this with careful exception handling, which is precisely the sort of thing oneTimeOnly is supposed to avoid. With this proposal, the URL would always be revoked when the script returns to the event loop, whether or not it was actually used. This would still require work in each URL-consuming spec, to define taking a reference to the underlying blob's data when it receives an object URL. I think this is inherent to the feature. oneTimeOnly would be the wrong name with this approach; it should be something like autoRelease. This has one drawback: it doesn't work nicely in long-running Workers, which may never return to the event loop at all. I think that's probably an acceptable tradeoff. This is an interesting idea for sure. It doesn't solve any of the issues I brought up, so we still need to define when dereferencing happens. But it does solve the problem of the URL leaking if it never gets dereferenced, which is nice. I've never been terribly happy with createObjectURL and the requirement for folks to remember to call revokeObjectURL. I really like that we're talking about ways to minimize this pain :-) I noticed the WeakRefs proposal: http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs It also makes use of the micro-task concept, and it does so to avoid revealing details about garbage collection. What if we were to adopt a similar approach here. Instead of introducing a second parameter to createObjectURL, what if createObjectURL returned a WeakObjectURL object instead of a String object? WeakObjectURL could be converted to a String to reveal the Blob URL. Suppose WeakObjectURL if retained would keep the Blob URL valid. Else, when WeakObjectURL gets deleted, its Blob URL would remain alive up until the next micro-task. Crazy idea? Too crazy? I agree that it is valuable to define dereference points for APIs that receive Blob URLs. -Darin
Re: File API oneTimeOnly is too poorly defined
On 29.3.2012 9:29, Darin Fisher wrote: I've never been terribly happy with createObjectURL and the requirement for folks to remember to call revokeObjectURL. I really like that we're talking about ways to minimize this pain :-) I noticed the WeakRefs proposal: http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs It also makes use of the micro-task concept, and it does so to avoid revealing details about garbage collection. What if we were to adopt a similar approach here. Instead of introducing a second parameter to createObjectURL, what if createObjectURL returned a WeakObjectURL object instead of a String object? WeakObjectURL could be converted to a String to reveal the Blob URL. Suppose WeakObjectURL if retained would keep the Blob URL valid. Else, when WeakObjectURL gets deleted, its Blob URL would remain alive up until the next micro-task. Crazy idea? Too crazy? I agree that it is valuable to define dereference points for APIs that receive Blob URLs. -Darin So the WeakObjectURL would exists as long as Blob exists? //so this function will automatically GC Blob, URL object and the url string itself would be reference to 'nothing' (since blob does not exists anymore)? function fileInputOnChange() { var blob = this.files[0]; var url = blob.getWeakObjectURL(); document.getElementById('firstImage').src = url; this.parentNode.removeChild(this); } //so this function will not automatically GC Blob nor URL object and the url string will be still pointing to that Blob? var blob = MyGlobalFiles[0]; loadBlob(blob) function loadBlob(blob) { var url = blob.getWeakObjectURL(); document.getElementById('firstImage').src = url; } //and when I remove any reference to that blob, all will be GC and invalidated? MyGlobalFiles.shift(); that sounds interesting and probably like one of the best idea in this long discussion Brona
Re: File API oneTimeOnly is too poorly defined
Any feedback on what exists in modern implementations? MS seems to have the most hard-line stance when talking about this API. When it comes to it, we ought to look at what happened in the latest harvest. IE10, O12, C19, and so forth. On Mar 28, 2012, at 6:12 PM, Glenn Maynard gl...@zewt.org wrote: On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking jo...@sicking.cc wrote: This would still require work in each URL-consuming spec, to define taking a reference to the underlying blob's data when it receives an object URL. I think this is inherent to the feature. This is an interesting idea for sure. It doesn't solve any of the issues I brought up, so we still need to define when dereferencing happens. But it does solve the problem of the URL leaking if it never gets dereferenced, which is nice. Right, that's what I meant above. The dereferencing step needs to be defined no matter what you do. This just makes it easier to define (eliminating task ordering problems as a source of problems). Also, I still think that all APIs should consistently do that as soon as it first sees the URL. For example, XHR should do it in open(), not in send(). That's makes it easy for developers to understand when the dereferencing actually happens (in the general case, for all APIs). One other thing: dereferencing should take a reference to the underlying data of the Blob, not the Blob itself, so it's unaffected by neutering (transfers and Blob.close). That avoids a whole category of problems. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On Thu, Mar 29, 2012 at 12:36 PM, Glenn Maynard gl...@zewt.org wrote: oneTimeOnly (a poor name in this proposal) would simply queue a microtask to revoke the URL. This is simpler, and answers a lot of questions. It means you can use the URL as many times as you want synchronously, since it's not released until the script returns. Any cases where the ordering may not be strictly defined (eg. the videovideo case in http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.htmlmay be like this; I don't know how innerHTML works, exactly) are now defined: both video elements would get the object. That sounds like a pretty good idea. It might be a bit better to revoke the URL at the next stable state. Microtask checkpoints can happen during synchronous script execution. Rob -- “You have heard that it was said, ‘Love your neighbor and hate your enemy.’ But I tell you, love your enemies and pray for those who persecute you, that you may be children of your Father in heaven. ... If you love those who love you, what reward will you get? Are not even the tax collectors doing that? And if you greet only your own people, what are you doing more than others? [Matthew 5:43-47]
Re: File API oneTimeOnly is too poorly defined
On Thu, Mar 29, 2012 at 2:29 AM, Darin Fisher da...@chromium.org wrote: I've never been terribly happy with createObjectURL and the requirement for folks to remember to call revokeObjectURL. I really like that we're talking about ways to minimize this pain :-) I noticed the WeakRefs proposal: http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs This exposes GC behavior, though. They try to reduce that by making it only collectable during the event loop, but it's still observable. For example, blob = createObjectURL(); blob = null; setTimeout(function() { img.src = blob; }, 0); return; might or might not succeed. (IIRC, WeakMaps avoid GC exposure by being non-enumerable. That seems to make them not very useful, since it's hardly different from simply assigning properties to the object.) On Thu, Mar 29, 2012 at 6:08 AM, Robert O'Callahan rob...@ocallahan.orgwrote: It might be a bit better to revoke the URL at the next stable state. Microtask checkpoints can happen during synchronous script execution. I'm confused. I'd never seen microtasks actually defined (only suggested), but I see it's just not defined in the page I was viewing. It looks like Google's returning out of date spec links ( http://www.whatwg.org/specs/web-apps/current-work/.w3c-html-core/webappapis.html#processing-model-2). This is the second time I've been bit recently by landing on a weird, out-of-date spec URL (ignoring the /TR/ trap, which I know to watch out for)... So, microtask isn't what we need here (and not what IndexedDB needs, either). Stable state might be correct. I think that has another effect, which you don't get by waiting for the event loop: blob URLs would also be freed between script execution. That is, in scripturl = URL.createObjectURL(blob); .../scriptscript.../script the blob is released before the second script is run. That seems like a plus. getObjectURL(blob, {auto: true}) would simply do: N+1. Return the generated url, and then continue running this algorithm asynchronously. N+2. Await a stable state. N+3. Revoke url. which doesn't require any new concepts. I'd suggest auto as the option name; it's short, since it'd probably be used a lot. On Thu, Mar 29, 2012 at 3:03 AM, Charles Pritchard ch...@jumis.com wrote: Any feedback on what exists in modern implementations? MS seems to have the most hard-line stance when talking about this API. As far as I could tell, MS implemented something behind closed doors, presented it whole, and then more or less refused to change anything, despite the serious issues pointed out in it. Web API development can't work that way in 2012. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On 30.3.2012 0:19, Glenn Maynard wrote: On Thu, Mar 29, 2012 at 2:29 AM, Darin Fisher da...@chromium.org mailto:da...@chromium.org wrote: I've never been terribly happy with createObjectURL and the requirement for folks to remember to call revokeObjectURL. I really like that we're talking about ways to minimize this pain :-) I noticed the WeakRefs proposal: http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs This exposes GC behavior, though. They try to reduce that by making it only collectable during the event loop, but it's still observable. For example, blob = createObjectURL(); blob = null; setTimeout(function() { img.src = blob; }, 0); return; might or might not succeed. (IIRC, WeakMaps avoid GC exposure by being non-enumerable. That seems to make them not very useful, since it's hardly different from simply assigning properties to the object.) On Thu, Mar 29, 2012 at 6:08 AM, Robert O'Callahan rob...@ocallahan.org mailto:rob...@ocallahan.org wrote: It might be a bit better to revoke the URL at the next stable state. Microtask checkpoints can happen during synchronous script execution. I'm confused. I'd never seen microtasks actually defined (only suggested), but I see it's just not defined in the page I was viewing. It looks like Google's returning out of date spec links (http://www.whatwg.org/specs/web-apps/current-work/.w3c-html-core/webappapis.html#processing-model-2). This is the second time I've been bit recently by landing on a weird, out-of-date spec URL (ignoring the /TR/ trap, which I know to watch out for)... So, microtask isn't what we need here (and not what IndexedDB needs, either). Stable state might be correct. I think that has another effect, which you don't get by waiting for the event loop: blob URLs would also be freed between script execution. That is, in scripturl = URL.createObjectURL(blob); .../scriptscript.../script the blob is released before the second script is run. That seems like a plus. getObjectURL(blob, {auto: true}) would simply do: N+1. Return the generated url, and then continue running this algorithm asynchronously. N+2. Await a stable state. N+3. Revoke url. which doesn't require any new concepts. I'd suggest auto as the option name; it's short, since it'd probably be used a lot. On Thu, Mar 29, 2012 at 3:03 AM, Charles Pritchard ch...@jumis.com mailto:ch...@jumis.com wrote: Any feedback on what exists in modern implementations? MS seems to have the most hard-line stance when talking about this API. As far as I could tell, MS implemented something behind closed doors, presented it whole, and then more or less refused to change anything, despite the serious issues pointed out in it. Web API development can't work that way in 2012. -- Glenn Maynard Sure, weak referencing is probably not well explored approach, but the underlying idea applied to blob is interesting: URL creates no reference to Blob (from GC perspective), meaning Blob is subjected to GC regardless of BlobUrl existence. This would remove the need for revoking URL, programmers would only need to maintain blobs they want to be persistent (e.g. in some global array). This seem to solve nothing, because there is still some revoking/releasing variable, but the approach is reverse (explicit keeping of reference, instead of explicit releasing), which could seriously limit the I forgot to release this cases. And what you are working with is the actual Blob, not some String thing. Brona
Re: File API oneTimeOnly is too poorly defined
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com Sure, weak referencing is probably not well explored approach, but the underlying idea applied to blob is interesting: URL creates no reference to Blob (from GC perspective), meaning Blob is subjected to GC regardless of BlobUrl existence. This would remove the need for revoking URL, programmers would only need to maintain blobs they want to be persistent (e.g. in some global array). Weak referencing is pretty well explored, I think. It's intentionally not supported for the most part in JavaScript, because most weakref features expose garbage collection behavior to scripts. Web APIs don't do that. This approach exposes GC behavior, making it possible to write code that behaves differently depending on GC. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On 30.3.2012 5:40, Glenn Maynard wrote: 2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com mailto:bronislav.klu...@bauglir.com Sure, weak referencing is probably not well explored approach, but the underlying idea applied to blob is interesting: URL creates no reference to Blob (from GC perspective), meaning Blob is subjected to GC regardless of BlobUrl existence. This would remove the need for revoking URL, programmers would only need to maintain blobs they want to be persistent (e.g. in some global array). Weak referencing is pretty well explored, I think. It's intentionally not supported for the most part in JavaScript, because most weakref features expose garbage collection behavior to scripts. Web APIs don't do that. This approach exposes GC behavior, making it possible to write code that behaves differently depending on GC. -- Glenn Maynard The point was not to talk about weak refs, but about not creating a GC reference from URL to Blob Brona
Re: File API oneTimeOnly is too poorly defined
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com The point was not to talk about weak refs, but about not creating a GC reference from URL to Blob If the lifetime of the URL is tied to the lifetime of the Blob, then that's what a weak reference *is*. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On 30.3.2012 5:54, Glenn Maynard wrote: 2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com mailto:bronislav.klu...@bauglir.com The point was not to talk about weak refs, but about not creating a GC reference from URL to Blob If the lifetime of the URL is tied to the lifetime of the Blob, then that's what a weak reference *is*. -- Glenn Maynard If I understand you, you find it problematic that by using weak ref, URL would for some time reference actual Blob and other time it would not? Brona Klucka
Re: File API oneTimeOnly is too poorly defined
On Tue, Mar 27, 2012 at 4:59 PM, Glenn Maynard gl...@zewt.org wrote: I didn't realize this was actually added to the spec: The optional options dictionary argument contains a key, oneTimeOnly that defaults to false. If set to true, then the first time the Blob URI is dereferenced, user agents MUST automatically revoke that Blob URI without needing a call to revokeObjectURL() on the Blob URI. What does dereferenced mean? Where is it defined? What happens if two XHR calls open() a blob URL one after the other (causing fetches to be queued for it in separate task queues, whose order of execution is undefined)? What happens if two completely unrelated APIs queue tasks in different task queues (causing the same problem, but in a way that can't be worked around within any one spec)? This feature is dangerously weakly defined. It should be removed from the spec until it can be defined properly (or at least marked not ready for implementations), or we may end up with interop failures that could be hard to fix later. Again, I'm pretty sure the sanest way to approach this feature is for any API supporting it to grab a reference to the underlying resource, and revoke the URL, as soon as the string enters that API (eg. xhr.open() is called, or img.src is assigned). That ensures it's always deterministically--and synchronously--clear who will actually successfully receive the object, regardless of later complications like separate task queues across APIs. It doesn't answer all questions (eg. the issues mentioned at http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.html), and the actual dereferencing action would need to be specified for every supported API (this would need work to make it easy to do), but it's a lot closer than what's in there now. I think we need to define that APIs like xhr.open(...) and the img.src setter synchronously dereference the URL before returning. This is needed even if we didn't have oneTimeOnly for at least two reasons: 1. var blob = getBlob(); var url = URL.createObjectURL(blob); img.src = url; URL.revokeObjectURL(url); 2. var fileEntry = getFileEntry(); fileEntry.file(function(file) { fileEntry.createWriter(function(fileWriter) { var url = URL.createObjectURL(file); var xhr = new XMLHttpRequest(); xhr.open(GET, url); xhr.send(); xhr.onload = ...; fileWriter.write(new Blob([hello])); }); }); In the first example the blob-url is disabled synchronously after the img.src is set. Unless it's defined when img.src dereferences the blob-url, then it's undefined if the first example works. In the second example the file object itself is disabled when the fileWriter.write function is called. The blob-url which represents is logically also disabled at the same time. If it's not defined when the XHR object dereferences the blob-url then it's undefined whether the second example works. In fact, this problem isn't even blob-url specific. If you change the second example to not use blob-urls, but rather read from 'file' using a FileReader, you'll have exactly the same question of if starting to read the Blob happens before the Blob is disabled, or after. Generally speaking, in order to be able to precisely define when these URLs or Blobs are dereferenced we likely need to define that that happens synchronously from the various APIs that dereferences URLs and Blobs. It so happens that dereferencing synchronously also is the most useful behavior for authors. Note that no actual IO needs to happen just because you dereference the URL. So no synchronous IO is required. We took a survey of the various points in the Gecko codebase to see if we dereference URLs and Blobs synchronously or not. The only API we found that didn't do so was the IndexedDB code for storing Blobs. All of this will definitely be a lot of work to specify (and possibly implement). But I don't see any other options to get interoperability with Blobs and blob-URLs. It's definitely not a problem restricted to oneTimeOnly. / Jonas
Re: File API oneTimeOnly is too poorly defined
On Wed, 28 Mar 2012 08:19:55 +0100, Jonas Sicking jo...@sicking.cc wrote: I think we need to define that APIs like xhr.open(...) and the img.src setter synchronously dereference the URL before returning. What does dereferencing mean exactly? xhr.open() resolves URLs currently and then xhr.send() will fetch the URL. -- Anne van Kesteren http://annevankesteren.nl/
Re: File API oneTimeOnly is too poorly defined
On Wed, Mar 28, 2012 at 2:17 AM, Anne van Kesteren ann...@opera.com wrote: On Wed, 28 Mar 2012 08:19:55 +0100, Jonas Sicking jo...@sicking.cc wrote: I think we need to define that APIs like xhr.open(...) and the img.src setter synchronously dereference the URL before returning. What does dereferencing mean exactly? It means initiating the load or some such. Implementation-wise for blob URLs it would likely mean going through the blob-url hash table to find the underlying Blob object and start a read from it. So if the URL is removed from the hash using revokeObjectURL this wouldn't affect the load. Likewise for the FileSystem API it would mean that a read from the blob has started and so any writes need to be queued until after the read is finished. xhr.open() resolves URLs currently and then xhr.send() will fetch the URL. Yup, this is the stuff that needs to be defined. In Gecko we actually dereference the URL in xhr.open which means that the caller can call revokeObjectURL after the call to xhr.open but before xhr.send. But it's certainly possible to change this so that the URL is dereferenced in xhr.send instead. / Jonas
Re: File API oneTimeOnly is too poorly defined
On Wed, 28 Mar 2012 10:51:59 +0200, Jonas Sicking jo...@sicking.cc wrote: On Wed, Mar 28, 2012 at 2:17 AM, Anne van Kesteren ann...@opera.com wrote: What does dereferencing mean exactly? It means initiating the load or some such. Implementation-wise for blob URLs it would likely mean going through the blob-url hash table to find the underlying Blob object and start a read from it. So if the URL is removed from the hash using revokeObjectURL this wouldn't affect the load. Likewise for the FileSystem API it would mean that a read from the blob has started and so any writes need to be queued until after the read is finished. xhr.open() resolves URLs currently and then xhr.send() will fetch the URL. Yup, this is the stuff that needs to be defined. In Gecko we actually dereference the URL in xhr.open which means that the caller can call revokeObjectURL after the call to xhr.open but before xhr.send. But it's certainly possible to change this so that the URL is dereferenced in xhr.send instead. Given that the start of the fetch algorithm http://www.whatwg.org/specs/web-apps/current-work/multipage/fetching-resources.html#fetch is synchronous maybe dereferencing can be defined as part of that. The change to XMLHttpRequest that would be needed then is to invoke fetch before returning from send(). Not sure how well that would work for other contexts such as img and 'background-image' though. -- Anne van Kesteren http://annevankesteren.nl/
Re: File API oneTimeOnly is too poorly defined
On Wed, Mar 28, 2012 at 2:19 AM, Jonas Sicking jo...@sicking.cc wrote: All of this will definitely be a lot of work to specify (and possibly implement). But I don't see any other options to get interoperability with Blobs and blob-URLs. It's definitely not a problem restricted to oneTimeOnly. Those are separate problems. Other uses of blob URLs (without oneTimeOnly) don't have an undefined dereference concept to begin with; they just access the URL directly. They do have other problems, though. To take the first example: var blob = getBlob(); var url = URL.createObjectURL(blob); img.src = url; URL.revokeObjectURL(url); When you assign img.src, you cause update the image data to be invoked. That algorithm goes asynchronous in step 5; it then accesses img.src asynchronously. This means there's a race condition, depending on whether the revokeObjectURL call happens before or after the asynchronous fetch. The same changes needed to fix oneTimeOnly would probably fix most of these sorts of problems too, though. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
Here's another proposal, which is an iteration of the previous. It's based on the microtask concept, which is creeping up here and there but hasn't yet been properly defined. The idea is that microtasks can be queued (call it queue a microtask), and the microtask queue is executed by the event loop as soon as the current task completes, so it executes as soon as the outermost task returns to the event loop. oneTimeOnly (a poor name in this proposal) would simply queue a microtask to revoke the URL. This is simpler, and answers a lot of questions. It means you can use the URL as many times as you want synchronously, since it's not released until the script returns. Any cases where the ordering may not be strictly defined (eg. the videovideo case in http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.html may be like this; I don't know how innerHTML works, exactly) are now defined: both video elements would get the object. It has another nice side-effect: it's much less prone to leaks. For example, under previous approaches, the following code would leak the blob: function updateProgressMeter() { throw obscure error; } url = URL.createObjectURL(blob, {oneTimeOnly: true}); updateProgressMeter(); img.src = url; // never happens Since the URL is never actually used, the blob reference leaks. You'd have to work around this with careful exception handling, which is precisely the sort of thing oneTimeOnly is supposed to avoid. With this proposal, the URL would always be revoked when the script returns to the event loop, whether or not it was actually used. This would still require work in each URL-consuming spec, to define taking a reference to the underlying blob's data when it receives an object URL. I think this is inherent to the feature. oneTimeOnly would be the wrong name with this approach; it should be something like autoRelease. This has one drawback: it doesn't work nicely in long-running Workers, which may never return to the event loop at all. I think that's probably an acceptable tradeoff. -- Glenn Maynard
Re: File API oneTimeOnly is too poorly defined
On Wed, Mar 28, 2012 at 4:36 PM, Glenn Maynard gl...@zewt.org wrote: Here's another proposal, which is an iteration of the previous. It's based on the microtask concept, which is creeping up here and there but hasn't yet been properly defined. The idea is that microtasks can be queued (call it queue a microtask), and the microtask queue is executed by the event loop as soon as the current task completes, so it executes as soon as the outermost task returns to the event loop. oneTimeOnly (a poor name in this proposal) would simply queue a microtask to revoke the URL. This is simpler, and answers a lot of questions. It means you can use the URL as many times as you want synchronously, since it's not released until the script returns. Any cases where the ordering may not be strictly defined (eg. the videovideo case in http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.html may be like this; I don't know how innerHTML works, exactly) are now defined: both video elements would get the object. It has another nice side-effect: it's much less prone to leaks. For example, under previous approaches, the following code would leak the blob: function updateProgressMeter() { throw obscure error; } url = URL.createObjectURL(blob, {oneTimeOnly: true}); updateProgressMeter(); img.src = url; // never happens Since the URL is never actually used, the blob reference leaks. You'd have to work around this with careful exception handling, which is precisely the sort of thing oneTimeOnly is supposed to avoid. With this proposal, the URL would always be revoked when the script returns to the event loop, whether or not it was actually used. This would still require work in each URL-consuming spec, to define taking a reference to the underlying blob's data when it receives an object URL. I think this is inherent to the feature. oneTimeOnly would be the wrong name with this approach; it should be something like autoRelease. This has one drawback: it doesn't work nicely in long-running Workers, which may never return to the event loop at all. I think that's probably an acceptable tradeoff. This is an interesting idea for sure. It doesn't solve any of the issues I brought up, so we still need to define when dereferencing happens. But it does solve the problem of the URL leaking if it never gets dereferenced, which is nice. / Jonas
Re: File API oneTimeOnly is too poorly defined
On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking jo...@sicking.cc wrote: This would still require work in each URL-consuming spec, to define taking a reference to the underlying blob's data when it receives an object URL. I think this is inherent to the feature. This is an interesting idea for sure. It doesn't solve any of the issues I brought up, so we still need to define when dereferencing happens. But it does solve the problem of the URL leaking if it never gets dereferenced, which is nice. Right, that's what I meant above. The dereferencing step needs to be defined no matter what you do. This just makes it easier to define (eliminating task ordering problems as a source of problems). Also, I still think that all APIs should consistently do that as soon as it first sees the URL. For example, XHR should do it in open(), not in send(). That's makes it easy for developers to understand when the dereferencing actually happens (in the general case, for all APIs). One other thing: dereferencing should take a reference to the underlying data of the Blob, not the Blob itself, so it's unaffected by neutering (transfers and Blob.close). That avoids a whole category of problems. -- Glenn Maynard