Re: File API "oneTimeOnly" is too poorly defined

2012-04-10 Thread Jonas Sicking
On Mon, Apr 9, 2012 at 3:52 PM, Feras Moussa  wrote:
> We agree that the spec text should be updated to more clearly define what 
> dereference means.
> When we were trying to solve this problem, we looked for a simple and 
> consistent way that a developer can understand what dereferencing is.
> What we came up with was the following definition: revoking should happen at 
> the first time that any of the bits of the BLOB are accessed.
>
> This is a simple concept for a developer to understand, and is not complex to 
> spec or implement. This also helps avoid having to explicitly spec
> out in the File API spec the various edge cases that different APIs exhibit – 
> such as XHR open/send versus imgtag.src load versus css link href
> versus when a URL gets resolved or not. Instead those behaviors will continue 
> to be documented in their respective spec.
>
> The definition above would imply that some cases, such as a 
> cross-site-of-origin request to a Blob URL do not revoke, but we think that 
> is OK
> since it implies a developer error. If we use the above definition for 
> dereferencing, then in the XHR example you provided, xhr.send would
> be responsible for revoking the URL.

Depending on how you define "accessing the bits" I think this has the
risk of resulting in quite a few race conditions, both in a single
implementation, as well as across implementations.

For example, the following code:

url = URL.createObjectURL(blob, { oneTimeOnly: true; });
myImg.src = url;
setTimeout(function() { myOtherImg.src = url }, 0);

Assuming that the blob is backed by a OS file, this will start reading
the bits from the blob as soon as the IO thread is free to read from
the requested file. When that is depends on a lot of other things,
such as what happens in other tabs, what other actions did the page
just do, how much of a back-log does the IO thread currently have etc.

In gecko things are even worse since blobs can be backed by either OS
files, or by memory, or a combination thereof. We plan to use various
optimizations for determining which backing type to use. For example
if you get a blob from a WebSocket connection, it might depend on how
much data was downloaded before .binaryType was set to "blob" as well
as how big the websocket frame is.

So if what you have is a memory backed blob then "accessing the bits"
will likely happen sooner making it more likely that the above code
snippet will fail to load the second image.

I would expect other browsers to use other strategies for what backing
stores to use, introducing more uncertainty.

Like Glenn points out, basically all of the situations we are talking
about are error conditions. The way you should use the url after using
oneTimeOnly is to load from it exactly once. Anything else is "an
error" for some definition of "an error". But we all know that people
use web APIs in ways we wish they didn't. Intentionally or
accidentally.


What will happen for the following three code snippets in IE?

1.
url = URL.createObjectURL(blob, { oneTimeOnly: true; });
myImg.src = url;
setTimeout(function() { myOtherImg.src = url }, 0);

2.
url = URL.createObjectURL(blob);
myImg.src = url;
setTimeout(function() { URL.revokeObjectURL(url) }, 0);

3.
url = URL.createObjectURL(blob);
myImg.src = url;
URL.revokeObjectURL(url);

/ Jonas



Re: File API "oneTimeOnly" is too poorly defined

2012-04-09 Thread Glenn Maynard
On Mon, Apr 9, 2012 at 5:52 PM, Feras Moussa  wrote:

> We agree that the spec text should be updated to more clearly define what
> dereference means.
> When we were trying to solve this problem, we looked for a simple and
> consistent way that a developer can understand what dereferencing is.
> What we came up with was the following definition: revoking should happen
> at the first time that any of the bits of the BLOB are accessed.
>

No, this is hard to understand, because with many APIs it's not at all
obvious when the actual access happens.  It's also error-prone, as was
explained before.

This is a simple concept for a developer to understand, and is not complex
> to spec or implement. This also helps avoid having to explicitly spec
> out in the File API spec the various edge cases that different APIs
> exhibit – such as XHR open/send versus imgtag.src load versus css link href
> versus when a URL gets resolved or not. Instead those behaviors will
> continue to be documented in their respective spec.
>

Neither the current proposal (release at stable state) nor the previous
(release at first entry point) would require File API to specify cases for
other APIs.

Your proposal is essentially impossible to spec in an interoperable way,
because the point you propose that URLs should be released--at fetch
time--often happens in a queued task, and different APIs (and sometimes
even the same API, eg. XHR2) perform fetches in different event queues.
Some fetches happen asynchronously (eg. HTMLImageElement's "update the
image data"), in which case the ordering is even weaker.

The definition above would imply that some cases, such as a
> cross-site-of-origin request to a Blob URL do not revoke, but we think that
> is OK
> since it implies a developer error.


Everything that can result in a blob not being revoked is a developer
error.  The entire point of this API is to eliminate these error-prone
cases.

If we use the above definition for dereferencing, then in the XHR example
> you provided, xhr.send would
> be responsible for revoking the URL.
>

The current proposal, which there seems to be interest in, is to have the
object URL be revoked at the next stable state.  This guarantees that the
URL is always revoked.  It gives consistent behavior regardless of which
combinations of APIs are used with it.  APIs that begin fetches
synchronously would not need any modification at all, and APIs that begin
fetches asynchronously would essentially need to take a reference
synchronously (which should be fairly simple, and would be required for any
interoperable form of this API).

This model is simpler, easier to understand, easier to use, and easier to
spec; and makes it impossible for programming errors to cause leaked Blobs
(which is the very purpose of this API).

-- 
Glenn Maynard


RE: File API "oneTimeOnly" is too poorly defined

2012-04-09 Thread Feras Moussa
We agree that the spec text should be updated to more clearly define what 
dereference means. 
When we were trying to solve this problem, we looked for a simple and 
consistent way that a developer can understand what dereferencing is. 
What we came up with was the following definition: revoking should happen at 
the first time that any of the bits of the BLOB are accessed. 

This is a simple concept for a developer to understand, and is not complex to 
spec or implement. This also helps avoid having to explicitly spec 
out in the File API spec the various edge cases that different APIs exhibit – 
such as XHR open/send versus imgtag.src load versus css link href 
versus when a URL gets resolved or not. Instead those behaviors will continue 
to be documented in their respective spec.

The definition above would imply that some cases, such as a 
cross-site-of-origin request to a Blob URL do not revoke, but we think that is 
OK 
since it implies a developer error. If we use the above definition for 
dereferencing, then in the XHR example you provided, xhr.send would 
be responsible for revoking the URL.

Thanks,
Feras

>From: Charles Pritchard [mailto:ch...@jumis.com] 
>Sent: Thursday, March 29, 2012 1:03 AM
>To: Glenn Maynard
>Cc: Jonas Sicking; public-webapps WG
>Subject: Re: File API "oneTimeOnly" is too poorly defined
>
>Any feedback on what exists in modern implementations? MS seems to have the 
>most hard-line stance when talking about this API.
>
>When it comes to it, we ought to look at what happened in the latest harvest. 
>IE10, O12, C19, and so forth.
>
>
>On Mar 28, 2012, at 6:12 PM, Glenn Maynard  wrote:
>>On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking  wrote:
>> This would still require work in each URL-consuming spec, to define taking a
>> reference to the underlying blob's data when it receives an object URL.  I
>> think this is inherent to the feature.
>>This is an interesting idea for sure. It doesn't solve any of the
>>issues I brought up, so we still need to define when dereferencing
>>happens. But it does solve the problem of the URL leaking if it never
>>gets dereferenced, which is nice.
>>
>>Right, that's what I meant above.  The "dereferencing" step needs to be 
>>defined no matter what you do.  This just makes it easier to define 
>>(eliminating task ordering problems as a source of problems).
>>
>>Also, I still think that all APIs should consistently do that as soon as it 
>>first sees the URL.  For example, XHR should do it in open(), not in send().  
>>That's makes it easy for developers to understand when the dereferencing 
>>actually happens (in the general case, for all APIs).
>>
>>One other thing: "dereferencing" should take a reference to the underlying 
>>data of the Blob, not the Blob itself, so it's unaffected by neutering 
>>(transfers and Blob.close).  That avoids a whole category of problems.
>>
>>


Re: File API "oneTimeOnly" is too poorly defined

2012-03-30 Thread Glenn Maynard
2012/3/30 Bronislav Klučka 

> url = createObjectURL(blob);
>> blob = null;
>> setTimeout(function() { img.src = url; }, 0);
>>
>  That should not be problematic; yes, GC may actually free that blob
> allocated memory later than that timeout function triggers, but there is
> explicit release of that blob (blob = null), so this must fail (memory
> might be allocated by blob data, but that variable should be out). But
> following may cause the same issue
>

"blob = null" is not an explicit release of anything.  It's just clearing a
reference.  The only way to know that the blob is no longer referenced is
to wait for GC--that's what garbage collection *is*.  It's impossible to
guarantee that the "img.src = url" above will fail (without placing severe
constraints on how GC can be implemented, which won't happen).

-- 
Glenn Maynard


Re: File API "oneTimeOnly" is too poorly defined

2012-03-30 Thread Bronislav Klučka



On 30.3.2012 15:21, Glenn Maynard wrote:
2012/3/29 Bronislav Klučka >


If I understand you, you find it problematic that by using weak
ref, URL would for some time reference actual Blob and other time
it would not?


The problem is that the following code might or might not work, 
depending on the behavior of the browser's GC:


url = createObjectURL(blob);
blob = null;
setTimeout(function() { img.src = url; }, 0);

If the timer executes before GC collects the blob, this works, because 
the URL is still valid.  Otherwise, it fails, because--since the Blob 
no longer exists--the URL is no longer valid.


--
Glenn Maynard



That should not be problematic; yes, GC may actually free that blob 
allocated memory later than that timeout function triggers, but there is 
explicit release of that blob (blob = null), so this must fail (memory 
might be allocated by blob data, but that variable should be out). But 
following may cause the same issue


var url = createObjectUrl(new Blob(['hello']));
setTimeout(function() { img.src = url; }, 0);

and even this in some aggressive GC implementation

img.src = createObjectUrl(new Blob(['hello']));
//since nothing references this blob, it could be destroyed right after 
function ends, but before assigning result,
// thou GC works most likely in some application idle state not to delay 
application process, theoretically this
// applies as well, since the functionality of GS is not specified = 
implementation specific


Thou I could live with such issue, I see the problem (from others 
programmers perspective and from specification perspective)... there's 
no light at the end of this tunnel...
Either carefully treat weak refs, or have one time ULRs with 
dereferencing and concurrency issues, or explicit URL release one might 
forget..


Brona



Re: File API "oneTimeOnly" is too poorly defined

2012-03-30 Thread Glenn Maynard
2012/3/29 Bronislav Klučka 

> If I understand you, you find it problematic that by using weak ref, URL
> would for some time reference actual Blob and other time it would not?
>

The problem is that the following code might or might not work, depending
on the behavior of the browser's GC:

url = createObjectURL(blob);
blob = null;
setTimeout(function() { img.src = url; }, 0);

If the timer executes before GC collects the blob, this works, because the
URL is still valid.  Otherwise, it fails, because--since the Blob no longer
exists--the URL is no longer valid.

-- 
Glenn Maynard


Re: File API "oneTimeOnly" is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 30.3.2012 5:54, Glenn Maynard wrote:
2012/3/29 Bronislav Klučka >


The point was not to talk about weak refs, but about not creating
a GC reference from URL to Blob


If the lifetime of the URL is tied to the lifetime of the Blob, then 
that's what a weak reference *is*.


--
Glenn Maynard



If I understand you, you find it problematic that by using weak ref, URL 
would for some time reference actual Blob and other time it would not?


Brona Klucka



Re: File API "oneTimeOnly" is too poorly defined

2012-03-29 Thread Glenn Maynard
2012/3/29 Bronislav Klučka 

> The point was not to talk about weak refs, but about not creating a GC
> reference from URL to Blob
>

If the lifetime of the URL is tied to the lifetime of the Blob, then that's
what a weak reference *is*.

-- 
Glenn Maynard


Re: File API "oneTimeOnly" is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 30.3.2012 5:40, Glenn Maynard wrote:
2012/3/29 Bronislav Klučka >


Sure, weak referencing is probably not well explored approach, but
the underlying idea applied to blob is interesting: URL creates no
reference to Blob (from GC perspective), meaning Blob is subjected
to GC regardless of BlobUrl existence. This would remove the need
for revoking URL, programmers would only need to maintain blobs
they want to be persistent (e.g. in some global array).


Weak referencing is pretty well explored, I think.  It's intentionally 
not supported for the most part in JavaScript, because most weakref 
features expose garbage collection behavior to scripts.  Web APIs 
don't do that.


This approach exposes GC behavior, making it possible to write code 
that behaves differently depending on GC.


--
Glenn Maynard



The point was not to talk about weak refs, but about not creating a GC 
reference from URL to Blob


Brona



Re: File API "oneTimeOnly" is too poorly defined

2012-03-29 Thread Glenn Maynard
2012/3/29 Bronislav Klučka 

> Sure, weak referencing is probably not well explored approach, but the
> underlying idea applied to blob is interesting: URL creates no reference to
> Blob (from GC perspective), meaning Blob is subjected to GC regardless of
> BlobUrl existence. This would remove the need for revoking URL, programmers
> would only need to maintain blobs they want to be persistent (e.g. in some
> global array).
>

Weak referencing is pretty well explored, I think.  It's intentionally not
supported for the most part in JavaScript, because most weakref features
expose garbage collection behavior to scripts.  Web APIs don't do that.

This approach exposes GC behavior, making it possible to write code that
behaves differently depending on GC.

-- 
Glenn Maynard


Re: File API "oneTimeOnly" is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 30.3.2012 0:19, Glenn Maynard wrote:
On Thu, Mar 29, 2012 at 2:29 AM, Darin Fisher > wrote:


I've never been terribly happy with createObjectURL and the
requirement for
folks to remember to call revokeObjectURL.  I really like that
we're talking
about ways to minimize this pain :-)

I noticed the WeakRefs proposal:
http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs


This exposes GC behavior, though.  They try to reduce that by making 
it only collectable during the event loop, but it's still observable.  
For example,


blob = createObjectURL();
blob = null;
setTimeout(function() { img.src = blob; }, 0);
return;

might or might not succeed.

(IIRC, WeakMaps avoid GC exposure by being non-enumerable.  That seems 
to make them not very useful, since it's hardly different from simply 
assigning properties to the object.)


On Thu, Mar 29, 2012 at 6:08 AM, Robert O'Callahan 
mailto:rob...@ocallahan.org>> wrote:


It might be a bit better to revoke the URL at the next stable
state. Microtask checkpoints can happen during synchronous script
execution.


I'm confused.  I'd never seen microtasks actually defined (only 
suggested), but I see it's just not defined in the page I was viewing. 
 It looks like Google's returning out of date spec links 
(http://www.whatwg.org/specs/web-apps/current-work/.w3c-html-core/webappapis.html#processing-model-2). 
 This is the second time I've been bit recently by landing on a weird, 
out-of-date spec URL (ignoring the /TR/ trap, which I know to watch 
out for)...


So, "microtask" isn't what we need here (and not what IndexedDB needs, 
either).  Stable state might be correct.  I think that has another 
effect, which you don't get by waiting for the event loop: blob URLs 
would also be freed between  execution.  That is, in