Re: File API oneTimeOnly is too poorly defined

2012-04-10 Thread Jonas Sicking
On Mon, Apr 9, 2012 at 3:52 PM, Feras Moussa fer...@microsoft.com wrote:
 We agree that the spec text should be updated to more clearly define what 
 dereference means.
 When we were trying to solve this problem, we looked for a simple and 
 consistent way that a developer can understand what dereferencing is.
 What we came up with was the following definition: revoking should happen at 
 the first time that any of the bits of the BLOB are accessed.

 This is a simple concept for a developer to understand, and is not complex to 
 spec or implement. This also helps avoid having to explicitly spec
 out in the File API spec the various edge cases that different APIs exhibit – 
 such as XHR open/send versus imgtag.src load versus css link href
 versus when a URL gets resolved or not. Instead those behaviors will continue 
 to be documented in their respective spec.

 The definition above would imply that some cases, such as a 
 cross-site-of-origin request to a Blob URL do not revoke, but we think that 
 is OK
 since it implies a developer error. If we use the above definition for 
 dereferencing, then in the XHR example you provided, xhr.send would
 be responsible for revoking the URL.

Depending on how you define accessing the bits I think this has the
risk of resulting in quite a few race conditions, both in a single
implementation, as well as across implementations.

For example, the following code:

url = URL.createObjectURL(blob, { oneTimeOnly: true; });
myImg.src = url;
setTimeout(function() { myOtherImg.src = url }, 0);

Assuming that the blob is backed by a OS file, this will start reading
the bits from the blob as soon as the IO thread is free to read from
the requested file. When that is depends on a lot of other things,
such as what happens in other tabs, what other actions did the page
just do, how much of a back-log does the IO thread currently have etc.

In gecko things are even worse since blobs can be backed by either OS
files, or by memory, or a combination thereof. We plan to use various
optimizations for determining which backing type to use. For example
if you get a blob from a WebSocket connection, it might depend on how
much data was downloaded before .binaryType was set to blob as well
as how big the websocket frame is.

So if what you have is a memory backed blob then accessing the bits
will likely happen sooner making it more likely that the above code
snippet will fail to load the second image.

I would expect other browsers to use other strategies for what backing
stores to use, introducing more uncertainty.

Like Glenn points out, basically all of the situations we are talking
about are error conditions. The way you should use the url after using
oneTimeOnly is to load from it exactly once. Anything else is an
error for some definition of an error. But we all know that people
use web APIs in ways we wish they didn't. Intentionally or
accidentally.


What will happen for the following three code snippets in IE?

1.
url = URL.createObjectURL(blob, { oneTimeOnly: true; });
myImg.src = url;
setTimeout(function() { myOtherImg.src = url }, 0);

2.
url = URL.createObjectURL(blob);
myImg.src = url;
setTimeout(function() { URL.revokeObjectURL(url) }, 0);

3.
url = URL.createObjectURL(blob);
myImg.src = url;
URL.revokeObjectURL(url);

/ Jonas



RE: File API oneTimeOnly is too poorly defined

2012-04-09 Thread Feras Moussa
We agree that the spec text should be updated to more clearly define what 
dereference means. 
When we were trying to solve this problem, we looked for a simple and 
consistent way that a developer can understand what dereferencing is. 
What we came up with was the following definition: revoking should happen at 
the first time that any of the bits of the BLOB are accessed. 

This is a simple concept for a developer to understand, and is not complex to 
spec or implement. This also helps avoid having to explicitly spec 
out in the File API spec the various edge cases that different APIs exhibit – 
such as XHR open/send versus imgtag.src load versus css link href 
versus when a URL gets resolved or not. Instead those behaviors will continue 
to be documented in their respective spec.

The definition above would imply that some cases, such as a 
cross-site-of-origin request to a Blob URL do not revoke, but we think that is 
OK 
since it implies a developer error. If we use the above definition for 
dereferencing, then in the XHR example you provided, xhr.send would 
be responsible for revoking the URL.

Thanks,
Feras

From: Charles Pritchard [mailto:ch...@jumis.com] 
Sent: Thursday, March 29, 2012 1:03 AM
To: Glenn Maynard
Cc: Jonas Sicking; public-webapps WG
Subject: Re: File API oneTimeOnly is too poorly defined

Any feedback on what exists in modern implementations? MS seems to have the 
most hard-line stance when talking about this API.

When it comes to it, we ought to look at what happened in the latest harvest. 
IE10, O12, C19, and so forth.


On Mar 28, 2012, at 6:12 PM, Glenn Maynard gl...@zewt.org wrote:
On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking jo...@sicking.cc wrote:
 This would still require work in each URL-consuming spec, to define taking a
 reference to the underlying blob's data when it receives an object URL.  I
 think this is inherent to the feature.
This is an interesting idea for sure. It doesn't solve any of the
issues I brought up, so we still need to define when dereferencing
happens. But it does solve the problem of the URL leaking if it never
gets dereferenced, which is nice.

Right, that's what I meant above.  The dereferencing step needs to be 
defined no matter what you do.  This just makes it easier to define 
(eliminating task ordering problems as a source of problems).

Also, I still think that all APIs should consistently do that as soon as it 
first sees the URL.  For example, XHR should do it in open(), not in send().  
That's makes it easy for developers to understand when the dereferencing 
actually happens (in the general case, for all APIs).

One other thing: dereferencing should take a reference to the underlying 
data of the Blob, not the Blob itself, so it's unaffected by neutering 
(transfers and Blob.close).  That avoids a whole category of problems.




Re: File API oneTimeOnly is too poorly defined

2012-04-09 Thread Glenn Maynard
On Mon, Apr 9, 2012 at 5:52 PM, Feras Moussa fer...@microsoft.com wrote:

 We agree that the spec text should be updated to more clearly define what
 dereference means.
 When we were trying to solve this problem, we looked for a simple and
 consistent way that a developer can understand what dereferencing is.
 What we came up with was the following definition: revoking should happen
 at the first time that any of the bits of the BLOB are accessed.


No, this is hard to understand, because with many APIs it's not at all
obvious when the actual access happens.  It's also error-prone, as was
explained before.

This is a simple concept for a developer to understand, and is not complex
 to spec or implement. This also helps avoid having to explicitly spec
 out in the File API spec the various edge cases that different APIs
 exhibit – such as XHR open/send versus imgtag.src load versus css link href
 versus when a URL gets resolved or not. Instead those behaviors will
 continue to be documented in their respective spec.


Neither the current proposal (release at stable state) nor the previous
(release at first entry point) would require File API to specify cases for
other APIs.

Your proposal is essentially impossible to spec in an interoperable way,
because the point you propose that URLs should be released--at fetch
time--often happens in a queued task, and different APIs (and sometimes
even the same API, eg. XHR2) perform fetches in different event queues.
Some fetches happen asynchronously (eg. HTMLImageElement's update the
image data), in which case the ordering is even weaker.

The definition above would imply that some cases, such as a
 cross-site-of-origin request to a Blob URL do not revoke, but we think that
 is OK
 since it implies a developer error.


Everything that can result in a blob not being revoked is a developer
error.  The entire point of this API is to eliminate these error-prone
cases.

If we use the above definition for dereferencing, then in the XHR example
 you provided, xhr.send would
 be responsible for revoking the URL.


The current proposal, which there seems to be interest in, is to have the
object URL be revoked at the next stable state.  This guarantees that the
URL is always revoked.  It gives consistent behavior regardless of which
combinations of APIs are used with it.  APIs that begin fetches
synchronously would not need any modification at all, and APIs that begin
fetches asynchronously would essentially need to take a reference
synchronously (which should be fairly simple, and would be required for any
interoperable form of this API).

This model is simpler, easier to understand, easier to use, and easier to
spec; and makes it impossible for programming errors to cause leaked Blobs
(which is the very purpose of this API).

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-30 Thread Glenn Maynard
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com

 If I understand you, you find it problematic that by using weak ref, URL
 would for some time reference actual Blob and other time it would not?


The problem is that the following code might or might not work, depending
on the behavior of the browser's GC:

url = createObjectURL(blob);
blob = null;
setTimeout(function() { img.src = url; }, 0);

If the timer executes before GC collects the blob, this works, because the
URL is still valid.  Otherwise, it fails, because--since the Blob no longer
exists--the URL is no longer valid.

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-30 Thread Bronislav Klučka



On 30.3.2012 15:21, Glenn Maynard wrote:
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com 
mailto:bronislav.klu...@bauglir.com


If I understand you, you find it problematic that by using weak
ref, URL would for some time reference actual Blob and other time
it would not?


The problem is that the following code might or might not work, 
depending on the behavior of the browser's GC:


url = createObjectURL(blob);
blob = null;
setTimeout(function() { img.src = url; }, 0);

If the timer executes before GC collects the blob, this works, because 
the URL is still valid.  Otherwise, it fails, because--since the Blob 
no longer exists--the URL is no longer valid.


--
Glenn Maynard



That should not be problematic; yes, GC may actually free that blob 
allocated memory later than that timeout function triggers, but there is 
explicit release of that blob (blob = null), so this must fail (memory 
might be allocated by blob data, but that variable should be out). But 
following may cause the same issue


var url = createObjectUrl(new Blob(['hello']));
setTimeout(function() { img.src = url; }, 0);

and even this in some aggressive GC implementation

img.src = createObjectUrl(new Blob(['hello']));
//since nothing references this blob, it could be destroyed right after 
function ends, but before assigning result,
// thou GC works most likely in some application idle state not to delay 
application process, theoretically this
// applies as well, since the functionality of GS is not specified = 
implementation specific


Thou I could live with such issue, I see the problem (from others 
programmers perspective and from specification perspective)... there's 
no light at the end of this tunnel...
Either carefully treat weak refs, or have one time ULRs with 
dereferencing and concurrency issues, or explicit URL release one might 
forget..


Brona



Re: File API oneTimeOnly is too poorly defined

2012-03-30 Thread Glenn Maynard
2012/3/30 Bronislav Klučka bronislav.klu...@bauglir.com

 url = createObjectURL(blob);
 blob = null;
 setTimeout(function() { img.src = url; }, 0);

  That should not be problematic; yes, GC may actually free that blob
 allocated memory later than that timeout function triggers, but there is
 explicit release of that blob (blob = null), so this must fail (memory
 might be allocated by blob data, but that variable should be out). But
 following may cause the same issue


blob = null is not an explicit release of anything.  It's just clearing a
reference.  The only way to know that the blob is no longer referenced is
to wait for GC--that's what garbage collection *is*.  It's impossible to
guarantee that the img.src = url above will fail (without placing severe
constraints on how GC can be implemented, which won't happen).

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Darin Fisher
On Wed, Mar 28, 2012 at 5:49 PM, Jonas Sicking jo...@sicking.cc wrote:

 On Wed, Mar 28, 2012 at 4:36 PM, Glenn Maynard gl...@zewt.org wrote:
  Here's another proposal, which is an iteration of the previous.  It's
 based
  on the microtask concept, which is creeping up here and there but
 hasn't
  yet been properly defined.  The idea is that microtasks can be queued
 (call
  it queue a microtask), and the microtask queue is executed by the event
  loop as soon as the current task completes, so it executes as soon as the
  outermost task returns to the event loop.
 
  oneTimeOnly (a poor name in this proposal) would simply queue a
 microtask to
  revoke the URL.
 
  This is simpler, and answers a lot of questions.  It means you can use
 the
  URL as many times as you want synchronously, since it's not released
 until
  the script returns.  Any cases where the ordering may not be strictly
  defined (eg. the videovideo case in
  http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.htmlmay
  be like this; I don't know how innerHTML works, exactly) are now defined:
  both video elements would get the object.
 
  It has another nice side-effect: it's much less prone to leaks.  For
  example, under previous approaches, the following code would leak the
 blob:
 
  function updateProgressMeter() { throw obscure error; }
  url = URL.createObjectURL(blob, {oneTimeOnly: true});
  updateProgressMeter();
  img.src = url; // never happens
 
  Since the URL is never actually used, the blob reference leaks.  You'd
 have
  to work around this with careful exception handling, which is precisely
 the
  sort of thing oneTimeOnly is supposed to avoid.  With this proposal, the
 URL
  would always be revoked when the script returns to the event loop,
 whether
  or not it was actually used.
 
  This would still require work in each URL-consuming spec, to define
 taking a
  reference to the underlying blob's data when it receives an object URL.
  I
  think this is inherent to the feature.
 
  oneTimeOnly would be the wrong name with this approach; it should be
  something like autoRelease.
 
  This has one drawback: it doesn't work nicely in long-running Workers,
 which
  may never return to the event loop at all.  I think that's probably an
  acceptable tradeoff.

 This is an interesting idea for sure. It doesn't solve any of the
 issues I brought up, so we still need to define when dereferencing
 happens. But it does solve the problem of the URL leaking if it never
 gets dereferenced, which is nice.


I've never been terribly happy with createObjectURL and the requirement for
folks to remember to call revokeObjectURL.  I really like that we're talking
about ways to minimize this pain :-)

I noticed the WeakRefs proposal:
http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs

It also makes use of the micro-task concept, and it does so to avoid
revealing
details about garbage collection.

What if we were to adopt a similar approach here.  Instead of introducing a
second parameter to createObjectURL, what if createObjectURL returned a
WeakObjectURL object instead of a String object?  WeakObjectURL could
be converted to a String to reveal the Blob URL.

Suppose WeakObjectURL if retained would keep the Blob URL valid.  Else,
when WeakObjectURL gets deleted, its Blob URL would remain alive up
until the next micro-task.

Crazy idea?  Too crazy?

I agree that it is valuable to define dereference points for APIs that
receive
Blob URLs.

-Darin


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 29.3.2012 9:29, Darin Fisher wrote:
I've never been terribly happy with createObjectURL and the 
requirement for
folks to remember to call revokeObjectURL.  I really like that we're 
talking

about ways to minimize this pain :-)

I noticed the WeakRefs proposal:
http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs

It also makes use of the micro-task concept, and it does so to avoid 
revealing

details about garbage collection.

What if we were to adopt a similar approach here.  Instead of 
introducing a

second parameter to createObjectURL, what if createObjectURL returned a
WeakObjectURL object instead of a String object?  WeakObjectURL could
be converted to a String to reveal the Blob URL.

Suppose WeakObjectURL if retained would keep the Blob URL valid.  Else,
when WeakObjectURL gets deleted, its Blob URL would remain alive up
until the next micro-task.

Crazy idea?  Too crazy?

I agree that it is valuable to define dereference points for APIs 
that receive

Blob URLs.

-Darin

So the WeakObjectURL would exists as long as Blob exists?

//so this function will automatically GC Blob, URL object and the url 
string itself would be reference to 'nothing' (since blob does not 
exists anymore)?

function fileInputOnChange()
{
   var blob = this.files[0];
   var url = blob.getWeakObjectURL();
   document.getElementById('firstImage').src = url;
   this.parentNode.removeChild(this);
}

//so this function will not automatically GC Blob nor URL object and the 
url string will be still pointing to that Blob?

var blob  = MyGlobalFiles[0];
loadBlob(blob)
function loadBlob(blob)
{
   var url = blob.getWeakObjectURL();
   document.getElementById('firstImage').src = url;
}

//and when I remove any reference to that blob, all will be GC and 
invalidated?

MyGlobalFiles.shift();


that sounds interesting and probably like one of the best idea in this 
long discussion



Brona




Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Charles Pritchard
Any feedback on what exists in modern implementations? MS seems to have the 
most hard-line stance when talking about this API.

When it comes to it, we ought to look at what happened in the latest harvest. 
IE10, O12, C19, and so forth.



On Mar 28, 2012, at 6:12 PM, Glenn Maynard gl...@zewt.org wrote:

 On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking jo...@sicking.cc wrote:
  This would still require work in each URL-consuming spec, to define taking a
  reference to the underlying blob's data when it receives an object URL.  I
  think this is inherent to the feature.
 
 This is an interesting idea for sure. It doesn't solve any of the
 issues I brought up, so we still need to define when dereferencing
 happens. But it does solve the problem of the URL leaking if it never
 gets dereferenced, which is nice.
 
 Right, that's what I meant above.  The dereferencing step needs to be 
 defined no matter what you do.  This just makes it easier to define 
 (eliminating task ordering problems as a source of problems).
 
 Also, I still think that all APIs should consistently do that as soon as it 
 first sees the URL.  For example, XHR should do it in open(), not in send().  
 That's makes it easy for developers to understand when the dereferencing 
 actually happens (in the general case, for all APIs).
 
 One other thing: dereferencing should take a reference to the underlying 
 data of the Blob, not the Blob itself, so it's unaffected by neutering 
 (transfers and Blob.close).  That avoids a whole category of problems.
 
 -- 
 Glenn Maynard
 


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Robert O'Callahan
On Thu, Mar 29, 2012 at 12:36 PM, Glenn Maynard gl...@zewt.org wrote:

 oneTimeOnly (a poor name in this proposal) would simply queue a microtask
 to revoke the URL.

 This is simpler, and answers a lot of questions.  It means you can use the
 URL as many times as you want synchronously, since it's not released until
 the script returns.  Any cases where the ordering may not be strictly
 defined (eg. the videovideo case in
 http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.htmlmay be 
 like this; I don't know how innerHTML works, exactly) are now
 defined: both video elements would get the object.


That sounds like a pretty good idea.

It might be a bit better to revoke the URL at the next stable state.
Microtask checkpoints can happen during synchronous script execution.

Rob
-- 
“You have heard that it was said, ‘Love your neighbor and hate your enemy.’
But I tell you, love your enemies and pray for those who persecute you,
that you may be children of your Father in heaven. ... If you love those
who love you, what reward will you get? Are not even the tax collectors
doing that? And if you greet only your own people, what are you doing more
than others? [Matthew 5:43-47]


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Glenn Maynard
On Thu, Mar 29, 2012 at 2:29 AM, Darin Fisher da...@chromium.org wrote:

 I've never been terribly happy with createObjectURL and the requirement for
 folks to remember to call revokeObjectURL.  I really like that we're
 talking
 about ways to minimize this pain :-)

 I noticed the WeakRefs proposal:
 http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs


This exposes GC behavior, though.  They try to reduce that by making it
only collectable during the event loop, but it's still observable.  For
example,

blob = createObjectURL();
blob = null;
setTimeout(function() { img.src = blob; }, 0);
return;

might or might not succeed.

(IIRC, WeakMaps avoid GC exposure by being non-enumerable.  That seems to
make them not very useful, since it's hardly different from simply
assigning properties to the object.)

On Thu, Mar 29, 2012 at 6:08 AM, Robert O'Callahan rob...@ocallahan.orgwrote:

 It might be a bit better to revoke the URL at the next stable state.
 Microtask checkpoints can happen during synchronous script execution.


I'm confused.  I'd never seen microtasks actually defined (only suggested),
but I see it's just not defined in the page I was viewing.  It looks like
Google's returning out of date spec links (
http://www.whatwg.org/specs/web-apps/current-work/.w3c-html-core/webappapis.html#processing-model-2).
 This is the second time I've been bit recently by landing on a weird,
out-of-date spec URL (ignoring the /TR/ trap, which I know to watch out
for)...

So, microtask isn't what we need here (and not what IndexedDB needs,
either).  Stable state might be correct.  I think that has another effect,
which you don't get by waiting for the event loop: blob URLs would also be
freed between script execution.  That is, in

scripturl = URL.createObjectURL(blob); .../scriptscript.../script

the blob is released before the second script is run.  That seems like a
plus.

getObjectURL(blob, {auto: true}) would simply do:

N+1. Return the generated url, and then continue running this algorithm
asynchronously.
N+2. Await a stable state.
N+3. Revoke url.

which doesn't require any new concepts.

I'd suggest auto as the option name; it's short, since it'd probably be
used a lot.

On Thu, Mar 29, 2012 at 3:03 AM, Charles Pritchard ch...@jumis.com wrote:

 Any feedback on what exists in modern implementations? MS seems to have
 the most hard-line stance when talking about this API.


As far as I could tell, MS implemented something behind closed doors,
presented it whole, and then more or less refused to change anything,
despite the serious issues pointed out in it.  Web API development can't
work that way in 2012.

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 30.3.2012 0:19, Glenn Maynard wrote:
On Thu, Mar 29, 2012 at 2:29 AM, Darin Fisher da...@chromium.org 
mailto:da...@chromium.org wrote:


I've never been terribly happy with createObjectURL and the
requirement for
folks to remember to call revokeObjectURL.  I really like that
we're talking
about ways to minimize this pain :-)

I noticed the WeakRefs proposal:
http://wiki.ecmascript.org/doku.php?id=strawman:weak_refs


This exposes GC behavior, though.  They try to reduce that by making 
it only collectable during the event loop, but it's still observable.  
For example,


blob = createObjectURL();
blob = null;
setTimeout(function() { img.src = blob; }, 0);
return;

might or might not succeed.

(IIRC, WeakMaps avoid GC exposure by being non-enumerable.  That seems 
to make them not very useful, since it's hardly different from simply 
assigning properties to the object.)


On Thu, Mar 29, 2012 at 6:08 AM, Robert O'Callahan 
rob...@ocallahan.org mailto:rob...@ocallahan.org wrote:


It might be a bit better to revoke the URL at the next stable
state. Microtask checkpoints can happen during synchronous script
execution.


I'm confused.  I'd never seen microtasks actually defined (only 
suggested), but I see it's just not defined in the page I was viewing. 
 It looks like Google's returning out of date spec links 
(http://www.whatwg.org/specs/web-apps/current-work/.w3c-html-core/webappapis.html#processing-model-2). 
 This is the second time I've been bit recently by landing on a weird, 
out-of-date spec URL (ignoring the /TR/ trap, which I know to watch 
out for)...


So, microtask isn't what we need here (and not what IndexedDB needs, 
either).  Stable state might be correct.  I think that has another 
effect, which you don't get by waiting for the event loop: blob URLs 
would also be freed between script execution.  That is, in


scripturl = URL.createObjectURL(blob); .../scriptscript.../script

the blob is released before the second script is run.  That seems 
like a plus.


getObjectURL(blob, {auto: true}) would simply do:

N+1. Return the generated url, and then continue running this 
algorithm asynchronously.

N+2. Await a stable state.
N+3. Revoke url.

which doesn't require any new concepts.

I'd suggest auto as the option name; it's short, since it'd probably 
be used a lot.


On Thu, Mar 29, 2012 at 3:03 AM, Charles Pritchard ch...@jumis.com 
mailto:ch...@jumis.com wrote:


Any feedback on what exists in modern implementations? MS seems to
have the most hard-line stance when talking about this API.


As far as I could tell, MS implemented something behind closed doors, 
presented it whole, and then more or less refused to change anything, 
despite the serious issues pointed out in it.  Web API development 
can't work that way in 2012.


--
Glenn Maynard



Sure, weak referencing is probably not well explored approach, but the 
underlying idea applied to blob is interesting: URL creates no reference 
to Blob (from GC perspective), meaning Blob is subjected to GC 
regardless of BlobUrl existence. This would remove the need for revoking 
URL, programmers would only need to maintain blobs they want to be 
persistent (e.g. in some global array).
This seem to solve nothing, because there is still some 
revoking/releasing variable, but the approach is reverse (explicit 
keeping of reference, instead of explicit releasing), which could 
seriously limit the I forgot to release this cases. And what you are 
working with is the actual Blob, not some String thing.


Brona



Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Glenn Maynard
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com

 Sure, weak referencing is probably not well explored approach, but the
 underlying idea applied to blob is interesting: URL creates no reference to
 Blob (from GC perspective), meaning Blob is subjected to GC regardless of
 BlobUrl existence. This would remove the need for revoking URL, programmers
 would only need to maintain blobs they want to be persistent (e.g. in some
 global array).


Weak referencing is pretty well explored, I think.  It's intentionally not
supported for the most part in JavaScript, because most weakref features
expose garbage collection behavior to scripts.  Web APIs don't do that.

This approach exposes GC behavior, making it possible to write code that
behaves differently depending on GC.

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 30.3.2012 5:40, Glenn Maynard wrote:
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com 
mailto:bronislav.klu...@bauglir.com


Sure, weak referencing is probably not well explored approach, but
the underlying idea applied to blob is interesting: URL creates no
reference to Blob (from GC perspective), meaning Blob is subjected
to GC regardless of BlobUrl existence. This would remove the need
for revoking URL, programmers would only need to maintain blobs
they want to be persistent (e.g. in some global array).


Weak referencing is pretty well explored, I think.  It's intentionally 
not supported for the most part in JavaScript, because most weakref 
features expose garbage collection behavior to scripts.  Web APIs 
don't do that.


This approach exposes GC behavior, making it possible to write code 
that behaves differently depending on GC.


--
Glenn Maynard



The point was not to talk about weak refs, but about not creating a GC 
reference from URL to Blob


Brona



Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Glenn Maynard
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com

 The point was not to talk about weak refs, but about not creating a GC
 reference from URL to Blob


If the lifetime of the URL is tied to the lifetime of the Blob, then that's
what a weak reference *is*.

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-29 Thread Bronislav Klučka



On 30.3.2012 5:54, Glenn Maynard wrote:
2012/3/29 Bronislav Klučka bronislav.klu...@bauglir.com 
mailto:bronislav.klu...@bauglir.com


The point was not to talk about weak refs, but about not creating
a GC reference from URL to Blob


If the lifetime of the URL is tied to the lifetime of the Blob, then 
that's what a weak reference *is*.


--
Glenn Maynard



If I understand you, you find it problematic that by using weak ref, URL 
would for some time reference actual Blob and other time it would not?


Brona Klucka



Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Jonas Sicking
On Tue, Mar 27, 2012 at 4:59 PM, Glenn Maynard gl...@zewt.org wrote:
 I didn't realize this was actually added to the spec:

 The optional options dictionary argument contains a key, oneTimeOnly that
 defaults to false. If set to true, then the first time the Blob URI is
 dereferenced, user agents MUST automatically revoke that Blob URI without
 needing a call to revokeObjectURL() on the Blob URI.

 What does dereferenced mean?  Where is it defined?  What happens if two
 XHR calls open() a blob URL one after the other (causing fetches to be
 queued for it in separate task queues, whose order of execution is
 undefined)?  What happens if two completely unrelated APIs queue tasks in
 different task queues (causing the same problem, but in a way that can't be
 worked around within any one spec)?

 This feature is dangerously weakly defined.  It should be removed from the
 spec until it can be defined properly (or at least marked not ready for
 implementations), or we may end up with interop failures that could be hard
 to fix later.

 Again, I'm pretty sure the sanest way to approach this feature is for any
 API supporting it to grab a reference to the underlying resource, and revoke
 the URL, as soon as the string enters that API (eg. xhr.open() is called, or
 img.src is assigned).  That ensures it's always deterministically--and
 synchronously--clear who will actually successfully receive the object,
 regardless of later complications like separate task queues across APIs.  It
 doesn't answer all questions (eg. the issues mentioned at
 http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.html),
 and the actual dereferencing action would need to be specified for every
 supported API (this would need work to make it easy to do), but it's a lot
 closer than what's in there now.

I think we need to define that APIs like xhr.open(...) and the img.src
setter synchronously dereference the URL before returning.

This is needed even if we didn't have oneTimeOnly for at least two reasons:

1.
var blob = getBlob();
var url = URL.createObjectURL(blob);
img.src = url;
URL.revokeObjectURL(url);

2.
var fileEntry = getFileEntry();
fileEntry.file(function(file) {
  fileEntry.createWriter(function(fileWriter) {
var url = URL.createObjectURL(file);
var xhr = new XMLHttpRequest();
xhr.open(GET, url);
xhr.send();
xhr.onload = ...;
fileWriter.write(new Blob([hello]));
  });
});


In the first example the blob-url is disabled synchronously after the
img.src is set. Unless it's defined when img.src dereferences the
blob-url, then it's undefined if the first example works.

In the second example the file object itself is disabled when the
fileWriter.write function is called. The blob-url which represents is
logically also disabled at the same time. If it's not defined when the
XHR object dereferences the blob-url then it's undefined whether the
second example works.

In fact, this problem isn't even blob-url specific. If you change the
second example to not use blob-urls, but rather read from 'file' using
a FileReader, you'll have exactly the same question of if starting to
read the Blob happens before the Blob is disabled, or after.


Generally speaking, in order to be able to precisely define when these
URLs or Blobs are dereferenced we likely need to define that that
happens synchronously from the various APIs that dereferences URLs and
Blobs. It so happens that dereferencing synchronously also is the most
useful behavior for authors.

Note that no actual IO needs to happen just because you dereference
the URL. So no synchronous IO is required.

We took a survey of the various points in the Gecko codebase to see if
we dereference URLs and Blobs synchronously or not. The only API we
found that didn't do so was the IndexedDB code for storing Blobs.


All of this will definitely be a lot of work to specify (and possibly
implement). But I don't see any other options to get interoperability
with Blobs and blob-URLs. It's definitely not a problem restricted to
oneTimeOnly.

/ Jonas



Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Anne van Kesteren

On Wed, 28 Mar 2012 08:19:55 +0100, Jonas Sicking jo...@sicking.cc wrote:

I think we need to define that APIs like xhr.open(...) and the img.src
setter synchronously dereference the URL before returning.


What does dereferencing mean exactly? xhr.open() resolves URLs currently  
and then xhr.send() will fetch the URL.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Jonas Sicking
On Wed, Mar 28, 2012 at 2:17 AM, Anne van Kesteren ann...@opera.com wrote:
 On Wed, 28 Mar 2012 08:19:55 +0100, Jonas Sicking jo...@sicking.cc wrote:

 I think we need to define that APIs like xhr.open(...) and the img.src
 setter synchronously dereference the URL before returning.

 What does dereferencing mean exactly?

It means initiating the load or some such. Implementation-wise for
blob URLs it would likely mean going through the blob-url hash table
to find the underlying Blob object and start a read from it. So if the
URL is removed from the hash using revokeObjectURL this wouldn't
affect the load. Likewise for the FileSystem API it would mean that a
read from the blob has started and so any writes need to be queued
until after the read is finished.

 xhr.open() resolves URLs currently and then xhr.send() will fetch the URL.

Yup, this is the stuff that needs to be defined. In Gecko we actually
dereference the URL in xhr.open which means that the caller can call
revokeObjectURL after the call to xhr.open but before xhr.send. But
it's certainly possible to change this so that the URL is dereferenced
in xhr.send instead.

/ Jonas



Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Anne van Kesteren

On Wed, 28 Mar 2012 10:51:59 +0200, Jonas Sicking jo...@sicking.cc wrote:
On Wed, Mar 28, 2012 at 2:17 AM, Anne van Kesteren ann...@opera.com  
wrote:

What does dereferencing mean exactly?


It means initiating the load or some such. Implementation-wise for
blob URLs it would likely mean going through the blob-url hash table
to find the underlying Blob object and start a read from it. So if the
URL is removed from the hash using revokeObjectURL this wouldn't
affect the load. Likewise for the FileSystem API it would mean that a
read from the blob has started and so any writes need to be queued
until after the read is finished.

xhr.open() resolves URLs currently and then xhr.send() will fetch the  
URL.


Yup, this is the stuff that needs to be defined. In Gecko we actually
dereference the URL in xhr.open which means that the caller can call
revokeObjectURL after the call to xhr.open but before xhr.send. But
it's certainly possible to change this so that the URL is dereferenced
in xhr.send instead.


Given that the start of the fetch algorithm  
http://www.whatwg.org/specs/web-apps/current-work/multipage/fetching-resources.html#fetch  
is synchronous maybe dereferencing can be defined as part of that. The  
change to XMLHttpRequest that would be needed then is to invoke fetch  
before returning from send(). Not sure how well that would work for other  
contexts such as img and 'background-image' though.



--
Anne van Kesteren
http://annevankesteren.nl/



Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Glenn Maynard
On Wed, Mar 28, 2012 at 2:19 AM, Jonas Sicking jo...@sicking.cc wrote:

 All of this will definitely be a lot of work to specify (and possibly
 implement). But I don't see any other options to get interoperability
 with Blobs and blob-URLs. It's definitely not a problem restricted to
 oneTimeOnly.


Those are separate problems.  Other uses of blob URLs (without oneTimeOnly)
don't have an undefined dereference concept to begin with; they just
access the URL directly.  They do have other problems, though.  To take the
first example:

var blob = getBlob();
var url = URL.createObjectURL(blob);
img.src = url;
URL.revokeObjectURL(url);

When you assign img.src, you cause update the image data to be invoked.
 That algorithm goes asynchronous in step 5; it then accesses img.src
asynchronously.  This means there's a race condition, depending on whether
the revokeObjectURL call happens before or after the asynchronous fetch.

The same changes needed to fix oneTimeOnly would probably fix most of these
sorts of problems too, though.

-- 
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Glenn Maynard
Here's another proposal, which is an iteration of the previous.  It's based
on the microtask concept, which is creeping up here and there but hasn't
yet been properly defined.  The idea is that microtasks can be queued (call
it queue a microtask), and the microtask queue is executed by the event
loop as soon as the current task completes, so it executes as soon as the
outermost task returns to the event loop.

oneTimeOnly (a poor name in this proposal) would simply queue a microtask
to revoke the URL.

This is simpler, and answers a lot of questions.  It means you can use the
URL as many times as you want synchronously, since it's not released until
the script returns.  Any cases where the ordering may not be strictly
defined (eg. the videovideo case in
http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.html may
be like this; I don't know how innerHTML works, exactly) are now defined:
both video elements would get the object.

It has another nice side-effect: it's much less prone to leaks.  For
example, under previous approaches, the following code would leak the blob:

function updateProgressMeter() { throw obscure error; }
url = URL.createObjectURL(blob, {oneTimeOnly: true});
updateProgressMeter();
img.src = url; // never happens

Since the URL is never actually used, the blob reference leaks.  You'd have
to work around this with careful exception handling, which is precisely the
sort of thing oneTimeOnly is supposed to avoid.  With this proposal, the
URL would always be revoked when the script returns to the event loop,
whether or not it was actually used.

This would still require work in each URL-consuming spec, to define taking
a reference to the underlying blob's data when it receives an object URL.
 I think this is inherent to the feature.

oneTimeOnly would be the wrong name with this approach; it should be
something like autoRelease.

This has one drawback: it doesn't work nicely in long-running Workers,
which may never return to the event loop at all.  I think that's probably
an acceptable tradeoff.

--
Glenn Maynard


Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Jonas Sicking
On Wed, Mar 28, 2012 at 4:36 PM, Glenn Maynard gl...@zewt.org wrote:
 Here's another proposal, which is an iteration of the previous.  It's based
 on the microtask concept, which is creeping up here and there but hasn't
 yet been properly defined.  The idea is that microtasks can be queued (call
 it queue a microtask), and the microtask queue is executed by the event
 loop as soon as the current task completes, so it executes as soon as the
 outermost task returns to the event loop.

 oneTimeOnly (a poor name in this proposal) would simply queue a microtask to
 revoke the URL.

 This is simpler, and answers a lot of questions.  It means you can use the
 URL as many times as you want synchronously, since it's not released until
 the script returns.  Any cases where the ordering may not be strictly
 defined (eg. the videovideo case in
 http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/1265.html may
 be like this; I don't know how innerHTML works, exactly) are now defined:
 both video elements would get the object.

 It has another nice side-effect: it's much less prone to leaks.  For
 example, under previous approaches, the following code would leak the blob:

 function updateProgressMeter() { throw obscure error; }
 url = URL.createObjectURL(blob, {oneTimeOnly: true});
 updateProgressMeter();
 img.src = url; // never happens

 Since the URL is never actually used, the blob reference leaks.  You'd have
 to work around this with careful exception handling, which is precisely the
 sort of thing oneTimeOnly is supposed to avoid.  With this proposal, the URL
 would always be revoked when the script returns to the event loop, whether
 or not it was actually used.

 This would still require work in each URL-consuming spec, to define taking a
 reference to the underlying blob's data when it receives an object URL.  I
 think this is inherent to the feature.

 oneTimeOnly would be the wrong name with this approach; it should be
 something like autoRelease.

 This has one drawback: it doesn't work nicely in long-running Workers, which
 may never return to the event loop at all.  I think that's probably an
 acceptable tradeoff.

This is an interesting idea for sure. It doesn't solve any of the
issues I brought up, so we still need to define when dereferencing
happens. But it does solve the problem of the URL leaking if it never
gets dereferenced, which is nice.

/ Jonas



Re: File API oneTimeOnly is too poorly defined

2012-03-28 Thread Glenn Maynard
On Wed, Mar 28, 2012 at 7:49 PM, Jonas Sicking jo...@sicking.cc wrote:

  This would still require work in each URL-consuming spec, to define
 taking a
  reference to the underlying blob's data when it receives an object URL.
  I
  think this is inherent to the feature.

 This is an interesting idea for sure. It doesn't solve any of the
 issues I brought up, so we still need to define when dereferencing
 happens. But it does solve the problem of the URL leaking if it never
 gets dereferenced, which is nice.


Right, that's what I meant above.  The dereferencing step needs to be
defined no matter what you do.  This just makes it easier to define
(eliminating task ordering problems as a source of problems).

Also, I still think that all APIs should consistently do that as soon as it
first sees the URL.  For example, XHR should do it in open(), not in
send().  That's makes it easy for developers to understand when the
dereferencing actually happens (in the general case, for all APIs).

One other thing: dereferencing should take a reference to the underlying
data of the Blob, not the Blob itself, so it's unaffected by neutering
(transfers and Blob.close).  That avoids a whole category of problems.

-- 
Glenn Maynard