Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-07-07 Thread Jonas Sicking
On Tue, Jun 21, 2011 at 10:17 AM, Arun Ranganathan  wrote:
> Sorry if these have all been discussed before. I just read the File API for
> the first time and 2 random questions popped in my head.
> 1) If I'm using readAsText with a particular encoding and the data in the
> file is not actually in that encoding such that code points in the file can
> not be mapped to valid code points what happens? Is that implementation
> specific or is it specified? I can imagine at least 3 different behaviors.
>
> This should be specified better and isn't.  I'm inclined to then return the
> file in the encoding it is in rather than force an encoding (in other words,
> ignore the encoding parameter if it is determined that code points can't be
> mapped to valid code points in the encoding... also note that we say to
> "Replace bytes or sequences of bytes that are not valid according to
> the charset with a single U+FFFD character [Unicode]").  Right now, the spec
> isn't specific to this scenario ("... if the user agent cannot decode blob
> using encoding, then let charset be null" before the algorithmic steps,
> which essentially forces UTF-8).

I definitely don't think we should use some type of autodetecting of
charset if people explicitly define one. That is likely to create more
confusion and bugs than it'll solve problems.

I don't fully understand what's undefined if we say that any invalid
character should be replaced by U+FFFD? I.e. why isn't that enough?
I'm not at all doubting that it isn't enough, but I'd like to
understand how it's not enough in order to fix it.

> 2) If I'm reading using readAsText a multibyte encoding (utf-8, shift-jis,
> etc..) is it implementation dependent whether or not it can return partial
> characters when returning partial results during reading? In other words,
>  Let's say the next character in a file is a 3 byte code point but the
> reader has only read 2 of those 3 bytes so far. Is implementation dependent
> whether result includes those 2 bytes before reading the 3rd byte or not?
>
> Yes, partial results are currently implementation dependent; the spec. only
> says they SHOULD be returned.  There was reluctance to have MUST condition
> on partial file reads.  I'm open to revisiting this decision if the
> justification is a really good one.

I absolutely don't think we should return partial results. From the
page authors point of view .result should "stream" in. Once a
character has been appended to it, it should never change.

/ Jonas



Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-07-06 Thread Arun Ranganathan

On 6/30/11 6:01 PM, Gregg Tavares (wrk) wrote:



On Tue, Jun 21, 2011 at 10:17 AM, Arun Ranganathan > wrote:



Sorry if these have all been discussed before. I just read the
File API for the first time and 2 random questions popped in my
head.

1) If I'm using readAsText with a particular encoding and the
data in the file is not actually in that encoding such that code
points in the file can not be mapped to valid code points what
happens? Is that implementation specific or is it specified? I
can imagine at least 3 different behaviors.


This should be specified better and isn't.  I'm inclined to then
return the file in the encoding it is in rather than force an
encoding (in other words, ignore the encoding parameter if it is
determined that code points can't be mapped to valid code points
in the encoding... also note that we say to "Replace bytes or
sequences of bytes that are not valid according to thecharsetwith
a single U+FFFD character [Unicode
]").  Right now,
the spec isn't specific to this scenario ("... if the user agent
cannot decode blob using encoding, then let charset be null"
before the algorithmic steps, which essentially forces UTF-8).

Can we list your three behaviors here, just so we get them on
record?  Which behavior do you think is ideal?  More importantly,
is substituting U+FFFD and "defaulting" to UTF-8 good enough for
your scenario above?


The 3 off the top of my head were

1) Throw an exception. (content not valid for encoding)
2) Remap bad codes to some other value (sounds like that's the one above)
3) Remove the bad character

I see you've listed a 4th, "Ignore the encoding on error, assume 
utf-8". That one seems problematic because of partial reads. If you 
are decoding as shift-jis, have returned a partial read, and then 
later hit a bad code point, the stuff you've seen previously will all 
need to change by switching to no encoding.


I'd chose #2 which it sounds like is already the case according the spec.


This is the case in the spec. currently, but:


Regardless of what solution is chosen is there a way for me to know 
something was lost?




I don't think so, actually. And I'm not entirely sure how we can allow 
for such a way, unless we throw an error or something.




2) If I'm reading using readAsText a multibyte encoding (utf-8,
shift-jis, etc..) is it implementation dependent whether or not
it can return partial characters when returning partial results
during reading? In other words,  Let's say the next character in
a file is a 3 byte code point but the reader has only read 2 of
those 3 bytes so far. Is implementation dependent whether result
includes those 2 bytes before reading the 3rd byte or not?



Yes, partial results are currently implementation dependent; the
spec. only says they SHOULD be returned.  There was reluctance to
have MUST condition on partial file reads.  I'm open to revisiting
this decision if the justification is a really good one.


I'm assuming by "MUST condition" you mean a UA doesn't have to support 
partial reads at all, not that how partial reads work shouldn't be 
specified.


Here's an example.

Assume we stick with unknown characters get mapped to U+FFFD.
Assume my stream is utf8 and in hex the bytes are.

E3 83 91 E3 83 91

That's 2 code points of 0x30D1. Now assume the reader has only read 
the first 5 bytes.


Should the partial results be

(a) filereader.result.length == 1 where the content is 0x30D1

 or should the partial result be

(b) filereader.result.length == 2 where the content is 0x30D1, 0xFFFD 
 because at that point the E3 83 at the end of the partial result is 
not a valid codepoint


I think the spec should specify that if the UA supports partial reads 
it should follow example (a)


OK.  I think the spec. needs more bolstering here.  Thanks for your 
example.  This makes it clearer.


-- A*


Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-30 Thread Gregg Tavares (wrk)
On Tue, Jun 21, 2011 at 10:17 AM, Arun Ranganathan  wrote:

> **
>
> Sorry if these have all been discussed before. I just read the File API for
> the first time and 2 random questions popped in my head.
>
>  1) If I'm using readAsText with a particular encoding and the data in the
> file is not actually in that encoding such that code points in the file can
> not be mapped to valid code points what happens? Is that implementation
> specific or is it specified? I can imagine at least 3 different behaviors.
>
>
> This should be specified better and isn't.  I'm inclined to then return the
> file in the encoding it is in rather than force an encoding (in other words,
> ignore the encoding parameter if it is determined that code points can't be
> mapped to valid code points in the encoding... also note that we say to 
> "Replace
> bytes or sequences of bytes that are not valid according to the charset with
> a single U+FFFD character 
> [Unicode
> ]").  Right now, the spec isn't specific to this scenario ("... if the
> user agent cannot decode blob using encoding, then let charset be null"
> before the algorithmic steps, which essentially forces UTF-8).
>
> Can we list your three behaviors here, just so we get them on record?
>  Which behavior do you think is ideal?  More importantly, is substituting
> U+FFFD and "defaulting" to UTF-8 good enough for your scenario above?
>

The 3 off the top of my head were

1) Throw an exception. (content not valid for encoding)
2) Remap bad codes to some other value (sounds like that's the one above)
3) Remove the bad character

I see you've listed a 4th, "Ignore the encoding on error, assume utf-8".
That one seems problematic because of partial reads. If you are decoding as
shift-jis, have returned a partial read, and then later hit a bad code
point, the stuff you've seen previously will all need to change by switching
to no encoding.

I'd chose #2 which it sounds like is already the case according the spec.

Regardless of what solution is chosen is there a way for me to know
something was lost?


>
>
>
>  2) If I'm reading using readAsText a multibyte encoding (utf-8,
> shift-jis, etc..) is it implementation dependent whether or not it can
> return partial characters when returning partial results during reading? In
> other words,  Let's say the next character in a file is a 3 byte code point
> but the reader has only read 2 of those 3 bytes so far. Is implementation
> dependent whether result includes those 2 bytes before reading the 3rd byte
> or not?
>
>
> Yes, partial results are currently implementation dependent; the spec. only
> says they SHOULD be returned.  There was reluctance to have MUST condition
> on partial file reads.  I'm open to revisiting this decision if the
> justification is a really good one.
>

I'm assuming by "MUST condition" you mean a UA doesn't have to support
partial reads at all, not that how partial reads work shouldn't be
specified.

Here's an example.

Assume we stick with unknown characters get mapped to U+FFFD.
Assume my stream is utf8 and in hex the bytes are.

E3 83 91 E3 83 91

That's 2 code points of 0x30D1. Now assume the reader has only read the
first 5 bytes.

Should the partial results be

(a) filereader.result.length == 1 where the content is 0x30D1

 or should the partial result be

(b) filereader.result.length == 2 where the content is 0x30D1, 0xFFFD
 because at that point the E3 83 at the end of the partial result is not a
valid codepoint

I think the spec should specify that if the UA supports partial reads it
should follow example (a)




>
> -- A*
>


Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-21 Thread Arun Ranganathan
Sorry if these have all been discussed before. I just read the File 
API for the first time and 2 random questions popped in my head.


1) If I'm using readAsText with a particular encoding and the data in 
the file is not actually in that encoding such that code points in the 
file can not be mapped to valid code points what happens? Is that 
implementation specific or is it specified? I can imagine at least 3 
different behaviors.


This should be specified better and isn't.  I'm inclined to then return 
the file in the encoding it is in rather than force an encoding (in 
other words, ignore the encoding parameter if it is determined that code 
points can't be mapped to valid code points in the encoding... also note 
that we say to "Replace bytes or sequences of bytes that are not valid 
according to thecharsetwith a single U+FFFD character [Unicode 
]").  Right now, the 
spec isn't specific to this scenario ("... if the user agent cannot 
decode blob using encoding, then let charset be null" before the 
algorithmic steps, which essentially forces UTF-8).


Can we list your three behaviors here, just so we get them on record? 
 Which behavior do you think is ideal?  More importantly, is 
substituting U+FFFD and "defaulting" to UTF-8 good enough for your 
scenario above?




2) If I'm reading using readAsText a multibyte encoding (utf-8, 
shift-jis, etc..) is it implementation dependent whether or not it can 
return partial characters when returning partial results during 
reading? In other words,  Let's say the next character in a file is a 
3 byte code point but the reader has only read 2 of those 3 bytes so 
far. Is implementation dependent whether result includes those 2 bytes 
before reading the 3rd byte or not?




Yes, partial results are currently implementation dependent; the spec. 
only says they SHOULD be returned.  There was reluctance to have MUST 
condition on partial file reads.  I'm open to revisiting this decision 
if the justification is a really good one.


-- A*


Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-21 Thread Arun Ranganathan

On 6/7/11 1:43 PM, Jian Li wrote:

I have a couple questions regarding abort behavior.

* If the reading is completed and the loadend event has been
  fired, do we want to fire loadend event again when abort()
  method is called?



Right now, if reading is completed (with loadend fired, etc.), abort() 
is specified to *still* fire abort and loadend.  Do you disagree here?

result will be null.

   * Do we want to reset error to null or leave it intact when abort()
 method is called?

The step to fire an error event is eliminated.  Chrome seems to set an 
error to ABORT_ERR which the spec no longer says to do.  I'm inclined to 
leave ABORT_ERR around, but right now, it never really gets used.


-- A*


Thanks,

Jian

On Wed, May 11, 2011 at 3:49 PM, Arun Ranganathan > wrote:


The Editor's Draft of the FileAPI --
http://dev.w3.org/2006/webapi/FileAPI/ -- has had some updates.
 These are the notable changes:

1. Blob.slice behavior has changed to more closely match
String.prototype.slice from ECMAScript (and Array.prototype.slice
semantically).  I think we're the first host object to have a
slice outside of ECMAScript primitives; some builds of browsers
have already vendor-prefixed slice till it becomes more stable
(and till the new behavior becomes more diffuse on the web -- Blob
will soon be used in the Canvas API, etc.).  I'm optimistic this
will happen soon enough.  Thanks to all the browser projects that
helped initiate the change -- the consistency is desirable.

2. The read methods on FileReader raise a new exception --
OperationNotAllowedException -- if multiple concurrent reads are
invoked.  I talked this over with Jonas; we think that rather than
reuse DOMException error codes (like INVALID_STATE_ERR), these
kinds of scenarios should throw a distinct exception.  Some things
on the web (as in life) are simply not allowed.  It may be useful
to reuse this exception in other places.

3. FileReader.abort( ) behavior has changed.

4. There is a closer integration with event loops as defined by HTML.

For browser projects with open bug databases, I'll log some bugs
based on test cases I've run on each implementation.  A few
discrepancies exist in implementations I've tested; for instance,
setting FileReader.result to the empty string vs. setting it to
null, and when exceptions are thrown vs. use of the error event.

Feedback encouraged!  Draft at http://dev.w3.org/2006/webapi/FileAPI/

-- A*








Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-21 Thread Arun Ranganathan

On 6/7/11 5:04 PM, Jonas Sicking wrote:

On Tue, Jun 7, 2011 at 11:51 AM, Jian Li  wrote:


On Tue, Jun 7, 2011 at 11:23 AM, Jonas Sicking  wrote:

On Tue, Jun 7, 2011 at 10:43 AM, Jian Li  wrote:

I have a couple questions regarding abort behavior.

If the reading is completed and the loadend event has been fired, do we
want
to fire loadend event again when abort() method is called?

No


Do we want to reset error to null or leave it intact when abort() method
is
called?

If called after load/abort/error has fired the calling abort() should
just throw an exception and not alter the FileReader object in any
way.

Do you mean we should throw if abort() is called after load/abort/error has
been fired but before loadend event has been fired?

Yes.


If so, what kind of exception should we throw?

I need to get updated on the status on various exceptions, so I don't
have an opinion on this at this time.


The spec only mentions that "If readyState = DONE set result to null."

I actually disagree with that sentence.


So I'm not exactly sure we should throw here.  Why do you disagree?  
Have you revisited this opinion?  What should the behavior be?



-- A*


/ Jonas





Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-14 Thread Gregg Tavares (wrk)
Sorry if these have all been discussed before. I just read the File API for
the first time and 2 random questions popped in my head.

1) If I'm using readAsText with a particular encoding and the data in the
file is not actually in that encoding such that code points in the file can
not be mapped to valid code points what happens? Is that implementation
specific or is it specified? I can imagine at least 3 different behaviors.

2) If I'm reading using readAsText a multibyte encoding (utf-8, shift-jis,
etc..) is it implementation dependent whether or not it can return partial
characters when returning partial results during reading? In other words,
 Let's say the next character in a file is a 3 byte code point but the
reader has only read 2 of those 3 bytes so far. Is implementation dependent
whether result includes those 2 bytes before reading the 3rd byte or not?


Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-08 Thread Jonas Sicking
On Wed, Jun 8, 2011 at 8:16 AM, Robin Berjon  wrote:
> On May 12, 2011, at 00:49 , Arun Ranganathan wrote:
>> 2. The read methods on FileReader raise a new exception -- 
>> OperationNotAllowedException -- if multiple concurrent reads are invoked.  I 
>> talked this over with Jonas; we think that rather than reuse DOMException 
>> error codes (like INVALID_STATE_ERR), these kinds of scenarios should throw 
>> a distinct exception.  Some things on the web (as in life) are simply not 
>> allowed.  It may be useful to reuse this exception in other places.
>
> I don't have a strong opinion on the ISSUE-182 side of this, but if we're 
> going to create a new exception type that is expected to be reused by other 
> specs can we at least learn from the past and not use numerical codes to 
> identify different variants of the same exception (I'm presuming that other 
> specs reusing this could want to be more precise about why the operation is 
> not allowed, e.g. "user said no", or "Gandalf doesn't want you to pass")? 
> Reuse of DOMException was a long and at times painful coordination effort to 
> make sure that people didn't use the same codes in their own extensions.

Yes. These should definitely not have a .code property unless needed
for backwards compatibility (which shouldn't be the case for
exceptions thrown by FileAPI, unless we want to share them with
DOM-Core methods).

/ Jonas



Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-08 Thread Robin Berjon
On May 12, 2011, at 00:49 , Arun Ranganathan wrote:
> 2. The read methods on FileReader raise a new exception -- 
> OperationNotAllowedException -- if multiple concurrent reads are invoked.  I 
> talked this over with Jonas; we think that rather than reuse DOMException 
> error codes (like INVALID_STATE_ERR), these kinds of scenarios should throw a 
> distinct exception.  Some things on the web (as in life) are simply not 
> allowed.  It may be useful to reuse this exception in other places.

I don't have a strong opinion on the ISSUE-182 side of this, but if we're going 
to create a new exception type that is expected to be reused by other specs can 
we at least learn from the past and not use numerical codes to identify 
different variants of the same exception (I'm presuming that other specs 
reusing this could want to be more precise about why the operation is not 
allowed, e.g. "user said no", or "Gandalf doesn't want you to pass")? Reuse of 
DOMException was a long and at times painful coordination effort to make sure 
that people didn't use the same codes in their own extensions.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon




Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-07 Thread Jonas Sicking
On Tue, Jun 7, 2011 at 11:51 AM, Jian Li  wrote:
>
>
> On Tue, Jun 7, 2011 at 11:23 AM, Jonas Sicking  wrote:
>>
>> On Tue, Jun 7, 2011 at 10:43 AM, Jian Li  wrote:
>> > I have a couple questions regarding abort behavior.
>> >
>> > If the reading is completed and the loadend event has been fired, do we
>> > want
>> > to fire loadend event again when abort() method is called?
>>
>> No
>>
>> > Do we want to reset error to null or leave it intact when abort() method
>> > is
>> > called?
>>
>> If called after load/abort/error has fired the calling abort() should
>> just throw an exception and not alter the FileReader object in any
>> way.
>
> Do you mean we should throw if abort() is called after load/abort/error has
> been fired but before loadend event has been fired?

Yes.

> If so, what kind of exception should we throw?

I need to get updated on the status on various exceptions, so I don't
have an opinion on this at this time.

> The spec only mentions that "If readyState = DONE set result to null."

I actually disagree with that sentence.

/ Jonas



Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-07 Thread Jian Li
On Tue, Jun 7, 2011 at 11:23 AM, Jonas Sicking  wrote:

> On Tue, Jun 7, 2011 at 10:43 AM, Jian Li  wrote:
> > I have a couple questions regarding abort behavior.
> >
> > If the reading is completed and the loadend event has been fired, do we
> want
> > to fire loadend event again when abort() method is called?
>
> No
>
> > Do we want to reset error to null or leave it intact when abort() method
> is
> > called?
>
> If called after load/abort/error has fired the calling abort() should
> just throw an exception and not alter the FileReader object in any
> way.
>

Do you mean we should throw if abort() is called after load/abort/error has
been fired but before loadend event has been fired? If so, what kind of
exception should we throw?

The spec only mentions that "If readyState = DONE set result to null."

>
> / Jonas
>


Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-07 Thread Jonas Sicking
On Tue, Jun 7, 2011 at 10:43 AM, Jian Li  wrote:
> I have a couple questions regarding abort behavior.
>
> If the reading is completed and the loadend event has been fired, do we want
> to fire loadend event again when abort() method is called?

No

> Do we want to reset error to null or leave it intact when abort() method is
> called?

If called after load/abort/error has fired the calling abort() should
just throw an exception and not alter the FileReader object in any
way.

/ Jonas



Re: [FileAPI] Updates to FileAPI Editor's Draft

2011-06-07 Thread Jian Li
I have a couple questions regarding abort behavior.

   - If the reading is completed and the loadend event has been fired, do we
   want to fire loadend event again when abort() method is called?
   - Do we want to reset error to null or leave it intact when abort()
   method is called?

Thanks,

Jian

On Wed, May 11, 2011 at 3:49 PM, Arun Ranganathan  wrote:

> The Editor's Draft of the FileAPI -- http://dev.w3.org/2006/webapi/**
> FileAPI/  -- has had some updates.
>  These are the notable changes:
>
> 1. Blob.slice behavior has changed to more closely match
> String.prototype.slice from ECMAScript (and Array.prototype.slice
> semantically).  I think we're the first host object to have a slice outside
> of ECMAScript primitives; some builds of browsers have already
> vendor-prefixed slice till it becomes more stable (and till the new behavior
> becomes more diffuse on the web -- Blob will soon be used in the Canvas API,
> etc.).  I'm optimistic this will happen soon enough.  Thanks to all the
> browser projects that helped initiate the change -- the consistency is
> desirable.
>
> 2. The read methods on FileReader raise a new exception --
> OperationNotAllowedException -- if multiple concurrent reads are invoked.  I
> talked this over with Jonas; we think that rather than reuse DOMException
> error codes (like INVALID_STATE_ERR), these kinds of scenarios should throw
> a distinct exception.  Some things on the web (as in life) are simply not
> allowed.  It may be useful to reuse this exception in other places.
>
> 3. FileReader.abort( ) behavior has changed.
>
> 4. There is a closer integration with event loops as defined by HTML.
>
> For browser projects with open bug databases, I'll log some bugs based on
> test cases I've run on each implementation.  A few discrepancies exist in
> implementations I've tested; for instance, setting FileReader.result to the
> empty string vs. setting it to null, and when exceptions are thrown vs. use
> of the error event.
>
> Feedback encouraged!  Draft at 
> http://dev.w3.org/2006/webapi/**FileAPI/
>
> -- A*
>
>
>
>


[FileAPI] Updates to FileAPI Editor's Draft

2011-05-11 Thread Arun Ranganathan
The Editor's Draft of the FileAPI -- 
http://dev.w3.org/2006/webapi/FileAPI/ -- has had some updates.  These 
are the notable changes:


1. Blob.slice behavior has changed to more closely match 
String.prototype.slice from ECMAScript (and Array.prototype.slice 
semantically).  I think we're the first host object to have a slice 
outside of ECMAScript primitives; some builds of browsers have already 
vendor-prefixed slice till it becomes more stable (and till the new 
behavior becomes more diffuse on the web -- Blob will soon be used in 
the Canvas API, etc.).  I'm optimistic this will happen soon enough.  
Thanks to all the browser projects that helped initiate the change -- 
the consistency is desirable.


2. The read methods on FileReader raise a new exception -- 
OperationNotAllowedException -- if multiple concurrent reads are 
invoked.  I talked this over with Jonas; we think that rather than reuse 
DOMException error codes (like INVALID_STATE_ERR), these kinds of 
scenarios should throw a distinct exception.  Some things on the web (as 
in life) are simply not allowed.  It may be useful to reuse this 
exception in other places.


3. FileReader.abort( ) behavior has changed.

4. There is a closer integration with event loops as defined by HTML.

For browser projects with open bug databases, I'll log some bugs based 
on test cases I've run on each implementation.  A few discrepancies 
exist in implementations I've tested; for instance, setting 
FileReader.result to the empty string vs. setting it to null, and when 
exceptions are thrown vs. use of the error event.


Feedback encouraged!  Draft at http://dev.w3.org/2006/webapi/FileAPI/

-- A*