Re: File API to separate reading from files

Arun Ranganathan Tue, 22 Sep 2009 22:11:33 -0700

Arun wrote:

There is lots that is attractive about InputStream, and I think thatit can be used in other specifications, especially when discussingCamera APIs, streaming from web apps (conferencing) etc. I also likethe idea of DataHandler. When we define a byte primitive, it can beused in conjunction with the stream interface. For additional readfeatures (fseek) this is also useful. I also appreciate that youhave pointed out in a subsequent email [1] that it is possible to"sidestep the issue of dealing with bytes directly." Managing bytesproperly, with the right primitives, is one reason why, despitehaving looked at the Java I/O APIs[2], I went with somethingsimpler. I think that we should have streams at some point, and I'mamenable to looking at them in a subsequent iteration of the FileAPI. It's worth saying here that the appeal of streams is for*multiple use cases* for both File API and other APIs, and *not*because the Java I/O model is one we should emulate. Programmertaste and choice about coining APIs is subjective.

Nikunj wrote in response:

I respect your point on taste, however, I am more interested incomposability than the maturity of Java I/O.

Firstly, what Jonas proposed as the Alternative File API [1] uses anevent model to address use cases such as progress feedback andseparating reading from file objects. I expressed reservations aboutcomplexity, but saw more posts in favor of it than against it. Thismodel has advantages that come with an event model (separatenotifications like onprogress, onerror, allowing specific 'isolated'code, etc) along with a signature similarity to XHR (which developersare familiar with). My caveats about the model were mainly aboutunderstanding trade-offs. I'm reconciled to having a v1 of the File APIspecification based on Jonas' proposal (hopefully in good shape by theupcoming TPAC), and I believe we can iterate from there.

It would be useful to see how you meet the following requirements:

1. incremental reading of a file's data

The proposal [1] reuses the FileData interface, which will still supporta slice(offset, length) method that returns another FileData objectwithin stipulated byte ranges. I hope to flesh out what happens underrange mathematics errors a bit more clearly (e.g. whether an exceptionis raised). Along with progress events, I think this use case is addressed.


2. concurrent access to file data

(Note that "FileRequest" and "FileReader" are used interchangably in[1]; I personally prefer FileReader as a name). Nothing precludesmultiple FileReader objects from accessing the same file, but not allimplementations need fire notifications (events) concurrently. Do youhave a specific use case in mind?

3. access to all file metadata without needing to read the file

(Note that in FileRequest, which I think should be named FileReader, theread* methods take File objects as parameters, although the emailproposal [1] says that they take FileData objects. Jonas means Fileobjects).

The answer to your question depends on what you mean by *all* filemetadata.File objects (which inherit from FileData objects) expose name andmediaType properties, along with size (from FileData). But, suppose youwanted ID3 information from an MP3 file. In this case (assuming ID3v1usage), you would *have* to read the file, and look for the 128 bytechunk beginning with TAG. This can be done in two ways:

i. Using splice() and range mathematics based on the file's size to getto the end of the file and look at the last 128 bits of it as a separateFileData object (since ID3v1 puts stuff at the end). Not ideal.ii. Using read methods and working with the file format. Again, notdripping with syntactic sugar, but certainly feasible.

I agree that metadata extraction could be made better, but I think thatI'm happy with what the existing proposal has. I also don't see how anyother proposal improves on this, even if you read into a stream buffer.

I am happy with the existing metadata extraction for a v1, and believethat as we work out more audio and video issues on the platform, we canget to specific metadata issues. Can you clear up what you mean by "allmetadata?"

4. separation of error handling from file reading

In Jonas' proposal, this isn't done cleanly (for some definition of"clean" as separate from the reader object), but I think what *is* doneis good for the majority of use cases. In Jonas' proposal, theFileReader object (named "FileRequest" in the email [1]) allows separateonerror handling (along with onprogress being separate, etc.). It's notdone *within* a read method (unlike the existing proposal, which doesthis less well than Jonas' proposal), and the callback that handles theevent can deal with the response.


This is as separate as is done with XHR.

All things being equal, I would prefer a model that, in order ofpriority:
1. involves fewer steps, and

Me too! But, *both* your model and Jonas' model don't involve fewersteps than the original proposal :) Jonas' model adds necessarycomplexity for the major use case (onprogress) and for an event model.

2. evolves nicely with file write and binary access, which are bothlikely to be next evolution directions in this area.

Agreed, but again, much of what you mean by "evolves nicely" is aquestion of programmer taste. For instance, I think that readAsBinarycan be introduced on the FileReader object, in addition toreadAsBinaryString. Furthermore, I maintain that your streams proposalcan evolve later, and doesn't prevent us from proceeding with thealternative File API proposal as what is in the draft [1].

Can you provide a comparison of your proposed approach with myproposal for the above so that the WG can develop an informed opinionabout the proposals?

I *think* I've done this in answering the questions above.

For a first version (which should replacehttp://www.w3.org/TR/file-upload/ , with a more meaningful name like"File API"), I think we should address use cases around reads. IanFette has given us plenty of other uses cases for consideration lateron[3]. While my editor's draft strove to address the use cases forfile access with different asynchronous data accessors, it was clearthat it couldn't gracefully account for progress events. Moreover,general feedback favored a model that used events with a separatereader object that allowed for progress events, and Jonas'alternative proposal does this as well as resembles XHR [4]. WhileI'm reluctant to sacrifice simplicity, I think moving in thedirection of the "Alternative File API"[4] reconciles use cases suchas progress events with calls for a reader/event model. FWIW, Idisagree that resemblance to XHR should be seen as "unwanted baggage"[5]. I think it's desirable to resemble an API that has suchwidespread usage!
This is arguable at best, since it seems to be an opinion not sharedby everyone, especially not the editor of XMLHttpRequest [1].

There are two things here that you may be confusing! Anne (the editorof the XHR2 draft) expressed support for a model based on events [2].What he is against is "abusing XHR" by using the URL attribute of a Fileobject as part of request [3]. I disagree with his stance on this, butthat is a bridge that we'll cross later, after we sort out details ofthe FileData URL.

In fact, there is no similarity to XHR in the current editor's draft,and I wonder why those benefits were considered unimportant whendrafting previously.

Note: the "benefits" I considered important centered on simplicity. Butothers have argued in favor of a more robust model that gives usprogress events that is not simply another callback on the existingproposal [4]. I expressed my support for simplicity [4] but also mywillingness to draft a spec. based on the alterative API. So far, onlyyou are arguing *against* it, but I don't believe that the alternativeapproach blocks consideration of a stream-based approach later on.

While the web is inconsistent, event models are widely used, andsimilarity between XHR and File API, which will be used inconjunction anyway, is probably a good thing.
Can you explain in light of the objections I raised in [2], why the"Alternative File API" is the right approach. I haven't seen anyreplies to my points.

I'm happy to provide more details on anything I've answered here.

-- A*

[1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0565.html
[2] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0485.html
[3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0571.html
[4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0576.html

Re: File API to separate reading from files

Reply via email to