Re: [FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, Nov 17, 2009 at 9:12 PM, Julian Reschke julian.resc...@gmx.de wrote: 3. We could not directly call out a URI scheme at all. The benefit of doing this is we can specify *behavior* without actually getting into details about the actual identifier scheme used. But, the chief reason to not take this course of action is that leaving *anything* unspecified on the web platform leads to reverse engineering in ways that we can't envision currently. Developers may rely on quirks within one implementation and incompatibly use them with other implementations. Having to mimic another user agent's behavior has been a common outcome on the web, due to lack of specifications -- *many* examples of this exist on the web throughout its history. One lesson from the browser competition of the past is to avoid leaving things to guesswork. I'd like us to agree on something, and I'd like that agreement to be bolstered with concrete implementor feedback. ... Not requiring a specific scheme is not exactly the same thing as leaving it unspecified. If the *only* use of the identifier is within the API, the syntactical form of the identifier really doesn't matter as long as it's understood by the those parts of the platform that need to. Requiring a specific scheme here seems to be a case of *overspecifying*. As far as I'm concerned, the reason to specify a scheme here is for code that would stuff a bunch of URIs into an array, then walk through the array and depending on what type of URI it is do different things. Say a function that takes an array of URIs of images to pre-cache (in order to allow the images to be displayed in a user interface without ugly half-loaded images). The code might look something like: var imgCache = []; function cacheURIs(uris) { for each (var uri in uris) { // No need to waste resources on caching local images, they load fast enough anyway if (uri.substr(0, 9) != urn:uuid:) { img = document.createElement(img); img.src = uri; imgCache.push(img); } } } Now, we can certainly debate how likely it is that someone will write code like the above. For example, when would you have a mixture of remote and local uris like that? But I don't think it's impossible, so I wouldn't think it's overspecifying. / Jonas
Re: [FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
Jonas Sicking wrote: On Tue, Nov 17, 2009 at 9:12 PM, Julian Reschke julian.resc...@gmx.de wrote: 3. We could not directly call out a URI scheme at all. The benefit of doing this is we can specify *behavior* without actually getting into details about the actual identifier scheme used. But, the chief reason to not take this course of action is that leaving *anything* unspecified on the web platform leads to reverse engineering in ways that we can't envision currently. Developers may rely on quirks within one implementation and incompatibly use them with other implementations. Having to mimic another user agent's behavior has been a common outcome on the web, due to lack of specifications -- *many* examples of this exist on the web throughout its history. One lesson from the browser competition of the past is to avoid leaving things to guesswork. I'd like us to agree on something, and I'd like that agreement to be bolstered with concrete implementor feedback. ... Not requiring a specific scheme is not exactly the same thing as leaving it unspecified. If the *only* use of the identifier is within the API, the syntactical form of the identifier really doesn't matter as long as it's understood by the those parts of the platform that need to. Requiring a specific scheme here seems to be a case of *overspecifying*. As far as I'm concerned, the reason to specify a scheme here is for code that would stuff a bunch of URIs into an array, then walk through the array and depending on what type of URI it is do different things. Say a function that takes an array of URIs of images to pre-cache (in order to allow the images to be displayed in a user interface without ugly half-loaded images). The code might look something like: var imgCache = []; function cacheURIs(uris) { for each (var uri in uris) { // No need to waste resources on caching local images, they load fast enough anyway if (uri.substr(0, 9) != urn:uuid:) { img = document.createElement(img); img.src = uri; imgCache.push(img); } } } Now, we can certainly debate how likely it is that someone will write code like the above. For example, when would you have a mixture of remote and local uris like that? But I don't think it's impossible, so I wouldn't think it's overspecifying. ... If the use case is, given an arbitrary URI, to distinguish one identifying a file object from others, then *either* a distinct scheme (1) is needed, or an API could be added that answers that question. (1) In that case, re-using urn:uuid seems to be a bad idea, because it precludes that scheme being used for anything else. BR, Julian
Re: [FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
On Wed, 18 Nov 2009 01:03:23 +0100, Arun Ranganathan a...@mozilla.com wrote: 1. We could coin a new scheme such as the originally proposed filedata: scheme. This has the advantages of associating behavior (and semantics) with a scheme, so that existing schemes aren't confused or co-opted inappropriately. However, actually registering a new scheme used by browsers seemed problematic (with overhead due to coordination with multiple groups). I'm willing to revisit this idea given enough feedback, but to date, haven't really received enough of it. It's not implementor feedback, but I think this is the best solution. I think it's valuable that you can tell what the URL is for just by looking at it. Jonas gave an example of how that might be used in practice. From what I've heard so far it does not matter much for Opera either, but I rather have a new scheme. It also seems safer in case something comes up down the road. -- Anne van Kesteren http://annevankesteren.nl/
Re: [FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
Robin Berjon wrote: ... Couldn't we just register a URN NID for this? It seems that one has to go through fewer hurdles, and no matter how transient I believe that it's a useful thing to identify. ... Yes, that's possible and probably would cause less eyebrows being raised... BR, Julian
Re: [FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
On Nov 18, 2009, at 13:13 , Julian Reschke wrote: Robin Berjon wrote: ... Couldn't we just register a URN NID for this? It seems that one has to go through fewer hurdles, and no matter how transient I believe that it's a useful thing to identify. ... Yes, that's possible and probably would cause less eyebrows being raised... It also doesn't seem like a lot of work, registration doesn't require specifying the behaviour of the beast (which we'd leave where it is today). We could take the urn:transient-data NID, or urn:data-handle (whichever way the bike is shed today) and toss a UUID at the end of it; then declare victory. -- Robin Berjon - http://berjon.com/
Re: Blob as URN was Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, Nov 17, 2009 at 6:25 PM, Arun Ranganathan a...@mozilla.com wrote: Eric, I recall you saying at TPAC that you wanted to keep the Blob interface as small as possible, since it seemed likely to get used in a lot of places. I think that's an excellent goal, but of course, having said that, I am immediately going to suggest that you add something to it. I'm definitely not averse to additions :) Actually, I'm also following the discussion initiated by Maciej about BinaryData [2] with public-script-co...@w3.org. Eventually, I believe ECMAScript will provide a Binary primitive (perhaps ByteArray), and I think Blob should expose that primitive. This would be a natural extension of what I envision Blob to be used for. I also envision natural streaming extensions to this API. How would you feel about exposing a way to produce a URN from a Blob, instead of just getting one from a File? I'm not averse to it. In fact, it was originally in the Blob interface (which at that time, was dubbed FileData). We moved it to the File interface since the understanding of use cases at the time was that all URL consumers expect a full file. [3] You've provided use cases that show that this isn't the case, and so we should revisit our earlier understanding. More on your use cases below: This seems likely to have wide-ranging uses. Pretty much anywhere you have a blob of data, you might want to hand it off to the browser, even if it didn't come from, or wasn't, a single user-supplied file. Here are a few use cases, but I'm sure more won't be hard to come up with: * Viewing a single chapter of a book in a frame. * Slicing one episode out of a DVD and handing it to the video tag, so that the player controls start and end at the episode boundaries. * Analogous to the game-asset archive I mentioned at [1], one might pack a number of small files together to speed download [using only HTTP compression], then parse them apart on the client. Picture a Picasa client written in the web browser; it's got to handle maybe 1+ thumbnails, and putting each in a separate file would be terribly inefficient. Pulling down a tarfile would be a lot quicker. I can understand why you'd want *partial* data exposed through a URL, and why your API may force a type on the partial data. Question: would a fragment identifier scheme [4] address any of these use cases, or is this completely orthogonal to the use cases you envision? I ask because you envision a chapter within a frame but I'm not sure what the frame data structure is. Ah--I think I wasn't clear here. I just pictured taking a file, chopping out a chapter by, say, byte offset, and writing it into an iframe for display. I don't think a fragment identifier is powerful enough, even if we spec out how to overload it. For example, say you wanted to chop a chapter out of an HTML document *and* scroll your iframe to a specific page of that chapter? One can also imagine use cases in which the Blob is constructed completely from scratch by JavaScript, in which case there's no File at all. it would seem natural to me do to something like this: interface Blob { ... DOMString getURN(in DOMString mediaType, [Optional] in DOMString contentDisposition, [Optional] in DOMString name); }; Given that a File that one gets from the user will still tell you its name and detected mediaType, and can have a constant urn, there seems to be no conflict in leaving the File interface as-is and adding something like getURN to Blob. On the off chance that you want to override the detected mediaType for a file, force a contentDisposition of attachment, or change the name, you might still use getURN there as well. To be clear: you want the File object's URN capabilities to inherit from Blob, and not be separate, correct? Thus, each Blob has an affiliated URN, and when a Blob is a File, it uses the Blob's getURN method? No, I picture both being there, although I'm open to discussion about it. The getURN method forces the user to specify how the data is to be presented to the UA via the URN created. In contrast, the urn property of File is inherently based on what the UA knows about the file, and the user need not specify anything in order to use it. Hmm...I guess there's no reason that we couldn't make the mediaType parameter optional as well, so that in the case of a File getURN could just return what the UA thought was appropriate. That makes me nervous for Blob data, though...perhaps we should spec it to throw in that case? Can you explain what a contentDisposition is a bit better? Can you write some psuedo-code showing how contentDisposition is used, perhaps to flesh out the above use cases? Sure. Let's say that you're writing an offline version of GMail. You've got a File referring to an image attachment, and you want to offer links through which the user can either view
[FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
Julian Reschke wrote: Arun Ranganathan wrote: The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ ... 4. A suggestion to *not* have a separate scheme (filedata:) in lieu of urn:uuid:uuid[2] has been the basis of a rewrite of that feature in this version of the specification. ... Is there a particular reason why a specific URI scheme needs to be called out at all? (there are other schemes that may be more flexible, for instance because they allow using a UUID/String pair for identification). This is a useful question to answer :) I assume everyone understands use cases for this identifier. Ian's email discusses a few [1] which have been supplemented with a few more. There are a few ways to proceed: 1. We could coin a new scheme such as the originally proposed filedata: scheme. This has the advantages of associating behavior (and semantics) with a scheme, so that existing schemes aren't confused or co-opted inappropriately. However, actually registering a new scheme used by browsers seemed problematic (with overhead due to coordination with multiple groups). I'm willing to revisit this idea given enough feedback, but to date, haven't really received enough of it. 2. We could reuse an existing scheme. This seemed desirable if there was little chance of confusion and collision, and it avoids multi-group coordination. Using urn:uuid was an obvious choice given assumptions on UUID uniqueness, but it is hardly a pave the cowpaths choice since it isn't currently used on the web platform in any way I can recognize. Also, urn isn't used on the web platform as an attribute on interfaces, but url is, so we once again have a consistency argument. While consistency is desirable, I don't think we should be too hung up on it. To make a longer story shorter, urn:uuid addressed the use case fairly well, and seemed a useful starting point. [ Now between 1. and 2., I'd say the deciding factor might be implementation feedback. For instance, Firefox's code is such that 1. and 2. are both (pretty much) *equally* feasible. We also got some feedback from Microsoft [2], but that feedback seems to be preliminary, with more to come. I'm also not sure concern over origin issues are valid here [2], since a perusal of the issues with jar: (which urn:uuid has been compared to) doesn't reveal any cogent origin effrontery. I'd really like *more* implementor feedback on this issue. ] 3. We could not directly call out a URI scheme at all. The benefit of doing this is we can specify *behavior* without actually getting into details about the actual identifier scheme used. But, the chief reason to not take this course of action is that leaving *anything* unspecified on the web platform leads to reverse engineering in ways that we can't envision currently. Developers may rely on quirks within one implementation and incompatibly use them with other implementations. Having to mimic another user agent's behavior has been a common outcome on the web, due to lack of specifications -- *many* examples of this exist on the web throughout its history. One lesson from the browser competition of the past is to avoid leaving things to guesswork. I'd like us to agree on something, and I'd like that agreement to be bolstered with concrete implementor feedback. -- A* [1] http://lists.w3.org/Archives/Public/public-webapps/2009AprJun/1110.html [2] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0462.html
Blob as URN was Re: [FileAPI] Latest Revision of Editor's Draft
Eric, I recall you saying at TPAC that you wanted to keep the Blob interface as small as possible, since it seemed likely to get used in a lot of places. I think that's an excellent goal, but of course, having said that, I am immediately going to suggest that you add something to it. I'm definitely not averse to additions :) Actually, I'm also following the discussion initiated by Maciej about BinaryData [2] with public-script-co...@w3.org. Eventually, I believe ECMAScript will provide a Binary primitive (perhaps ByteArray), and I think Blob should expose that primitive. This would be a natural extension of what I envision Blob to be used for. I also envision natural streaming extensions to this API. How would you feel about exposing a way to produce a URN from a Blob, instead of just getting one from a File? I'm not averse to it. In fact, it was originally in the Blob interface (which at that time, was dubbed FileData). We moved it to the File interface since the understanding of use cases at the time was that all URL consumers expect a full file. [3] You've provided use cases that show that this isn't the case, and so we should revisit our earlier understanding. More on your use cases below: This seems likely to have wide-ranging uses. Pretty much anywhere you have a blob of data, you might want to hand it off to the browser, even if it didn't come from, or wasn't, a single user-supplied file. Here are a few use cases, but I'm sure more won't be hard to come up with: * Viewing a single chapter of a book in a frame. * Slicing one episode out of a DVD and handing it to the video tag, so that the player controls start and end at the episode boundaries. * Analogous to the game-asset archive I mentioned at [1], one might pack a number of small files together to speed download [using only HTTP compression], then parse them apart on the client. Picture a Picasa client written in the web browser; it's got to handle maybe 1+ thumbnails, and putting each in a separate file would be terribly inefficient. Pulling down a tarfile would be a lot quicker. I can understand why you'd want *partial* data exposed through a URL, and why your API may force a type on the partial data. Question: would a fragment identifier scheme [4] address any of these use cases, or is this completely orthogonal to the use cases you envision? I ask because you envision a chapter within a frame but I'm not sure what the frame data structure is. it would seem natural to me do to something like this: interface Blob { ... DOMString getURN(in DOMString mediaType, [Optional] in DOMString contentDisposition, [Optional] in DOMString name); }; Given that a File that one gets from the user will still tell you its name and detected mediaType, and can have a constant urn, there seems to be no conflict in leaving the File interface as-is and adding something like getURN to Blob. On the off chance that you want to override the detected mediaType for a file, force a contentDisposition of attachment, or change the name, you might still use getURN there as well. To be clear: you want the File object's URN capabilities to inherit from Blob, and not be separate, correct? Thus, each Blob has an affiliated URN, and when a Blob is a File, it uses the Blob's getURN method? Can you explain what a contentDisposition is a bit better? Can you write some psuedo-code showing how contentDisposition is used, perhaps to flesh out the above use cases? -- A* [1] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0424.html\ [2] http://lists.w3.org/Archives/Public/public-script-coord/2009OctDec/0093.html [3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0609.html [4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0587.html
Re: [FileAPI] URL, URI, URN | Re: [FileAPI] Latest Revision of Editor's Draft
Arun Ranganathan wrote: Is there a particular reason why a specific URI scheme needs to be called out at all? (there are other schemes that may be more flexible, for instance because they allow using a UUID/String pair for identification). This is a useful question to answer :) I assume everyone understands use cases for this identifier. Ian's email discusses a few [1] which have been supplemented with a few more. There are a few ways to proceed: ... 2. We could reuse an existing scheme. This seemed desirable if there was little chance of confusion and collision, and it avoids multi-group coordination. Using urn:uuid was an obvious choice given assumptions on UUID uniqueness, but it is hardly a pave the cowpaths choice since it isn't currently used on the web platform in any way I can recognize. I don't think that it matters at all whether it's widely used, as long as it's properly specified. That being said it is used in WebDAV (lock token URIs) and Atom (Atom:id). ... 3. We could not directly call out a URI scheme at all. The benefit of doing this is we can specify *behavior* without actually getting into details about the actual identifier scheme used. But, the chief reason to not take this course of action is that leaving *anything* unspecified on the web platform leads to reverse engineering in ways that we can't envision currently. Developers may rely on quirks within one implementation and incompatibly use them with other implementations. Having to mimic another user agent's behavior has been a common outcome on the web, due to lack of specifications -- *many* examples of this exist on the web throughout its history. One lesson from the browser competition of the past is to avoid leaving things to guesswork. I'd like us to agree on something, and I'd like that agreement to be bolstered with concrete implementor feedback. ... Not requiring a specific scheme is not exactly the same thing as leaving it unspecified. If the *only* use of the identifier is within the API, the syntactical form of the identifier really doesn't matter as long as it's understood by the those parts of the platform that need to. Requiring a specific scheme here seems to be a case of *overspecifying*. Best regards, Julian
Re: [FileAPI] Latest Revision of Editor's Draft
Arun: I recall you saying at TPAC that you wanted to keep the Blob interface as small as possible, since it seemed likely to get used in a lot of places. I think that's an excellent goal, but of course, having said that, I am immediately going to suggest that you add something to it. How would you feel about exposing a way to produce a URN from a Blob, instead of just getting one from a File? This seems likely to have wide-ranging uses. Pretty much anywhere you have a blob of data, you might want to hand it off to the browser, even if it didn't come from, or wasn't, a single user-supplied file. Here are a few use cases, but I'm sure more won't be hard to come up with: * Viewing a single chapter of a book in a frame. * Slicing one episode out of a DVD and handing it to the video tag, so that the player controls start and end at the episode boundaries. * Analogous to the game-asset archive I mentioned at [1], one might pack a number of small files together to speed download [using only HTTP compression], then parse them apart on the client. Picture a Picasa client written in the web browser; it's got to handle maybe 1+ thumbnails, and putting each in a separate file would be terribly inefficient. Pulling down a tarfile would be a lot quicker. In order for the URN to be useful, it would have to have a mediaType associated with it, and then there's content-disposition to think about, which then wants a file name as well...boy, that's a lot of baggage. However, since these aren't really inherent properties of the Blob, just of the way you want the browser to view the Blob, it would seem natural to me do to something like this: interface Blob { ... DOMString getURN(in DOMString mediaType, [Optional] in DOMString contentDisposition, [Optional] in DOMString name); }; Given that a File that one gets from the user will still tell you its name and detected mediaType, and can have a constant urn, there seems to be no conflict in leaving the File interface as-is and adding something like getURN to Blob. On the off chance that you want to override the detected mediaType for a file, force a contentDisposition of attachment, or change the name, you might still use getURN there as well. This is a pure addition to your spec, so I think we can discuss it in parallel with the publication of the WD. I don't want to suggest that this slow anything else down. Eric [1] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0424.html On Mon, Oct 26, 2009 at 4:24 AM, Arun Ranganathan a...@mozilla.com wrote: The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ These changes constitute a substantial reworking of the original API along the lines of the Alternative File API proposal [1]. There are also some additional changes that are worth pointing out explicitly: 1. This editor's draft now resides in a new location in CVS. Essentially, previous repository names had FileUpload in them and were confusing, since the API in question had less to do with *uploading a file* than *reading* a file. FileAPI is shorter and more intuitive (and better describes what we're doing). Previous drafts are worth keeping as historical artifacts reflecting the decision making of the WG, and so now include a caveat on them pointing them to the draft above. 2. Interface names have changed (notably FileData -- Blob) since the underlying FileData interface had uses on the platform beyond a file read API. Blob as an interface name was first introduced by a Google Gears API, which I cite as an informative reference. 3. The event model resembles that of XHR2, with a few differences. Notably, the APIs differ in their use of the 'loadend' ProgressEvent. 4. A suggestion to *not* have a separate scheme (filedata:) in lieu of urn:uuid:uuid[2] has been the basis of a rewrite of that feature in this version of the specification. I don't anticipate the event model will be controversial, having seen a fair amount of discussion on the listserv. But I do anticipate feedback about 4., as well as the remaining editor's notes. Looking forward to discussion of this API on this listserv and at the upcoming TPAC :) -- A* [1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0565.html [2] http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/0091.html
RE: [FileAPI] Latest Revision of Editor's Draft
On Monday, November 02, 2009 10:12 PM, Jonas Sicking wrote: On Mon, Nov 2, 2009 at 12:25 PM, Adrian Bateman adria...@microsoft.com wrote: On Tuesday, October 27, 2009 2:35 PM, Jonas Sicking wrote: But like Arun, I suspect that this feature is the most controversial in the spec. Apple expressed concern about having a string represent a handle to a resource, and when we talked to Microsoft they briefly mentioned that they has concerns about this feature too, though I don't know specifically what their concerns were. The main concern I had was whether the URN feature was a must have for v1 given Arun's desire that this be the simplest spec that we could then build on later. Implementing a new protocol handler is more complex than just supporting the API part, for us anyway. I am also concerned about introducing new origin semantics - in the past this has been a source of security bugs and so I question whether we need to rush into this part (I accept the use case is valuable but I'm not sure it is initially essential). I'd really like to try to keep it in version 1. One of the use cases we hear most often for this API is for uploading images. For example to photo management sites like Flickr, but also for profile pictures on sites like twitter. In both these cases it's possible to use data-uris, but that will most likely result in the several copies of a several-MB-sized data-uris living in memory. I think the situation might be even worse in IE which if recall correctly there's some fairly low limits on how big data-uris can be (is this correct?). There is a limit on the size of data-uris in IE8 (32K I think). I expect addressing this will be a higher priority than a new handler but I agree that copying around large strings is problematic. Are you concerned about security bugs in the feature design or in the implementation? Mostly in the implementation - it increases the surface area to be concerned about and there might be a different approach. This isn't something I feel really strongly about. I imagine that when we look at implementing this we will start with just the API part and look at the URN handling separately. Cheers, Adrian.
Re: [FileAPI] Latest Revision of Editor's Draft
Adrian Bateman wrote: On Monday, November 02, 2009 10:12 PM, Jonas Sicking wrote: On Mon, Nov 2, 2009 at 12:25 PM, Adrian Bateman adria...@microsoft.com wrote: On Tuesday, October 27, 2009 2:35 PM, Jonas Sicking wrote: But like Arun, I suspect that this feature is the most controversial in the spec. Apple expressed concern about having a string represent a handle to a resource, and when we talked to Microsoft they briefly mentioned that they has concerns about this feature too, though I don't know specifically what their concerns were. The main concern I had was whether the URN feature was a must have for v1 given Arun's desire that this be the simplest spec that we could then build on later. Implementing a new protocol handler is more complex than just supporting the API part, for us anyway. I am also concerned about introducing new origin semantics - in the past this has been a source of security bugs and so I question whether we need to rush into this part (I accept the use case is valuable but I'm not sure it is initially essential). I'd really like to try to keep it in version 1. One of the use cases we hear most often for this API is for uploading images. For example to photo management sites like Flickr, but also for profile pictures on sites like twitter. In both these cases it's possible to use data-uris, but that will most likely result in the several copies of a several-MB-sized data-uris living in memory. I think the situation might be even worse in IE which if recall correctly there's some fairly low limits on how big data-uris can be (is this correct?). There is a limit on the size of data-uris in IE8 (32K I think). I expect addressing this will be a higher priority than a new handler but I agree that copying around large strings is problematic. FWIW, the specification makes a provision for URL length limitations in certain user agents, so that a file (as a Data URL) that exceeds a URL-length limit will force an ENCODING_ERR when the readAsDataURL method is called on a FileReader object. Are you concerned about security bugs in the feature design or in the implementation? Mostly in the implementation - it increases the surface area to be concerned about and there might be a different approach. This feedback as a potential implementor is important :-) 1. Can you give us an example of an exploit, or expand on your concerns? 2. From an implementation perspective, do you care whether we define a scheme (such as filedata:) or reuse something like urn:uuid:[UUID] ? Are there any barriers with respect to either one? This isn't something I feel really strongly about. I imagine that when we look at implementing this we will start with just the API part and look at the URN handling separately. This is in fact pretty much the approach we have taken with Firefox 3.6 (currently in beta). -- A*
RE: [FileAPI] Latest Revision of Editor's Draft
On Tuesday, November 03, 2009 10:07 AM, Arun Ranganathan wrote: Adrian Bateman wrote: On Monday, November 02, 2009 10:12 PM, Jonas Sicking wrote: Are you concerned about security bugs in the feature design or in the implementation? Mostly in the implementation - it increases the surface area to be concerned about and there might be a different approach. This feedback as a potential implementor is important :-) 1. Can you give us an example of an exploit, or expand on your concerns? If you look through the bugs reported (and fixed) in the Firefox jar: scheme handler many of them revolve around mishandling origin. The file urn is obviously simpler and also currently refers to a file that the user had to select. However, in future this might be used as part of a larger API that allows certain web sites to access certain files/folders. A vulnerability might involve leaking the URN from one origin to another allowing a site to read a file it shouldn't have access to. 2. From an implementation perspective, do you care whether we define a scheme (such as filedata:) or reuse something like urn:uuid:[UUID] ? Are there any barriers with respect to either one? At first glance, I imagine filedata: would be easier for us to implement but I haven't researched this yet - I will ask the question. I wonder from a spec perspective whether reusing urn:uuid: might cause problems with this being overloaded for different uses in future. Cheers, Adrian.
RE: [FileAPI] Latest Revision of Editor's Draft
On Tuesday, October 27, 2009 2:35 PM, Jonas Sicking wrote: On Tue, Oct 27, 2009 at 12:36 AM, Ian Hickson i...@hixie.ch wrote: I would like to see implementation feedback on this. I don't understand why we would want to assign semantics to urn:uuid: URLs that are so specific -- that seems completely wrong. It also seems really awkward from an implementation perspective to forgo the normal extension mechanism (schemes) and have implementations give special (and non-trivial) semantics to a subset of another scheme. Why are we doing this? But like Arun, I suspect that this feature is the most controversial in the spec. Apple expressed concern about having a string represent a handle to a resource, and when we talked to Microsoft they briefly mentioned that they has concerns about this feature too, though I don't know specifically what their concerns were. The main concern I had was whether the URN feature was a must have for v1 given Arun's desire that this be the simplest spec that we could then build on later. Implementing a new protocol handler is more complex than just supporting the API part, for us anyway. I am also concerned about introducing new origin semantics - in the past this has been a source of security bugs and so I question whether we need to rush into this part (I accept the use case is valuable but I'm not sure it is initially essential). Cheers, Adrian.
Re: [FileAPI] Latest Revision of Editor's Draft
On Mon, Nov 2, 2009 at 12:25 PM, Adrian Bateman adria...@microsoft.com wrote: On Tuesday, October 27, 2009 2:35 PM, Jonas Sicking wrote: On Tue, Oct 27, 2009 at 12:36 AM, Ian Hickson i...@hixie.ch wrote: I would like to see implementation feedback on this. I don't understand why we would want to assign semantics to urn:uuid: URLs that are so specific -- that seems completely wrong. It also seems really awkward from an implementation perspective to forgo the normal extension mechanism (schemes) and have implementations give special (and non-trivial) semantics to a subset of another scheme. Why are we doing this? But like Arun, I suspect that this feature is the most controversial in the spec. Apple expressed concern about having a string represent a handle to a resource, and when we talked to Microsoft they briefly mentioned that they has concerns about this feature too, though I don't know specifically what their concerns were. The main concern I had was whether the URN feature was a must have for v1 given Arun's desire that this be the simplest spec that we could then build on later. Implementing a new protocol handler is more complex than just supporting the API part, for us anyway. I am also concerned about introducing new origin semantics - in the past this has been a source of security bugs and so I question whether we need to rush into this part (I accept the use case is valuable but I'm not sure it is initially essential). I'd really like to try to keep it in version 1. One of the use cases we hear most often for this API is for uploading images. For example to photo management sites like Flickr, but also for profile pictures on sites like twitter. In both these cases it's possible to use data-uris, but that will most likely result in the several copies of a several-MB-sized data-uris living in memory. I think the situation might be even worse in IE which if recall correctly there's some fairly low limits on how big data-uris can be (is this correct?). Are you concerned about security bugs in the feature design or in the implementation? / Jonas
Re: [FileAPI] Latest Revision of Editor's Draft
Arun Ranganathan wrote: The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ ... 4. A suggestion to *not* have a separate scheme (filedata:) in lieu of urn:uuid:uuid[2] has been the basis of a rewrite of that feature in this version of the specification. ... Is there a particular reason why a specific URI scheme needs to be called out at all? (there are other schemes that may be more flexible, for instance because they allow using a UUID/String pair for identification). Best regards, Julian
Re: [FileAPI] Latest Revision of Editor's Draft
Le lundi 26 octobre 2009 à 05:24 -0700, Arun Ranganathan a écrit : The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ The WebIDL checker identifies a couple of simple bugs in the draft: http://www.w3.org/2009/07/webidl-check?doc=http%3A%2F%2Fdev.w3.org% 2F2006%2Fwebapi%2FFileAPI%2Foutput=html (a missing “;” and an erroneous usage of “attribute” inside an Exception). Dom
Re: [FileAPI] Latest Revision of Editor's Draft
Dominique Hazael-Massieux wrote: Le lundi 26 octobre 2009 à 05:24 -0700, Arun Ranganathan a écrit : The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ The WebIDL checker identifies a couple of simple bugs in the draft: http://www.w3.org/2009/07/webidl-check?doc=http%3A%2F%2Fdev.w3.org% 2F2006%2Fwebapi%2FFileAPI%2Foutput=html (a missing “;” and an erroneous usage of “attribute” inside an Exception). Dom Fixed! (This is a very useful tool :-) ) -- A*
Re: [FileAPI] Latest Revision of Editor's Draft
Jonas, When loadend was removed from media elements [2] I wished to determine whether it was event overkill to also fire at successful reads. Sounds like you want it back for successful reads as well? But the reason why loadend *and load* was removed from video do not apply here. The reason there was that they never guarentee that the whole video is downloaded such that it can be accessed without further need for network access. Additionally we should follow the intent of the Progress Events spec. Done. (I was simply wrong about loadend, and citing media elements wasn't helpful; I've changed the spec. to show that loadend should be fired, and it now matches the behavior of the XHR2 editor's draft.) Of the two names you suggest, do you feel strongly about one over the others? I'm not sure I love 'result' (it isn't intuitive as a response to a read), and 'data' is used in other contexts on the platform and so may be confusing. If you feel strongly (stronger than a 'maybe' ;-) ) about a different name, I'm happy to change it. I don't feel that strongly no. But i think 'result' the the most correct name that I can think of. Done. *grumble grumble OK Jonas, in the interest of not bike shedding I've renamed fileData to result* Ah, ok. There was also some confusing wording in the definition of the attribute: On getting, if progress events are queued for dispatch while processing the readAsText read method, this attibute SHOULD return partial file data in the format specified by the encoding determination) However it should contain the data read so far even if there currently aren't any progress events queued for dispatch. I.e. if a progress event was just dispatched and no more data has been read so far, then .fileData should still contain the same value as when the last progress event was dispatched. Fixed; this isn't harnessed to queuing ProgressEvents for dispatch anymore. * I think someone had brought up a good argument for *not* throwing when slice is called with start+offset size. One of the main use cases for slice is to slice up a file in several chunks when sending with XHR. When that is done it's easy to end up with rounding errors resulting in a slightly to large length being requested. In this case it makes sense to just clamp to size rather than throwing an error. OK -- sounds like slice should NOT throw an INDEX_SIZE_ERR at all, and only clamp on size? Yeah, I think so. Except if the 'start' attribute is bigger then size. Though possibly we could even then clamp to a zero-sized Blob. I don't really feel strongly. Fixed (I don't feel strongly about whether to throw or not, so I changed it to *clamp.* It may offer conveniences to NOT throw, but I'd like to hear from others). http://dev.w3.org/2006/webapi/FileAPI/ -- A*
Re: [FileAPI] Latest Revision of Editor's Draft
On Mon, Oct 26, 2009 at 5:24 AM, Arun Ranganathan a...@mozilla.com wrote: The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ A few comments: * loadend should fire after load/error/abort. * I'm not sure i love the name 'fileData'. Maybe 'result' or simply 'data' is better. * Whatever the name, I don't see why 'fileData' should only be readable while an event is being fired. That seems unnecessarily complicated, doesn't match XHR and seems less useful. * fileData should probably be null rather than the empty string during on error and before data is read. * The second argument to 'splice' should be called 'length' rather than 'offset'. * I think someone had brought up a good argument for *not* throwing when slice is called with start+offset size. One of the main use cases for slice is to slice up a file in several chunks when sending with XHR. When that is done it's easy to end up with rounding errors resulting in a slightly to large length being requested. In this case it makes sense to just clamp to size rather than throwing an error. / Jonas
Re: [FileAPI] Latest Revision of Editor's Draft
On Mon, 26 Oct 2009, Arun Ranganathan wrote: 2. Interface names have changed (notably FileData -- Blob) since the underlying FileData interface had uses on the platform beyond a file read API. Blob as an interface name was first introduced by a Google Gears API, which I cite as an informative reference. Updated HTML5 and related specs. 3. The event model resembles that of XHR2, with a few differences. Notably, the APIs differ in their use of the 'loadend' ProgressEvent. I think this spec needs examples. I think the examples would show that the current design requires far too many lines of code to do something that really should only need one or two statements. (I think XHR is a very poor model to follow.) 4. A suggestion to *not* have a separate scheme (filedata:) in lieu of urn:uuid:uuid[2] has been the basis of a rewrite of that feature in this version of the specification. I would like to see implementation feedback on this. I don't understand why we would want to assign semantics to urn:uuid: URLs that are so specific -- that seems completely wrong. It also seems really awkward from an implementation perspective to forgo the normal extension mechanism (schemes) and have implementations give special (and non-trivial) semantics to a subset of another scheme. Why are we doing this? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [FileAPI] Latest Revision of Editor's Draft
Jonas Sicking wrote: On Mon, Oct 26, 2009 at 5:24 AM, Arun Ranganathan a...@mozilla.com wrote: The latest revision of the FileAPI editor's draft is available here: http://dev.w3.org/2006/webapi/FileAPI/ A few comments: * loadend should fire after load/error/abort. Currently it *only* fires when error and abort events fire. I felt that 'load' was sufficient for successful reads into memory, while 'loadend' was useful for unsuccessful ones. This differs from XHR2's definition of 'loadend'[1]: When the request has completed (either in success or failure). [1] When loadend was removed from media elements [2] I wished to determine whether it was event overkill to also fire at successful reads. Sounds like you want it back for successful reads as well? * I'm not sure i love the name 'fileData'. Maybe 'result' or simply 'data' is better. I'm happy to change it to a better name, but chose 'fileData' since in the original version of the draft, with asynchronous callbacks [3], we had an interface called FileData which represented the actual file data (in the present and current version of the editor's draft -- http://dev.w3.org/2006/webapi/FileAPI/ -- FileData is called Blob) . Of the two names you suggest, do you feel strongly about one over the others? I'm not sure I love 'result' (it isn't intuitive as a response to a read), and 'data' is used in other contexts on the platform and so may be confusing. If you feel strongly (stronger than a 'maybe' ;-) ) about a different name, I'm happy to change it. * Whatever the name, I don't see why 'fileData' should only be readable while an event is being fired. That seems unnecessarily complicated, doesn't match XHR and seems less useful. Nothing in the present draft prohibits that -- I only left an editor's note as an open question. I agree with you about the desired behavior, and so I'll remove the editor's note. * fileData should probably be null rather than the empty string during on error and before data is read. Done * The second argument to 'splice' should be called 'length' rather than 'offset'. Done * I think someone had brought up a good argument for *not* throwing when slice is called with start+offset size. One of the main use cases for slice is to slice up a file in several chunks when sending with XHR. When that is done it's easy to end up with rounding errors resulting in a slightly to large length being requested. In this case it makes sense to just clamp to size rather than throwing an error. OK -- sounds like slice should NOT throw an INDEX_SIZE_ERR at all, and only clamp on size? / Jonas Current editor's draft: http://dev.w3.org/2006/webapi/FileAPI/ [1] http://www.w3.org/TR/XMLHttpRequest2/#loadend-event [2] http://dev.w3.org/cvsweb/html5/spec/Overview.html?r1=1.3290r2=1.3291f=h [3] http://dev.w3.org/2006/webapi/FileUpload/publish/FileAPI.html
Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, Oct 27, 2009 at 12:36 AM, Ian Hickson i...@hixie.ch wrote: 3. The event model resembles that of XHR2, with a few differences. Notably, the APIs differ in their use of the 'loadend' ProgressEvent. I think this spec needs examples. I think the examples would show that the current design requires far too many lines of code to do something that really should only need one or two statements. (I think XHR is a very poor model to follow.) I was as surprised as you, but the feedback we consistently received, both here and when talking directly to developers, was that using an events-based model was preferable to using callbacks. We also received the feedback that following XHR was good because authors were used to this model. I agree that especially the common simple use case results in more lines of code, but it doesn't need to be as complicated as the example in the beginning of the spec: r = new FileReader; r.readAsText(file); r.onload = fileIsRead; 4. A suggestion to *not* have a separate scheme (filedata:) in lieu of urn:uuid:uuid[2] has been the basis of a rewrite of that feature in this version of the specification. I would like to see implementation feedback on this. I don't understand why we would want to assign semantics to urn:uuid: URLs that are so specific -- that seems completely wrong. It also seems really awkward from an implementation perspective to forgo the normal extension mechanism (schemes) and have implementations give special (and non-trivial) semantics to a subset of another scheme. Why are we doing this? I'd like to hear implementation feedback here too. I'm especially worried that implementations might not be able to use the urn scheme due to limitations in the network stacks they are using. But like Arun, I suspect that this feature is the most controversial in the spec. Apple expressed concern about having a string represent a handle to a resource, and when we talked to Microsoft they briefly mentioned that they has concerns about this feature too, though I don't know specifically what their concerns were. I don't think we are assigning specific semantics to another scheme that aren't already there. All we're saying is that urn:uuid represents a specific chunk of data with a specific mimetype. This seems like something that's already there with urn:uuid. Arguably the status codes is something that urn:uuid doesn't already have. Arun mentioned that we could possibly get rid of that. / Jonas
Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, 27 Oct 2009, Jonas Sicking wrote: All we're saying is that urn:uuid represents a specific chunk of data with a specific mimetype. This seems like something that's already there with urn:uuid. We're also saying that urn:uuid: has special semantics in the same-origin model, and that it has an expiration model. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, Oct 27, 2009 at 2:49 PM, Ian Hickson i...@hixie.ch wrote: On Tue, 27 Oct 2009, Jonas Sicking wrote: All we're saying is that urn:uuid represents a specific chunk of data with a specific mimetype. This seems like something that's already there with urn:uuid. We're also saying that urn:uuid: has special semantics in the same-origin model, and that it has an expiration model. The expiration model is just that when the Document goes away the urn:uuid is changed to no longer represent that chunk of data. The origin is something that at least in gecko we build on top of the URI, i.e. the URI itself doesn't contain any origin information. If you consider it to be part of the URI, then why wouldn't each urn:uuids already have an origin? / Jonas
Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, 27 Oct 2009, Jonas Sicking wrote: On Tue, Oct 27, 2009 at 2:49 PM, Ian Hickson i...@hixie.ch wrote: On Tue, 27 Oct 2009, Jonas Sicking wrote: All we're saying is that urn:uuid represents a specific chunk of data with a specific mimetype. This seems like something that's already there with urn:uuid. We're also saying that urn:uuid: has special semantics in the same-origin model, and that it has an expiration model. The expiration model is just that when the Document goes away the urn:uuid is changed to no longer represent that chunk of data. The origin is something that at least in gecko we build on top of the URI, i.e. the URI itself doesn't contain any origin information. If you consider it to be part of the URI, then why wouldn't each urn:uuids already have an origin? I just mean that if someone else decides that they are going to resolve urn:uuid:s in some way or other, the origin model they would use will almost certainly be quite different to the origin model we are using here. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [FileAPI] Latest Revision of Editor's Draft
On Tue, Oct 27, 2009 at 3:26 PM, Ian Hickson i...@hixie.ch wrote: On Tue, 27 Oct 2009, Jonas Sicking wrote: On Tue, Oct 27, 2009 at 2:49 PM, Ian Hickson i...@hixie.ch wrote: On Tue, 27 Oct 2009, Jonas Sicking wrote: All we're saying is that urn:uuid represents a specific chunk of data with a specific mimetype. This seems like something that's already there with urn:uuid. We're also saying that urn:uuid: has special semantics in the same-origin model, and that it has an expiration model. The expiration model is just that when the Document goes away the urn:uuid is changed to no longer represent that chunk of data. The origin is something that at least in gecko we build on top of the URI, i.e. the URI itself doesn't contain any origin information. If you consider it to be part of the URI, then why wouldn't each urn:uuids already have an origin? I just mean that if someone else decides that they are going to resolve urn:uuid:s in some way or other, the origin model they would use will almost certainly be quite different to the origin model we are using here. This doesn't seem to be a problem as long as the two specs don't mandate the exact same uuids. But again, I'd like feedback from other implementations with different network stacks. / Jonas
Re: [FileAPI] Latest Revision of Editor's Draft
Ian Hickson wrote: On Tue, 27 Oct 2009, Jonas Sicking wrote: On Tue, Oct 27, 2009 at 2:49 PM, Ian Hickson i...@hixie.ch wrote: On Tue, 27 Oct 2009, Jonas Sicking wrote: All we're saying is that urn:uuid represents a specific chunk of data with a specific mimetype. This seems like something that's already there with urn:uuid. We're also saying that urn:uuid: has special semantics in the same-origin model, and that it has an expiration model. The expiration model is just that when the Document goes away the urn:uuid is changed to no longer represent that chunk of data. The origin is something that at least in gecko we build on top of the URI, i.e. the URI itself doesn't contain any origin information. If you consider it to be part of the URI, then why wouldn't each urn:uuids already have an origin? I just mean that if someone else decides that they are going to resolve urn:uuid:s in some way or other, the origin model they would use will almost certainly be quite different to the origin model we are using here. Yes; that is true, and is a concern. However, my reading of: http://www.ietf.org/rfc/rfc4122 (which describes urn:uuid) suggest that namespace resolution for UUIDs, coupled with general stipulations for namespace resolution, make this a manageable problem. From RFC4122, Section 3: Process for identifier resolution: Since UUIDs are not globally resolvable, this is not applicable. Moreover, in http://www.ietf.org/rfc/rfc2141.txt (which describes URN syntax), we find that: ... Namespace registration must include guidance on how to determine functional equivalence for that namespace, i.e. when two URNs are identical within a namespace. We're unlikely to have *identical URNs* in the uuid namespace. One reason I chose UUID is because the identical URN problem is unlikely. That leaves the problem of persistence (which is also a stipulation on URNs) but I think that we are entitled to define persistence in terms of the Document's persistence. I'd like to hear from implementors, and of course those that disagree with my reading of these specifications. I'm amenable to dropping the HTTP responses if that helps. -- A*