Re: File API: Blob and underlying file changes.
Going a bit back to current spec and changing underlying files - here is an update on our thinking (and current implementation plan). We played with File/Blob ideas a little more and talked with some of our app developers. In regard to a problem of changing file, most folks feel the Blob is best to be though of as a 'snapshot of a byte range' with a delayed promise to deliver the actual bytes in that range from the underlying data storage. It is a 'delayed promise' because all the actual 'reading' methods are async. Basically, in terms of implementation, the Blob is not a 'container of bytes' but rather a 'reference' to the byte range. As such, the async read operations later may fail, for many reasons - the file can be deleted, renamed, modified, etc. It seems developers sometimes want to be oblivious to those problems, but in other scenarios they want to process them. Basically, it's app-specific choice. It appears that the following implementation goes along with the current edition of the spec but also provides the ability to detect the file change: 1. File derives from Blob, so there is a File.size that performs synchronous file I/O. Not ideal, but easy to use and compatible with current forms upload. 2. File.slice() also does a synchronous IO and captures the current size and modification time of the underlying file - and caches it in the resulting Blob. 3. Subsequent Blob.slice() and Blob.size calls do not do any file IO, but merely operate on cached values. So the only Blob methods that do sync IO are those on the File object. Subsequent slicing operates on the file information captured from File and propagate it to derived Blobs. 4. In xhr.send() and FileReader, if the UA discovers that the underlying file is changed, it behaves just like when other file errors are discovered - returning 'error' progress event and setting FileReader.error attribute for example. We might need another FileError code for that if existing ones do not feel adequate. This way, the folks who don't care about changing files could simply ignore the error results - because they likely do not worry about other errors as well (such as NOT_FOUND_ERR). At the same time, folks that worry about such things, could simply process the errors already specified. It also doesn't add new exceptions to the picture so no special code is needed in simple cases. One obvious difficulty here is the synchronous file IO on File.size and File.slice(). Trying to eliminate it requires some complexity in API that is not obviously better. It either leads to some strange APIs like a getSize() with a callback that delivers the size, or/and breaks behavior of currently implemented File (and most developer's expectations). In any case, an attempt to completely avoid sync IO and preserve correctness seems to be calling for a way more involved API. Considering that most uploaders which slice the file and send it in pieces will likely do it in a worker thread, sync IO in these places perhaps is a lesser evil then complicated (or dual) API... Thanks, Dmitry On Wed, Jan 27, 2010 at 4:40 AM, Juan Lanus juan.la...@gmail.com wrote: On Wed, Jan 27, 2010 at 01:16, Robert O'Callahan rob...@ocallahan.org wrote: On Wed, Jan 27, 2010 at 5:38 AM, Juan Lanus juan.la...@gmail.com wrote: Quite right Bob. But still the lock is the way to go. At least as of today. HTML5 might be mainstream for the next 10 years, starting rather soon. In the meanwhile OSs will also evolve, in a way that we can't tell now. But if there are common issues, like this one, somebody will come up with a smart solution maybe soon. For example feeding an image of the file as of the instant it was opened (like relational databases do to provide stable queries) by keeping a temporary map to the original disk segments that comprised the file before it was changed. For example Apple is encouraging advisory locks http://developer.apple.com/mac/library/technotes/tn/tn2037.html#OSSolutions asking developers to design in an environment-aware mood. In my experience, almost no code uses advisory locking unless it is being explicitly designed for some kind of concurrent usage, i.e., Apple's advice is not being followed. If that's not going to suddenly change --- and I see no evidence it will --- then asking the UA to apply a mandatory lock is asking the UA to do something impossible, which is generally not a good idea. Rob Right, not talking about locks any more because it would be telling HOW the UA should do it, and what is best for the UA developers is to be told WHAT to do. Not writing a tutorial but a specification. Let the developer find out how to do it, this year, and with the tools that will be available by 2020. Now, out of the locks subject, what I want to be sure of is that the specification does not specify the mutating blob, the origin of this thread. -- Juan He was pierced for our transgressions, he was crushed
Re: File API: Blob and underlying file changes.
On Mon, Feb 1, 2010 at 12:27 PM, Dmitry Titov dim...@chromium.org wrote: Basically, it's app-specific choice. It appears that the following implementation goes along with the current edition of the spec but also provides the ability to detect the file change: 1. File derives from Blob, so there is a File.size that performs synchronous file I/O. Not ideal, but easy to use and compatible with current forms upload. 2. File.slice() also does a synchronous IO and captures the current size and modification time of the underlying file - and caches it in the resulting Blob. Note that the synch IO is not required by the spec. You can just cache the filesize when the File object is created, which always happens asynchronously. Then used that cached value through all calls to File.size and Blob.slice(). 4. In xhr.send() and FileReader, if the UA discovers that the underlying file is changed, it behaves just like when other file errors are discovered - returning 'error' progress event and setting FileReader.error attribute for example. We might need another FileError code for that if existing ones do not feel adequate. This is definitely an interesting idea, possibly even something that we should standardize. I don't really feel strongly either way, though I am curious about platform support if the file lives in NFS or samba or some such. One obvious difficulty here is the synchronous file IO on File.size and File.slice(). Trying to eliminate it requires some complexity in API that is not obviously better. See above. / Jonas
Re: File API: Blob and underlying file changes.
On Sun, Jan 24, 2010 at 8:04 AM, Juan Lanus juan.la...@gmail.com wrote: ** Locking What's wrong with file locking? Rob O'Callahan answered that: One problem is that mandatory locking is not supported on Mac or most Linux installs. Quite right Bob. But still the lock is the way to go. At least as of today. HTML5 might be mainstream for the next 10 years, starting rather soon. In the meanwhile OSs will also evolve, in a way that we can't tell now. But if there are common issues, like this one, somebody will come up with a smart solution maybe soon. For example feeding an image of the file as of the instant it was opened (like relational databases do to provide stable queries) by keeping a temporary map to the original disk segments that comprised the file before it was changed. For example Apple is encouraging advisory locks http://developer.apple.com/mac/library/technotes/tn/tn2037.html#OSSolutions asking developers to design in an environment-aware mood. Maybe now that I think it a bit more, specifying that the UA should get a lock is telling HOW to do things, while the use cases practice teaches us that at the requirements level one should say WHAT to do. What if the specification only said that the UA has to do its best to get an integral copy of the input file, or else after doing whatever it MUST raise an error? This will leave headroom for the UA designers and also is what the specification says now, isn't it? I got scared by the mutating blob solution. -- Juan Lanus
Re: File API: Blob and underlying file changes.
On Wed, Jan 27, 2010 at 5:38 AM, Juan Lanus juan.la...@gmail.com wrote: Quite right Bob. But still the lock is the way to go. At least as of today. HTML5 might be mainstream for the next 10 years, starting rather soon. In the meanwhile OSs will also evolve, in a way that we can't tell now. But if there are common issues, like this one, somebody will come up with a smart solution maybe soon. For example feeding an image of the file as of the instant it was opened (like relational databases do to provide stable queries) by keeping a temporary map to the original disk segments that comprised the file before it was changed. For example Apple is encouraging advisory locks http://developer.apple.com/mac/library/technotes/tn/tn2037.html#OSSolutions asking developers to design in an environment-aware mood. In my experience, almost no code uses advisory locking unless it is being explicitly designed for some kind of concurrent usage, i.e., Apple's advice is not being followed. If that's not going to suddenly change --- and I see no evidence it will --- then asking the UA to apply a mandatory lock is asking the UA to do something impossible, which is generally not a good idea. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: File API: Blob and underlying file changes.
On Sun, Jan 24, 2010 at 8:04 AM, Juan Lanus juan.la...@gmail.com wrote: ** Locking What's wrong with file locking? One problem is that mandatory locking is not supported on Mac or most Linux installs. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: File API: Blob and underlying file changes.
I'm new to this list and to all the W3C work so I might be completely wrong. That said, let's say. Dmitry posed a simple question: If a file's blob should be kept in sync with the file's content in disk, or not. He did not get a yes or no answer but instead triggered a near 30 posts thread that as I see it denotes a certain lack of definition so far. This is what I think, after having read only the draft and this thread: ** The mutating blob The idea of keeping the disk file in sync with its working version, the mutating blob, as too risky and impractical. IMO doing so will raise a lot of issues while solving none. What is the scenario that calls for such feature? I can't see any, buy I can yes see lots of scenarios where data stability is desirable. For example a disk file holding the data of an active relational database. The scenario is uploading a big file where possibly many concurrent applications introduce changes anywhere in the file, every few seconds. I know that this example is contrived, but there might be many others with similar characteristics, albeit not so clear and dramatic. In this scenario the UA might be completely busy in trying to keep current with the changes, like during a DoS attack. Another requirement for a database file is that it has to be consistent, so sending a slice of one version lumped with a slice of a later version is unacceptable. If, and only if, there is an unavoidable requirement for such a feature then I strongly suggest that the API specifies a flag informing the application that the original file changed during the operation but without doing nothing. Let the developer decide if she wants to take any action, instead of trying in advance to solve her a problem that might not exist. In one post Dmitry says that he found out that developers expect Blob to be a 'snapshot'. This is the way to go: talking with developers and also with software architects who already solved issues like this years ago. ** Locking What's wrong with file locking? May be it was discussed in prior sessions I didn't read, because it seems to be already discarded. But locking is the universally accepted solution in multitasking operating systems. The API should lock the files to prevent them to be written by other applications, for a short while or during a long time It is a must, to make the read atomic (atomic is not desirable but a must): 1 the UA SHOULD lock the file (a mandatory lock preventing writes by other apps) and open it . 1a the file refuses to be locked .. 1a1 the operation fails with a file is locked error .. 1a2 the use case fails 2 the UA uses the file 3 the UA unlocks the file by issuing a close method For small files this does not make a difference. But what happens if the file is huge? In this case leave the problem to the developer, the one who knows about the environment and the particular requirements. For example the developer could choose to swiftly copy the file into a blob and close it to release the brief lock if it is a busy file (database ...), or have if locked during a lengthy transference operation if the file content is static (video, or backup ...). It is not possible to solve all the developer's issues at this point, we can only provide tools, the simpler the better, for the developers to leverage. For very special cases there might be an option locking=no to open a file allowing other applications to change it. Intuitively I perceive this as a security crack. Such a file could become a communicacion area between the computer contents and the web. A trojan could repeatedly paste information in the file for the UA to send it to the bad guy's server. This could be achieved by setting a trojan listener in the OS to detect when the user selected a file. As I see it when the user allows the UA to grab a file then she means what the file contains right now and we MUST not deceive her. ** Avoid involving technology limitations in the design The File API is sort of an impedance adapter between the latency of the Internet connections and the speed of disk drives (disks or whatever, think of the future). As such, it must be able to handle any speed difference. In the future the case difference might change its sign. Also, the API must consider that what today is regarded as big might be regular in the future and small after a while. For example making a memory copy of a 300MB file is possible today but not when the computers, even the mainframes, sported a few MB RAM. The virtual memory that most OSs have is an existing implementation of a in-memory file backed by disk storage. This issue is already solved, since the seventies. A program, like the UA, can pump lots of data into RAM and the OS will use the disk to store the bytes in case of a shortage in real RAM. This way computers, like PCs, appear to have twice as much RAM as they have physically installed, at the cost of some performance loss that is completely compatible with Internet
Re: File API: Blob and underlying file changes.
Treating blobs as snapshots sounds like a reasonable approach and it will make the life of the chunked upload and other scenarios easier. Now the problem is: how do we get the blob (snapshot) out of the file? 1) We can still keep the current relationship between File and Blob. When we slice a file by calling File.slice, a new blob that captures the current file size and modification time is returned. The following Blob operations, like slice, will simply inherit the cached size and modification time. When we access the underlying file data in XHR.send() or FileReader, the modification time will be verified and an exception could be thrown. 2) We can remove the inheritance of Blob from File and introduce File.getAsBlob() as dimich suggested. This seems to be more elegant. However, it requires changing the File API spec a lot. On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane er...@google.com wrote: On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov dim...@chromium.org wrote: On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. True, but we'd probably want to deprecate file.fileSize anyway and then get rid of it, since it's synchronous. 2) Currently, xhr.send(file) does not fail and sends the version of the file that exists somewhere around xhr.send(file) call was issued. Since File is also a Blob, xhr.send(blob) would behave the same which means if we want to preserve this behavior the Blob can not fail async read operation if file has changed. There is a contradiction here. One way to resolve it would be to break File is Blob and to be able to capture the File as Blob by having file.getAsBlob(). The latter would make a snapshot of the state of the file, to be able to fail subsequent async read operations if the file has been changed. I've asked a few people around in a non-scientific poll and it seems developers expect Blob to be a 'snapshot', reflecting the state of the file (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation. Since it's obviously bad to actually copy data, it seems acceptable to capture enough information (like mod time) so the read operations later can fail if underlying storage has been changed. It feels really strange if reading the Blob can yield some data from one version of a file (or Canvas) mixed with some data from newer version, without any indication that this is happening. All that means there is an option 3: 3. Treat all Blobs as 'snapshots' that refer to the range of underlying data at the moment of creation of the Blob. Blobs produced further by Blob.slice() operation inherit the captured state w/o actually verifying it against 'live' underlying objects like files. All Blobs can be 'read' (or 'sent') via operations that can fail if the underlying content has changed. Optionally, expose snapshotTime property and perhaps read if not changed since parameter to read operations. Do not derive File from Blob, rather have File.getAsBlob() that produces a Blob which is a snapshot of the file at the moment of call. The advantage here is that it removes need for sync operations from Blob and provides mechanism to ensure the changing underlying storage is detectable. The disadvantage is a bit more complexity and bigger change to File spec. That sounds good to me. If we're treating blobs as snapshots, I retract my suggestion of the read-if-not-changed-since parameter. All reads after the data has changed should fail. If you want to do a chunked upload, don't snapshot your
Re: File API: Blob and underlying file changes.
One thing to remember here is that if we require snapshotting, that will mean paying potentially very high costs every time the snapshotting operation is used. Potetially copying hundreds of megabytes of data (think video). But if we don't require snapshotting, things will only break if the user takes the action to modify a file after giving the page access to it. Also, in general snapshotting is something that UAs can experiment with without requiring changes to the spec. Even though File.slice is a synchronous function, the UA can implement snapshotting without using synchronous IO. The UA could simply do a asynchronous file copy in the background. If any read operations are performed on the slice those could simply be stalled until the copy is finished since reads are always asynchronous. / Jonas On Thu, Jan 21, 2010 at 11:22 AM, Eric Uhrhane er...@google.com wrote: On Thu, Jan 21, 2010 at 11:15 AM, Jian Li jia...@chromium.org wrote: Treating blobs as snapshots sounds like a reasonable approach and it will make the life of the chunked upload and other scenarios easier. Now the problem is: how do we get the blob (snapshot) out of the file? 1) We can still keep the current relationship between File and Blob. When we slice a file by calling File.slice, a new blob that captures the current file size and modification time is returned. The following Blob operations, like slice, will simply inherit the cached size and modification time. When we access the underlying file data in XHR.send() or FileReader, the modification time will be verified and an exception could be thrown. This would require File.slice to do synchronous file IO, whereas Blob.slice doesn't do that. 2) We can remove the inheritance of Blob from File and introduce File.getAsBlob() as dimich suggested. This seems to be more elegant. However, it requires changing the File API spec a lot. On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane er...@google.com wrote: On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov dim...@chromium.org wrote: On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. True, but we'd probably want to deprecate file.fileSize anyway and then get rid of it, since it's synchronous. 2) Currently, xhr.send(file) does not fail and sends the version of the file that exists somewhere around xhr.send(file) call was issued. Since File is also a Blob, xhr.send(blob) would behave the same which means if we want to preserve this behavior the Blob can not fail async read operation if file has changed. There is a contradiction here. One way to resolve it would be to break File is Blob and to be able to capture the File as Blob by having file.getAsBlob(). The latter would make a snapshot of the state of the file, to be able to fail subsequent async read operations if the file has been changed. I've asked a few people around in a non-scientific poll and it seems developers expect Blob to be a 'snapshot', reflecting the state of the file (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation. Since it's obviously bad to actually copy data, it seems acceptable to capture enough information (like mod time) so the read operations later can fail if underlying storage has been changed. It feels really strange if reading the Blob can yield some data from one version of a file (or Canvas) mixed with some data from newer version, without any indication that this is happening. All that means there is an option 3: 3. Treat all Blobs as 'snapshots'
Re: File API: Blob and underlying file changes.
On Thu, Jan 21, 2010 at 12:49 PM, Jonas Sicking jo...@sicking.cc wrote: One thing to remember here is that if we require snapshotting, that will mean paying potentially very high costs every time the snapshotting operation is used. Potetially copying hundreds of megabytes of data (think video). I was thinking of different semantics. If the underlying bits change sometime after a 'snapshot' is taken, the 'snapshot' becomes invalid and you cannot access the underying bits. If an application wants guaranteed access to the 'snapshot', it would have to explicitly save a copy somewhere (sandboxed file system / coin a new transient 'Blob' via a new blob.copy() method) and refer to the copy. So no costly copies are made w/o explicit direction to do so from the app. But if we don't require snapshotting, things will only break if the user takes the action to modify a file after giving the page access to it. Also, in general snapshotting is something that UAs can experiment with without requiring changes to the spec. Even though File.slice is a synchronous function, the UA can implement snapshotting without using synchronous IO. The UA could simply do a asynchronous file copy in the background. If any read operations are performed on the slice those could simply be stalled until the copy is finished since reads are always asynchronous. / Jonas On Thu, Jan 21, 2010 at 11:22 AM, Eric Uhrhane er...@google.com wrote: On Thu, Jan 21, 2010 at 11:15 AM, Jian Li jia...@chromium.org wrote: Treating blobs as snapshots sounds like a reasonable approach and it will make the life of the chunked upload and other scenarios easier. Now the problem is: how do we get the blob (snapshot) out of the file? 1) We can still keep the current relationship between File and Blob. When we slice a file by calling File.slice, a new blob that captures the current file size and modification time is returned. The following Blob operations, like slice, will simply inherit the cached size and modification time. When we access the underlying file data in XHR.send() or FileReader, the modification time will be verified and an exception could be thrown. This would require File.slice to do synchronous file IO, whereas Blob.slice doesn't do that. 2) We can remove the inheritance of Blob from File and introduce File.getAsBlob() as dimich suggested. This seems to be more elegant. However, it requires changing the File API spec a lot. On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane er...@google.com wrote: On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov dim...@chromium.org wrote: On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. True, but we'd probably want to deprecate file.fileSize anyway and then get rid of it, since it's synchronous. 2) Currently, xhr.send(file) does not fail and sends the version of the file that exists somewhere around xhr.send(file) call was issued. Since File is also a Blob, xhr.send(blob) would behave the same which means if we want to preserve this behavior the Blob can not fail async read operation if file has changed. There is a contradiction here. One way to resolve it would be to break File is Blob and to be able to capture the File as Blob by having file.getAsBlob(). The latter would make a snapshot of the state of the file, to be able to fail subsequent async read operations if the file has been changed. I've asked a few people around in a non-scientific
Re: File API: Blob and underlying file changes.
I think the 'snapshotting' discussed above does not imply the actual copy of data, sync or async. The proposal seems to be to 'snapshot' enough information (in case of file on a disk - the modification time is enogh) so that later read operations can fail reliably if the Blob is out of sync with underlying storage. Making copies of large video files will probably never be a feasible option, for size/time issues and for potentially quite complicated lifetime of such copies... We might provide a separate API for file manipulation that can be used to make temporary copies of files in cases where it is a good idea, and that could be used in conjunction with Blob API perhaps, but it seems to be a separate functionality. It is also interesting to think of Blobs backed by some other objects, Canvas for example. Perhaps 'snapshotting' is not an ideal name, but I think discussion above means it as capture the state of the underlying object so the data can be read in the future but w/o a guarantee that the read operation will actually succeed - since there can not be a guarantee that underlying object is still there. On Thu, Jan 21, 2010 at 12:49 PM, Jonas Sicking jo...@sicking.cc wrote: One thing to remember here is that if we require snapshotting, that will mean paying potentially very high costs every time the snapshotting operation is used. Potetially copying hundreds of megabytes of data (think video). But if we don't require snapshotting, things will only break if the user takes the action to modify a file after giving the page access to it. Also, in general snapshotting is something that UAs can experiment with without requiring changes to the spec. Even though File.slice is a synchronous function, the UA can implement snapshotting without using synchronous IO. The UA could simply do a asynchronous file copy in the background. If any read operations are performed on the slice those could simply be stalled until the copy is finished since reads are always asynchronous. / Jonas On Thu, Jan 21, 2010 at 11:22 AM, Eric Uhrhane er...@google.com wrote: On Thu, Jan 21, 2010 at 11:15 AM, Jian Li jia...@chromium.org wrote: Treating blobs as snapshots sounds like a reasonable approach and it will make the life of the chunked upload and other scenarios easier. Now the problem is: how do we get the blob (snapshot) out of the file? 1) We can still keep the current relationship between File and Blob. When we slice a file by calling File.slice, a new blob that captures the current file size and modification time is returned. The following Blob operations, like slice, will simply inherit the cached size and modification time. When we access the underlying file data in XHR.send() or FileReader, the modification time will be verified and an exception could be thrown. This would require File.slice to do synchronous file IO, whereas Blob.slice doesn't do that. 2) We can remove the inheritance of Blob from File and introduce File.getAsBlob() as dimich suggested. This seems to be more elegant. However, it requires changing the File API spec a lot. On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane er...@google.com wrote: On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov dim...@chromium.org wrote: On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. True, but we'd probably want to deprecate file.fileSize anyway and then get rid of it, since it's synchronous. 2) Currently, xhr.send(file) does not fail and sends the
Re: File API: Blob and underlying file changes.
What we mean for snapshotting here is not to copy all the underlying data. Instead, we only intend to capture the least information needed in order to verify if the underlying data have been changed. I agreed with Eric that the first option could cause inconsistent semantics between File.slice and Bloc.slice. But how are we going to address the synchronous call to get the file size for Blob.size if the blob is a file? On Thu, Jan 21, 2010 at 12:49 PM, Jonas Sicking jo...@sicking.cc wrote: One thing to remember here is that if we require snapshotting, that will mean paying potentially very high costs every time the snapshotting operation is used. Potetially copying hundreds of megabytes of data (think video). But if we don't require snapshotting, things will only break if the user takes the action to modify a file after giving the page access to it. Also, in general snapshotting is something that UAs can experiment with without requiring changes to the spec. Even though File.slice is a synchronous function, the UA can implement snapshotting without using synchronous IO. The UA could simply do a asynchronous file copy in the background. If any read operations are performed on the slice those could simply be stalled until the copy is finished since reads are always asynchronous. / Jonas On Thu, Jan 21, 2010 at 11:22 AM, Eric Uhrhane er...@google.com wrote: On Thu, Jan 21, 2010 at 11:15 AM, Jian Li jia...@chromium.org wrote: Treating blobs as snapshots sounds like a reasonable approach and it will make the life of the chunked upload and other scenarios easier. Now the problem is: how do we get the blob (snapshot) out of the file? 1) We can still keep the current relationship between File and Blob. When we slice a file by calling File.slice, a new blob that captures the current file size and modification time is returned. The following Blob operations, like slice, will simply inherit the cached size and modification time. When we access the underlying file data in XHR.send() or FileReader, the modification time will be verified and an exception could be thrown. This would require File.slice to do synchronous file IO, whereas Blob.slice doesn't do that. 2) We can remove the inheritance of Blob from File and introduce File.getAsBlob() as dimich suggested. This seems to be more elegant. However, it requires changing the File API spec a lot. On Wed, Jan 20, 2010 at 3:44 PM, Eric Uhrhane er...@google.com wrote: On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov dim...@chromium.org wrote: On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. True, but we'd probably want to deprecate file.fileSize anyway and then get rid of it, since it's synchronous. 2) Currently, xhr.send(file) does not fail and sends the version of the file that exists somewhere around xhr.send(file) call was issued. Since File is also a Blob, xhr.send(blob) would behave the same which means if we want to preserve this behavior the Blob can not fail async read operation if file has changed. There is a contradiction here. One way to resolve it would be to break File is Blob and to be able to capture the File as Blob by having file.getAsBlob(). The latter would make a snapshot of the state of the file, to be able to fail subsequent async read operations if the file has been changed. I've asked a few people around in a non-scientific poll and it seems developers expect Blob to be
Re: File API: Blob and underlying file changes.
So it seems there is 2 ideas on how to handle the underlying file changes in case of File and Blob objects, nicely captured by Arun above: 1. Keep all Blobs 'mutating', following the underlying file change. In particular, it means that Blob.size and similar properties may change from query to query, reflecting the current file state. In case the Blob was sliced and corresponding portion of the file does not exist anymore, it would be clamped, potentially to 0, as currently specified. Read operations would simply read the clamped portion. That would provide similar behavior of all Blobs regardless if they are the Files or obtained via slice(). It also has a slight disadvantage that every access to Blob.size or Blob.slice() will incur synchronous file I/O. Note that current File.fileSize is already implemented like that in FF and WebKit and uses sync file I/O. 2. Treat Blobs that are Files and Blobs that are produced by slice() as different blobs, semantically. While former ones would 'mutate' with the file on the disk (to keep compat with form submission), the later would simply 'inherit' the file information and never do sync IO. Instead, they would fail later during async read operations. This has disadvantage of Blob behaving differently in some cases, making it hard for web developers to produce correct code. The synchronous file IO would be reduced but not completely eliminated, because the Blobs that are Files would continue to 'sync' with the underlying file stats during sync JS calls. One benefit is that it allows detection of file content change, via checks of modification time captured when the first slice() operation is performed and verified during async read operations, which provides a way to implement reliable file operations in face of changing files, if the developer wants to spent an effort to do so. It seems folks on the thread do not like the duplicity of Blobs (hard to program and debug), and there is also a desire to avoid synchronous file IO. It seems the spec today leans more to the #1. The only problem with it is that it's hard to implement some scenarios, like a big file upload in chunks - in case the file changes, the result of upload may actually be a mix of new and old file contents and there is no way to check... Perhaps we can expose File.modificationTime? It still dos not get rid of sync I/O... Dmitry On Fri, Jan 15, 2010 at 12:10 PM, Dmitry Titov dim...@chromium.org wrote: On Fri, Jan 15, 2010 at 11:50 AM, Jonas Sicking jo...@sicking.cc wrote: This doesn't address the problem that authors are unlikely to even attempt to deal with this situation, given how rare it is. And even less likely to deal with it successfully given how hard the situation is reproduce while testing. I don't know how rare the case is. It might become less rare if there is an uploader of big movie files and it's easy to overwrite the big movie file by hitting 'save' button in movie editor while it is still uploading... Perhaps such uploader can use other means to detect the file change though... It would be nice to spell out *some* behavior though, or we can end up with some incompatible implementations. Speaking about Blob.slice(), what is recommended behavior of resultant Blobs on the underlying file change? / Jonas
Re: File API: Blob and underlying file changes.
On Wed, Jan 20, 2010 at 1:45 PM, Dmitry Titov dim...@chromium.org wrote: So it seems there is 2 ideas on how to handle the underlying file changes in case of File and Blob objects, nicely captured by Arun above: 1. Keep all Blobs 'mutating', following the underlying file change. In particular, it means that Blob.size and similar properties may change from query to query, reflecting the current file state. In case the Blob was sliced and corresponding portion of the file does not exist anymore, it would be clamped, potentially to 0, as currently specified. Read operations would simply read the clamped portion. That would provide similar behavior of all Blobs regardless if they are the Files or obtained via slice(). It also has a slight disadvantage that every access to Blob.size or Blob.slice() will incur synchronous file I/O. Note that current File.fileSize is already implemented like that in FF and WebKit and uses sync file I/O. 2. Treat Blobs that are Files and Blobs that are produced by slice() as different blobs, semantically. While former ones would 'mutate' with the file on the disk (to keep compat with form submission), the later would simply 'inherit' the file information and never do sync IO. Instead, they would fail later during async read operations. This has disadvantage of Blob behaving differently in some cases, making it hard for web developers to produce correct code. The synchronous file IO would be reduced but not completely eliminated, because the Blobs that are Files would continue to 'sync' with the underlying file stats during sync JS calls. One benefit is that it allows detection of file content change, via checks of modification time captured when the first slice() operation is performed and verified during async read operations, which provides a way to implement reliable file operations in face of changing files, if the developer wants to spent an effort to do so. It seems folks on the thread do not like the duplicity of Blobs (hard to program and debug), and there is also a desire to avoid synchronous file IO. It seems the spec today leans more to the #1. The only problem with it is that it's hard to implement some scenarios, like a big file upload in chunks - in case the file changes, the result of upload may actually be a mix of new and old file contents and there is no way to check... Perhaps we can expose File.modificationTime? It still dos not get rid of sync I/O... I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. Eric Dmitry On Fri, Jan 15, 2010 at 12:10 PM, Dmitry Titov dim...@chromium.org wrote: On Fri, Jan 15, 2010 at 11:50 AM, Jonas Sicking jo...@sicking.cc wrote: This doesn't address the problem that authors are unlikely to even attempt to deal with this situation, given how rare it is. And even less likely to deal with it successfully given how hard the situation is reproduce while testing. I don't know how rare the case is. It might become less rare if there is an uploader of big movie files and it's easy to overwrite the big movie file by hitting 'save' button in movie editor while it is still uploading... Perhaps such uploader can use other means to detect the file change though... It would be nice to spell out some behavior though, or we can end up with some incompatible implementations. Speaking about Blob.slice(), what is recommended behavior of resultant Blobs on the underlying file change? / Jonas
Re: File API: Blob and underlying file changes.
On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. 2) Currently, xhr.send(file) does not fail and sends the version of the file that exists somewhere around xhr.send(file) call was issued. Since File is also a Blob, xhr.send(blob) would behave the same which means if we want to preserve this behavior the Blob can not fail async read operation if file has changed. There is a contradiction here. One way to resolve it would be to break File is Blob and to be able to capture the File as Blob by having file.getAsBlob(). The latter would make a snapshot of the state of the file, to be able to fail subsequent async read operations if the file has been changed. I've asked a few people around in a non-scientific poll and it seems developers expect Blob to be a 'snapshot', reflecting the state of the file (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation. Since it's obviously bad to actually copy data, it seems acceptable to capture enough information (like mod time) so the read operations later can fail if underlying storage has been changed. It feels really strange if reading the Blob can yield some data from one version of a file (or Canvas) mixed with some data from newer version, without any indication that this is happening. All that means there is an option 3: 3. Treat all Blobs as 'snapshots' that refer to the range of underlying data at the moment of creation of the Blob. Blobs produced further by Blob.slice() operation inherit the captured state w/o actually verifying it against 'live' underlying objects like files. All Blobs can be 'read' (or 'sent') via operations that can fail if the underlying content has changed. Optionally, expose snapshotTime property and perhaps read if not changed since parameter to read operations. Do not derive File from Blob, rather have File.getAsBlob() that produces a Blob which is a snapshot of the file at the moment of call. The advantage here is that it removes need for sync operations from Blob and provides mechanism to ensure the changing underlying storage is detectable. The disadvantage is a bit more complexity and bigger change to File spec.
Re: File API: Blob and underlying file changes.
On Wed, Jan 20, 2010 at 3:23 PM, Dmitry Titov dim...@chromium.org wrote: On Wed, Jan 20, 2010 at 2:30 PM, Eric Uhrhane er...@google.com wrote: I think it could. Here's a third option: Make all blobs, file-based or not, just as async as the blobs in option 2. They never do sync IO, but could potentially fail future read operations if their metadata is out of date [e.g. reading beyond EOF]. However, expose the modification time on File via an async method and allow the user to pass it in to a read call to enforce fail if changed since this time. This keeps all file accesses async, but still allows for chunked uploads without mixing files accidentally. If we allow users to refresh the modification time asynchronously, it also allows for adding a file to a form, changing the file on disk, and then uploading the new file. The user would look up the mod time when starting the upload, rather than when the file's selected. It would be great to avoid sync file I/O on calls like Blob.size. They would simply return cached value. Actual mismatch would be detected during actual read operation. However then I'm not sure how to keep File derived from Blob, since: 1) Currently, in FF and WebKit File.fileSize is a sync I/O that returns current file size. The current spec says File is derived from Blob and Blob has Blob.size property that is likely going to co-exist with File.fileSize for a while, for compat reasons. It's weird for file.size and file.fileSize to return different things. True, but we'd probably want to deprecate file.fileSize anyway and then get rid of it, since it's synchronous. 2) Currently, xhr.send(file) does not fail and sends the version of the file that exists somewhere around xhr.send(file) call was issued. Since File is also a Blob, xhr.send(blob) would behave the same which means if we want to preserve this behavior the Blob can not fail async read operation if file has changed. There is a contradiction here. One way to resolve it would be to break File is Blob and to be able to capture the File as Blob by having file.getAsBlob(). The latter would make a snapshot of the state of the file, to be able to fail subsequent async read operations if the file has been changed. I've asked a few people around in a non-scientific poll and it seems developers expect Blob to be a 'snapshot', reflecting the state of the file (or Canvas if we get Canvas.getBlob(...)) at the moment of Blob creation. Since it's obviously bad to actually copy data, it seems acceptable to capture enough information (like mod time) so the read operations later can fail if underlying storage has been changed. It feels really strange if reading the Blob can yield some data from one version of a file (or Canvas) mixed with some data from newer version, without any indication that this is happening. All that means there is an option 3: 3. Treat all Blobs as 'snapshots' that refer to the range of underlying data at the moment of creation of the Blob. Blobs produced further by Blob.slice() operation inherit the captured state w/o actually verifying it against 'live' underlying objects like files. All Blobs can be 'read' (or 'sent') via operations that can fail if the underlying content has changed. Optionally, expose snapshotTime property and perhaps read if not changed since parameter to read operations. Do not derive File from Blob, rather have File.getAsBlob() that produces a Blob which is a snapshot of the file at the moment of call. The advantage here is that it removes need for sync operations from Blob and provides mechanism to ensure the changing underlying storage is detectable. The disadvantage is a bit more complexity and bigger change to File spec. That sounds good to me. If we're treating blobs as snapshots, I retract my suggestion of the read-if-not-changed-since parameter. All reads after the data has changed should fail. If you want to do a chunked upload, don't snapshot your file into a blob until you're ready to start. Once you've done that, just slice off parts of the blob, not the file.
Re: File API: Blob and underlying file changes.
I don't think we should worry about underlying file changes. If the app wants to cut a file into parts and copy them separately, then perhaps the app should first copy the file into a private area. (I'm presuming that one day, we'll have the concept of a chroot'd private file storage area for a web app.) I think we should avoid solutions that involve file locking since it is bad for the user (loss of control) if their files are locked by the browser on behalf of a web app. It might be reasonable, however, to lock a file while sending it. -Darin On Thu, Jan 14, 2010 at 2:41 PM, Jian Li jia...@chromium.org wrote: It seems that we feel that when a File object is sent via either Form or XHR, the latest underlying version should be used. When we get a slice via Blob.slice, we assume that the underlying file data is stable since then. So for uploader scenario, we need to cut a big file into multiple pieces. With current File API spec, we will have to do something like the following to make sure that all pieces are cut from a stable file. var file = myInputElement.files[0]; var blob = file.slice(0, file.size); var piece1 = blob.slice(0, 1000); var piece2 = blob.slice(1001, 1000); ... The above seems a bit ugly. If we want to make it clean, what Dmitry proposed above seems to be reasonable. But it would require non-trivial spec change. On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov dim...@chromium.orgwrote: Atomic read is obviously a nice thing - it would be hard to program against API that behaves as unpredictably as a single read operation that reads half of old content and half of new content. At the same note, it would be likely very hard to program against Blob objects if they could change underneath unpredictably. Imagine that we need to build an uploader that cuts a big file in multiple pieces and sends those pieces to the servers so they will be stitched together later. If during this operation the underlying file changes and this changes all the pieces that Blobs refer to (due to clamping and just silent change of content), all the slicing/stitching assumptions are invalid and it's hard to even notice since blobs are simply 'clamped' silently. Some degree of mess is possible then. Another use case could be a JPEG image processor that uses slice() to cut the headers from the image file and then uses info from the headers to cut further JFIF fields from the file (reading EXIF and populating local database of images for example). Changing the file in the middle of that is bad. It seems the typical use cases that will need Blob.slice() functionality form 'units of work' where Blob.slice() is used with likely assumption that underlying data is stable and does not change silently. Such a 'unit of work' should fail as a whole if underlying file changes. One way to achieve that is to reliably fail operations with 'derived' Blobs and even perhaps have a 'isValid' property on it. 'Derived' Blobs are those obtained via slice(), as opposite to 'original' Blobs that are also File. One disadvantage of this approach is that it implies that the same Blob has 2 possible behaviors - when it is obtained via Blob.slice() (or other methods) vs is a File. It all could be a bit cleaner if File did not derive from Blob, but instead had getAsBlob() method - then it would be possible to say that Blobs are always immutable but may become 'invalid' over time if underlying data changes. The FileReader can then be just a BlobReader and have cleaner semantics. If that was the case, then xhr.send(file) would capture the state of file at the moment of sending, while xhr.send(blob) would fail with exception if the blob is 'invalid' at the moment of send() operation. This would keep compatibility with current behavior and avoid duplicity of Blob behavior. Quite a change to the spec though... Dmitry On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking jo...@sicking.cc wrote: On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince cpri...@google.com wrote: For the record, I'd like to make the read atomic, such that you can never get half a file before a change, and half after. But it likely depends on what OSs can enforce here. I think *enforcing* atomicity is difficult across all OSes. But implementations can get nearly the same effect by checking the file's last modification time at the start + end of the API call. If it has changed, the read operation can throw an exception. I'm talking about during the actual read. I.e. not related to the lifetime of the File object, just related to the time between the first 'progress' event, and the 'loadend' event. If the file changes during this time there is no way to fake atomicity since the partial file has already been returned. / Jonas
Re: File API: Blob and underlying file changes.
On Thu, Jan 14, 2010 at 11:58 PM, Darin Fisher da...@chromium.org wrote: I don't think we should worry about underlying file changes. If the app wants to cut a file into parts and copy them separately, then perhaps the app should first copy the file into a private area. (I'm presuming that one day, we'll have the concept of a chroot'd private file storage area for a web app.) I think we should avoid solutions that involve file locking since it is bad for the user (loss of control) if their files are locked by the browser on behalf of a web app. It might be reasonable, however, to lock a file while sending it. I largely agree. Though I think it'd be reasonable to lock the file while reading it too. / Jonas On Thu, Jan 14, 2010 at 2:41 PM, Jian Li jia...@chromium.org wrote: It seems that we feel that when a File object is sent via either Form or XHR, the latest underlying version should be used. When we get a slice via Blob.slice, we assume that the underlying file data is stable since then. So for uploader scenario, we need to cut a big file into multiple pieces. With current File API spec, we will have to do something like the following to make sure that all pieces are cut from a stable file. var file = myInputElement.files[0]; var blob = file.slice(0, file.size); var piece1 = blob.slice(0, 1000); var piece2 = blob.slice(1001, 1000); ... The above seems a bit ugly. If we want to make it clean, what Dmitry proposed above seems to be reasonable. But it would require non-trivial spec change. On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov dim...@chromium.org wrote: Atomic read is obviously a nice thing - it would be hard to program against API that behaves as unpredictably as a single read operation that reads half of old content and half of new content. At the same note, it would be likely very hard to program against Blob objects if they could change underneath unpredictably. Imagine that we need to build an uploader that cuts a big file in multiple pieces and sends those pieces to the servers so they will be stitched together later. If during this operation the underlying file changes and this changes all the pieces that Blobs refer to (due to clamping and just silent change of content), all the slicing/stitching assumptions are invalid and it's hard to even notice since blobs are simply 'clamped' silently. Some degree of mess is possible then. Another use case could be a JPEG image processor that uses slice() to cut the headers from the image file and then uses info from the headers to cut further JFIF fields from the file (reading EXIF and populating local database of images for example). Changing the file in the middle of that is bad. It seems the typical use cases that will need Blob.slice() functionality form 'units of work' where Blob.slice() is used with likely assumption that underlying data is stable and does not change silently. Such a 'unit of work' should fail as a whole if underlying file changes. One way to achieve that is to reliably fail operations with 'derived' Blobs and even perhaps have a 'isValid' property on it. 'Derived' Blobs are those obtained via slice(), as opposite to 'original' Blobs that are also File. One disadvantage of this approach is that it implies that the same Blob has 2 possible behaviors - when it is obtained via Blob.slice() (or other methods) vs is a File. It all could be a bit cleaner if File did not derive from Blob, but instead had getAsBlob() method - then it would be possible to say that Blobs are always immutable but may become 'invalid' over time if underlying data changes. The FileReader can then be just a BlobReader and have cleaner semantics. If that was the case, then xhr.send(file) would capture the state of file at the moment of sending, while xhr.send(blob) would fail with exception if the blob is 'invalid' at the moment of send() operation. This would keep compatibility with current behavior and avoid duplicity of Blob behavior. Quite a change to the spec though... Dmitry On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking jo...@sicking.cc wrote: On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince cpri...@google.com wrote: For the record, I'd like to make the read atomic, such that you can never get half a file before a change, and half after. But it likely depends on what OSs can enforce here. I think *enforcing* atomicity is difficult across all OSes. But implementations can get nearly the same effect by checking the file's last modification time at the start + end of the API call. If it has changed, the read operation can throw an exception. I'm talking about during the actual read. I.e. not related to the lifetime of the File object, just related to the time between the first 'progress' event, and the 'loadend' event. If the file changes during this time there is no way to fake atomicity since the partial file has already been returned.
Re: File API: Blob and underlying file changes.
On Fri, Jan 15, 2010 at 10:19 AM, Dmitry Titov dim...@chromium.org wrote: Nobody proposed locking the file. Sorry for being unclear if that sounds like it. Basically it's all about timestamps. As Chris proposed earlier, a read operation can grab the timestamp of the file before and after reading its content and throw exception if the timestamps do not match. This is pretty good approximation of 'atomic' read - although it can not guarantee success, it can at least provide reliable detection of it. but doesn't that imply some degree of unpredictability for web developers? must they always handle that exception even though it is an extremely rare occurrence? also, what about normal form submission, in which the file reading happens asynchronously to form.submit(). Same thing with the Blob - the slice() may capture the timestamp of the content it's based on. Blob can throw exception later if the modification timestamp of underlying data has changed since the time of Blob's creation. also note that we MUST NOT design APIs that involve synchronous file access. no stat calls allowed on the main UI thread please! (remember the network filesystem case.) in other words, assuming detection of file changes happens asynchronously, we'll have trouble producing exceptions as you describe. Both actual OS locking and requiring copying files to a safe location before slice() seem to be problematic, for different reasons. Good example is youtube uploader that needs to slice and send 1Gb file, while having a way to reliably detect the change of the underlyign file, terminate current upload and potentially request another one. Copying is hard because of size and locking, even if provided by OS, may stay in the way of user's workflow. Dmitry On Thu, Jan 14, 2010 at 11:58 PM, Darin Fisher da...@chromium.org wrote: I don't think we should worry about underlying file changes. If the app wants to cut a file into parts and copy them separately, then perhaps the app should first copy the file into a private area. (I'm presuming that one day, we'll have the concept of a chroot'd private file storage area for a web app.) I think we should avoid solutions that involve file locking since it is bad for the user (loss of control) if their files are locked by the browser on behalf of a web app. It might be reasonable, however, to lock a file while sending it. -Darin On Thu, Jan 14, 2010 at 2:41 PM, Jian Li jia...@chromium.org wrote: It seems that we feel that when a File object is sent via either Form or XHR, the latest underlying version should be used. When we get a slice via Blob.slice, we assume that the underlying file data is stable since then. So for uploader scenario, we need to cut a big file into multiple pieces. With current File API spec, we will have to do something like the following to make sure that all pieces are cut from a stable file. var file = myInputElement.files[0]; var blob = file.slice(0, file.size); var piece1 = blob.slice(0, 1000); var piece2 = blob.slice(1001, 1000); ... The above seems a bit ugly. If we want to make it clean, what Dmitry proposed above seems to be reasonable. But it would require non-trivial spec change. On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov dim...@chromium.orgwrote: Atomic read is obviously a nice thing - it would be hard to program against API that behaves as unpredictably as a single read operation that reads half of old content and half of new content. At the same note, it would be likely very hard to program against Blob objects if they could change underneath unpredictably. Imagine that we need to build an uploader that cuts a big file in multiple pieces and sends those pieces to the servers so they will be stitched together later. If during this operation the underlying file changes and this changes all the pieces that Blobs refer to (due to clamping and just silent change of content), all the slicing/stitching assumptions are invalid and it's hard to even notice since blobs are simply 'clamped' silently. Some degree of mess is possible then. Another use case could be a JPEG image processor that uses slice() to cut the headers from the image file and then uses info from the headers to cut further JFIF fields from the file (reading EXIF and populating local database of images for example). Changing the file in the middle of that is bad. It seems the typical use cases that will need Blob.slice() functionality form 'units of work' where Blob.slice() is used with likely assumption that underlying data is stable and does not change silently. Such a 'unit of work' should fail as a whole if underlying file changes. One way to achieve that is to reliably fail operations with 'derived' Blobs and even perhaps have a 'isValid' property on it. 'Derived' Blobs are those obtained via slice(), as opposite to 'original' Blobs that are also File. One
Re: File API: Blob and underlying file changes.
On Fri, Jan 15, 2010 at 10:19 AM, Dmitry Titov dim...@chromium.org wrote: Nobody proposed locking the file. Sorry for being unclear if that sounds like it. Basically it's all about timestamps. As Chris proposed earlier, a read operation can grab the timestamp of the file before and after reading its content and throw exception if the timestamps do not match. This is pretty good approximation of 'atomic' read - although it can not guarantee success, it can at least provide reliable detection of it. I don't understand how you intend to use the timestamp. Consider the following scenario: 1. User drops a 10MB File onto the a page. 2. Page requests to read the file using FileReader.readAsBinaryString and installs a 'progress' event listener. 3. Implementation grabs a the current timestamp and then starts reading the file 4. After 2MB of data is read the implementation updates FileReader.result with the partial read and fires a 'progress' event. 5. Page grabs the partial result and processes it. 6. After another 1MB of data is read, but before another 'progress' event has been fired, the user modifies the file such that the timestamp changes 7. The implementation detects that the timestamp has changed. Now what? You can't throw an exception since part of the file has already been delivered. You could raise an error event, but that's unlikely to be treated correctly by the page as this is a very rare condition and hard to test for, so the page author has likely not written correct code to deal with it. It's additionally not atomic since the read started, but was interrupted. / Jonas
Re: File API: Blob and underlying file changes.
On Fri, Jan 15, 2010 at 11:42 AM, Dmitry Titov dim...@chromium.org wrote: On Fri, Jan 15, 2010 at 10:36 AM, Jonas Sicking jo...@sicking.cc wrote: On Fri, Jan 15, 2010 at 10:19 AM, Dmitry Titov dim...@chromium.org wrote: Nobody proposed locking the file. Sorry for being unclear if that sounds like it. Basically it's all about timestamps. As Chris proposed earlier, a read operation can grab the timestamp of the file before and after reading its content and throw exception if the timestamps do not match. This is pretty good approximation of 'atomic' read - although it can not guarantee success, it can at least provide reliable detection of it. I don't understand how you intend to use the timestamp. Consider the following scenario: 1. User drops a 10MB File onto the a page. 2. Page requests to read the file using FileReader.readAsBinaryString and installs a 'progress' event listener. 3. Implementation grabs a the current timestamp and then starts reading the file 4. After 2MB of data is read the implementation updates FileReader.result with the partial read and fires a 'progress' event. 5. Page grabs the partial result and processes it. 6. After another 1MB of data is read, but before another 'progress' event has been fired, the user modifies the file such that the timestamp changes 7. The implementation detects that the timestamp has changed. Now what? You can't throw an exception since part of the file has already been delivered. You could raise an error event, but that's unlikely to be treated correctly by the page as this is a very rare condition and hard to test for, so the page author has likely not written correct code to deal with it. FileReader has both 'error' and 'abort' events, in addition to 'progress'. It seems we just can use those? There is always a possibility that async operation that comes with partial results may fail as a whole - the only real way to ensure its atomicity would be to reliably lock the file or/and make a copy - which as this thread indicates are both not always possible. So yeah, in case FileReader returned 2MB and file suddenly changed to be only 1Mb, the next event the page should get is 'error'. What would be other possibility? This doesn't address the problem that authors are unlikely to even attempt to deal with this situation, given how rare it is. And even less likely to deal with it successfully given how hard the situation is reproduce while testing. / Jonas
Re: File API: Blob and underlying file changes.
It seems that we feel that when a File object is sent via either Form or XHR, the latest underlying version should be used. When we get a slice via Blob.slice, we assume that the underlying file data is stable since then. So for uploader scenario, we need to cut a big file into multiple pieces. With current File API spec, we will have to do something like the following to make sure that all pieces are cut from a stable file. var file = myInputElement.files[0]; var blob = file.slice(0, file.size); var piece1 = blob.slice(0, 1000); var piece2 = blob.slice(1001, 1000); ... The above seems a bit ugly. If we want to make it clean, what Dmitry proposed above seems to be reasonable. But it would require non-trivial spec change. On Wed, Jan 13, 2010 at 11:28 AM, Dmitry Titov dim...@chromium.org wrote: Atomic read is obviously a nice thing - it would be hard to program against API that behaves as unpredictably as a single read operation that reads half of old content and half of new content. At the same note, it would be likely very hard to program against Blob objects if they could change underneath unpredictably. Imagine that we need to build an uploader that cuts a big file in multiple pieces and sends those pieces to the servers so they will be stitched together later. If during this operation the underlying file changes and this changes all the pieces that Blobs refer to (due to clamping and just silent change of content), all the slicing/stitching assumptions are invalid and it's hard to even notice since blobs are simply 'clamped' silently. Some degree of mess is possible then. Another use case could be a JPEG image processor that uses slice() to cut the headers from the image file and then uses info from the headers to cut further JFIF fields from the file (reading EXIF and populating local database of images for example). Changing the file in the middle of that is bad. It seems the typical use cases that will need Blob.slice() functionality form 'units of work' where Blob.slice() is used with likely assumption that underlying data is stable and does not change silently. Such a 'unit of work' should fail as a whole if underlying file changes. One way to achieve that is to reliably fail operations with 'derived' Blobs and even perhaps have a 'isValid' property on it. 'Derived' Blobs are those obtained via slice(), as opposite to 'original' Blobs that are also File. One disadvantage of this approach is that it implies that the same Blob has 2 possible behaviors - when it is obtained via Blob.slice() (or other methods) vs is a File. It all could be a bit cleaner if File did not derive from Blob, but instead had getAsBlob() method - then it would be possible to say that Blobs are always immutable but may become 'invalid' over time if underlying data changes. The FileReader can then be just a BlobReader and have cleaner semantics. If that was the case, then xhr.send(file) would capture the state of file at the moment of sending, while xhr.send(blob) would fail with exception if the blob is 'invalid' at the moment of send() operation. This would keep compatibility with current behavior and avoid duplicity of Blob behavior. Quite a change to the spec though... Dmitry On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking jo...@sicking.cc wrote: On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince cpri...@google.com wrote: For the record, I'd like to make the read atomic, such that you can never get half a file before a change, and half after. But it likely depends on what OSs can enforce here. I think *enforcing* atomicity is difficult across all OSes. But implementations can get nearly the same effect by checking the file's last modification time at the start + end of the API call. If it has changed, the read operation can throw an exception. I'm talking about during the actual read. I.e. not related to the lifetime of the File object, just related to the time between the first 'progress' event, and the 'loadend' event. If the file changes during this time there is no way to fake atomicity since the partial file has already been returned. / Jonas
Re: File API: Blob and underlying file changes.
On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince cpri...@google.com wrote: For the record, I'd like to make the read atomic, such that you can never get half a file before a change, and half after. But it likely depends on what OSs can enforce here. I think *enforcing* atomicity is difficult across all OSes. But implementations can get nearly the same effect by checking the file's last modification time at the start + end of the API call. If it has changed, the read operation can throw an exception. I'm talking about during the actual read. I.e. not related to the lifetime of the File object, just related to the time between the first 'progress' event, and the 'loadend' event. If the file changes during this time there is no way to fake atomicity since the partial file has already been returned. / Jonas
Re: File API: Blob and underlying file changes.
Atomic read is obviously a nice thing - it would be hard to program against API that behaves as unpredictably as a single read operation that reads half of old content and half of new content. At the same note, it would be likely very hard to program against Blob objects if they could change underneath unpredictably. Imagine that we need to build an uploader that cuts a big file in multiple pieces and sends those pieces to the servers so they will be stitched together later. If during this operation the underlying file changes and this changes all the pieces that Blobs refer to (due to clamping and just silent change of content), all the slicing/stitching assumptions are invalid and it's hard to even notice since blobs are simply 'clamped' silently. Some degree of mess is possible then. Another use case could be a JPEG image processor that uses slice() to cut the headers from the image file and then uses info from the headers to cut further JFIF fields from the file (reading EXIF and populating local database of images for example). Changing the file in the middle of that is bad. It seems the typical use cases that will need Blob.slice() functionality form 'units of work' where Blob.slice() is used with likely assumption that underlying data is stable and does not change silently. Such a 'unit of work' should fail as a whole if underlying file changes. One way to achieve that is to reliably fail operations with 'derived' Blobs and even perhaps have a 'isValid' property on it. 'Derived' Blobs are those obtained via slice(), as opposite to 'original' Blobs that are also File. One disadvantage of this approach is that it implies that the same Blob has 2 possible behaviors - when it is obtained via Blob.slice() (or other methods) vs is a File. It all could be a bit cleaner if File did not derive from Blob, but instead had getAsBlob() method - then it would be possible to say that Blobs are always immutable but may become 'invalid' over time if underlying data changes. The FileReader can then be just a BlobReader and have cleaner semantics. If that was the case, then xhr.send(file) would capture the state of file at the moment of sending, while xhr.send(blob) would fail with exception if the blob is 'invalid' at the moment of send() operation. This would keep compatibility with current behavior and avoid duplicity of Blob behavior. Quite a change to the spec though... Dmitry On Wed, Jan 13, 2010 at 2:38 AM, Jonas Sicking jo...@sicking.cc wrote: On Tue, Jan 12, 2010 at 5:28 PM, Chris Prince cpri...@google.com wrote: For the record, I'd like to make the read atomic, such that you can never get half a file before a change, and half after. But it likely depends on what OSs can enforce here. I think *enforcing* atomicity is difficult across all OSes. But implementations can get nearly the same effect by checking the file's last modification time at the start + end of the API call. If it has changed, the read operation can throw an exception. I'm talking about during the actual read. I.e. not related to the lifetime of the File object, just related to the time between the first 'progress' event, and the 'loadend' event. If the file changes during this time there is no way to fake atomicity since the partial file has already been returned. / Jonas
Re: File API: Blob and underlying file changes.
For the record, I'd like to make the read atomic, such that you can never get half a file before a change, and half after. But it likely depends on what OSs can enforce here. I think *enforcing* atomicity is difficult across all OSes. But implementations can get nearly the same effect by checking the file's last modification time at the start + end of the API call. If it has changed, the read operation can throw an exception.
File API: Blob and underlying file changes.
Hi, Does the Blob, which is obtained as File (so it refers to an actual file on disk) track the changes in the underlying file and 'mutates', or does it represent the 'snapshot' of the file, or does it become 'invalid'? Today, if a user selects a file using input type=file, and then the file on the disk changes before the 'submit' is clicked, the form will submit the latest version of the file. This may be a surprisingly popular use case, when user submits a file via form and wants to do 'last moment' changes in the file, after partial pre-populating the form. It works 'intuitively' today. Now, if the page decides to use XHR to upload the file, I think var file = myInputElement.files[0]; var xhr = ... xhr.send(file); should also send the version of the file that exists at the moment of xhr.send(file), not when user picked the file (for consistency with form action). Assuming this is desired behavior, what should the following do: var file = myInputElement.files[0]; var blob = file.slice(0, file.size); // ... now file on the disk changes ... xhr.send(blob); Will it: - send the new version of the whole file (and update blob.size?) - send captured number of bytes from the new version of the file (perhaps truncated since file may be shorter now) - send original bytes from the previous version of the file that existed when Blob was created (sort of 'copy on write') - throw exception ? Thanks, Dmitry
Re: File API: Blob and underlying file changes.
Adding reply from Jonas Sicking from anther list (which I used first by mistake :( ) Technically, you should send this email to the webapps mailing list, since that is where this spec is being developed. That said, this is a really hard problem, and one that is hard to test. One thing that we decided when we did security review on this stuff at mozilla is that if a File object is ever passed cross origin using postMessage, then the File object that the other origin has should not work if the file is changed on disc. For some definition of not work. On Fri, Jan 8, 2010 at 2:21 PM, Dmitry Titov dim...@chromium.org wrote: Hi, Does the Blob, which is obtained as File (so it refers to an actual file on disk) track the changes in the underlying file and 'mutates', or does it represent the 'snapshot' of the file, or does it become 'invalid'? Today, if a user selects a file using input type=file, and then the file on the disk changes before the 'submit' is clicked, the form will submit the latest version of the file. This may be a surprisingly popular use case, when user submits a file via form and wants to do 'last moment' changes in the file, after partial pre-populating the form. It works 'intuitively' today. Now, if the page decides to use XHR to upload the file, I think var file = myInputElement.files[0]; var xhr = ... xhr.send(file); should also send the version of the file that exists at the moment of xhr.send(file), not when user picked the file (for consistency with form action). Assuming this is desired behavior, what should the following do: var file = myInputElement.files[0]; var blob = file.slice(0, file.size); // ... now file on the disk changes ... xhr.send(blob); Will it: - send the new version of the whole file (and update blob.size?) - send captured number of bytes from the new version of the file (perhaps truncated since file may be shorter now) - send original bytes from the previous version of the file that existed when Blob was created (sort of 'copy on write') - throw exception ? Thanks, Dmitry