Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/24/2011 11:49 AM, Jamie Lokier wrote: My problem with FIND_* is that we are messing with the well understood semantics of lseek(). fcntl() looks a better fit for FIND_HOLE/DATA anyway. With fcntl(), it would have to be something like: off_t offset = start; if (fcntl (fd, F_FIND_HOLE, offset) != 0) ; // find failed // offset is now set to the next location after start That is, offset has to be passed as an input _and_ output parameter, since we can't use the int return value of fcntl to convey off_t information, and since the optional third argument of fcntl has traditionally been no wider than a pointer which is not the case for off_t on 32-bit platforms. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
Eric Blake wrote: On 04/24/2011 11:49 AM, Jamie Lokier wrote: My problem with FIND_* is that we are messing with the well understood semantics of lseek(). fcntl() looks a better fit for FIND_HOLE/DATA anyway. With fcntl(), it would have to be something like: off_t offset = start; if (fcntl (fd, F_FIND_HOLE, offset) != 0) ; // find failed // offset is now set to the next location after start Yes that, or a pointer-to-struct in the style of other fcntl() operations. A struct offers more flexibiliy such as a limit on search distance (may be useful on filesystems where searching reads a lot of metadata and you don't really want to scan all of a large file), and whether to include unwritten prealloc space or written-known-zeros space. I thought of fcntl() because historically it's often how you get quirky information about files and how to read them, on many OSes. -- Jamie -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
Hi Eric, On 2011-04-22 07:06 -0600, Eric Blake wrote: I've created a request to standardize SEEK_HOLE and SEEK_DATA in the next revision of POSIX; comments are welcome to make sure that everyone is happy with wording: http://austingroupbugs.net/view.php?id=415 Reading through your proposal, I think there is one thing that could be clarified: the meaning of the last hole in the file. Consider the following two file layouts -- in the diagrams, a bar (|) indicates a boundary between holes and non-hole data, and a double bar (||) indicates end-of-file. * File A (sparse file created by lseek/write beyond end-of-file): data | hole 0 | data || hole 1 (virtual) * File B (sparse file created by truncate beyond end-of-file): data | hole 0 || hole 1 (virtual) Excluding the error description, the term the last hole is used in two places in your proposal: * (for SEEK_HOLE): if offset falls within the last hole, then the file offset may be set to the file size instead. * (for SEEK_DATA): it shall be an error ... if offset falls within the last hole. I imagine that both of these conditions are intended to address the case where the offset falls within hole 0 in File B, that is, when there is no non-hole data beyond the specified offset but the offset is nevertheless less than the file size. However, this looks (to me) like the penultimate hole in the file, not the last hole. Furthermore, these conditions are presumably *not* intended to apply to the penultimate hole in File A, which has data after it. I think my confusion can be avoided by talking about the last non-hole data byte in the file (which is unambigious), instead of by talking about the last hole. For instance, the SEEK_HOLE/SEEK_DATA descriptions could be written as follows: If whence is SEEK_HOLE, the file offset shall be set to the smallest location of a byte within a hole and not less than offset, except that if offset falls beyond the last byte not within a hole, then the file offset may be set to the file size instead. It shall be an error if offset is greater or equal to the size of the file. If whence is SEEK_DATA, the file offset shall be set to the smallest location of a byte not within a hole and not less than offset. It shall be an error if no such byte exists. plus a corresponding update to the ENXIO description: ... or the whence argument is SEEK_DATA and the offset falls beyond the last byte not within a hole. -- Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/25/2011 09:02 AM, Nick Bowler wrote: Hi Nick, * File A (sparse file created by lseek/write beyond end-of-file): data | hole 0 | data || hole 1 (virtual) * File B (sparse file created by truncate beyond end-of-file): data | hole 0 || hole 1 (virtual) Excluding the error description, the term the last hole is used in two places in your proposal: * (for SEEK_HOLE): if offset falls within the last hole, then the file offset may be set to the file size instead. * (for SEEK_DATA): it shall be an error ... if offset falls within the last hole. I imagine that both of these conditions are intended to address the case where the offset falls within hole 0 in File B, that is, when there is no non-hole data beyond the specified offset but the offset is nevertheless less than the file size. Correct. However, this looks (to me) like the penultimate hole in the file, not the last hole. Furthermore, these conditions are presumably *not* intended to apply to the penultimate hole in File A, which has data after it. Good catch. I think my confusion can be avoided by talking about the last non-hole data byte in the file (which is unambigious), instead of by talking about the last hole. For instance, the SEEK_HOLE/SEEK_DATA descriptions could be written as follows: If whence is SEEK_HOLE, the file offset shall be set to the smallest location of a byte within a hole and not less than offset, except that if offset falls beyond the last byte not within a hole, then the file offset may be set to the file size instead. It shall be an error if offset is greater or equal to the size of the file. If whence is SEEK_DATA, the file offset shall be set to the smallest location of a byte not within a hole and not less than offset. It shall be an error if no such byte exists. plus a corresponding update to the ENXIO description: ... or the whence argument is SEEK_DATA and the offset falls beyond the last byte not within a hole. I've added your improved wording as a comment at http://austingroupbugs.net/view.php?id=415 -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
Sunil Mushran wrote: On 04/22/2011 04:50 AM, Eric Blake wrote: That blog also mentioned the useful idea of adding FIND_HOLE and FIND_DATA, not implemented in Solaris, but which could easily be provided as additional lseek constants in Linux to locate the start of the next chunk without repositioning and which could ease application programmer's life a bit. After all, cp wants to know where data ends without repositioning (FIND_HOLE), read() that much data which repositions in the process, then skip to the next chunk of data (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants, but 3 per iteration if we only have SEEK_HOLE and have to manually rewind. while(1) { read(block); if (block_all_zeroes) lseek(SEEK_DATA); } What's wrong with the above? If this is the case, even SEEK_HOLE is not needed but should be added as it is already in Solaris. Apart from the obvious waste of effort (scanning *all* data for zeros is cheap but not free if the file is mostly non-hole zeros), you can't do a pread() version of the above in parallel over different parts of the same file/device. My problem with FIND_* is that we are messing with the well understood semantics of lseek(). fcntl() looks a better fit for FIND_HOLE/DATA anyway. -- Jamie -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On Thu, 21 Apr 2011 15:42:33 EDT, Josef Bacik said: -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the given position. Do we want the semantic to be the nearest hole? Or did we really want the next hole? Loops like a bullet loaded in the chamber and pointed at the programmer's foot if they aren't allowing for the fact that this can go *backwards* in the file if the closest hole is towards the beginning. Good way to end up in an infinite loop or other messy... Consider the obvious implementation of skip over a hole - lseek(SEEK_HOLE), lseek(SEEK_DATA). and start reading data because we've skipped over the hole. Wrong - the second seek may have gone backwards if the data was only 4K away but the hole was 64K in size... pgpN3MtjIn6I1.pgp Description: PGP signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 2011.04.22 at 00:50 -0400, Christoph Hellwig wrote: [Eric: please don't drop the Cc list, thanks!] On Thu, Apr 21, 2011 at 09:22:55PM -0400, Josef Bacik wrote: since all files have a virtual hole at the end, but leaves the position unchanged). ??I'd have to write a test program on Solaris to see whether that definition is actually true, or if it is a bug in the Solaris man page. lseek's purpose is to reposition the file position, so I'd imagine this is just a bug in the man page. I would be surprised if the bug is around for such a long time, but otherwise I concur. It's a bug. Let me quote what Jeff Bonwick wrote on his blog: »I'm not sure where you got the impression that either SEEK_HOLE or SEEK_DATA doesn't set the file pointer. They do. (If they didn't, that would be weird, and we'd call it out explicitly.)« http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 05:28 AM, Markus Trippelsdorf wrote: On 2011.04.22 at 00:50 -0400, Christoph Hellwig wrote: [Eric: please don't drop the Cc list, thanks!] That's the fault of gmane. My apologies for not being subscribed in the first place, and for gmane's refusal to cc all. Now that I'm on the cc-chain, it won't happen again for this thread. lseek's purpose is to reposition the file position, so I'd imagine this is just a bug in the man page. I would be surprised if the bug is around for such a long time, but otherwise I concur. It's a bug. Let me quote what Jeff Bonwick wrote on his blog: »I'm not sure where you got the impression that either SEEK_HOLE or SEEK_DATA doesn't set the file pointer. They do. (If they didn't, that would be weird, and we'd call it out explicitly.)« http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data That blog also mentioned the useful idea of adding FIND_HOLE and FIND_DATA, not implemented in Solaris, but which could easily be provided as additional lseek constants in Linux to locate the start of the next chunk without repositioning and which could ease application programmer's life a bit. After all, cp wants to know where data ends without repositioning (FIND_HOLE), read() that much data which repositions in the process, then skip to the next chunk of data (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants, but 3 per iteration if we only have SEEK_HOLE and have to manually rewind. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 05:28 AM, Markus Trippelsdorf wrote: I would be surprised if the bug is around for such a long time, but otherwise I concur. It's a bug. Let me quote what Jeff Bonwick wrote on his blog: »I'm not sure where you got the impression that either SEEK_HOLE or SEEK_DATA doesn't set the file pointer. They do. (If they didn't, that would be weird, and we'd call it out explicitly.)« http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data I've created a request to standardize SEEK_HOLE and SEEK_DATA in the next revision of POSIX; comments are welcome to make sure that everyone is happy with wording: http://austingroupbugs.net/view.php?id=415 -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 04:50 AM, Eric Blake wrote: That blog also mentioned the useful idea of adding FIND_HOLE and FIND_DATA, not implemented in Solaris, but which could easily be provided as additional lseek constants in Linux to locate the start of the next chunk without repositioning and which could ease application programmer's life a bit. After all, cp wants to know where data ends without repositioning (FIND_HOLE), read() that much data which repositions in the process, then skip to the next chunk of data (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants, but 3 per iteration if we only have SEEK_HOLE and have to manually rewind. while(1) { read(block); if (block_all_zeroes) lseek(SEEK_DATA); } What's wrong with the above? If this is the case, even SEEK_HOLE is not needed but should be added as it is already in Solaris. My problem with FIND_* is that we are messing with the well understood semantics of lseek(). And if generic_file_llseek_unlocked() treats SEEK_DATA as SEEK_CUR and SEEK_HOLE as SEEK_END (both with zero offset) then we don't even have to bother with the finding the correct error code. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 10:28 AM, Sunil Mushran wrote: On 04/22/2011 04:50 AM, Eric Blake wrote: That blog also mentioned the useful idea of adding FIND_HOLE and FIND_DATA, not implemented in Solaris, but which could easily be provided as additional lseek constants in Linux to locate the start of the next chunk without repositioning and which could ease application programmer's life a bit. After all, cp wants to know where data ends without repositioning (FIND_HOLE), read() that much data which repositions in the process, then skip to the next chunk of data (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants, but 3 per iteration if we only have SEEK_HOLE and have to manually rewind. while(1) { read(block); if (block_all_zeroes) lseek(SEEK_DATA); } What's wrong with the above? If this is the case, even SEEK_HOLE is not needed but should be added as it is already in Solaris. Because you don't know if the block is the same size as the minimum hole, and because some systems require rather large holes (my Solaris testing on a zfs system didn't have holes until 128k), that's a rather large amount of reading just to prove that the block has all zeros to know that it is even worth trying the lseek(SEEK_DATA). My gut feel is that doing the lseek(SEEK_HOLE) up front coupled with seeking back to the same position is more efficient than manually checking for a run of zeros (less cache pollution, works with 4k read buffers without having to know filesystem hole size). My problem with FIND_* is that we are messing with the well understood semantics of lseek(). You'll notice I didn't propose any FIND_* constants for POSIX. And if generic_file_llseek_unlocked() treats SEEK_DATA as SEEK_CUR and You meant SEEK_SET not SEEK_CUR, but... SEEK_HOLE as SEEK_END (both with zero offset) then we don't even have to bother with the finding the correct error code. ...that's still not compatible with Solaris. On a file with size of 0 bytes, lseek(fd, 1, SEEK_SET) and lseek(fd, 0, SEEK_END) will both succeed, but lseek(fd, 1, SEEK_DATA) and lseek(fd, 0, SEEK_HOLE) must fail with ENXIO (the offset was at or beyond the size of the file). For a file with no holes, Solaris semantics behave as if: off_t lseek(int fildes, off_t offset, int whence) { off_t cur, end; switch (whence) { case SEEK_HOLE: case SEEK_DATA: cur = lseek(fildes, 0, SEEK_CUR); if (cur 0) return cur; end = lseek(fildes, 0, SEEK_END); if (end 0) return end; if (offset end) return whence == SEEK_HOLE ? end : lseek(fildes, offset, SEEK_SET); lseek(fildes, cur, SEEK_SET); errno = ENXIO; return -1; default: ... /* Existing implementation */ } } -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 09:40 AM, Eric Blake wrote: On 04/22/2011 10:28 AM, Sunil Mushran wrote: while(1) { read(block); if (block_all_zeroes) lseek(SEEK_DATA); } What's wrong with the above? If this is the case, even SEEK_HOLE is not needed but should be added as it is already in Solaris. Because you don't know if the block is the same size as the minimum hole, and because some systems require rather large holes (my Solaris testing on a zfs system didn't have holes until 128k), that's a rather large amount of reading just to prove that the block has all zeros to know that it is even worth trying the lseek(SEEK_DATA). My gut feel is that doing the lseek(SEEK_HOLE) up front coupled with seeking back to the same position is more efficient than manually checking for a run of zeros (less cache pollution, works with 4k read buffers without having to know filesystem hole size). Holes are an implementation detail. cp can read whatever blocksize it chooses. If that block contains zero, it would signal cp that maybe it should SEEK_DATA and skip reading all those blocks. That's all. We are not trying to achieve perfection. We are just trying to reduce cpu waste. If the fs supports SEEK_*, then great. If it does not, then it is no worse than before. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 10:57 AM, Sunil Mushran wrote: On 04/22/2011 09:40 AM, Eric Blake wrote: On 04/22/2011 10:28 AM, Sunil Mushran wrote: while(1) { read(block); if (block_all_zeroes) lseek(SEEK_DATA); } What's wrong with the above? If this is the case, even SEEK_HOLE is not needed but should be added as it is already in Solaris. Holes are an implementation detail. Nobody's arguing that. And on Solaris, a file system with no holes support tells you that up front - lseek(fd, 0, SEEK_HOLE) returns the end of the file (unless the file is 0 bytes, then it fails with ENXIO). cp can read whatever blocksize it chooses. If that block contains zero, it would signal cp that maybe it should SEEK_DATA and skip reading all those blocks. That's all. We are not trying to achieve perfection. We are just trying to reduce cpu waste. If the fs supports SEEK_*, then great. If it does not, then it is no worse than before. But providing just SEEK_DATA _is_ worse than before if you don't provide the correct SEEK_HOLE everywhere. Because then your algorithm of trying lseek(SEEK_DATA) after every run of zeros in the hopes of an optimization is a wasted syscall, since it will just return your current offset every time, so you end up with more syscalls than if you had used the single lseek(SEEK_DATA) that returns the end of the file up front, and known that the remainder of the file has no holes to even try seeking past. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 10:03 AM, Eric Blake wrote: cp can read whatever blocksize it chooses. If that block contains zero, it would signal cp that maybe it should SEEK_DATA and skip reading all those blocks. That's all. We are not trying to achieve perfection. We are just trying to reduce cpu waste. If the fs supports SEEK_*, then great. If it does not, then it is no worse than before. But providing just SEEK_DATA _is_ worse than before if you don't provide the correct SEEK_HOLE everywhere. Because then your algorithm of trying lseek(SEEK_DATA) after every run of zeros in the hopes of an optimization is a wasted syscall, since it will just return your current offset every time, so you end up with more syscalls than if you had used the single lseek(SEEK_DATA) that returns the end of the file up front, and known that the remainder of the file has no holes to even try seeking past. You are over-optimizing. strace any process on your box and you will find numerous wasted syscalls. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 2011-04-22, at 11:08 AM, Sunil Mushran sunil.mush...@oracle.com wrote: On 04/22/2011 10:03 AM, Eric Blake wrote: cp can read whatever blocksize it chooses. If that block contains zero, it would signal cp that maybe it should SEEK_DATA and skip reading all those blocks. That's all. We are not trying to achieve perfection. We are just trying to reduce cpu waste. If the fs supports SEEK_*, then great. If it does not, then it is no worse than before. But providing just SEEK_DATA _is_ worse than before if you don't provide the correct SEEK_HOLE everywhere. Because then your algorithm of trying lseek(SEEK_DATA) after every run of zeros in the hopes of an optimization is a wasted syscall, since it will just return your current offset every time, so you end up with more syscalls than if you had used the single lseek(SEEK_DATA) that returns the end of the file up front, and known that the remainder of the file has no holes to even try seeking past. You are over-optimizing. strace any process on your box and you will find numerous wasted syscalls. Sure, there are lots of wasted syscalls, but in this case the cost of doing extra SEEK_DATA and SEEK_HOLE operations may be fairly costly. This involves scanning the whole disk layout, possibly over a network, which might need tens or hundreds of disk seeks to read the metadata, unlike regular SEEK_SET. Even SEEK_END is somewhat costly on a network filesystem, since the syscall needs to lock the file size in order to determine the end of the file, which can block other threads from writing to the file. So while I agree that the Linux mantra that syscalls are cheap is true if those syscalls don't actually do any work, I think it also ignores the cost of what some syscalls need to do behind the scenes. Cheers, Andreas-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
Hi, Josef Bacik wrote: This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. So we have That's easy to believe, but could you elaborate? What problem does using fiemap cause? I assume the answer is somewhere in somewhere in the thread http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895/focus=24404 but it would be nice to have a summary in the commit log for posterity. Thanks for working on this. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 01:10 PM, Jonathan Nieder wrote: Hi, Josef Bacik wrote: This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. So we have That's easy to believe, but could you elaborate? What problem does using fiemap cause? I assume the answer is somewhere in somewhere in the thread http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895/focus=24404 but it would be nice to have a summary in the commit log for posterity. http://article.gmane.org/gmane.comp.file-systems.ext4/24465 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/22/2011 11:06 AM, Andreas Dilger wrote: Sure, there are lots of wasted syscalls, but in this case the cost of doing extra SEEK_DATA and SEEK_HOLE operations may be fairly costly. This involves scanning the whole disk layout, possibly over a network, which might need tens or hundreds of disk seeks to read the metadata, unlike regular SEEK_SET. Even SEEK_END is somewhat costly on a network filesystem, since the syscall needs to lock the file size in order to determine the end of the file, which can block other threads from writing to the file. So while I agree that the Linux mantra that syscalls are cheap is true if those syscalls don't actually do any work, I think it also ignores the cost of what some syscalls need to do behind the scenes. You have a point. I was just reviewing the possible patch for ocfs2 and it looks heavy. One option is to scrap SEEK_* and make another syscall llfind() with FIND_*. Or, do both. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. Turns out using fiemap in things like cp cause more problems than it solves, so lets try and give userspace an interface that doesn't suck. So we have -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the given position. If the given position is a hole then pos won't move. A hole is defined by whatever the fs feels like defining it to be. In simple things like ext2/3 it will simplly mean an unallocated space in the file. For more complex things where you have preallocated space then that is left up to the filesystem. Since preallocated space is supposed to return all 0's it is perfectly legitimate to have SEEK_HOLE dump you out at the start of a preallocated extent, but then again if this is not something you can do and be sure the extent isn't in the middle of being converted to a real extent then it is also perfectly legitimate to skip preallocated extents and only park f_pos at a truly unallocated section. -SEEK_DATA: this is obviously a little more self-explanatory. Again the only ambiguity comes in with preallocated extents. If you have an fs that can't reliably tell that the preallocated extent is in the process of turning into a real extent, it is correct for SEEK_DATA to park you at a preallocated extent. Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- fs/read_write.c|3 +++ include/linux/fs.h |4 +++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 5520f8a..5446d61 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -64,6 +64,9 @@ generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin) return file-f_pos; offset += file-f_pos; break; + case SEEK_DATA: + case SEEK_HOLE: + return -EINVAL; } if (offset 0 !unsigned_offsets(file)) diff --git a/include/linux/fs.h b/include/linux/fs.h index dbd860a..1b72e0c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -31,7 +31,9 @@ #define SEEK_SET 0 /* seek relative to beginning of file */ #define SEEK_CUR 1 /* seek relative to current file position */ #define SEEK_END 2 /* seek relative to end of file */ -#define SEEK_MAX SEEK_END +#define SEEK_HOLE 3 /* seek to the closest hole */ +#define SEEK_DATA 4 /* seek to the closest data */ +#define SEEK_MAX SEEK_DATA struct fstrim_range { __u64 start; -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On Apr 21, 2011, at 3:42 PM, Josef Bacik wrote: + case SEEK_DATA: + case SEEK_HOLE: + return -EINVAL; } Maybe we should be returning EOPNOTSUPP? -- Ted -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On 04/21/2011 01:45 PM, Theodore Tso wrote: On Apr 21, 2011, at 3:42 PM, Josef Bacik wrote: + case SEEK_DATA: + case SEEK_HOLE: + return -EINVAL; } Maybe we should be returning EOPNOTSUPP? -- Ted For SEEK_HOLE yes. But we could let the default SEEK_DATA be a null-op. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
On Thu, Apr 21, 2011 at 04:45:54PM -0400, Theodore Tso wrote: On Apr 21, 2011, at 3:42 PM, Josef Bacik wrote: + case SEEK_DATA: + case SEEK_HOLE: + return -EINVAL; } Maybe we should be returning EOPNOTSUPP? That's definitely wrong ... -ENOSYS might be better! -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
We'll also need: - a patch to lseek(2) to document the new flags - some testcases for xfstests, specially dealing with things that were problematic in FIEMAP, e.g. data in delalloc extents, making sure stuff in unwrittent extents partially converted actually gets copied, etc. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags
[Eric: please don't drop the Cc list, thanks!] On Thu, Apr 21, 2011 at 09:22:55PM -0400, Josef Bacik wrote: since all files have a virtual hole at the end, but leaves the position unchanged). ??I'd have to write a test program on Solaris to see whether that definition is actually true, or if it is a bug in the Solaris man page. lseek's purpose is to reposition the file position, so I'd imagine this is just a bug in the man page. I would be surprised if the bug is around for such a long time, but otherwise I concur. Except you can have blocks allocated past i_size, so returning i_size isn't necessarily correct. Obviously the idea is to have most of the filesystems doing the correct thing, but for my first go around I just was going to do one since I expect quite a bit of repainting for this bikeshed is yet to be done. But I guess if you have data past i_size doing this would encourage the various fs people to fix it. Trying to be smart about our semantics is not helpful here. The interface has been out in Solaris for a while, and we'll have to either stick to it or use different names for the interface. And staying compatible is far more useful for userspace programmers. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html