Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-25 Thread Eric Blake
On 04/24/2011 11:49 AM, Jamie Lokier wrote:
 My problem with FIND_* is that we are messing with the well understood
 semantics of lseek().
 
 fcntl() looks a better fit for FIND_HOLE/DATA anyway.

With fcntl(), it would have to be something like:

off_t offset = start;
if (fcntl (fd, F_FIND_HOLE, offset) != 0)
; // find failed
// offset is now set to the next location after start

That is, offset has to be passed as an input _and_ output parameter,
since we can't use the int return value of fcntl to convey off_t
information, and since the optional third argument of fcntl has
traditionally been no wider than a pointer which is not the case for
off_t on 32-bit platforms.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-25 Thread Jamie Lokier
Eric Blake wrote:
 On 04/24/2011 11:49 AM, Jamie Lokier wrote:
  My problem with FIND_* is that we are messing with the well understood
  semantics of lseek().
  
  fcntl() looks a better fit for FIND_HOLE/DATA anyway.
 
 With fcntl(), it would have to be something like:
 
 off_t offset = start;
 if (fcntl (fd, F_FIND_HOLE, offset) != 0)
 ; // find failed
 // offset is now set to the next location after start

Yes that, or a pointer-to-struct in the style of other fcntl()
operations.

A struct offers more flexibiliy such as a limit on search distance
(may be useful on filesystems where searching reads a lot of metadata
and you don't really want to scan all of a large file), and whether to
include unwritten prealloc space or written-known-zeros space.

I thought of fcntl() because historically it's often how you get quirky
information about files and how to read them, on many OSes.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-25 Thread Nick Bowler
Hi Eric,

On 2011-04-22 07:06 -0600, Eric Blake wrote:
 I've created a request to standardize SEEK_HOLE and SEEK_DATA in the
 next revision of POSIX; comments are welcome to make sure that everyone
 is happy with wording:
 http://austingroupbugs.net/view.php?id=415

Reading through your proposal, I think there is one thing that could be
clarified: the meaning of the last hole in the file.  Consider the
following two file layouts -- in the diagrams, a bar (|) indicates a
boundary between holes and non-hole data, and a double bar (||)
indicates end-of-file.

* File A (sparse file created by lseek/write beyond end-of-file):

data | hole 0 | data || hole 1 (virtual)

* File B (sparse file created by truncate beyond end-of-file):

data | hole 0 || hole 1 (virtual)

Excluding the error description, the term the last hole is used in
two places in your proposal:

  * (for SEEK_HOLE): if offset falls within the last hole, then the
file offset may be set to the file size instead.

  * (for SEEK_DATA): it shall be an error ... if offset falls within the
last hole.

I imagine that both of these conditions are intended to address the
case where the offset falls within hole 0 in File B, that is, when
there is no non-hole data beyond the specified offset but the offset
is nevertheless less than the file size.  However, this looks (to me)
like the penultimate hole in the file, not the last hole.  Furthermore,
these conditions are presumably *not* intended to apply to the
penultimate hole in File A, which has data after it.

I think my confusion can be avoided by talking about the last non-hole
data byte in the file (which is unambigious), instead of by talking
about the last hole.  For instance, the SEEK_HOLE/SEEK_DATA descriptions
could be written as follows:

  If whence is SEEK_HOLE, the file offset shall be set to the smallest
  location of a byte within a hole and not less than offset, except that
  if offset falls beyond the last byte not within a hole, then the file
  offset may be set to the file size instead.  It shall be an error if
  offset is greater or equal to the size of the file.

  If whence is SEEK_DATA, the file offset shall be set to the smallest
  location of a byte not within a hole and not less than offset.  It shall
  be an error if no such byte exists.

plus a corresponding update to the ENXIO description:

  ... or the whence argument is SEEK_DATA and the offset falls beyond
  the last byte not within a hole.

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-25 Thread Eric Blake
On 04/25/2011 09:02 AM, Nick Bowler wrote:

Hi Nick,
 * File A (sparse file created by lseek/write beyond end-of-file):
 
 data | hole 0 | data || hole 1 (virtual)
 
 * File B (sparse file created by truncate beyond end-of-file):
 
 data | hole 0 || hole 1 (virtual)
 
 Excluding the error description, the term the last hole is used in
 two places in your proposal:
 
   * (for SEEK_HOLE): if offset falls within the last hole, then the
 file offset may be set to the file size instead.
 
   * (for SEEK_DATA): it shall be an error ... if offset falls within the
 last hole.
 
 I imagine that both of these conditions are intended to address the
 case where the offset falls within hole 0 in File B, that is, when
 there is no non-hole data beyond the specified offset but the offset
 is nevertheless less than the file size.

Correct.

  However, this looks (to me)
 like the penultimate hole in the file, not the last hole.  Furthermore,
 these conditions are presumably *not* intended to apply to the
 penultimate hole in File A, which has data after it.

Good catch.

 
 I think my confusion can be avoided by talking about the last non-hole
 data byte in the file (which is unambigious), instead of by talking
 about the last hole.  For instance, the SEEK_HOLE/SEEK_DATA descriptions
 could be written as follows:
 
   If whence is SEEK_HOLE, the file offset shall be set to the smallest
   location of a byte within a hole and not less than offset, except that
   if offset falls beyond the last byte not within a hole, then the file
   offset may be set to the file size instead.  It shall be an error if
   offset is greater or equal to the size of the file.
 
   If whence is SEEK_DATA, the file offset shall be set to the smallest
   location of a byte not within a hole and not less than offset.  It shall
   be an error if no such byte exists.
 
 plus a corresponding update to the ENXIO description:
 
   ... or the whence argument is SEEK_DATA and the offset falls beyond
   the last byte not within a hole.

I've added your improved wording as a comment at
http://austingroupbugs.net/view.php?id=415

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-24 Thread Jamie Lokier
Sunil Mushran wrote:
 On 04/22/2011 04:50 AM, Eric Blake wrote:
 That blog also mentioned the useful idea of adding FIND_HOLE and
 FIND_DATA, not implemented in Solaris, but which could easily be
 provided as additional lseek constants in Linux to locate the start of
 the next chunk without repositioning and which could ease application
 programmer's life a bit.  After all, cp wants to know where data ends
 without repositioning (FIND_HOLE), read() that much data which
 repositions in the process, then skip to the next chunk of data
 (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants,
 but 3 per iteration if we only have SEEK_HOLE and have to manually rewind.
 
 while(1) {
 read(block);
 if (block_all_zeroes)
 lseek(SEEK_DATA);
 }
 
 What's wrong with the above? If this is the case, even SEEK_HOLE
 is not needed but should be added as it is already in Solaris.

Apart from the obvious waste of effort (scanning *all* data for zeros
is cheap but not free if the file is mostly non-hole zeros), you can't
do a pread() version of the above in parallel over different parts of
the same file/device.

 My problem with FIND_* is that we are messing with the well understood
 semantics of lseek().

fcntl() looks a better fit for FIND_HOLE/DATA anyway.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-24 Thread Valdis . Kletnieks
On Thu, 21 Apr 2011 15:42:33 EDT, Josef Bacik said:

 -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the
 given position. 

Do we want the semantic to be the nearest hole?  Or did we really want the
next hole?  Loops like a bullet loaded in the chamber and pointed at the
programmer's foot if they aren't allowing for the fact that this can go
*backwards* in the file if the closest hole is towards the beginning.  Good way
to end up in an infinite loop or other messy...

Consider the obvious implementation of skip over a hole - lseek(SEEK_HOLE),
lseek(SEEK_DATA).  and start reading data because we've skipped over the hole.
Wrong - the second seek may have gone backwards if the data was only 4K away
but the hole was 64K in size...



pgpN3MtjIn6I1.pgp
Description: PGP signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Markus Trippelsdorf
On 2011.04.22 at 00:50 -0400, Christoph Hellwig wrote:
 [Eric: please don't drop the Cc list, thanks!]
 
 On Thu, Apr 21, 2011 at 09:22:55PM -0400, Josef Bacik wrote:
   since all files have a virtual hole at the end, but leaves the position
   unchanged). ??I'd have to write a test program on Solaris to see whether 
   that
   definition is actually true, or if it is a bug in the Solaris man page.
  
  
  lseek's purpose is to reposition the file position, so I'd imagine
  this is just a bug in the man page.
 
 I would be surprised if the bug is around for such a long time, but
 otherwise I concur.

It's a bug. Let me quote what Jeff Bonwick wrote on his blog:

»I'm not sure where you got the impression that either SEEK_HOLE or
SEEK_DATA doesn't set the file pointer. They do. (If they didn't, that
would be weird, and we'd call it out explicitly.)«

http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Eric Blake
On 04/22/2011 05:28 AM, Markus Trippelsdorf wrote:
 On 2011.04.22 at 00:50 -0400, Christoph Hellwig wrote:
 [Eric: please don't drop the Cc list, thanks!]

That's the fault of gmane.  My apologies for not being subscribed in the
first place, and for gmane's refusal to cc all.  Now that I'm on the
cc-chain, it won't happen again for this thread.

 lseek's purpose is to reposition the file position, so I'd imagine
 this is just a bug in the man page.

 I would be surprised if the bug is around for such a long time, but
 otherwise I concur.
 
 It's a bug. Let me quote what Jeff Bonwick wrote on his blog:
 
 »I'm not sure where you got the impression that either SEEK_HOLE or
 SEEK_DATA doesn't set the file pointer. They do. (If they didn't, that
 would be weird, and we'd call it out explicitly.)«
 
 http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data

That blog also mentioned the useful idea of adding FIND_HOLE and
FIND_DATA, not implemented in Solaris, but which could easily be
provided as additional lseek constants in Linux to locate the start of
the next chunk without repositioning and which could ease application
programmer's life a bit.  After all, cp wants to know where data ends
without repositioning (FIND_HOLE), read() that much data which
repositions in the process, then skip to the next chunk of data
(SEEK_DATA) - two lseek() calls per iteration if we have 4 constants,
but 3 per iteration if we only have SEEK_HOLE and have to manually rewind.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Eric Blake
On 04/22/2011 05:28 AM, Markus Trippelsdorf wrote:
 I would be surprised if the bug is around for such a long time, but
 otherwise I concur.
 
 It's a bug. Let me quote what Jeff Bonwick wrote on his blog:
 
 »I'm not sure where you got the impression that either SEEK_HOLE or
 SEEK_DATA doesn't set the file pointer. They do. (If they didn't, that
 would be weird, and we'd call it out explicitly.)«
 
 http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data

I've created a request to standardize SEEK_HOLE and SEEK_DATA in the
next revision of POSIX; comments are welcome to make sure that everyone
is happy with wording:
http://austingroupbugs.net/view.php?id=415

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Sunil Mushran

On 04/22/2011 04:50 AM, Eric Blake wrote:

That blog also mentioned the useful idea of adding FIND_HOLE and
FIND_DATA, not implemented in Solaris, but which could easily be
provided as additional lseek constants in Linux to locate the start of
the next chunk without repositioning and which could ease application
programmer's life a bit.  After all, cp wants to know where data ends
without repositioning (FIND_HOLE), read() that much data which
repositions in the process, then skip to the next chunk of data
(SEEK_DATA) - two lseek() calls per iteration if we have 4 constants,
but 3 per iteration if we only have SEEK_HOLE and have to manually rewind.


while(1) {
read(block);
if (block_all_zeroes)
lseek(SEEK_DATA);
}

What's wrong with the above? If this is the case, even SEEK_HOLE
is not needed but should be added as it is already in Solaris.

My problem with FIND_* is that we are messing with the well understood
semantics of lseek().

And if generic_file_llseek_unlocked() treats SEEK_DATA as SEEK_CUR and
SEEK_HOLE as SEEK_END (both with zero offset) then we don't even
have to bother with the finding the correct error code.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Eric Blake
On 04/22/2011 10:28 AM, Sunil Mushran wrote:
 On 04/22/2011 04:50 AM, Eric Blake wrote:
 That blog also mentioned the useful idea of adding FIND_HOLE and
 FIND_DATA, not implemented in Solaris, but which could easily be
 provided as additional lseek constants in Linux to locate the start of
 the next chunk without repositioning and which could ease application
 programmer's life a bit.  After all, cp wants to know where data ends
 without repositioning (FIND_HOLE), read() that much data which
 repositions in the process, then skip to the next chunk of data
 (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants,
 but 3 per iteration if we only have SEEK_HOLE and have to manually
 rewind.
 
 while(1) {
 read(block);
 if (block_all_zeroes)
 lseek(SEEK_DATA);
 }
 
 What's wrong with the above? If this is the case, even SEEK_HOLE
 is not needed but should be added as it is already in Solaris.

Because you don't know if the block is the same size as the minimum
hole, and because some systems require rather large holes (my Solaris
testing on a zfs system didn't have holes until 128k), that's a rather
large amount of reading just to prove that the block has all zeros to
know that it is even worth trying the lseek(SEEK_DATA).  My gut feel is
that doing the lseek(SEEK_HOLE) up front coupled with seeking back to
the same position is more efficient than manually checking for a run of
zeros (less cache pollution, works with 4k read buffers without having
to know filesystem hole size).

 
 My problem with FIND_* is that we are messing with the well understood
 semantics of lseek().

You'll notice I didn't propose any FIND_* constants for POSIX.

 And if generic_file_llseek_unlocked() treats SEEK_DATA as SEEK_CUR and

You meant SEEK_SET not SEEK_CUR, but...

 SEEK_HOLE as SEEK_END (both with zero offset) then we don't even
 have to bother with the finding the correct error code.

...that's still not compatible with Solaris.  On a file with size of 0
bytes, lseek(fd, 1, SEEK_SET) and lseek(fd, 0, SEEK_END) will both
succeed, but lseek(fd, 1, SEEK_DATA) and lseek(fd, 0, SEEK_HOLE) must
fail with ENXIO (the offset was at or beyond the size of the file).

For a file with no holes, Solaris semantics behave as if:

off_t lseek(int fildes, off_t offset, int whence)
{
off_t cur, end;
switch (whence)
{
case SEEK_HOLE:
case SEEK_DATA:
cur = lseek(fildes, 0, SEEK_CUR);
if (cur  0)
return cur;
end = lseek(fildes, 0, SEEK_END);
if (end  0)
return end;
if (offset  end)
return whence == SEEK_HOLE ? end : lseek(fildes, offset,
SEEK_SET);
lseek(fildes, cur, SEEK_SET);
errno = ENXIO;
return -1;
default:
... /* Existing implementation */
}
}

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Sunil Mushran

On 04/22/2011 09:40 AM, Eric Blake wrote:

On 04/22/2011 10:28 AM, Sunil Mushran wrote:

while(1) {
 read(block);
 if (block_all_zeroes)
 lseek(SEEK_DATA);
}

What's wrong with the above? If this is the case, even SEEK_HOLE
is not needed but should be added as it is already in Solaris.

Because you don't know if the block is the same size as the minimum
hole, and because some systems require rather large holes (my Solaris
testing on a zfs system didn't have holes until 128k), that's a rather
large amount of reading just to prove that the block has all zeros to
know that it is even worth trying the lseek(SEEK_DATA).  My gut feel is
that doing the lseek(SEEK_HOLE) up front coupled with seeking back to
the same position is more efficient than manually checking for a run of
zeros (less cache pollution, works with 4k read buffers without having
to know filesystem hole size).


Holes are an implementation detail.

cp can read whatever blocksize it chooses. If that block contains
zero, it would signal cp that maybe it should SEEK_DATA and skip
reading all those blocks. That's all. We are not trying to achieve
perfection. We are just trying to reduce cpu waste.

If the fs supports SEEK_*, then great. If it does not, then it is no
worse than before.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Eric Blake
On 04/22/2011 10:57 AM, Sunil Mushran wrote:
 On 04/22/2011 09:40 AM, Eric Blake wrote:
 On 04/22/2011 10:28 AM, Sunil Mushran wrote:
 while(1) {
  read(block);
  if (block_all_zeroes)
  lseek(SEEK_DATA);
 }

 What's wrong with the above? If this is the case, even SEEK_HOLE
 is not needed but should be added as it is already in Solaris.
 
 Holes are an implementation detail.

Nobody's arguing that.  And on Solaris, a file system with no holes
support tells you that up front - lseek(fd, 0, SEEK_HOLE) returns the
end of the file (unless the file is 0 bytes, then it fails with ENXIO).

 
 cp can read whatever blocksize it chooses. If that block contains
 zero, it would signal cp that maybe it should SEEK_DATA and skip
 reading all those blocks. That's all. We are not trying to achieve
 perfection. We are just trying to reduce cpu waste.
 
 If the fs supports SEEK_*, then great. If it does not, then it is no
 worse than before.

But providing just SEEK_DATA _is_ worse than before if you don't provide
the correct SEEK_HOLE everywhere.  Because then your algorithm of trying
lseek(SEEK_DATA) after every run of zeros in the hopes of an
optimization is a wasted syscall, since it will just return your current
offset every time, so you end up with more syscalls than if you had used
the single lseek(SEEK_DATA) that returns the end of the file up front,
and known that the remainder of the file has no holes to even try
seeking past.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Sunil Mushran

On 04/22/2011 10:03 AM, Eric Blake wrote:

cp can read whatever blocksize it chooses. If that block contains
zero, it would signal cp that maybe it should SEEK_DATA and skip
reading all those blocks. That's all. We are not trying to achieve
perfection. We are just trying to reduce cpu waste.

If the fs supports SEEK_*, then great. If it does not, then it is no
worse than before.

But providing just SEEK_DATA _is_ worse than before if you don't provide
the correct SEEK_HOLE everywhere.  Because then your algorithm of trying
lseek(SEEK_DATA) after every run of zeros in the hopes of an
optimization is a wasted syscall, since it will just return your current
offset every time, so you end up with more syscalls than if you had used
the single lseek(SEEK_DATA) that returns the end of the file up front,
and known that the remainder of the file has no holes to even try
seeking past.


You are over-optimizing. strace any process on your box and you
will find numerous wasted syscalls.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Andreas Dilger
On 2011-04-22, at 11:08 AM, Sunil Mushran sunil.mush...@oracle.com wrote:
 On 04/22/2011 10:03 AM, Eric Blake wrote:
 cp can read whatever blocksize it chooses. If that block contains
 zero, it would signal cp that maybe it should SEEK_DATA and skip
 reading all those blocks. That's all. We are not trying to achieve
 perfection. We are just trying to reduce cpu waste.
 
 If the fs supports SEEK_*, then great. If it does not, then it is no
 worse than before.
 But providing just SEEK_DATA _is_ worse than before if you don't provide
 the correct SEEK_HOLE everywhere.  Because then your algorithm of trying
 lseek(SEEK_DATA) after every run of zeros in the hopes of an
 optimization is a wasted syscall, since it will just return your current
 offset every time, so you end up with more syscalls than if you had used
 the single lseek(SEEK_DATA) that returns the end of the file up front,
 and known that the remainder of the file has no holes to even try
 seeking past.
 
 You are over-optimizing. strace any process on your box and you
 will find numerous wasted syscalls.

Sure, there are lots of wasted syscalls, but in this case the cost of doing 
extra SEEK_DATA and SEEK_HOLE operations may be fairly costly. This involves 
scanning the whole disk layout, possibly over a network, which might need tens 
or hundreds of disk seeks to read the metadata, unlike regular SEEK_SET.

Even SEEK_END is somewhat costly on a network filesystem, since the syscall 
needs to lock the file size in order to determine the end of the file, which 
can block other threads from writing to the file. 

So while I agree that the Linux mantra that syscalls are cheap is true if 
those syscalls don't actually do any work, I think it also ignores the cost of 
what some syscalls need to do behind the scenes. 

Cheers, Andreas--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Jonathan Nieder
Hi,

Josef Bacik wrote:

 This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns 
 out
 using fiemap in things like cp cause more problems than it solves, so lets try
 and give userspace an interface that doesn't suck.  So we have

That's easy to believe, but could you elaborate?  What problem does
using fiemap cause?  I assume the answer is somewhere in somewhere in
the thread

 http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895/focus=24404

but it would be nice to have a summary in the commit log for
posterity.

Thanks for working on this.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Sunil Mushran

On 04/22/2011 01:10 PM, Jonathan Nieder wrote:

Hi,

Josef Bacik wrote:


This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  So we have

That's easy to believe, but could you elaborate?  What problem does
using fiemap cause?  I assume the answer is somewhere in somewhere in
the thread

  http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895/focus=24404

but it would be nice to have a summary in the commit log for
posterity.


http://article.gmane.org/gmane.comp.file-systems.ext4/24465
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-22 Thread Sunil Mushran

On 04/22/2011 11:06 AM, Andreas Dilger wrote:

Sure, there are lots of wasted syscalls, but in this case the cost of doing 
extra SEEK_DATA and SEEK_HOLE operations may be fairly costly. This involves 
scanning the whole disk layout, possibly over a network, which might need tens 
or hundreds of disk seeks to read the metadata, unlike regular SEEK_SET.

Even SEEK_END is somewhat costly on a network filesystem, since the syscall 
needs to lock the file size in order to determine the end of the file, which 
can block other threads from writing to the file.

So while I agree that the Linux mantra that syscalls are cheap is true if 
those syscalls don't actually do any work, I think it also ignores the cost of what some 
syscalls need to do behind the scenes.


You have a point. I was just reviewing the possible patch for ocfs2
and it looks heavy.

One option is to scrap SEEK_* and make another syscall llfind() with FIND_*.
Or, do both.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-21 Thread Josef Bacik
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  So we have

-SEEK_HOLE: this moves the file pos to the nearest hole in the file from the
given position.  If the given position is a hole then pos won't move.  A hole
is defined by whatever the fs feels like defining it to be.  In simple things
like ext2/3 it will simplly mean an unallocated space in the file.  For more
complex things where you have preallocated space then that is left up to the
filesystem.  Since preallocated space is supposed to return all 0's it is
perfectly legitimate to have SEEK_HOLE dump you out at the start of a
preallocated extent, but then again if this is not something you can do and be
sure the extent isn't in the middle of being converted to a real extent then it
is also perfectly legitimate to skip preallocated extents and only park f_pos at
a truly unallocated section.

-SEEK_DATA: this is obviously a little more self-explanatory.  Again the only
ambiguity comes in with preallocated extents.  If you have an fs that can't
reliably tell that the preallocated extent is in the process of turning into a
real extent, it is correct for SEEK_DATA to park you at a preallocated extent.

Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/read_write.c|3 +++
 include/linux/fs.h |4 +++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5520f8a..5446d61 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -64,6 +64,9 @@ generic_file_llseek_unlocked(struct file *file, loff_t 
offset, int origin)
return file-f_pos;
offset += file-f_pos;
break;
+   case SEEK_DATA:
+   case SEEK_HOLE:
+   return -EINVAL;
}
 
if (offset  0  !unsigned_offsets(file))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dbd860a..1b72e0c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -31,7 +31,9 @@
 #define SEEK_SET   0   /* seek relative to beginning of file */
 #define SEEK_CUR   1   /* seek relative to current file position */
 #define SEEK_END   2   /* seek relative to end of file */
-#define SEEK_MAX   SEEK_END
+#define SEEK_HOLE  3   /* seek to the closest hole */
+#define SEEK_DATA  4   /* seek to the closest data */
+#define SEEK_MAX   SEEK_DATA
 
 struct fstrim_range {
__u64 start;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-21 Thread Theodore Tso

On Apr 21, 2011, at 3:42 PM, Josef Bacik wrote:

 + case SEEK_DATA:
 + case SEEK_HOLE:
 + return -EINVAL;
   }

Maybe we should be returning EOPNOTSUPP?

-- Ted

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-21 Thread Sunil Mushran

On 04/21/2011 01:45 PM, Theodore Tso wrote:

On Apr 21, 2011, at 3:42 PM, Josef Bacik wrote:

+   case SEEK_DATA:
+   case SEEK_HOLE:
+   return -EINVAL;
}


Maybe we should be returning EOPNOTSUPP?

-- Ted


For SEEK_HOLE yes. But we could let the default SEEK_DATA be a null-op.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-21 Thread Matthew Wilcox
On Thu, Apr 21, 2011 at 04:45:54PM -0400, Theodore Tso wrote:
 
 On Apr 21, 2011, at 3:42 PM, Josef Bacik wrote:
 
  +   case SEEK_DATA:
  +   case SEEK_HOLE:
  +   return -EINVAL;
  }
 
 Maybe we should be returning EOPNOTSUPP?

That's definitely wrong ... -ENOSYS might be better!

-- 
Matthew Wilcox  Intel Open Source Technology Centre
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-21 Thread Christoph Hellwig
We'll also need:

 - a patch to lseek(2) to document the new flags
 - some testcases for xfstests, specially dealing with things that
   were problematic in FIEMAP, e.g. data in delalloc extents, making
   sure stuff in unwrittent extents partially converted actually gets
   copied, etc.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

2011-04-21 Thread Christoph Hellwig
[Eric: please don't drop the Cc list, thanks!]

On Thu, Apr 21, 2011 at 09:22:55PM -0400, Josef Bacik wrote:
  since all files have a virtual hole at the end, but leaves the position
  unchanged). ??I'd have to write a test program on Solaris to see whether 
  that
  definition is actually true, or if it is a bug in the Solaris man page.
 
 
 lseek's purpose is to reposition the file position, so I'd imagine
 this is just a bug in the man page.

I would be surprised if the bug is around for such a long time, but
otherwise I concur.

 Except you can have blocks allocated past i_size, so returning i_size
 isn't necessarily correct.  Obviously the idea is to have most of the
 filesystems doing the correct thing, but for my first go around I just
 was going to do one since I expect quite a bit of repainting for this
 bikeshed is yet to be done.  But I guess if you have data past i_size
 doing this would encourage the various fs people to fix it.

Trying to be smart about our semantics is not helpful here.  The
interface has been out in Solaris for a while, and we'll have to either
stick to it or use different names for the interface.  And staying
compatible is far more useful for userspace programmers.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html