Re: [PATCH v7 0/4] VFS: In-kernel copy system call

2015-10-25 Thread Christoph Hellwig
On Sat, Oct 24, 2015 at 11:52:37AM -0500, Eric Biggers wrote:
> A few comments:
> 
> >   if (!(file_in->f_mode & FMODE_READ) ||
> >   !(file_out->f_mode & FMODE_WRITE) ||
> >   (file_out->f_flags & O_APPEND) ||
> >   !file_out->f_op)
> >   return -EBADF;
> 
> Isn't 'f_op' always non-NULL?

Yes, its is.

> If the destination file cannot be append-only, shouldn't this be documented?

Yes.

> > if (inode_in->i_sb != inode_out->i_sb ||
> > file_in->f_path.mnt != file_out->f_path.mnt)
> > return -EXDEV;
> 
> Doesn't the same mount already imply the same superblock?

It does.

> > /*
> >  * copy_file_range() differs from regular file read and write in that it
> >  * specifically allows return partial success.  When it does so is up to
> >  * the copy_file_range method.
> >  */
> 
> What does this mean?  I thought that read() and write() can also return
> partial success.

The syscalls are allow to return short from the standards perspective,
but if you actually do that for regualr fiels hell will break loose as
applications don't expect it.  That's why we can't actually ever do it.

> Should FMODE_PREAD or FMODE_PWRITE access be checked if the user specifies 
> their
> own 'off_in' or 'off_out', respectively?

Maybe.

> What is supposed to happen if the user passes provides a file descriptor to a
> non-regular file, such as a block device or char device?

If they implement the proper method I see no reason why we can't support
it.  For block device we only have one file_ops instance and mapping
that to the bio-level XCOPY abstraction that's been posted a couple of
times would seem sensible.  For character devices that's entirely up to
the driver.

> If the 'in' file has fewer than 'len' bytes remaining until EOF, what is the
> expected behavior?  It looks like the btrfs implementation has different
> behavior from the pagecache implementation.

Good question.  I'd say failure is the right way to handle a mismatching
length.

> It appears the btrfs implementation has alignment restrictions --- where is 
> this
> documented and how will users know what alignment to use?

For actual clones we're limited to the file system block size (NFS adds
an extra attribute for the clone block size), but for regaulr copies we
probably should fall back to the dumb implementation if we don't match
it.

> Are copies within the same file permitted and can the ranges overlap?  The man
> page doesn't say.

For clones we defintively want to support it, but for copies I'd be
tempted to say no.  Does anyone else have an opinion?

> It looks like the initial patch defines __NR_copy_file_range for the ARM
> architecture but doesn't actually hook that system call up for ARM; why is 
> that?

Looks like that should be dropped.  I really wish we had a way to just
wire up syscalls everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 0/4] VFS: In-kernel copy system call

2015-10-24 Thread Andreas Dilger

> On Oct 24, 2015, at 10:52 AM, Eric Biggers  wrote:
> 
> A few comments:
> 
>>  if (!(file_in->f_mode & FMODE_READ) ||
>>  !(file_out->f_mode & FMODE_WRITE) ||
>>  (file_out->f_flags & O_APPEND) ||
>>  !file_out->f_op)
>>  return -EBADF;
> 
> Isn't 'f_op' always non-NULL?
> 
> If the destination file cannot be append-only, shouldn't this be documented?

Actually, wouldn't O_APPEND only be a problem if the target file wasn't
being appended to?  In other words, if the target i_size == start offset
then it should be possible to use the copy syscall on an O_APPEND file.

Cheers, Andreas

>>  if (inode_in->i_sb != inode_out->i_sb ||
>>  file_in->f_path.mnt != file_out->f_path.mnt)
>>  return -EXDEV;
> 
> Doesn't the same mount already imply the same superblock?
> 
>> /*
>> * copy_file_range() differs from regular file read and write in that it
>> * specifically allows return partial success.  When it does so is up to
>> * the copy_file_range method.
>> */
> 
> What does this mean?  I thought that read() and write() can also return 
> partial
> success.
> 
>>  f_out = fdget(fd_out);
>>  if (!f_in.file || !f_out.file) {
>>  ret = -EBADF;
>>  goto out;
>>  }
> 
> This looked wrong at first because it may call fdput() on a 'struct fd' that 
> was
> not successfully acquired, but it looks like it works currently because of how
> the FDPUT_FPUT flag is used.  It may be a good idea to write it the "obvious"
> way, though (use separate labels depending on which fdput()s need to happen).
> 
> 
> Other questions:
> 
> Should FMODE_PREAD or FMODE_PWRITE access be checked if the user specifies 
> their
> own 'off_in' or 'off_out', respectively?
> 
> What is supposed to happen if the user passes provides a file descriptor to a
> non-regular file, such as a block device or char device?
> 
> If the 'in' file has fewer than 'len' bytes remaining until EOF, what is the
> expected behavior?  It looks like the btrfs implementation has different
> behavior from the pagecache implementation.
> 
> It appears the btrfs implementation has alignment restrictions --- where is 
> this
> documented and how will users know what alignment to use?
> 
> Are copies within the same file permitted and can the ranges overlap?  The man
> page doesn't say.
> 
> It looks like the initial patch defines __NR_copy_file_range for the ARM
> architecture but doesn't actually hook that system call up for ARM; why is 
> that?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v7 0/4] VFS: In-kernel copy system call

2015-10-24 Thread Eric Biggers
A few comments:

>   if (!(file_in->f_mode & FMODE_READ) ||
>   !(file_out->f_mode & FMODE_WRITE) ||
>   (file_out->f_flags & O_APPEND) ||
>   !file_out->f_op)
>   return -EBADF;

Isn't 'f_op' always non-NULL?

If the destination file cannot be append-only, shouldn't this be documented?

>   if (inode_in->i_sb != inode_out->i_sb ||
>   file_in->f_path.mnt != file_out->f_path.mnt)
>   return -EXDEV;

Doesn't the same mount already imply the same superblock?

> /*
>  * copy_file_range() differs from regular file read and write in that it
>  * specifically allows return partial success.  When it does so is up to
>  * the copy_file_range method.
>  */

What does this mean?  I thought that read() and write() can also return partial
success.

>   f_out = fdget(fd_out);
>   if (!f_in.file || !f_out.file) {
>   ret = -EBADF;
>   goto out;
>   }

This looked wrong at first because it may call fdput() on a 'struct fd' that was
not successfully acquired, but it looks like it works currently because of how
the FDPUT_FPUT flag is used.  It may be a good idea to write it the "obvious"
way, though (use separate labels depending on which fdput()s need to happen).


Other questions:

Should FMODE_PREAD or FMODE_PWRITE access be checked if the user specifies their
own 'off_in' or 'off_out', respectively?

What is supposed to happen if the user passes provides a file descriptor to a
non-regular file, such as a block device or char device?

If the 'in' file has fewer than 'len' bytes remaining until EOF, what is the
expected behavior?  It looks like the btrfs implementation has different
behavior from the pagecache implementation.

It appears the btrfs implementation has alignment restrictions --- where is this
documented and how will users know what alignment to use?

Are copies within the same file permitted and can the ranges overlap?  The man
page doesn't say.

It looks like the initial patch defines __NR_copy_file_range for the ARM
architecture but doesn't actually hook that system call up for ARM; why is that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 0/4] VFS: In-kernel copy system call

2015-10-23 Thread Christoph Hellwig
Thanks Anna,

the whole series looks good to me,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 0/4] VFS: In-kernel copy system call

2015-10-23 Thread Anna Schumaker
Copy system calls came up during Plumbers a while ago, mostly because several
filesystems (including NFS and XFS) are currently working on copy acceleration
implementations.  We haven't heard from Zach Brown in a while, so I volunteered
to push his patches upstream so individual filesystems don't need to keep
writing their own ioctls.

This posting removes the COPY_FR_REFLINK flag.  Patch 3 still adds btrfs
support for copy_file_range() as a reflink, so if this behavior is undesireable
then the patch can be dropped.

Changes in v7:
- Remove COPY_FR_REFLINK flag.
- Fix build warning on ARM devices.
- Meniton sparse file expansion in the man page.


Anna Schumaker (1):
  vfs: Add vfs_copy_file_range() support for pagecache copies

Zach Brown (3):
  vfs: add copy_file_range syscall and vfs helper
  x86: add sys_copy_file_range to syscall tables
  btrfs: add .copy_file_range file operation

 arch/arm/include/uapi/asm/unistd.h |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/btrfs/ctree.h   |   3 +
 fs/btrfs/file.c|   1 +
 fs/btrfs/ioctl.c   |  91 --
 fs/read_write.c| 133 +
 include/linux/fs.h |   3 +
 include/linux/syscalls.h   |   3 +
 include/uapi/asm-generic/unistd.h  |   4 +-
 kernel/sys_ni.c|   1 +
 11 files changed, 202 insertions(+), 40 deletions(-)

-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html