subject:"RE\: New copyfile system call \- discuss before LSF\?"

Hi!

> >>User wants to test for a file with name "foo.txt"
> >>
> >>* create "foo.txt~" (or whatever)
> >>* write contents into "foo.txt~"
> >>* rename "foo.txt~" to "foo.txt"
> >>
> >>Until rename is done, the file does not exists and is not complete.
> >>You will potentially have a garbage file to clean up if the program
> >>(or system) crashes, but that is not racy in a classic sense, right?
> >Well. If people rsync from you, they will start fetching incomplete
> >foo.txt~. Plus the garbage issue.
> 
> That is not racy, just garbage (not trying to be pedantic, just
> trying to understand). I can see that the "~" file is annoying, but
> we have dealt with it for a *long* time :)

Ok, so lets keep it at "~" is annoying :-).

[But... I was wrong. openat(..., AT_UNLINKED) is not enough to solve
this: we do not have flink() and it is not easily possible to link
deleted file "back to life" from /proc/self/fd:

pavel@amd:/tmp$ > delme
pavel@amd:/tmp$ bash 3< delme &
[2] 32667
[2]+  Stopped bash 3< delme
pavel@amd:/tmp$ fg
bash 3< delme
pavel@amd:/tmp$ ls -al delme
-rw-r--r-- 1 pavel pavel 0 Apr  1 01:36 delme
pavel@amd:/tmp$ ls -al /proc/self/fd/3 
lr-x-- 1 pavel pavel 64 Apr  1 01:37 /proc/self/fd/3 -> /tmp/delme
pavel@amd:/tmp$ rm delme
pavel@amd:/tmp$ ls -al /proc/self/fd/3 
lr-x-- 1 pavel pavel 64 Apr  1 01:37 /proc/self/fd/3 -> /tmp/delme
(deleted)
pavel@amd:/tmp$ ln /proc/self/fd/3 delme2
ln: creating hard link `delme2' => `/proc/self/fd/3': Invalid
cross-device link
]

> >>This is more of a garbage clean up issue?
> >Also. Plus sometimes you want temporary "file" that is
> >deleted. Terminals use it for history, etc...
> 
> There you would have a race, you can create a file and unlink it of
> course and still write to it, but you would have a potential empty
> file issue?

Yes. openat(..., AT_UNLINKED) solves that -- you'll no longer get
those files. (Not sure they'd be always empty. How do you ensure rm
hits the disk? fsync() on parent directory? Sounds expensive.)
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


On 03/31/2013 07:18 PM, Pavel Machek wrote:

Hi!


Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

Problem is "clasical create temp file then delete it" is racy. See the
archives. That is useful & common operation.

Which race are you concerned with exactly?

User wants to test for a file with name "foo.txt"

* create "foo.txt~" (or whatever)
* write contents into "foo.txt~"
* rename "foo.txt~" to "foo.txt"

Until rename is done, the file does not exists and is not complete.
You will potentially have a garbage file to clean up if the program
(or system) crashes, but that is not racy in a classic sense, right?

Well. If people rsync from you, they will start fetching incomplete
foo.txt~. Plus the garbage issue.


That is not racy, just garbage (not trying to be pedantic, just trying to 
understand). I can see that the "~" file is annoying, but we have dealt with it 
for a *long* time :)


Until it has the right name (on either the source or target system for rsync), 
it is not the file you are looking for.



This is more of a garbage clean up issue?

Also. Plus sometimes you want temporary "file" that is
deleted. Terminals use it for history, etc...


There you would have a race, you can create a file and unlink it of course and 
still write to it, but you would have a potential empty file issue?


Ric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

Hi!

> Take a look at how many actively used filesystems out there that have
> some variant of sillyrename(), and explain what you want to do in those
> cases.
> >>>Well. Yes, there are non-unix filesystems around. You have to deal
> >>>with silly files on them, and this will not be different.
> >>So this would be a local POSIX filesystem only solution to a problem
> >>that has yet to be formulated?
> >Problem is "clasical create temp file then delete it" is racy. See the
> >archives. That is useful & common operation.
> 
> Which race are you concerned with exactly?
> 
> User wants to test for a file with name "foo.txt"
> 
> * create "foo.txt~" (or whatever)
> * write contents into "foo.txt~"
> * rename "foo.txt~" to "foo.txt"
> 
> Until rename is done, the file does not exists and is not complete.
> You will potentially have a garbage file to clean up if the program
> (or system) crashes, but that is not racy in a classic sense, right?

Well. If people rsync from you, they will start fetching incomplete
foo.txt~. Plus the garbage issue.

> This is more of a garbage clean up issue?

Also. Plus sometimes you want temporary "file" that is
deleted. Terminals use it for history, etc...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


On 03/31/2013 06:50 PM, Pavel Machek wrote:

On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote:

On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
be acceptable interface?

...and what's the big plan to make this work on anything other than ext4 and 
btrfs?

Deleted but open files are from original unix, so it should work on
anything unixy (minix, ext, ext2, ...).

minix, ext, ext2... are not under active development and haven't been
for more than a decade.

Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

Problem is "clasical create temp file then delete it" is racy. See the
archives. That is useful & common operation.


Which race are you concerned with exactly?

User wants to test for a file with name "foo.txt"

* create "foo.txt~" (or whatever)
* write contents into "foo.txt~"
* rename "foo.txt~" to "foo.txt"

Until rename is done, the file does not exists and is not complete. You will 
potentially have a garbage file to clean up if the program (or system) crashes, 
but that is not racy in a classic sense, right?


This is more of a garbage clean up issue?

Regards,

Ric



Problem is "atomicaly create file at target location with guaranteed
right content". That's also in the archives. Looks useful if someone
does rsync from your directory.

Non-POSIX filesystems have problems handling deleted files, but that
was always the case. That's one of the reasons they are seldomly used
for root filesystems.

Pavel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote:
> On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
> > > > > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) 
> > > > > > would
> > > > > > be acceptable interface?
> > > > > 
> > > > > ...and what's the big plan to make this work on anything other than 
> > > > > ext4 and btrfs?
> > > > 
> > > > Deleted but open files are from original unix, so it should work on
> > > > anything unixy (minix, ext, ext2, ...).
> > > 
> > > minix, ext, ext2... are not under active development and haven't been
> > > for more than a decade.
> > > 
> > > Take a look at how many actively used filesystems out there that have
> > > some variant of sillyrename(), and explain what you want to do in those
> > > cases.
> > 
> > Well. Yes, there are non-unix filesystems around. You have to deal
> > with silly files on them, and this will not be different.
> 
> So this would be a local POSIX filesystem only solution to a problem
> that has yet to be formulated?

Problem is "clasical create temp file then delete it" is racy. See the
archives. That is useful & common operation.

Problem is "atomicaly create file at target location with guaranteed
right content". That's also in the archives. Looks useful if someone
does rsync from your directory.

Non-POSIX filesystems have problems handling deleted files, but that
was always the case. That's one of the reasons they are seldomly used
for root filesystems.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-31 Thread Eric Wong

Pavel Machek  wrote:
> Eric Wong wrote:
> > [1] my splice() annoyances:
> > * need to create/manage a pipe
> > * copy size limited by pipe size
> > * doesn't reduce userspace syscalls (just data copy overhead)
> > * easy to misuse and starve with blocking sockets + big buffers
> > * not many users, so bugs creep in (v3.7.8 was the first usable
> >   version of the 3.7 series for TCP sockets)
> 
> Could library be created to make it less annoying to use, and harder
> to misuse?

Maybe, but getting people to use the library would be the hard, too.
And a library would not reduce syscalls in the common case.

We already have current->splice_pipe for sendfile, so maybe splice can
be taught to transparently use that when neither FD is a pipe.

I also think a SPLICE_F_DONTWAIT flag might be necessary.  It would be a
superset of SPLICE_F_NONBLOCK, but also act like MSG_DONTWAIT for the
non-pipe socket.

> splice man page does not mention pipe size limit...

It probably should.  I think I discovered it by using it many years ago
and burned it into my mind.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
> > > > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > > > be acceptable interface?
> > > > 
> > > > ...and what's the big plan to make this work on anything other than 
> > > > ext4 and btrfs?
> > > 
> > > Deleted but open files are from original unix, so it should work on
> > > anything unixy (minix, ext, ext2, ...).
> > 
> > minix, ext, ext2... are not under active development and haven't been
> > for more than a decade.
> > 
> > Take a look at how many actively used filesystems out there that have
> > some variant of sillyrename(), and explain what you want to do in those
> > cases.
> 
> Well. Yes, there are non-unix filesystems around. You have to deal
> with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


> > > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > > be acceptable interface?
> > > 
> > > ...and what's the big plan to make this work on anything other than ext4 
> > > and btrfs?
> > 
> > Deleted but open files are from original unix, so it should work on
> > anything unixy (minix, ext, ext2, ...).
> 
> minix, ext, ext2... are not under active development and haven't been
> for more than a decade.
> 
> Take a look at how many actively used filesystems out there that have
> some variant of sillyrename(), and explain what you want to do in those
> cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 09:36 +0200, Pavel Machek wrote:
> Hi!
> 
> > >>> Hmm, really? AFAICT it would be simple to provide an
> > >>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > >>> copy source file into it, then fsync(), then link it into filesystem.
> > >>> 
> > >>> That should have atomicity properties reflected.
> > >> 
> > >> Actually, the open_deleted_file() syscall is quite useful for many
> > >> different things all by itself.  Lots of applications need to create
> > >> temporary files that are unlinked at application failure (without a
> > >> race if app crashes after creating the file, but before unlinking).
> > >> It also avoids exposing temporary files into the namespace if other
> > >> applications are accessing the directory.
> > > 
> > > Hmm. open_deleted_file() will still need to get a directory... so it
> > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > > be acceptable interface?
> > 
> > ...and what's the big plan to make this work on anything other than ext4 
> > and btrfs?
> 
> Deleted but open files are from original unix, so it should work on
> anything unixy (minix, ext, ext2, ...).
>   Pavel

minix, ext, ext2... are not under active development and haven't been
for more than a decade.

Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-31 Thread Pádraig Brady

On 03/30/2013 08:08 PM, Andreas Dilger wrote:
> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>> Hmm, really? AFAICT it would be simple to provide an
>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>> copy source file into it, then fsync(), then link it into filesystem.
>>
>> That should have atomicity properties reflected.
> 
> Actually, the open_deleted_file() syscall is quite useful for many
> different things all by itself.  Lots of applications need to create
> temporary files that are unlinked at application failure (without a
> race if app crashes after creating the file, but before unlinking).
> It also avoids exposing temporary files into the namespace if other
> applications are accessing the directory.
> 
> We've added a library routine that does this for Lustre in a hackish
> way (magical filename created in target directory) for being able to
> migrate files between data servers, HSM, defragmentation, rsync, etc.
> 
> Cheers, Andreas

This reminds me of the flink() discussion:
http://marc.info/?l=linux-kernel=104965452917349

Also kinda related is the exchangedata() OSX system call to
"atomically exchange data between two files"

thanks,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!
On Sat 2013-03-30 22:38:35, AEDilger Gmail wrote:
> On 2013-03-30, at 14:45, Pavel Machek  wrote:
> > On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> >>> Hmm, really? AFAICT it would be simple to provide an
> >>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >>> copy source file into it, then fsync(), then link it into filesystem.
> >>> 
> >>> That should have atomicity properties reflected.
> >> 
> >> Actually, the open_deleted_file() syscall is quite useful for many
> >> different things all by itself.  Lots of applications need to create
> >> temporary files that are unlinked at application failure (without a
> >> race if app crashes after creating the file, but before unlinking).
> >> It also avoids exposing temporary files into the namespace if other
> >> applications are accessing the directory.
> > 
> > Hmm. open_deleted_file() will still need to get a directory... so it
> > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > be acceptable interface?
> 
> Yes, that would be reasonable, and/or possibly openat(fd, NULL, 
> AT_FDCWD|AT_UNLINKED)?

openat() is better interface for this, I'd say.

BTW... I don't think this has to be done at the same time as splice()
[or how it ends up being called] changes...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!

> >>> Hmm, really? AFAICT it would be simple to provide an
> >>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >>> copy source file into it, then fsync(), then link it into filesystem.
> >>> 
> >>> That should have atomicity properties reflected.
> >> 
> >> Actually, the open_deleted_file() syscall is quite useful for many
> >> different things all by itself.  Lots of applications need to create
> >> temporary files that are unlinked at application failure (without a
> >> race if app crashes after creating the file, but before unlinking).
> >> It also avoids exposing temporary files into the namespace if other
> >> applications are accessing the directory.
> > 
> > Hmm. open_deleted_file() will still need to get a directory... so it
> > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > be acceptable interface?
> 
> ...and what's the big plan to make this work on anything other than ext4 and 
> btrfs?

Deleted but open files are from original unix, so it should work on
anything unixy (minix, ext, ext2, ...).
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!

  Hmm, really? AFAICT it would be simple to provide an
  open_deleted_file(directory) syscall. You'd open_deleted_file(),
  copy source file into it, then fsync(), then link it into filesystem.
  
  That should have atomicity properties reflected.
  
  Actually, the open_deleted_file() syscall is quite useful for many
  different things all by itself.  Lots of applications need to create
  temporary files that are unlinked at application failure (without a
  race if app crashes after creating the file, but before unlinking).
  It also avoids exposing temporary files into the namespace if other
  applications are accessing the directory.
  
  Hmm. open_deleted_file() will still need to get a directory... so it
  will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
  be acceptable interface?
 
 ...and what's the big plan to make this work on anything other than ext4 and 
 btrfs?

Deleted but open files are from original unix, so it should work on
anything unixy (minix, ext, ext2, ...).
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!
On Sat 2013-03-30 22:38:35, AEDilger Gmail wrote:
 On 2013-03-30, at 14:45, Pavel Machek pa...@ucw.cz wrote:
  On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
  Hmm, really? AFAICT it would be simple to provide an
  open_deleted_file(directory) syscall. You'd open_deleted_file(),
  copy source file into it, then fsync(), then link it into filesystem.
  
  That should have atomicity properties reflected.
  
  Actually, the open_deleted_file() syscall is quite useful for many
  different things all by itself.  Lots of applications need to create
  temporary files that are unlinked at application failure (without a
  race if app crashes after creating the file, but before unlinking).
  It also avoids exposing temporary files into the namespace if other
  applications are accessing the directory.
  
  Hmm. open_deleted_file() will still need to get a directory... so it
  will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
  be acceptable interface?
 
 Yes, that would be reasonable, and/or possibly openat(fd, NULL, 
 AT_FDCWD|AT_UNLINKED)?

openat() is better interface for this, I'd say.

BTW... I don't think this has to be done at the same time as splice()
[or how it ends up being called] changes...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-31 Thread Pádraig Brady

On 03/30/2013 08:08 PM, Andreas Dilger wrote:
 On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
 Hmm, really? AFAICT it would be simple to provide an
 open_deleted_file(directory) syscall. You'd open_deleted_file(),
 copy source file into it, then fsync(), then link it into filesystem.

 That should have atomicity properties reflected.
 
 Actually, the open_deleted_file() syscall is quite useful for many
 different things all by itself.  Lots of applications need to create
 temporary files that are unlinked at application failure (without a
 race if app crashes after creating the file, but before unlinking).
 It also avoids exposing temporary files into the namespace if other
 applications are accessing the directory.
 
 We've added a library routine that does this for Lustre in a hackish
 way (magical filename created in target directory) for being able to
 migrate files between data servers, HSM, defragmentation, rsync, etc.
 
 Cheers, Andreas

This reminds me of the flink() discussion:
http://marc.info/?l=linux-kernelm=104965452917349

Also kinda related is the exchangedata() OSX system call to
atomically exchange data between two files

thanks,
Pádraig.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 09:36 +0200, Pavel Machek wrote:
 Hi!
 
   Hmm, really? AFAICT it would be simple to provide an
   open_deleted_file(directory) syscall. You'd open_deleted_file(),
   copy source file into it, then fsync(), then link it into filesystem.
   
   That should have atomicity properties reflected.
   
   Actually, the open_deleted_file() syscall is quite useful for many
   different things all by itself.  Lots of applications need to create
   temporary files that are unlinked at application failure (without a
   race if app crashes after creating the file, but before unlinking).
   It also avoids exposing temporary files into the namespace if other
   applications are accessing the directory.
   
   Hmm. open_deleted_file() will still need to get a directory... so it
   will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
   be acceptable interface?
  
  ...and what's the big plan to make this work on anything other than ext4 
  and btrfs?
 
 Deleted but open files are from original unix, so it should work on
 anything unixy (minix, ext, ext2, ...).
   Pavel

minix, ext, ext2... are not under active development and haven't been
for more than a decade.

Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
be acceptable interface?
   
   ...and what's the big plan to make this work on anything other than ext4 
   and btrfs?
  
  Deleted but open files are from original unix, so it should work on
  anything unixy (minix, ext, ext2, ...).
 
 minix, ext, ext2... are not under active development and haven't been
 for more than a decade.
 
 Take a look at how many actively used filesystems out there that have
 some variant of sillyrename(), and explain what you want to do in those
 cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
 Hmm. open_deleted_file() will still need to get a directory... so it
 will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
 be acceptable interface?

...and what's the big plan to make this work on anything other than 
ext4 and btrfs?
   
   Deleted but open files are from original unix, so it should work on
   anything unixy (minix, ext, ext2, ...).
  
  minix, ext, ext2... are not under active development and haven't been
  for more than a decade.
  
  Take a look at how many actively used filesystems out there that have
  some variant of sillyrename(), and explain what you want to do in those
  cases.
 
 Well. Yes, there are non-unix filesystems around. You have to deal
 with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-31 Thread Eric Wong

Pavel Machek pa...@ucw.cz wrote:
 Eric Wong wrote:
  [1] my splice() annoyances:
  * need to create/manage a pipe
  * copy size limited by pipe size
  * doesn't reduce userspace syscalls (just data copy overhead)
  * easy to misuse and starve with blocking sockets + big buffers
  * not many users, so bugs creep in (v3.7.8 was the first usable
version of the 3.7 series for TCP sockets)
 
 Could library be created to make it less annoying to use, and harder
 to misuse?

Maybe, but getting people to use the library would be the hard, too.
And a library would not reduce syscalls in the common case.

We already have current-splice_pipe for sendfile, so maybe splice can
be taught to transparently use that when neither FD is a pipe.

I also think a SPLICE_F_DONTWAIT flag might be necessary.  It would be a
superset of SPLICE_F_NONBLOCK, but also act like MSG_DONTWAIT for the
non-pipe socket.

 splice man page does not mention pipe size limit...

It probably should.  I think I discovered it by using it many years ago
and burned it into my mind.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote:
 On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:
  Hmm. open_deleted_file() will still need to get a directory... so it
  will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) 
  would
  be acceptable interface?
 
 ...and what's the big plan to make this work on anything other than 
 ext4 and btrfs?

Deleted but open files are from original unix, so it should work on
anything unixy (minix, ext, ext2, ...).
   
   minix, ext, ext2... are not under active development and haven't been
   for more than a decade.
   
   Take a look at how many actively used filesystems out there that have
   some variant of sillyrename(), and explain what you want to do in those
   cases.
  
  Well. Yes, there are non-unix filesystems around. You have to deal
  with silly files on them, and this will not be different.
 
 So this would be a local POSIX filesystem only solution to a problem
 that has yet to be formulated?

Problem is clasical create temp file then delete it is racy. See the
archives. That is useful  common operation.

Problem is atomicaly create file at target location with guaranteed
right content. That's also in the archives. Looks useful if someone
does rsync from your directory.

Non-POSIX filesystems have problems handling deleted files, but that
was always the case. That's one of the reasons they are seldomly used
for root filesystems.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


On 03/31/2013 06:50 PM, Pavel Machek wrote:

On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote:

On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote:

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
be acceptable interface?

...and what's the big plan to make this work on anything other than ext4 and 
btrfs?

Deleted but open files are from original unix, so it should work on
anything unixy (minix, ext, ext2, ...).

minix, ext, ext2... are not under active development and haven't been
for more than a decade.

Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

Problem is clasical create temp file then delete it is racy. See the
archives. That is useful  common operation.


Which race are you concerned with exactly?

User wants to test for a file with name foo.txt

* create foo.txt~ (or whatever)
* write contents into foo.txt~
* rename foo.txt~ to foo.txt

Until rename is done, the file does not exists and is not complete. You will 
potentially have a garbage file to clean up if the program (or system) crashes, 
but that is not racy in a classic sense, right?


This is more of a garbage clean up issue?

Regards,

Ric



Problem is atomicaly create file at target location with guaranteed
right content. That's also in the archives. Looks useful if someone
does rsync from your directory.

Non-POSIX filesystems have problems handling deleted files, but that
was always the case. That's one of the reasons they are seldomly used
for root filesystems.

Pavel


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

Hi!

 Take a look at how many actively used filesystems out there that have
 some variant of sillyrename(), and explain what you want to do in those
 cases.
 Well. Yes, there are non-unix filesystems around. You have to deal
 with silly files on them, and this will not be different.
 So this would be a local POSIX filesystem only solution to a problem
 that has yet to be formulated?
 Problem is clasical create temp file then delete it is racy. See the
 archives. That is useful  common operation.
 
 Which race are you concerned with exactly?
 
 User wants to test for a file with name foo.txt
 
 * create foo.txt~ (or whatever)
 * write contents into foo.txt~
 * rename foo.txt~ to foo.txt
 
 Until rename is done, the file does not exists and is not complete.
 You will potentially have a garbage file to clean up if the program
 (or system) crashes, but that is not racy in a classic sense, right?

Well. If people rsync from you, they will start fetching incomplete
foo.txt~. Plus the garbage issue.

 This is more of a garbage clean up issue?

Also. Plus sometimes you want temporary file that is
deleted. Terminals use it for history, etc...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?


On 03/31/2013 07:18 PM, Pavel Machek wrote:

Hi!


Take a look at how many actively used filesystems out there that have
some variant of sillyrename(), and explain what you want to do in those
cases.

Well. Yes, there are non-unix filesystems around. You have to deal
with silly files on them, and this will not be different.

So this would be a local POSIX filesystem only solution to a problem
that has yet to be formulated?

Problem is clasical create temp file then delete it is racy. See the
archives. That is useful  common operation.

Which race are you concerned with exactly?

User wants to test for a file with name foo.txt

* create foo.txt~ (or whatever)
* write contents into foo.txt~
* rename foo.txt~ to foo.txt

Until rename is done, the file does not exists and is not complete.
You will potentially have a garbage file to clean up if the program
(or system) crashes, but that is not racy in a classic sense, right?

Well. If people rsync from you, they will start fetching incomplete
foo.txt~. Plus the garbage issue.


That is not racy, just garbage (not trying to be pedantic, just trying to 
understand). I can see that the ~ file is annoying, but we have dealt with it 
for a *long* time :)


Until it has the right name (on either the source or target system for rsync), 
it is not the file you are looking for.



This is more of a garbage clean up issue?

Also. Plus sometimes you want temporary file that is
deleted. Terminals use it for history, etc...


There you would have a race, you can create a file and unlink it of course and 
still write to it, but you would have a potential empty file issue?


Ric


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?

Hi!

 User wants to test for a file with name foo.txt
 
 * create foo.txt~ (or whatever)
 * write contents into foo.txt~
 * rename foo.txt~ to foo.txt
 
 Until rename is done, the file does not exists and is not complete.
 You will potentially have a garbage file to clean up if the program
 (or system) crashes, but that is not racy in a classic sense, right?
 Well. If people rsync from you, they will start fetching incomplete
 foo.txt~. Plus the garbage issue.
 
 That is not racy, just garbage (not trying to be pedantic, just
 trying to understand). I can see that the ~ file is annoying, but
 we have dealt with it for a *long* time :)

Ok, so lets keep it at ~ is annoying :-).

[But... I was wrong. openat(..., AT_UNLINKED) is not enough to solve
this: we do not have flink() and it is not easily possible to link
deleted file back to life from /proc/self/fd:

pavel@amd:/tmp$  delme
pavel@amd:/tmp$ bash 3 delme 
[2] 32667
[2]+  Stopped bash 3 delme
pavel@amd:/tmp$ fg
bash 3 delme
pavel@amd:/tmp$ ls -al delme
-rw-r--r-- 1 pavel pavel 0 Apr  1 01:36 delme
pavel@amd:/tmp$ ls -al /proc/self/fd/3 
lr-x-- 1 pavel pavel 64 Apr  1 01:37 /proc/self/fd/3 - /tmp/delme
pavel@amd:/tmp$ rm delme
pavel@amd:/tmp$ ls -al /proc/self/fd/3 
lr-x-- 1 pavel pavel 64 Apr  1 01:37 /proc/self/fd/3 - /tmp/delme
(deleted)
pavel@amd:/tmp$ ln /proc/self/fd/3 delme2
ln: creating hard link `delme2' = `/proc/self/fd/3': Invalid
cross-device link
]

 This is more of a garbage clean up issue?
 Also. Plus sometimes you want temporary file that is
 deleted. Terminals use it for history, etc...
 
 There you would have a race, you can create a file and unlink it of
 course and still write to it, but you would have a potential empty
 file issue?

Yes. openat(..., AT_UNLINKED) solves that -- you'll no longer get
those files. (Not sure they'd be always empty. How do you ensure rm
hits the disk? fsync() on parent directory? Sounds expensive.)
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-30 Thread AEDilger Gmail

On 2013-03-30, at 14:45, Pavel Machek  wrote:
> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>>> Hmm, really? AFAICT it would be simple to provide an
>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>>> copy source file into it, then fsync(), then link it into filesystem.
>>> 
>>> That should have atomicity properties reflected.
>> 
>> Actually, the open_deleted_file() syscall is quite useful for many
>> different things all by itself.  Lots of applications need to create
>> temporary files that are unlinked at application failure (without a
>> race if app crashes after creating the file, but before unlinking).
>> It also avoids exposing temporary files into the namespace if other
>> applications are accessing the directory.
> 
> Hmm. open_deleted_file() will still need to get a directory... so it
> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> be acceptable interface?

Yes, that would be reasonable, and/or possibly openat(fd, NULL, 
AT_FDCWD|AT_UNLINKED)?

Cheers, Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 00:36 -0400, Trond Myklebust wrote:
> On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
> > On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
> >  wrote:
> > > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> > >> On 2013-03-30, at 16:21, Ric Wheeler  wrote:
> > >>
> > >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> > >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek 
> > >> >>  wrote:
> > >> >>
> > >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> > >>  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> > >> > Hmm, really? AFAICT it would be simple to provide an
> > >> > open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > >> > copy source file into it, then fsync(), then link it into 
> > >> > filesystem.
> > >> >
> > >> > That should have atomicity properties reflected.
> > >>  Actually, the open_deleted_file() syscall is quite useful for many
> > >>  different things all by itself.  Lots of applications need to create
> > >>  temporary files that are unlinked at application failure (without a
> > >>  race if app crashes after creating the file, but before unlinking).
> > >>  It also avoids exposing temporary files into the namespace if other
> > >>  applications are accessing the directory.
> > >> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> > >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> > >> >>> be acceptable interface?
> > >> >>>Pavel
> > >> >> ...and what's the big plan to make this work on anything other than 
> > >> >> ext4 and btrfs?
> > >> >>
> > >> >> Cheers,
> > >> >>   Trond
> > >> >
> > >> > I know that change can be a good thing, but are we really solving a 
> > >> > pressing problem given that application developers have dealt with 
> > >> > open/rename as the way to get "atomic" file creation for several 
> > >> > decades now ?
> > >>
> > >> Using open()+rename() has side effects:
> > >> - changes ctime/mtime on parent directory
> > >> - leaves temporary file in path during creation
> > >> - leaves temporary file in namespace during operations, and after crash
> > >
> > > So what is the actual problem that is being solved? Yes, the above may
> > > be disadvantages, but none of them have proven to be show-stoppers so
> > > far.
> > >
> > > So far, I've seen no justification for Andy's atomicity requirement
> > > other than "it would be nice if...". That's not enough IMO...
> > 
> > ISTM vpsendfile (or whatever it's called) plus a way to create deleted
> > files plus a way to relink deleted files gives atomic copies.  Perhaps
> > this is less efficient than would be ideal for OCFS2, though.
> 
> What real-life problem does the atomicity requirement solve? None of our
> customers have ever asked for it. They don't care...
> 
BTW: before you do answer, please note that the current NFSv4.2 solution
_does_ allow you to lock the file before you copy.

IOW: the same atomicity rules apply to offloaded copy as apply to
standard copy: there is no requirement anywhere to apply stronger
semantics. Surprisingly enough, that works for most people...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
> On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
>  wrote:
> > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> >> On 2013-03-30, at 16:21, Ric Wheeler  wrote:
> >>
> >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek 
> >> >>  wrote:
> >> >>
> >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> >>  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> >> > Hmm, really? AFAICT it would be simple to provide an
> >> > open_deleted_file("directory") syscall. You'd open_deleted_file(),
> >> > copy source file into it, then fsync(), then link it into filesystem.
> >> >
> >> > That should have atomicity properties reflected.
> >>  Actually, the open_deleted_file() syscall is quite useful for many
> >>  different things all by itself.  Lots of applications need to create
> >>  temporary files that are unlinked at application failure (without a
> >>  race if app crashes after creating the file, but before unlinking).
> >>  It also avoids exposing temporary files into the namespace if other
> >>  applications are accessing the directory.
> >> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> >> >>> be acceptable interface?
> >> >>>Pavel
> >> >> ...and what's the big plan to make this work on anything other than 
> >> >> ext4 and btrfs?
> >> >>
> >> >> Cheers,
> >> >>   Trond
> >> >
> >> > I know that change can be a good thing, but are we really solving a 
> >> > pressing problem given that application developers have dealt with 
> >> > open/rename as the way to get "atomic" file creation for several decades 
> >> > now ?
> >>
> >> Using open()+rename() has side effects:
> >> - changes ctime/mtime on parent directory
> >> - leaves temporary file in path during creation
> >> - leaves temporary file in namespace during operations, and after crash
> >
> > So what is the actual problem that is being solved? Yes, the above may
> > be disadvantages, but none of them have proven to be show-stoppers so
> > far.
> >
> > So far, I've seen no justification for Andy's atomicity requirement
> > other than "it would be nice if...". That's not enough IMO...
> 
> ISTM vpsendfile (or whatever it's called) plus a way to create deleted
> files plus a way to relink deleted files gives atomic copies.  Perhaps
> this is less efficient than would be ideal for OCFS2, though.

What real-life problem does the atomicity requirement solve? None of our
customers have ever asked for it. They don't care...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
 wrote:
> On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
>> On 2013-03-30, at 16:21, Ric Wheeler  wrote:
>>
>> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
>> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek 
>> >>  wrote:
>> >>
>> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>>  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>> > Hmm, really? AFAICT it would be simple to provide an
>> > open_deleted_file("directory") syscall. You'd open_deleted_file(),
>> > copy source file into it, then fsync(), then link it into filesystem.
>> >
>> > That should have atomicity properties reflected.
>>  Actually, the open_deleted_file() syscall is quite useful for many
>>  different things all by itself.  Lots of applications need to create
>>  temporary files that are unlinked at application failure (without a
>>  race if app crashes after creating the file, but before unlinking).
>>  It also avoids exposing temporary files into the namespace if other
>>  applications are accessing the directory.
>> >>> Hmm. open_deleted_file() will still need to get a directory... so it
>> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
>> >>> be acceptable interface?
>> >>>Pavel
>> >> ...and what's the big plan to make this work on anything other than ext4 
>> >> and btrfs?
>> >>
>> >> Cheers,
>> >>   Trond
>> >
>> > I know that change can be a good thing, but are we really solving a 
>> > pressing problem given that application developers have dealt with 
>> > open/rename as the way to get "atomic" file creation for several decades 
>> > now ?
>>
>> Using open()+rename() has side effects:
>> - changes ctime/mtime on parent directory
>> - leaves temporary file in path during creation
>> - leaves temporary file in namespace during operations, and after crash
>
> So what is the actual problem that is being solved? Yes, the above may
> be disadvantages, but none of them have proven to be show-stoppers so
> far.
>
> So far, I've seen no justification for Andy's atomicity requirement
> other than "it would be nice if...". That's not enough IMO...

ISTM vpsendfile (or whatever it's called) plus a way to create deleted
files plus a way to relink deleted files gives atomic copies.  Perhaps
this is less efficient than would be ideal for OCFS2, though.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
> On 2013-03-30, at 16:21, Ric Wheeler  wrote:
> 
> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek 
> >>  wrote:
> >> 
> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> > Hmm, really? AFAICT it would be simple to provide an
> > open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > copy source file into it, then fsync(), then link it into filesystem.
> > 
> > That should have atomicity properties reflected.
>  Actually, the open_deleted_file() syscall is quite useful for many
>  different things all by itself.  Lots of applications need to create
>  temporary files that are unlinked at application failure (without a
>  race if app crashes after creating the file, but before unlinking).
>  It also avoids exposing temporary files into the namespace if other
>  applications are accessing the directory.
> >>> Hmm. open_deleted_file() will still need to get a directory... so it
> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> >>> be acceptable interface?
> >>>Pavel
> >> ...and what's the big plan to make this work on anything other than ext4 
> >> and btrfs?
> >> 
> >> Cheers,
> >>   Trond
> > 
> > I know that change can be a good thing, but are we really solving a 
> > pressing problem given that application developers have dealt with 
> > open/rename as the way to get "atomic" file creation for several decades 
> > now ?
> 
> Using open()+rename() has side effects:
> - changes ctime/mtime on parent directory
> - leaves temporary file in path during creation
> - leaves temporary file in namespace during operations, and after crash

So what is the actual problem that is being solved? Yes, the above may
be disadvantages, but none of them have proven to be show-stoppers so
far.

So far, I've seen no justification for Andy's atomicity requirement
other than "it would be nice if...". That's not enough IMO...


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 16:21, Ric Wheeler  wrote:

> On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
>> On Mar 30, 2013, at 5:45 PM, Pavel Machek 
>>  wrote:
>> 
>>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
 On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> Hmm, really? AFAICT it would be simple to provide an
> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> copy source file into it, then fsync(), then link it into filesystem.
> 
> That should have atomicity properties reflected.
 Actually, the open_deleted_file() syscall is quite useful for many
 different things all by itself.  Lots of applications need to create
 temporary files that are unlinked at application failure (without a
 race if app crashes after creating the file, but before unlinking).
 It also avoids exposing temporary files into the namespace if other
 applications are accessing the directory.
>>> Hmm. open_deleted_file() will still need to get a directory... so it
>>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
>>> be acceptable interface?
>>>Pavel
>> ...and what's the big plan to make this work on anything other than ext4 and 
>> btrfs?
>> 
>> Cheers,
>>   Trond
> 
> I know that change can be a good thing, but are we really solving a pressing 
> problem given that application developers have dealt with open/rename as the 
> way to get "atomic" file creation for several decades now ?

Using open()+rename() has side effects:
- changes ctime/mtime on parent directory
- leaves temporary file in path during creation
- leaves temporary file in namespace during operations, and after crash

Cheers, Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-30 Thread Ric Wheeler


On 03/30/2013 05:57 PM, Myklebust, Trond wrote:

On Mar 30, 2013, at 5:45 PM, Pavel Machek 
  wrote:


On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:

On 2013-03-30, at 12:49 PM, Pavel Machek wrote:

Hmm, really? AFAICT it would be simple to provide an
open_deleted_file("directory") syscall. You'd open_deleted_file(),
copy source file into it, then fsync(), then link it into filesystem.

That should have atomicity properties reflected.

Actually, the open_deleted_file() syscall is quite useful for many
different things all by itself.  Lots of applications need to create
temporary files that are unlinked at application failure (without a
race if app crashes after creating the file, but before unlinking).
It also avoids exposing temporary files into the namespace if other
applications are accessing the directory.

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
be acceptable interface?
Pavel

...and what's the big plan to make this work on anything other than ext4 and 
btrfs?

Cheers,
   Trond


I know that change can be a good thing, but are we really solving a pressing 
problem given that application developers have dealt with open/rename as the way 
to get "atomic" file creation for several decades now ?


Regards,

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, Mar 30, 2013 at 12:49 PM, Pavel Machek  wrote:
> Hi!
>
>> > I thought the first thing people would ask for is to atomically create a
>> > new file and copy the old file into it (at least on local file systems).
>> >  The idea is that nothing should see an empty destination file, either
>> > by race or by crash.  (This feature would perhaps be described as a
>> > pony, but it should be implementable.)
>>
>> Having already wasted many week trying to implement your pony, I would
>> consider it about as possible as winning the lottery three times in a
>> row.  It clearly is in theory and yet,...
>
> Hmm, really? AFAICT it would be simple to provide 
> open_deleted_file("directory")
> syscall. You'd open_deleted_file(), copy source file into it, then
> fsync(), then link it into filesystem.

Isn't linking a deleted file back into the filesystem explicitly
forbidden?  I'm pretty sure that linking from /proc/fd/whatever
doesn't work.  (I've often wanted a flink system call that takes a
file descriptor and links it somewhere.  If it came with an option to
control whether it would overwrite an existing file, even better.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?


On Mar 30, 2013, at 5:45 PM, Pavel Machek 
 wrote:

> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
>> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
>>> Hmm, really? AFAICT it would be simple to provide an
>>> open_deleted_file("directory") syscall. You'd open_deleted_file(),
>>> copy source file into it, then fsync(), then link it into filesystem.
>>> 
>>> That should have atomicity properties reflected.
>> 
>> Actually, the open_deleted_file() syscall is quite useful for many
>> different things all by itself.  Lots of applications need to create
>> temporary files that are unlinked at application failure (without a
>> race if app crashes after creating the file, but before unlinking).
>> It also avoids exposing temporary files into the namespace if other
>> applications are accessing the directory.
> 
> Hmm. open_deleted_file() will still need to get a directory... so it
> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
> be acceptable interface?
>   Pavel

...and what's the big plan to make this work on anything other than ext4 and 
btrfs?

Cheers,
  Trond--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
> On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> > Hmm, really? AFAICT it would be simple to provide an
> > open_deleted_file("directory") syscall. You'd open_deleted_file(),
> > copy source file into it, then fsync(), then link it into filesystem.
> > 
> > That should have atomicity properties reflected.
> 
> Actually, the open_deleted_file() syscall is quite useful for many
> different things all by itself.  Lots of applications need to create
> temporary files that are unlinked at application failure (without a
> race if app crashes after creating the file, but before unlinking).
> It also avoids exposing temporary files into the namespace if other
> applications are accessing the directory.

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would
be acceptable interface?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
> Hmm, really? AFAICT it would be simple to provide an
> open_deleted_file("directory") syscall. You'd open_deleted_file(),
> copy source file into it, then fsync(), then link it into filesystem.
> 
> That should have atomicity properties reflected.

Actually, the open_deleted_file() syscall is quite useful for many
different things all by itself.  Lots of applications need to create
temporary files that are unlinked at application failure (without a
race if app crashes after creating the file, but before unlinking).
It also avoids exposing temporary files into the namespace if other
applications are accessing the directory.

We've added a library routine that does this for Lustre in a hackish
way (magical filename created in target directory) for being able to
migrate files between data servers, HSM, defragmentation, rsync, etc.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!

> > I thought the first thing people would ask for is to atomically create a
> > new file and copy the old file into it (at least on local file systems).
> >  The idea is that nothing should see an empty destination file, either
> > by race or by crash.  (This feature would perhaps be described as a
> > pony, but it should be implementable.)
> 
> Having already wasted many week trying to implement your pony, I would
> consider it about as possible as winning the lottery three times in a
> row.  It clearly is in theory and yet,...

Hmm, really? AFAICT it would be simple to provide open_deleted_file("directory")
syscall. You'd open_deleted_file(), copy source file into it, then
fsync(), then link it into filesystem.

That should have atomicity properties reflected.
Pavel
(who has too 
many (*)
ponies 
around)
(*) 1 is sometimes too many when we talk about big mammals.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!

> > > If I'm guessing correctly, sendfile64()+flags would be annoying because 
> > > it's
> > > missing an out_fd_offset.  The host will want to offload the guest's 
> > > copies by
> > > calling sendfile on block ranges of a guest disk image file that 
> > > correspond to
> > > the mappings of the in and out files in the guest.
> > > 
> > > You could make it work with some locking and out_fd seeking to set the
> > > write offset before calling sendfile64()+flags, but ugh.
> > > 
> > >  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> > >   out_offset, size_t count, int flags);
> > > 
> > > That seems closer.
> > 
> > psendfile() ?
> > 
> > I fully agree that sounds reasonable... Just being an ass. :-)
> 
> splice() already has offset for both fds and a flags arg:
> 
>ssize_t splice(int fd_in, loff_t *off_in, int fd_out,
>   loff_t *off_out, size_t len, unsigned int flags);
> 
> The current downside is it requires one fd to be a pipe, so it's
> just not very easy to use from my perspective[1].
...
> [1] my splice() annoyances:
> * need to create/manage a pipe
> * copy size limited by pipe size
> * doesn't reduce userspace syscalls (just data copy overhead)
> * easy to misuse and starve with blocking sockets + big buffers
> * not many users, so bugs creep in (v3.7.8 was the first usable
>   version of the 3.7 series for TCP sockets)

Could library be created to make it less annoying to use, and harder
to misuse?

splice man page does not mention pipe size limit... 
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!

   If I'm guessing correctly, sendfile64()+flags would be annoying because 
   it's
   missing an out_fd_offset.  The host will want to offload the guest's 
   copies by
   calling sendfile on block ranges of a guest disk image file that 
   correspond to
   the mappings of the in and out files in the guest.
   
   You could make it work with some locking and out_fd seeking to set the
   write offset before calling sendfile64()+flags, but ugh.
   
ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
 out_offset, size_t count, int flags);
   
   That seems closer.
  
  psendfile() ?
  
  I fully agree that sounds reasonable... Just being an ass. :-)
 
 splice() already has offset for both fds and a flags arg:
 
ssize_t splice(int fd_in, loff_t *off_in, int fd_out,
   loff_t *off_out, size_t len, unsigned int flags);
 
 The current downside is it requires one fd to be a pipe, so it's
 just not very easy to use from my perspective[1].
...
 [1] my splice() annoyances:
 * need to create/manage a pipe
 * copy size limited by pipe size
 * doesn't reduce userspace syscalls (just data copy overhead)
 * easy to misuse and starve with blocking sockets + big buffers
 * not many users, so bugs creep in (v3.7.8 was the first usable
   version of the 3.7 series for TCP sockets)

Could library be created to make it less annoying to use, and harder
to misuse?

splice man page does not mention pipe size limit... 
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

Hi!

  I thought the first thing people would ask for is to atomically create a
  new file and copy the old file into it (at least on local file systems).
   The idea is that nothing should see an empty destination file, either
  by race or by crash.  (This feature would perhaps be described as a
  pony, but it should be implementable.)
 
 Having already wasted many week trying to implement your pony, I would
 consider it about as possible as winning the lottery three times in a
 row.  It clearly is in theory and yet,...

Hmm, really? AFAICT it would be simple to provide open_deleted_file(directory)
syscall. You'd open_deleted_file(), copy source file into it, then
fsync(), then link it into filesystem.

That should have atomicity properties reflected.
Pavel
(who has too 
many (*)
ponies 
around)
(*) 1 is sometimes too many when we talk about big mammals.
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
 Hmm, really? AFAICT it would be simple to provide an
 open_deleted_file(directory) syscall. You'd open_deleted_file(),
 copy source file into it, then fsync(), then link it into filesystem.
 
 That should have atomicity properties reflected.

Actually, the open_deleted_file() syscall is quite useful for many
different things all by itself.  Lots of applications need to create
temporary files that are unlinked at application failure (without a
race if app crashes after creating the file, but before unlinking).
It also avoids exposing temporary files into the namespace if other
applications are accessing the directory.

We've added a library routine that does this for Lustre in a hackish
way (magical filename created in target directory) for being able to
migrate files between data servers, HSM, defragmentation, rsync, etc.

Cheers, Andreas





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
 On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
  Hmm, really? AFAICT it would be simple to provide an
  open_deleted_file(directory) syscall. You'd open_deleted_file(),
  copy source file into it, then fsync(), then link it into filesystem.
  
  That should have atomicity properties reflected.
 
 Actually, the open_deleted_file() syscall is quite useful for many
 different things all by itself.  Lots of applications need to create
 temporary files that are unlinked at application failure (without a
 race if app crashes after creating the file, but before unlinking).
 It also avoids exposing temporary files into the namespace if other
 applications are accessing the directory.

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
be acceptable interface?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?


On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
 wrote:

 On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
 On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
 Hmm, really? AFAICT it would be simple to provide an
 open_deleted_file(directory) syscall. You'd open_deleted_file(),
 copy source file into it, then fsync(), then link it into filesystem.
 
 That should have atomicity properties reflected.
 
 Actually, the open_deleted_file() syscall is quite useful for many
 different things all by itself.  Lots of applications need to create
 temporary files that are unlinked at application failure (without a
 race if app crashes after creating the file, but before unlinking).
 It also avoids exposing temporary files into the namespace if other
 applications are accessing the directory.
 
 Hmm. open_deleted_file() will still need to get a directory... so it
 will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
 be acceptable interface?
   Pavel

...and what's the big plan to make this work on anything other than ext4 and 
btrfs?

Cheers,
  Trond--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, Mar 30, 2013 at 12:49 PM, Pavel Machek pa...@ucw.cz wrote:
 Hi!

  I thought the first thing people would ask for is to atomically create a
  new file and copy the old file into it (at least on local file systems).
   The idea is that nothing should see an empty destination file, either
  by race or by crash.  (This feature would perhaps be described as a
  pony, but it should be implementable.)

 Having already wasted many week trying to implement your pony, I would
 consider it about as possible as winning the lottery three times in a
 row.  It clearly is in theory and yet,...

 Hmm, really? AFAICT it would be simple to provide 
 open_deleted_file(directory)
 syscall. You'd open_deleted_file(), copy source file into it, then
 fsync(), then link it into filesystem.

Isn't linking a deleted file back into the filesystem explicitly
forbidden?  I'm pretty sure that linking from /proc/fd/whatever
doesn't work.  (I've often wanted a flink system call that takes a
file descriptor and links it somewhere.  If it came with an option to
control whether it would overwrite an existing file, even better.)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-30 Thread Ric Wheeler


On 03/30/2013 05:57 PM, Myklebust, Trond wrote:

On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
  wrote:


On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:

On 2013-03-30, at 12:49 PM, Pavel Machek wrote:

Hmm, really? AFAICT it would be simple to provide an
open_deleted_file(directory) syscall. You'd open_deleted_file(),
copy source file into it, then fsync(), then link it into filesystem.

That should have atomicity properties reflected.

Actually, the open_deleted_file() syscall is quite useful for many
different things all by itself.  Lots of applications need to create
temporary files that are unlinked at application failure (without a
race if app crashes after creating the file, but before unlinking).
It also avoids exposing temporary files into the namespace if other
applications are accessing the directory.

Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
be acceptable interface?
Pavel

...and what's the big plan to make this work on anything other than ext4 and 
btrfs?

Cheers,
   Trond


I know that change can be a good thing, but are we really solving a pressing 
problem given that application developers have dealt with open/rename as the way 
to get atomic file creation for several decades now ?


Regards,

Ric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote:

 On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
 On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
  wrote:
 
 On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
 On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
 Hmm, really? AFAICT it would be simple to provide an
 open_deleted_file(directory) syscall. You'd open_deleted_file(),
 copy source file into it, then fsync(), then link it into filesystem.
 
 That should have atomicity properties reflected.
 Actually, the open_deleted_file() syscall is quite useful for many
 different things all by itself.  Lots of applications need to create
 temporary files that are unlinked at application failure (without a
 race if app crashes after creating the file, but before unlinking).
 It also avoids exposing temporary files into the namespace if other
 applications are accessing the directory.
 Hmm. open_deleted_file() will still need to get a directory... so it
 will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
 be acceptable interface?
Pavel
 ...and what's the big plan to make this work on anything other than ext4 and 
 btrfs?
 
 Cheers,
   Trond
 
 I know that change can be a good thing, but are we really solving a pressing 
 problem given that application developers have dealt with open/rename as the 
 way to get atomic file creation for several decades now ?

Using open()+rename() has side effects:
- changes ctime/mtime on parent directory
- leaves temporary file in path during creation
- leaves temporary file in namespace during operations, and after crash

Cheers, Andreas--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
 On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote:
 
  On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
  On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
   wrote:
  
  On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
  Hmm, really? AFAICT it would be simple to provide an
  open_deleted_file(directory) syscall. You'd open_deleted_file(),
  copy source file into it, then fsync(), then link it into filesystem.
  
  That should have atomicity properties reflected.
  Actually, the open_deleted_file() syscall is quite useful for many
  different things all by itself.  Lots of applications need to create
  temporary files that are unlinked at application failure (without a
  race if app crashes after creating the file, but before unlinking).
  It also avoids exposing temporary files into the namespace if other
  applications are accessing the directory.
  Hmm. open_deleted_file() will still need to get a directory... so it
  will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
  be acceptable interface?
 Pavel
  ...and what's the big plan to make this work on anything other than ext4 
  and btrfs?
  
  Cheers,
Trond
  
  I know that change can be a good thing, but are we really solving a 
  pressing problem given that application developers have dealt with 
  open/rename as the way to get atomic file creation for several decades 
  now ?
 
 Using open()+rename() has side effects:
 - changes ctime/mtime on parent directory
 - leaves temporary file in path during creation
 - leaves temporary file in namespace during operations, and after crash

So what is the actual problem that is being solved? Yes, the above may
be disadvantages, but none of them have proven to be show-stoppers so
far.

So far, I've seen no justification for Andy's atomicity requirement
other than it would be nice if That's not enough IMO...


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
trond.mykleb...@netapp.com wrote:
 On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
 On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote:

  On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
  On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
   wrote:
 
  On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
  On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
  Hmm, really? AFAICT it would be simple to provide an
  open_deleted_file(directory) syscall. You'd open_deleted_file(),
  copy source file into it, then fsync(), then link it into filesystem.
 
  That should have atomicity properties reflected.
  Actually, the open_deleted_file() syscall is quite useful for many
  different things all by itself.  Lots of applications need to create
  temporary files that are unlinked at application failure (without a
  race if app crashes after creating the file, but before unlinking).
  It also avoids exposing temporary files into the namespace if other
  applications are accessing the directory.
  Hmm. open_deleted_file() will still need to get a directory... so it
  will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
  be acceptable interface?
 Pavel
  ...and what's the big plan to make this work on anything other than ext4 
  and btrfs?
 
  Cheers,
Trond
 
  I know that change can be a good thing, but are we really solving a 
  pressing problem given that application developers have dealt with 
  open/rename as the way to get atomic file creation for several decades 
  now ?

 Using open()+rename() has side effects:
 - changes ctime/mtime on parent directory
 - leaves temporary file in path during creation
 - leaves temporary file in namespace during operations, and after crash

 So what is the actual problem that is being solved? Yes, the above may
 be disadvantages, but none of them have proven to be show-stoppers so
 far.

 So far, I've seen no justification for Andy's atomicity requirement
 other than it would be nice if That's not enough IMO...

ISTM vpsendfile (or whatever it's called) plus a way to create deleted
files plus a way to relink deleted files gives atomic copies.  Perhaps
this is less efficient than would be ideal for OCFS2, though.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
 On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
 trond.mykleb...@netapp.com wrote:
  On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
  On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote:
 
   On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
   On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
wrote:
  
   On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
   On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
   Hmm, really? AFAICT it would be simple to provide an
   open_deleted_file(directory) syscall. You'd open_deleted_file(),
   copy source file into it, then fsync(), then link it into filesystem.
  
   That should have atomicity properties reflected.
   Actually, the open_deleted_file() syscall is quite useful for many
   different things all by itself.  Lots of applications need to create
   temporary files that are unlinked at application failure (without a
   race if app crashes after creating the file, but before unlinking).
   It also avoids exposing temporary files into the namespace if other
   applications are accessing the directory.
   Hmm. open_deleted_file() will still need to get a directory... so it
   will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
   be acceptable interface?
  Pavel
   ...and what's the big plan to make this work on anything other than 
   ext4 and btrfs?
  
   Cheers,
 Trond
  
   I know that change can be a good thing, but are we really solving a 
   pressing problem given that application developers have dealt with 
   open/rename as the way to get atomic file creation for several decades 
   now ?
 
  Using open()+rename() has side effects:
  - changes ctime/mtime on parent directory
  - leaves temporary file in path during creation
  - leaves temporary file in namespace during operations, and after crash
 
  So what is the actual problem that is being solved? Yes, the above may
  be disadvantages, but none of them have proven to be show-stoppers so
  far.
 
  So far, I've seen no justification for Andy's atomicity requirement
  other than it would be nice if That's not enough IMO...
 
 ISTM vpsendfile (or whatever it's called) plus a way to create deleted
 files plus a way to relink deleted files gives atomic copies.  Perhaps
 this is less efficient than would be ideal for OCFS2, though.

What real-life problem does the atomicity requirement solve? None of our
customers have ever asked for it. They don't care...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Sun, 2013-03-31 at 00:36 -0400, Trond Myklebust wrote:
 On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote:
  On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond
  trond.mykleb...@netapp.com wrote:
   On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote:
   On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote:
  
On 03/30/2013 05:57 PM, Myklebust, Trond wrote:
On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz
 wrote:
   
On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
Hmm, really? AFAICT it would be simple to provide an
open_deleted_file(directory) syscall. You'd open_deleted_file(),
copy source file into it, then fsync(), then link it into 
filesystem.
   
That should have atomicity properties reflected.
Actually, the open_deleted_file() syscall is quite useful for many
different things all by itself.  Lots of applications need to create
temporary files that are unlinked at application failure (without a
race if app crashes after creating the file, but before unlinking).
It also avoids exposing temporary files into the namespace if other
applications are accessing the directory.
Hmm. open_deleted_file() will still need to get a directory... so it
will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
be acceptable interface?
   Pavel
...and what's the big plan to make this work on anything other than 
ext4 and btrfs?
   
Cheers,
  Trond
   
I know that change can be a good thing, but are we really solving a 
pressing problem given that application developers have dealt with 
open/rename as the way to get atomic file creation for several 
decades now ?
  
   Using open()+rename() has side effects:
   - changes ctime/mtime on parent directory
   - leaves temporary file in path during creation
   - leaves temporary file in namespace during operations, and after crash
  
   So what is the actual problem that is being solved? Yes, the above may
   be disadvantages, but none of them have proven to be show-stoppers so
   far.
  
   So far, I've seen no justification for Andy's atomicity requirement
   other than it would be nice if That's not enough IMO...
  
  ISTM vpsendfile (or whatever it's called) plus a way to create deleted
  files plus a way to relink deleted files gives atomic copies.  Perhaps
  this is less efficient than would be ideal for OCFS2, though.
 
 What real-life problem does the atomicity requirement solve? None of our
 customers have ever asked for it. They don't care...
 
BTW: before you do answer, please note that the current NFSv4.2 solution
_does_ allow you to lock the file before you copy.

IOW: the same atomicity rules apply to offloaded copy as apply to
standard copy: there is no requirement anywhere to apply stronger
semantics. Surprisingly enough, that works for most people...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-30 Thread AEDilger Gmail

On 2013-03-30, at 14:45, Pavel Machek pa...@ucw.cz wrote:
 On Sat 2013-03-30 13:08:39, Andreas Dilger wrote:
 On 2013-03-30, at 12:49 PM, Pavel Machek wrote:
 Hmm, really? AFAICT it would be simple to provide an
 open_deleted_file(directory) syscall. You'd open_deleted_file(),
 copy source file into it, then fsync(), then link it into filesystem.
 
 That should have atomicity properties reflected.
 
 Actually, the open_deleted_file() syscall is quite useful for many
 different things all by itself.  Lots of applications need to create
 temporary files that are unlinked at application failure (without a
 race if app crashes after creating the file, but before unlinking).
 It also avoids exposing temporary files into the namespace if other
 applications are accessing the directory.
 
 Hmm. open_deleted_file() will still need to get a directory... so it
 will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would
 be acceptable interface?

Yes, that would be reasonable, and/or possibly openat(fd, NULL, 
AT_FDCWD|AT_UNLINKED)?

Cheers, Andreas--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-11 Thread Joel Becker

On Mon, Feb 25, 2013 at 04:03:01PM -0800, Zach Brown wrote:
> > >   I think it would be neat if it couldn't
> > > corrupt data.
> > 
> > It would also be neat if the moon were made of cheese...
> 
> And there we have the lsf2013 t-shirt slogan.  I think we're done here!
> 
> - z

Hey Everyone,
So, of course, this thread happened while I was celebrating my
10-year anniversary on a warm, sunny island.  I won't trade.  But let me
drop my $0.02 in here.
First, we have our T-shirt slogan.  That overrides every other
concern.
Second, I agree that moving forward on anything is better than
not.  I haven't delivered the updated fastcopy(2) patch I promised two
years ago, and I have to admit that I can't promise code on any sane
timeframe.
Back when I was working on this, I thought that link(2) was a
good model for a full-file copy.  Thus I came up with reflink(2).  This
eventually became the fastcopyu(2) proposal discussed two years ago.  I
did not think, and I still don't think, that we should conflate the API
for "copy/clone this file in some way" (ala fastcopy(2)) with
"duplicate/link this range of bytes" (ala BTRFS_IOC_CLONE_RANGE).  I
thought that splice(2) or something like it was a better fit for ranges;
this thread has already had the same thought.
fastcopy(2) had a provision for CoW for atomicity, including
metadata.  This is because ocfs2 reflinks *can* provide atomic clones
with metadata included.  I would like any new proposal to allow for
that.  If it does not, of course, callers can continue to use
OCFS2_IOC_REFLINK, but I'd rather make it part of the generic behavior,
so that generic tools come with it.

Joel

-- 

"You don't make the poor richer by making the rich poorer."
- Sir Winston Churchill

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-03-11 Thread Joel Becker

On Mon, Feb 25, 2013 at 04:03:01PM -0800, Zach Brown wrote:
 I think it would be neat if it couldn't
   corrupt data.
  
  It would also be neat if the moon were made of cheese...
 
 And there we have the lsf2013 t-shirt slogan.  I think we're done here!
 
 - z

Hey Everyone,
So, of course, this thread happened while I was celebrating my
10-year anniversary on a warm, sunny island.  I won't trade.  But let me
drop my $0.02 in here.
First, we have our T-shirt slogan.  That overrides every other
concern.
Second, I agree that moving forward on anything is better than
not.  I haven't delivered the updated fastcopy(2) patch I promised two
years ago, and I have to admit that I can't promise code on any sane
timeframe.
Back when I was working on this, I thought that link(2) was a
good model for a full-file copy.  Thus I came up with reflink(2).  This
eventually became the fastcopyu(2) proposal discussed two years ago.  I
did not think, and I still don't think, that we should conflate the API
for copy/clone this file in some way (ala fastcopy(2)) with
duplicate/link this range of bytes (ala BTRFS_IOC_CLONE_RANGE).  I
thought that splice(2) or something like it was a better fit for ranges;
this thread has already had the same thought.
fastcopy(2) had a provision for CoW for atomicity, including
metadata.  This is because ocfs2 reflinks *can* provide atomic clones
with metadata included.  I would like any new proposal to allow for
that.  If it does not, of course, callers can continue to use
OCFS2_IOC_REFLINK, but I'd rather make it part of the generic behavior,
so that generic tools come with it.

Joel

-- 

You don't make the poor richer by making the rich poorer.
- Sir Winston Churchill

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-26 Thread Andy Lutomirski

On Tue, Feb 26, 2013 at 1:02 PM, Jörn Engel  wrote:
> On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote:
>>
>> I thought the first thing people would ask for is to atomically create a
>> new file and copy the old file into it (at least on local file systems).
>>  The idea is that nothing should see an empty destination file, either
>> by race or by crash.  (This feature would perhaps be described as a
>> pony, but it should be implementable.)
>
> Having already wasted many week trying to implement your pony, I would
> consider it about as possible as winning the lottery three times in a
> row.  It clearly is in theory and yet,...
>
> If you take a filesystem like ext[34] you are out of luck.  In those
> filesystems it may not even be theoretically possible to get the
> cleanup right for pathological cases.  And if you ignore pathological
> cases and depend on userspace to do the cleanup for you, you have to
> do ABI extentions that I don't want to mention with Al on Cc:.  My
> personal notebook ran such a kernel for several years until hardware
> improved to a point that I no longer wanted to forward-port the
> patches.  It worked but it was far from pretty.
>
> If you have a filesystem where you can simply bumb a reference count
> to copy the file content, implementation is fairly straightforward.
> But having a system call that is effectively limited to btrfs means
> pretty much noone will use it - beside the people looking for
> potential kernel exploits.

:)

>
> So my vote clearly goes to some variant of sendfile or splice.

Don't get me wrong -- the vpsendfile (or whatever it's called) idea
sounds extremely useful too.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-26 Thread Jörn Engel

On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote:
> 
> I thought the first thing people would ask for is to atomically create a
> new file and copy the old file into it (at least on local file systems).
>  The idea is that nothing should see an empty destination file, either
> by race or by crash.  (This feature would perhaps be described as a
> pony, but it should be implementable.)

Having already wasted many week trying to implement your pony, I would
consider it about as possible as winning the lottery three times in a
row.  It clearly is in theory and yet,...

If you take a filesystem like ext[34] you are out of luck.  In those
filesystems it may not even be theoretically possible to get the
cleanup right for pathological cases.  And if you ignore pathological
cases and depend on userspace to do the cleanup for you, you have to
do ABI extentions that I don't want to mention with Al on Cc:.  My
personal notebook ran such a kernel for several years until hardware
improved to a point that I no longer wanted to forward-port the
patches.  It worked but it was far from pretty.

If you have a filesystem where you can simply bumb a reference count
to copy the file content, implementation is fairly straightforward.
But having a system call that is effectively limited to btrfs means
pretty much noone will use it - beside the people looking for
potential kernel exploits.

So my vote clearly goes to some variant of sendfile or splice.

Jörn

--
Man darf nicht das, was uns unwahrscheinlich und unnatürlich erscheint,
mit dem verwechseln, was absolut unmöglich ist.
-- Carl Friedrich Gauß
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-26 Thread Jörn Engel

On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote:
 
 I thought the first thing people would ask for is to atomically create a
 new file and copy the old file into it (at least on local file systems).
  The idea is that nothing should see an empty destination file, either
 by race or by crash.  (This feature would perhaps be described as a
 pony, but it should be implementable.)

Having already wasted many week trying to implement your pony, I would
consider it about as possible as winning the lottery three times in a
row.  It clearly is in theory and yet,...

If you take a filesystem like ext[34] you are out of luck.  In those
filesystems it may not even be theoretically possible to get the
cleanup right for pathological cases.  And if you ignore pathological
cases and depend on userspace to do the cleanup for you, you have to
do ABI extentions that I don't want to mention with Al on Cc:.  My
personal notebook ran such a kernel for several years until hardware
improved to a point that I no longer wanted to forward-port the
patches.  It worked but it was far from pretty.

If you have a filesystem where you can simply bumb a reference count
to copy the file content, implementation is fairly straightforward.
But having a system call that is effectively limited to btrfs means
pretty much noone will use it - beside the people looking for
potential kernel exploits.

So my vote clearly goes to some variant of sendfile or splice.

Jörn

--
Man darf nicht das, was uns unwahrscheinlich und unnatürlich erscheint,
mit dem verwechseln, was absolut unmöglich ist.
-- Carl Friedrich Gauß
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-26 Thread Andy Lutomirski

On Tue, Feb 26, 2013 at 1:02 PM, Jörn Engel jo...@logfs.org wrote:
 On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote:

 I thought the first thing people would ask for is to atomically create a
 new file and copy the old file into it (at least on local file systems).
  The idea is that nothing should see an empty destination file, either
 by race or by crash.  (This feature would perhaps be described as a
 pony, but it should be implementable.)

 Having already wasted many week trying to implement your pony, I would
 consider it about as possible as winning the lottery three times in a
 row.  It clearly is in theory and yet,...

 If you take a filesystem like ext[34] you are out of luck.  In those
 filesystems it may not even be theoretically possible to get the
 cleanup right for pathological cases.  And if you ignore pathological
 cases and depend on userspace to do the cleanup for you, you have to
 do ABI extentions that I don't want to mention with Al on Cc:.  My
 personal notebook ran such a kernel for several years until hardware
 improved to a point that I no longer wanted to forward-port the
 patches.  It worked but it was far from pretty.

 If you have a filesystem where you can simply bumb a reference count
 to copy the file content, implementation is fairly straightforward.
 But having a system call that is effectively limited to btrfs means
 pretty much noone will use it - beside the people looking for
 potential kernel exploits.

:)


 So my vote clearly goes to some variant of sendfile or splice.

Don't get me wrong -- the vpsendfile (or whatever it's called) idea
sounds extremely useful too.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Zach Brown

> >   I think it would be neat if it couldn't
> > corrupt data.
> 
> It would also be neat if the moon were made of cheese...

And there we have the lsf2013 t-shirt slogan.  I think we're done here!

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, 2013-02-25 at 15:35 -0800, Andy Lutomirski wrote:
> On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond
>  wrote:
> > On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
> >> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
> >>  wrote:
> >> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> >> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> >> >> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> >> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> >> >>  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >> >> >> sendfile64() pretty much already has the right arguments for a
> >> >> >> "copyfile", however it would be nice to add a 'flags' parameter: 
> >> >> >> the
> >> >> >> NFSv4.2 version would use that to specify whether or not to copy 
> >> >> >> file
> >> >> >> metadata.
> >> >> > That would seem to be enough to me and has the advantage that it 
> >> >> > is an
> >> >> > relatively obvious extension to something that is at least not 
> >> >> > totally
> >> >> > unknown to developers.
> >> >> >
> >> >> > Do we need more than that for non-NFS paths I wonder? What does 
> >> >> > reflink
> >> >> > need or the SCSI mechanism?
> >> >>  For virt we would like to be able to specify arbitrary block 
> >> >>  ranges.
> >> >>  Copying an entire file helps some copy operations like storage
> >> >>  migration.  However, it is not enough to convert the guest's 
> >> >>  offloaded
> >> >>  copies to host-side offloaded copies.
> >> >> >>> So how would a system call based on sendfile64() plus my flag 
> >> >> >>> parameter
> >> >> >>> prevent an underlying implementation from meeting your criterion?
> >> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying 
> >> >> >> because
> >> >> >> it's missing an out_fd_offset.  The host will want to offload the
> >> >> >> guest's copies by calling sendfile on block ranges of a guest disk 
> >> >> >> image
> >> >> >> file that correspond to the mappings of the in and out files in the
> >> >> >> guest.
> >> >> >>
> >> >> >> You could make it work with some locking and out_fd seeking to set 
> >> >> >> the
> >> >> >> write offset before calling sendfile64()+flags, but ugh.
> >> >> >>
> >> >> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >> >> >>out_offset, size_t count, int flags);
> >> >> >>
> >> >> >> That seems closer.
> >> >> >>
> >> >> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> >> >> because that's the very first thing that's going to be requested 
> >> >> >> after
> >> >> >> people prototype having to iterate calling sendfile() for each
> >> >> >> contiguous copy region.
> >> >> > I thought the first thing people would ask for is to atomically 
> >> >> > create a
> >> >> > new file and copy the old file into it (at least on local file 
> >> >> > systems).
> >> >> >   The idea is that nothing should see an empty destination file, 
> >> >> > either
> >> >> > by race or by crash.  (This feature would perhaps be described as a
> >> >> > pony, but it should be implementable.)
> >> >> >
> >> >> > This would be like a better link(2).
> >> >> >
> >> >> > --Andy
> >> >>
> >> >> Why would this need to be atomic? That would seem to be a very difficult
> >> >> property to provide across all target types with multi-GB sized files...
> >> >
> >> > Right. It may sound cool, but what's the real-life use case?
> >> >
> >>
> >> Download file from some source and then verify it.  Now copyfile it
> >> into my repository of known-good files.
> >>
> >> Admittedly I could link + unlink or rename it there, but I consider
> >> hard links to be rather evil, especially when cow links are available.
> >
> > Rename is the right way to do that as it can't corrupt the data after
> > you have verified it. copyfile can...
> 
> ...copyfile doesn't exist.

Wrong! The underlying NFS and SCSI copy offload protocols are fully
defined at this time, and will constrain any implementation that you may
dream up.

>   I think it would be neat if it couldn't
> corrupt data.

It would also be neat if the moon were made of cheese... The underlying
NFS and SCSI protocols do not guarantee perfect copies; the copy may,
for instance, be interrupted due to external circumstances.

> In any case, this may be a bad idea -- presumably you'd have to fsync
> the file you're copying *from* first to avoid a massive performance
> hit.

You have to do that anyway.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond
 wrote:
> On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
>> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
>>  wrote:
>> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
>> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
>> >> > On 02/21/2013 02:24 PM, Zach Brown wrote:
>> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
>> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
>> >>  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
>> >> >> sendfile64() pretty much already has the right arguments for a
>> >> >> "copyfile", however it would be nice to add a 'flags' parameter: 
>> >> >> the
>> >> >> NFSv4.2 version would use that to specify whether or not to copy 
>> >> >> file
>> >> >> metadata.
>> >> > That would seem to be enough to me and has the advantage that it is 
>> >> > an
>> >> > relatively obvious extension to something that is at least not 
>> >> > totally
>> >> > unknown to developers.
>> >> >
>> >> > Do we need more than that for non-NFS paths I wonder? What does 
>> >> > reflink
>> >> > need or the SCSI mechanism?
>> >>  For virt we would like to be able to specify arbitrary block ranges.
>> >>  Copying an entire file helps some copy operations like storage
>> >>  migration.  However, it is not enough to convert the guest's 
>> >>  offloaded
>> >>  copies to host-side offloaded copies.
>> >> >>> So how would a system call based on sendfile64() plus my flag 
>> >> >>> parameter
>> >> >>> prevent an underlying implementation from meeting your criterion?
>> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
>> >> >> it's missing an out_fd_offset.  The host will want to offload the
>> >> >> guest's copies by calling sendfile on block ranges of a guest disk 
>> >> >> image
>> >> >> file that correspond to the mappings of the in and out files in the
>> >> >> guest.
>> >> >>
>> >> >> You could make it work with some locking and out_fd seeking to set the
>> >> >> write offset before calling sendfile64()+flags, but ugh.
>> >> >>
>> >> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
>> >> >>out_offset, size_t count, int flags);
>> >> >>
>> >> >> That seems closer.
>> >> >>
>> >> >> We might also want to pre-emptively offer iovs instead of offsets,
>> >> >> because that's the very first thing that's going to be requested after
>> >> >> people prototype having to iterate calling sendfile() for each
>> >> >> contiguous copy region.
>> >> > I thought the first thing people would ask for is to atomically create a
>> >> > new file and copy the old file into it (at least on local file systems).
>> >> >   The idea is that nothing should see an empty destination file, either
>> >> > by race or by crash.  (This feature would perhaps be described as a
>> >> > pony, but it should be implementable.)
>> >> >
>> >> > This would be like a better link(2).
>> >> >
>> >> > --Andy
>> >>
>> >> Why would this need to be atomic? That would seem to be a very difficult
>> >> property to provide across all target types with multi-GB sized files...
>> >
>> > Right. It may sound cool, but what's the real-life use case?
>> >
>>
>> Download file from some source and then verify it.  Now copyfile it
>> into my repository of known-good files.
>>
>> Admittedly I could link + unlink or rename it there, but I consider
>> hard links to be rather evil, especially when cow links are available.
>
> Rename is the right way to do that as it can't corrupt the data after
> you have verified it. copyfile can...

...copyfile doesn't exist.  I think it would be neat if it couldn't
corrupt data.

In any case, this may be a bad idea -- presumably you'd have to fsync
the file you're copying *from* first to avoid a massive performance
hit.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
>  wrote:
> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> >> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> >>  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >> >> sendfile64() pretty much already has the right arguments for a
> >> >> "copyfile", however it would be nice to add a 'flags' parameter: the
> >> >> NFSv4.2 version would use that to specify whether or not to copy 
> >> >> file
> >> >> metadata.
> >> > That would seem to be enough to me and has the advantage that it is 
> >> > an
> >> > relatively obvious extension to something that is at least not 
> >> > totally
> >> > unknown to developers.
> >> >
> >> > Do we need more than that for non-NFS paths I wonder? What does 
> >> > reflink
> >> > need or the SCSI mechanism?
> >>  For virt we would like to be able to specify arbitrary block ranges.
> >>  Copying an entire file helps some copy operations like storage
> >>  migration.  However, it is not enough to convert the guest's offloaded
> >>  copies to host-side offloaded copies.
> >> >>> So how would a system call based on sendfile64() plus my flag parameter
> >> >>> prevent an underlying implementation from meeting your criterion?
> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
> >> >> it's missing an out_fd_offset.  The host will want to offload the
> >> >> guest's copies by calling sendfile on block ranges of a guest disk image
> >> >> file that correspond to the mappings of the in and out files in the
> >> >> guest.
> >> >>
> >> >> You could make it work with some locking and out_fd seeking to set the
> >> >> write offset before calling sendfile64()+flags, but ugh.
> >> >>
> >> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >> >>out_offset, size_t count, int flags);
> >> >>
> >> >> That seems closer.
> >> >>
> >> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> >> because that's the very first thing that's going to be requested after
> >> >> people prototype having to iterate calling sendfile() for each
> >> >> contiguous copy region.
> >> > I thought the first thing people would ask for is to atomically create a
> >> > new file and copy the old file into it (at least on local file systems).
> >> >   The idea is that nothing should see an empty destination file, either
> >> > by race or by crash.  (This feature would perhaps be described as a
> >> > pony, but it should be implementable.)
> >> >
> >> > This would be like a better link(2).
> >> >
> >> > --Andy
> >>
> >> Why would this need to be atomic? That would seem to be a very difficult
> >> property to provide across all target types with multi-GB sized files...
> >
> > Right. It may sound cool, but what's the real-life use case?
> >
> 
> Download file from some source and then verify it.  Now copyfile it
> into my repository of known-good files.
> 
> Admittedly I could link + unlink or rename it there, but I consider
> hard links to be rather evil, especially when cow links are available.

Rename is the right way to do that as it can't corrupt the data after
you have verified it. copyfile can...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
 wrote:
> On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
>> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
>> > On 02/21/2013 02:24 PM, Zach Brown wrote:
>> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
>> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
>>  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
>> >> sendfile64() pretty much already has the right arguments for a
>> >> "copyfile", however it would be nice to add a 'flags' parameter: the
>> >> NFSv4.2 version would use that to specify whether or not to copy file
>> >> metadata.
>> > That would seem to be enough to me and has the advantage that it is an
>> > relatively obvious extension to something that is at least not totally
>> > unknown to developers.
>> >
>> > Do we need more than that for non-NFS paths I wonder? What does reflink
>> > need or the SCSI mechanism?
>>  For virt we would like to be able to specify arbitrary block ranges.
>>  Copying an entire file helps some copy operations like storage
>>  migration.  However, it is not enough to convert the guest's offloaded
>>  copies to host-side offloaded copies.
>> >>> So how would a system call based on sendfile64() plus my flag parameter
>> >>> prevent an underlying implementation from meeting your criterion?
>> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
>> >> it's missing an out_fd_offset.  The host will want to offload the
>> >> guest's copies by calling sendfile on block ranges of a guest disk image
>> >> file that correspond to the mappings of the in and out files in the
>> >> guest.
>> >>
>> >> You could make it work with some locking and out_fd seeking to set the
>> >> write offset before calling sendfile64()+flags, but ugh.
>> >>
>> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
>> >>out_offset, size_t count, int flags);
>> >>
>> >> That seems closer.
>> >>
>> >> We might also want to pre-emptively offer iovs instead of offsets,
>> >> because that's the very first thing that's going to be requested after
>> >> people prototype having to iterate calling sendfile() for each
>> >> contiguous copy region.
>> > I thought the first thing people would ask for is to atomically create a
>> > new file and copy the old file into it (at least on local file systems).
>> >   The idea is that nothing should see an empty destination file, either
>> > by race or by crash.  (This feature would perhaps be described as a
>> > pony, but it should be implementable.)
>> >
>> > This would be like a better link(2).
>> >
>> > --Andy
>>
>> Why would this need to be atomic? That would seem to be a very difficult
>> property to provide across all target types with multi-GB sized files...
>
> Right. It may sound cool, but what's the real-life use case?
>

Download file from some source and then verify it.  Now copyfile it
into my repository of known-good files.

Admittedly I could link + unlink or rename it there, but I consider
hard links to be rather evil, especially when cow links are available.


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
>  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >> sendfile64() pretty much already has the right arguments for a
> >> "copyfile", however it would be nice to add a 'flags' parameter: the
> >> NFSv4.2 version would use that to specify whether or not to copy file
> >> metadata.
> > That would seem to be enough to me and has the advantage that it is an
> > relatively obvious extension to something that is at least not totally
> > unknown to developers.
> >
> > Do we need more than that for non-NFS paths I wonder? What does reflink
> > need or the SCSI mechanism?
>  For virt we would like to be able to specify arbitrary block ranges.
>  Copying an entire file helps some copy operations like storage
>  migration.  However, it is not enough to convert the guest's offloaded
>  copies to host-side offloaded copies.
> >>> So how would a system call based on sendfile64() plus my flag parameter
> >>> prevent an underlying implementation from meeting your criterion?
> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
> >> it's missing an out_fd_offset.  The host will want to offload the
> >> guest's copies by calling sendfile on block ranges of a guest disk image
> >> file that correspond to the mappings of the in and out files in the
> >> guest.
> >>
> >> You could make it work with some locking and out_fd seeking to set the
> >> write offset before calling sendfile64()+flags, but ugh.
> >>
> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >>out_offset, size_t count, int flags);
> >>
> >> That seems closer.
> >>
> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> because that's the very first thing that's going to be requested after
> >> people prototype having to iterate calling sendfile() for each
> >> contiguous copy region.
> > I thought the first thing people would ask for is to atomically create a
> > new file and copy the old file into it (at least on local file systems).
> >   The idea is that nothing should see an empty destination file, either
> > by race or by crash.  (This feature would perhaps be described as a
> > pony, but it should be implementable.)
> >
> > This would be like a better link(2).
> >
> > --Andy
> 
> Why would this need to be atomic? That would seem to be a very difficult 
> property to provide across all target types with multi-GB sized files...

Right. It may sound cool, but what's the real-life use case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Ric Wheeler


On 02/25/2013 04:14 PM, Andy Lutomirski wrote:

On 02/21/2013 02:24 PM, Zach Brown wrote:

On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:

On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:

Il 21/02/2013 15:57, Ric Wheeler ha scritto:

sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

That would seem to be enough to me and has the advantage that it is an
relatively obvious extension to something that is at least not totally
unknown to developers.

Do we need more than that for non-NFS paths I wonder? What does reflink
need or the SCSI mechanism?

For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration.  However, it is not enough to convert the guest's offloaded
copies to host-side offloaded copies.

So how would a system call based on sendfile64() plus my flag parameter
prevent an underlying implementation from meeting your criterion?

If I'm guessing correctly, sendfile64()+flags would be annoying because
it's missing an out_fd_offset.  The host will want to offload the
guest's copies by calling sendfile on block ranges of a guest disk image
file that correspond to the mappings of the in and out files in the
guest.

You could make it work with some locking and out_fd seeking to set the
write offset before calling sendfile64()+flags, but ugh.

  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);

That seems closer.

We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested after
people prototype having to iterate calling sendfile() for each
contiguous copy region.

I thought the first thing people would ask for is to atomically create a
new file and copy the old file into it (at least on local file systems).
  The idea is that nothing should see an empty destination file, either
by race or by crash.  (This feature would perhaps be described as a
pony, but it should be implementable.)

This would be like a better link(2).

--Andy


Why would this need to be atomic? That would seem to be a very difficult 
property to provide across all target types with multi-GB sized files...


Ric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 02/21/2013 02:24 PM, Zach Brown wrote:
> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
>>> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
>>
> sendfile64() pretty much already has the right arguments for a
> "copyfile", however it would be nice to add a 'flags' parameter: the
> NFSv4.2 version would use that to specify whether or not to copy file
> metadata.

 That would seem to be enough to me and has the advantage that it is an
 relatively obvious extension to something that is at least not totally
 unknown to developers.

 Do we need more than that for non-NFS paths I wonder? What does reflink
 need or the SCSI mechanism?
>>>
>>> For virt we would like to be able to specify arbitrary block ranges.
>>> Copying an entire file helps some copy operations like storage
>>> migration.  However, it is not enough to convert the guest's offloaded
>>> copies to host-side offloaded copies.
>>
>> So how would a system call based on sendfile64() plus my flag parameter
>> prevent an underlying implementation from meeting your criterion?
> 
> If I'm guessing correctly, sendfile64()+flags would be annoying because
> it's missing an out_fd_offset.  The host will want to offload the
> guest's copies by calling sendfile on block ranges of a guest disk image
> file that correspond to the mappings of the in and out files in the
> guest.
> 
> You could make it work with some locking and out_fd seeking to set the
> write offset before calling sendfile64()+flags, but ugh.
> 
>  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
>   out_offset, size_t count, int flags);
> 
> That seems closer.
> 
> We might also want to pre-emptively offer iovs instead of offsets,
> because that's the very first thing that's going to be requested after
> people prototype having to iterate calling sendfile() for each
> contiguous copy region. 

I thought the first thing people would ask for is to atomically create a
new file and copy the old file into it (at least on local file systems).
 The idea is that nothing should see an empty destination file, either
by race or by crash.  (This feature would perhaps be described as a
pony, but it should be implementable.)

This would be like a better link(2).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 02/21/2013 02:24 PM, Zach Brown wrote:
 On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
 On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
 Il 21/02/2013 15:57, Ric Wheeler ha scritto:

 sendfile64() pretty much already has the right arguments for a
 copyfile, however it would be nice to add a 'flags' parameter: the
 NFSv4.2 version would use that to specify whether or not to copy file
 metadata.

 That would seem to be enough to me and has the advantage that it is an
 relatively obvious extension to something that is at least not totally
 unknown to developers.

 Do we need more than that for non-NFS paths I wonder? What does reflink
 need or the SCSI mechanism?

 For virt we would like to be able to specify arbitrary block ranges.
 Copying an entire file helps some copy operations like storage
 migration.  However, it is not enough to convert the guest's offloaded
 copies to host-side offloaded copies.

 So how would a system call based on sendfile64() plus my flag parameter
 prevent an underlying implementation from meeting your criterion?
 
 If I'm guessing correctly, sendfile64()+flags would be annoying because
 it's missing an out_fd_offset.  The host will want to offload the
 guest's copies by calling sendfile on block ranges of a guest disk image
 file that correspond to the mappings of the in and out files in the
 guest.
 
 You could make it work with some locking and out_fd seeking to set the
 write offset before calling sendfile64()+flags, but ugh.
 
  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);
 
 That seems closer.
 
 We might also want to pre-emptively offer iovs instead of offsets,
 because that's the very first thing that's going to be requested after
 people prototype having to iterate calling sendfile() for each
 contiguous copy region. 

I thought the first thing people would ask for is to atomically create a
new file and copy the old file into it (at least on local file systems).
 The idea is that nothing should see an empty destination file, either
by race or by crash.  (This feature would perhaps be described as a
pony, but it should be implementable.)

This would be like a better link(2).

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Ric Wheeler


On 02/25/2013 04:14 PM, Andy Lutomirski wrote:

On 02/21/2013 02:24 PM, Zach Brown wrote:

On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:

On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:

Il 21/02/2013 15:57, Ric Wheeler ha scritto:

sendfile64() pretty much already has the right arguments for a
copyfile, however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

That would seem to be enough to me and has the advantage that it is an
relatively obvious extension to something that is at least not totally
unknown to developers.

Do we need more than that for non-NFS paths I wonder? What does reflink
need or the SCSI mechanism?

For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration.  However, it is not enough to convert the guest's offloaded
copies to host-side offloaded copies.

So how would a system call based on sendfile64() plus my flag parameter
prevent an underlying implementation from meeting your criterion?

If I'm guessing correctly, sendfile64()+flags would be annoying because
it's missing an out_fd_offset.  The host will want to offload the
guest's copies by calling sendfile on block ranges of a guest disk image
file that correspond to the mappings of the in and out files in the
guest.

You could make it work with some locking and out_fd seeking to set the
write offset before calling sendfile64()+flags, but ugh.

  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);

That seems closer.

We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested after
people prototype having to iterate calling sendfile() for each
contiguous copy region.

I thought the first thing people would ask for is to atomically create a
new file and copy the old file into it (at least on local file systems).
  The idea is that nothing should see an empty destination file, either
by race or by crash.  (This feature would perhaps be described as a
pony, but it should be implementable.)

This would be like a better link(2).

--Andy


Why would this need to be atomic? That would seem to be a very difficult 
property to provide across all target types with multi-GB sized files...


Ric


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
 On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
  On 02/21/2013 02:24 PM, Zach Brown wrote:
  On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
  On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
  sendfile64() pretty much already has the right arguments for a
  copyfile, however it would be nice to add a 'flags' parameter: the
  NFSv4.2 version would use that to specify whether or not to copy file
  metadata.
  That would seem to be enough to me and has the advantage that it is an
  relatively obvious extension to something that is at least not totally
  unknown to developers.
 
  Do we need more than that for non-NFS paths I wonder? What does reflink
  need or the SCSI mechanism?
  For virt we would like to be able to specify arbitrary block ranges.
  Copying an entire file helps some copy operations like storage
  migration.  However, it is not enough to convert the guest's offloaded
  copies to host-side offloaded copies.
  So how would a system call based on sendfile64() plus my flag parameter
  prevent an underlying implementation from meeting your criterion?
  If I'm guessing correctly, sendfile64()+flags would be annoying because
  it's missing an out_fd_offset.  The host will want to offload the
  guest's copies by calling sendfile on block ranges of a guest disk image
  file that correspond to the mappings of the in and out files in the
  guest.
 
  You could make it work with some locking and out_fd seeking to set the
  write offset before calling sendfile64()+flags, but ugh.
 
ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
 out_offset, size_t count, int flags);
 
  That seems closer.
 
  We might also want to pre-emptively offer iovs instead of offsets,
  because that's the very first thing that's going to be requested after
  people prototype having to iterate calling sendfile() for each
  contiguous copy region.
  I thought the first thing people would ask for is to atomically create a
  new file and copy the old file into it (at least on local file systems).
The idea is that nothing should see an empty destination file, either
  by race or by crash.  (This feature would perhaps be described as a
  pony, but it should be implementable.)
 
  This would be like a better link(2).
 
  --Andy
 
 Why would this need to be atomic? That would seem to be a very difficult 
 property to provide across all target types with multi-GB sized files...

Right. It may sound cool, but what's the real-life use case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
trond.mykleb...@netapp.com wrote:
 On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
 On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
  On 02/21/2013 02:24 PM, Zach Brown wrote:
  On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
  On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
  Il 21/02/2013 15:57, Ric Wheeler ha scritto:
  sendfile64() pretty much already has the right arguments for a
  copyfile, however it would be nice to add a 'flags' parameter: the
  NFSv4.2 version would use that to specify whether or not to copy file
  metadata.
  That would seem to be enough to me and has the advantage that it is an
  relatively obvious extension to something that is at least not totally
  unknown to developers.
 
  Do we need more than that for non-NFS paths I wonder? What does reflink
  need or the SCSI mechanism?
  For virt we would like to be able to specify arbitrary block ranges.
  Copying an entire file helps some copy operations like storage
  migration.  However, it is not enough to convert the guest's offloaded
  copies to host-side offloaded copies.
  So how would a system call based on sendfile64() plus my flag parameter
  prevent an underlying implementation from meeting your criterion?
  If I'm guessing correctly, sendfile64()+flags would be annoying because
  it's missing an out_fd_offset.  The host will want to offload the
  guest's copies by calling sendfile on block ranges of a guest disk image
  file that correspond to the mappings of the in and out files in the
  guest.
 
  You could make it work with some locking and out_fd seeking to set the
  write offset before calling sendfile64()+flags, but ugh.
 
ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
 out_offset, size_t count, int flags);
 
  That seems closer.
 
  We might also want to pre-emptively offer iovs instead of offsets,
  because that's the very first thing that's going to be requested after
  people prototype having to iterate calling sendfile() for each
  contiguous copy region.
  I thought the first thing people would ask for is to atomically create a
  new file and copy the old file into it (at least on local file systems).
The idea is that nothing should see an empty destination file, either
  by race or by crash.  (This feature would perhaps be described as a
  pony, but it should be implementable.)
 
  This would be like a better link(2).
 
  --Andy

 Why would this need to be atomic? That would seem to be a very difficult
 property to provide across all target types with multi-GB sized files...

 Right. It may sound cool, but what's the real-life use case?


Download file from some source and then verify it.  Now copyfile it
into my repository of known-good files.

Admittedly I could link + unlink or rename it there, but I consider
hard links to be rather evil, especially when cow links are available.


--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
 On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
 trond.mykleb...@netapp.com wrote:
  On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
  On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
   On 02/21/2013 02:24 PM, Zach Brown wrote:
   On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
   On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
   Il 21/02/2013 15:57, Ric Wheeler ha scritto:
   sendfile64() pretty much already has the right arguments for a
   copyfile, however it would be nice to add a 'flags' parameter: the
   NFSv4.2 version would use that to specify whether or not to copy 
   file
   metadata.
   That would seem to be enough to me and has the advantage that it is 
   an
   relatively obvious extension to something that is at least not 
   totally
   unknown to developers.
  
   Do we need more than that for non-NFS paths I wonder? What does 
   reflink
   need or the SCSI mechanism?
   For virt we would like to be able to specify arbitrary block ranges.
   Copying an entire file helps some copy operations like storage
   migration.  However, it is not enough to convert the guest's offloaded
   copies to host-side offloaded copies.
   So how would a system call based on sendfile64() plus my flag parameter
   prevent an underlying implementation from meeting your criterion?
   If I'm guessing correctly, sendfile64()+flags would be annoying because
   it's missing an out_fd_offset.  The host will want to offload the
   guest's copies by calling sendfile on block ranges of a guest disk image
   file that correspond to the mappings of the in and out files in the
   guest.
  
   You could make it work with some locking and out_fd seeking to set the
   write offset before calling sendfile64()+flags, but ugh.
  
 ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
  out_offset, size_t count, int flags);
  
   That seems closer.
  
   We might also want to pre-emptively offer iovs instead of offsets,
   because that's the very first thing that's going to be requested after
   people prototype having to iterate calling sendfile() for each
   contiguous copy region.
   I thought the first thing people would ask for is to atomically create a
   new file and copy the old file into it (at least on local file systems).
 The idea is that nothing should see an empty destination file, either
   by race or by crash.  (This feature would perhaps be described as a
   pony, but it should be implementable.)
  
   This would be like a better link(2).
  
   --Andy
 
  Why would this need to be atomic? That would seem to be a very difficult
  property to provide across all target types with multi-GB sized files...
 
  Right. It may sound cool, but what's the real-life use case?
 
 
 Download file from some source and then verify it.  Now copyfile it
 into my repository of known-good files.
 
 Admittedly I could link + unlink or rename it there, but I consider
 hard links to be rather evil, especially when cow links are available.

Rename is the right way to do that as it can't corrupt the data after
you have verified it. copyfile can...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond
trond.mykleb...@netapp.com wrote:
 On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
 On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
 trond.mykleb...@netapp.com wrote:
  On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
  On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
   On 02/21/2013 02:24 PM, Zach Brown wrote:
   On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
   On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
   Il 21/02/2013 15:57, Ric Wheeler ha scritto:
   sendfile64() pretty much already has the right arguments for a
   copyfile, however it would be nice to add a 'flags' parameter: 
   the
   NFSv4.2 version would use that to specify whether or not to copy 
   file
   metadata.
   That would seem to be enough to me and has the advantage that it is 
   an
   relatively obvious extension to something that is at least not 
   totally
   unknown to developers.
  
   Do we need more than that for non-NFS paths I wonder? What does 
   reflink
   need or the SCSI mechanism?
   For virt we would like to be able to specify arbitrary block ranges.
   Copying an entire file helps some copy operations like storage
   migration.  However, it is not enough to convert the guest's 
   offloaded
   copies to host-side offloaded copies.
   So how would a system call based on sendfile64() plus my flag 
   parameter
   prevent an underlying implementation from meeting your criterion?
   If I'm guessing correctly, sendfile64()+flags would be annoying because
   it's missing an out_fd_offset.  The host will want to offload the
   guest's copies by calling sendfile on block ranges of a guest disk 
   image
   file that correspond to the mappings of the in and out files in the
   guest.
  
   You could make it work with some locking and out_fd seeking to set the
   write offset before calling sendfile64()+flags, but ugh.
  
 ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
  out_offset, size_t count, int flags);
  
   That seems closer.
  
   We might also want to pre-emptively offer iovs instead of offsets,
   because that's the very first thing that's going to be requested after
   people prototype having to iterate calling sendfile() for each
   contiguous copy region.
   I thought the first thing people would ask for is to atomically create a
   new file and copy the old file into it (at least on local file systems).
 The idea is that nothing should see an empty destination file, either
   by race or by crash.  (This feature would perhaps be described as a
   pony, but it should be implementable.)
  
   This would be like a better link(2).
  
   --Andy
 
  Why would this need to be atomic? That would seem to be a very difficult
  property to provide across all target types with multi-GB sized files...
 
  Right. It may sound cool, but what's the real-life use case?
 

 Download file from some source and then verify it.  Now copyfile it
 into my repository of known-good files.

 Admittedly I could link + unlink or rename it there, but I consider
 hard links to be rather evil, especially when cow links are available.

 Rename is the right way to do that as it can't corrupt the data after
 you have verified it. copyfile can...

...copyfile doesn't exist.  I think it would be neat if it couldn't
corrupt data.

In any case, this may be a bad idea -- presumably you'd have to fsync
the file you're copying *from* first to avoid a massive performance
hit.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Mon, 2013-02-25 at 15:35 -0800, Andy Lutomirski wrote:
 On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond
 trond.mykleb...@netapp.com wrote:
  On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
  On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
  trond.mykleb...@netapp.com wrote:
   On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
   On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
On 02/21/2013 02:24 PM, Zach Brown wrote:
On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
Il 21/02/2013 15:57, Ric Wheeler ha scritto:
sendfile64() pretty much already has the right arguments for a
copyfile, however it would be nice to add a 'flags' parameter: 
the
NFSv4.2 version would use that to specify whether or not to copy 
file
metadata.
That would seem to be enough to me and has the advantage that it 
is an
relatively obvious extension to something that is at least not 
totally
unknown to developers.
   
Do we need more than that for non-NFS paths I wonder? What does 
reflink
need or the SCSI mechanism?
For virt we would like to be able to specify arbitrary block 
ranges.
Copying an entire file helps some copy operations like storage
migration.  However, it is not enough to convert the guest's 
offloaded
copies to host-side offloaded copies.
So how would a system call based on sendfile64() plus my flag 
parameter
prevent an underlying implementation from meeting your criterion?
If I'm guessing correctly, sendfile64()+flags would be annoying 
because
it's missing an out_fd_offset.  The host will want to offload the
guest's copies by calling sendfile on block ranges of a guest disk 
image
file that correspond to the mappings of the in and out files in the
guest.
   
You could make it work with some locking and out_fd seeking to set 
the
write offset before calling sendfile64()+flags, but ugh.
   
  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);
   
That seems closer.
   
We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested 
after
people prototype having to iterate calling sendfile() for each
contiguous copy region.
I thought the first thing people would ask for is to atomically 
create a
new file and copy the old file into it (at least on local file 
systems).
  The idea is that nothing should see an empty destination file, 
either
by race or by crash.  (This feature would perhaps be described as a
pony, but it should be implementable.)
   
This would be like a better link(2).
   
--Andy
  
   Why would this need to be atomic? That would seem to be a very difficult
   property to provide across all target types with multi-GB sized files...
  
   Right. It may sound cool, but what's the real-life use case?
  
 
  Download file from some source and then verify it.  Now copyfile it
  into my repository of known-good files.
 
  Admittedly I could link + unlink or rename it there, but I consider
  hard links to be rather evil, especially when cow links are available.
 
  Rename is the right way to do that as it can't corrupt the data after
  you have verified it. copyfile can...
 
 ...copyfile doesn't exist.

Wrong! The underlying NFS and SCSI copy offload protocols are fully
defined at this time, and will constrain any implementation that you may
dream up.

   I think it would be neat if it couldn't
 corrupt data.

It would also be neat if the moon were made of cheese... The underlying
NFS and SCSI protocols do not guarantee perfect copies; the copy may,
for instance, be interrupted due to external circumstances.

 In any case, this may be a bad idea -- presumably you'd have to fsync
 the file you're copying *from* first to avoid a massive performance
 hit.

You have to do that anyway.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Zach Brown

I think it would be neat if it couldn't
  corrupt data.
 
 It would also be neat if the moon were made of cheese...

And there we have the lsf2013 t-shirt slogan.  I think we're done here!

- z
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Eric Wong

"Myklebust, Trond"  wrote:
> > -Original Message-
> > From: Zach Brown [mailto:z...@redhat.com]
> > Sent: Thursday, February 21, 2013 5:25 PM
> > To: Myklebust, Trond
> > Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; 
> > linux-kernel@vger.kernel.org;
> > Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen;
> > Hannes Reinecke; Joel Becker
> > Subject: Re: New copyfile system call - discuss before LSF?
> > 
> > On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> > > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> > > > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> > > > >>>
> > > > >> sendfile64() pretty much already has the right arguments for a
> > > > >> "copyfile", however it would be nice to add a 'flags' parameter:
> > > > >> the
> > > > >> NFSv4.2 version would use that to specify whether or not to copy
> > > > >> file metadata.
> > > > >
> > > > > That would seem to be enough to me and has the advantage that it
> > > > > is an relatively obvious extension to something that is at least
> > > > > not totally unknown to developers.
> > > > >
> > > > > Do we need more than that for non-NFS paths I wonder? What does
> > > > > reflink need or the SCSI mechanism?
> > > >
> > > > For virt we would like to be able to specify arbitrary block ranges.
> > > > Copying an entire file helps some copy operations like storage
> > > > migration.  However, it is not enough to convert the guest's
> > > > offloaded copies to host-side offloaded copies.
> > >
> > > So how would a system call based on sendfile64() plus my flag
> > > parameter prevent an underlying implementation from meeting your
> > criterion?
> > 
> > If I'm guessing correctly, sendfile64()+flags would be annoying because it's
> > missing an out_fd_offset.  The host will want to offload the guest's copies 
> > by
> > calling sendfile on block ranges of a guest disk image file that correspond 
> > to
> > the mappings of the in and out files in the guest.
> > 
> > You could make it work with some locking and out_fd seeking to set the
> > write offset before calling sendfile64()+flags, but ugh.
> > 
> >  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >   out_offset, size_t count, int flags);
> > 
> > That seems closer.
> 
> psendfile() ?
> 
> I fully agree that sounds reasonable... Just being an ass. :-)

splice() already has offset for both fds and a flags arg:

   ssize_t splice(int fd_in, loff_t *off_in, int fd_out,
  loff_t *off_out, size_t len, unsigned int flags);

The current downside is it requires one fd to be a pipe, so it's
just not very easy to use from my perspective[1].

> > We might also want to pre-emptively offer iovs instead of offsets, because
> > that's the very first thing that's going to be requested after people 
> > prototype
> > having to iterate calling sendfile() for each contiguous copy region.
> 
> vpsendfile() then? I agree that might be a little more future-proof. 
> Particularly given that the underlying protocols tend to be fully 
> asynchronous, and so it makes sense to queue up more than one copy at a 
> time...

splicev() might be nice to have in that case, too.



[1] my splice() annoyances:
* need to create/manage a pipe
* copy size limited by pipe size
* doesn't reduce userspace syscalls (just data copy overhead)
* easy to misuse and starve with blocking sockets + big buffers
* not many users, so bugs creep in (v3.7.8 was the first usable
  version of the 3.7 series for TCP sockets)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: New copyfile system call - discuss before LSF?

2013-02-22 Thread Myklebust, Trond

> -Original Message-
> From: Zach Brown [mailto:z...@redhat.com]
> Sent: Friday, February 22, 2013 1:22 PM
> To: Ric Wheeler
> Cc: Paolo Bonzini; Myklebust, Trond; Linux FS Devel; linux-
> ker...@vger.kernel.org; Chris L. Mason; Christoph Hellwig; Alexander Viro;
> Martin K. Petersen; Hannes Reinecke; Joel Becker
> Subject: Re: New copyfile system call - discuss before LSF?
> 
> > This seems to be suspiciously close to a clear consensus on how to
> > move forward after many years of spinning our wheels. Anyone want to
> > promote an actual patch before we change our collective minds?
> 
> It seems like we'd want to start with the exisiting (presumably
> bitrotten) prototypes that Trond has for nfs and that Martin has for
> block->scsi.  Mash the new syscall on top of and get them working in
> current mainline.
> 
> I'd be happy to take responsibility for making forward progress if no one else
> has the bandwidth.
> 
> Trond, Martin, would that make sense?  Are the most recent versions of the
> prototypes available somewhere?

Hi Zach,

The wildly bitrotten NFS copyfile prototype can be found on


ftp://ftp.netapp.com/frm-ntap/opensource/linux_copyfileat/v2/linux_copyfileat_v2.tgz

Please open with extreme caution and apply the resulting patches to a Linux 
2.6.34.2 kernel...

Cheers
   Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Zach Brown

> This seems to be suspiciously close to a clear consensus on how to
> move forward after many years of spinning our wheels. Anyone want to
> promote an actual patch before we change our collective minds?

It seems like we'd want to start with the exisiting (presumably
bitrotten) prototypes that Trond has for nfs and that Martin has for
block->scsi.  Mash the new syscall on top of and get them working in
current mainline.

I'd be happy to take responsibility for making forward progress if no
one else has the bandwidth.

Trond, Martin, would that make sense?  Are the most recent versions of
the prototypes available somewhere?

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?


On 02/22/2013 10:47 AM, Paolo Bonzini wrote:

Il 21/02/2013 23:24, Zach Brown ha scritto:

You could make it work with some locking and out_fd seeking to set the
write offset before calling sendfile64()+flags, but ugh.

  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);

That seems closer.

We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested after
people prototype having to iterate calling sendfile() for each
contiguous copy region.

Indeed, I was about to propose that exactly.  So that would be
psendfilev.  I don't think psendfile is useful, and can be easily
provided at the libc level.

Paolo


This seems to be suspiciously close to a clear consensus on how to move forward 
after many years of spinning our wheels. Anyone want to promote an actual patch 
before we change our collective minds?


Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Paolo Bonzini

Il 21/02/2013 23:24, Zach Brown ha scritto:
> You could make it work with some locking and out_fd seeking to set the
> write offset before calling sendfile64()+flags, but ugh.
> 
>  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
>   out_offset, size_t count, int flags);
> 
> That seems closer.
> 
> We might also want to pre-emptively offer iovs instead of offsets,
> because that's the very first thing that's going to be requested after
> people prototype having to iterate calling sendfile() for each
> contiguous copy region. 

Indeed, I was about to propose that exactly.  So that would be
psendfilev.  I don't think psendfile is useful, and can be easily
provided at the libc level.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 02/21/2013 11:13 PM, Myklebust, Trond wrote:

On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote:

On 02/21/2013 09:00 PM, Paolo Bonzini wrote:

Il 21/02/2013 15:57, Ric Wheeler ha scritto:

sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

That would seem to be enough to me and has the advantage that it is an
relatively obvious extension to something that is at least not totally
unknown to developers.

Do we need more than that for non-NFS paths I wonder? What does reflink
need or the SCSI mechanism?

For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration. However, it is not enough to convert the guest's offloaded
copies to host-side offloaded copies.

Paolo

I don't think that the NFS protocol allows arbitrary ranges, but the SCSI
commands are ranged based.

If I remember what the windows people said at a SNIA event a few years back,
they have a requirement that the target file be pre-allocated (at least for the
SCSI based copy). Not clear to me where they iterate over that target file to do
the block range copies, but I suspect it is in their kernel.

The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary
byte ranges. The main target for that functionality is indeed
virtualisation and thin provisioning of virtual machines.

For background, here is a pointer to Fred Knight's SNIA talk on the SCSI support
for offload:

https://snia.org/sites/default/files2/SDC2011/presentations/monday/FrederickKnight_Storage_Data_Movement_Offload.pdf

and a talk from Spencer Shepler that gives some detail on the NFS spec,
including the "server side copy" bits:

https://snia.org/sites/default/files2/SDC2011/presentations/wednesday/SpencerShepler_IETF_NFSv4_Working_Group_v4.pdf

The talks both have references to the actual specs for the gory details.

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On 02/21/2013 11:13 PM, Myklebust, Trond wrote:

On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote:

On 02/21/2013 09:00 PM, Paolo Bonzini wrote:

Il 21/02/2013 15:57, Ric Wheeler ha scritto:

sendfile64() pretty much already has the right arguments for a
copyfile, however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

That would seem to be enough to me and has the advantage that it is an
relatively obvious extension to something that is at least not totally
unknown to developers.

Do we need more than that for non-NFS paths I wonder? What does reflink
need or the SCSI mechanism?

Paolo

I don't think that the NFS protocol allows arbitrary ranges, but the SCSI
commands are ranged based.

The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary
byte ranges. The main target for that functionality is indeed
virtualisation and thin provisioning of virtual machines.

For background, here is a pointer to Fred Knight's SNIA talk on the SCSI support
for offload:

https://snia.org/sites/default/files2/SDC2011/presentations/monday/FrederickKnight_Storage_Data_Movement_Offload.pdf

and a talk from Spencer Shepler that gives some detail on the NFS spec,
including the server side copy bits:

https://snia.org/sites/default/files2/SDC2011/presentations/wednesday/SpencerShepler_IETF_NFSv4_Working_Group_v4.pdf

The talks both have references to the actual specs for the gory details.

Ric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Paolo Bonzini

Il 21/02/2013 23:24, Zach Brown ha scritto:
 You could make it work with some locking and out_fd seeking to set the
 write offset before calling sendfile64()+flags, but ugh.
 
  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);
 
 That seems closer.
 
 We might also want to pre-emptively offer iovs instead of offsets,
 because that's the very first thing that's going to be requested after
 people prototype having to iterate calling sendfile() for each
 contiguous copy region. 

Indeed, I was about to propose that exactly.  So that would be
psendfilev.  I don't think psendfile is useful, and can be easily
provided at the libc level.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?


On 02/22/2013 10:47 AM, Paolo Bonzini wrote:

Il 21/02/2013 23:24, Zach Brown ha scritto:

You could make it work with some locking and out_fd seeking to set the
write offset before calling sendfile64()+flags, but ugh.

  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
   out_offset, size_t count, int flags);

That seems closer.

We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested after
people prototype having to iterate calling sendfile() for each
contiguous copy region.

Indeed, I was about to propose that exactly.  So that would be
psendfilev.  I don't think psendfile is useful, and can be easily
provided at the libc level.

Paolo


This seems to be suspiciously close to a clear consensus on how to move forward 
after many years of spinning our wheels. Anyone want to promote an actual patch 
before we change our collective minds?


Ric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Zach Brown

 This seems to be suspiciously close to a clear consensus on how to
 move forward after many years of spinning our wheels. Anyone want to
 promote an actual patch before we change our collective minds?

It seems like we'd want to start with the exisiting (presumably
bitrotten) prototypes that Trond has for nfs and that Martin has for
block-scsi.  Mash the new syscall on top of and get them working in
current mainline.

I'd be happy to take responsibility for making forward progress if no
one else has the bandwidth.

Trond, Martin, would that make sense?  Are the most recent versions of
the prototypes available somewhere?

- z
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: New copyfile system call - discuss before LSF?

2013-02-22 Thread Myklebust, Trond

 -Original Message-
 From: Zach Brown [mailto:z...@redhat.com]
 Sent: Friday, February 22, 2013 1:22 PM
 To: Ric Wheeler
 Cc: Paolo Bonzini; Myklebust, Trond; Linux FS Devel; linux-
 ker...@vger.kernel.org; Chris L. Mason; Christoph Hellwig; Alexander Viro;
 Martin K. Petersen; Hannes Reinecke; Joel Becker
 Subject: Re: New copyfile system call - discuss before LSF?

  This seems to be suspiciously close to a clear consensus on how to
  move forward after many years of spinning our wheels. Anyone want to
  promote an actual patch before we change our collective minds?

 It seems like we'd want to start with the exisiting (presumably
 bitrotten) prototypes that Trond has for nfs and that Martin has for
 block-scsi.  Mash the new syscall on top of and get them working in
 current mainline.

 I'd be happy to take responsibility for making forward progress if no one else
 has the bandwidth.

 Trond, Martin, would that make sense?  Are the most recent versions of the
 prototypes available somewhere?

Hi Zach,

The wildly bitrotten NFS copyfile prototype can be found on

ftp://ftp.netapp.com/frm-ntap/opensource/linux_copyfileat/v2/linux_copyfileat_v2.tgz

Please open with extreme caution and apply the resulting patches to a Linux 
2.6.34.2 kernel...

Cheers
   Trond
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Eric Wong

Myklebust, Trond trond.mykleb...@netapp.com wrote:
  -Original Message-
  From: Zach Brown [mailto:z...@redhat.com]
  Sent: Thursday, February 21, 2013 5:25 PM
  To: Myklebust, Trond
  Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; 
  linux-kernel@vger.kernel.org;
  Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen;
  Hannes Reinecke; Joel Becker
  Subject: Re: New copyfile system call - discuss before LSF?

  On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
   On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
Il 21/02/2013 15:57, Ric Wheeler ha scritto:

 sendfile64() pretty much already has the right arguments for a
 copyfile, however it would be nice to add a 'flags' parameter:
 the
 NFSv4.2 version would use that to specify whether or not to copy
 file metadata.

 That would seem to be enough to me and has the advantage that it
 is an relatively obvious extension to something that is at least
 not totally unknown to developers.

 Do we need more than that for non-NFS paths I wonder? What does
 reflink need or the SCSI mechanism?

For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration.  However, it is not enough to convert the guest's
offloaded copies to host-side offloaded copies.

   So how would a system call based on sendfile64() plus my flag
   parameter prevent an underlying implementation from meeting your
  criterion?

  If I'm guessing correctly, sendfile64()+flags would be annoying because it's
  missing an out_fd_offset.  The host will want to offload the guest's copies 
  by
  calling sendfile on block ranges of a guest disk image file that correspond 
  to
  the mappings of the in and out files in the guest.

  You could make it work with some locking and out_fd seeking to set the
  write offset before calling sendfile64()+flags, but ugh.

   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
out_offset, size_t count, int flags);

  That seems closer.

 psendfile() ?

 I fully agree that sounds reasonable... Just being an ass. :-)

splice() already has offset for both fds and a flags arg:

   ssize_t splice(int fd_in, loff_t *off_in, int fd_out,
  loff_t *off_out, size_t len, unsigned int flags);

The current downside is it requires one fd to be a pipe, so it's
just not very easy to use from my perspective[1].

  We might also want to pre-emptively offer iovs instead of offsets, because
  that's the very first thing that's going to be requested after people 
  prototype
  having to iterate calling sendfile() for each contiguous copy region.

 vpsendfile() then? I agree that might be a little more future-proof. 
 Particularly given that the underlying protocols tend to be fully 
 asynchronous, and so it makes sense to queue up more than one copy at a 
 time...

splicev() might be nice to have in that case, too.

[1] my splice() annoyances:
* need to create/manage a pipe
* copy size limited by pipe size
* doesn't reduce userspace syscalls (just data copy overhead)
* easy to misuse and starve with blocking sockets + big buffers
* not many users, so bugs creep in (v3.7.8 was the first usable
  version of the 3.7 series for TCP sockets)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: New copyfile system call - discuss before LSF?

> -Original Message-
> From: Zach Brown [mailto:z...@redhat.com]
> Sent: Thursday, February 21, 2013 5:25 PM
> To: Myklebust, Trond
> Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; linux-kernel@vger.kernel.org;
> Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen;
> Hannes Reinecke; Joel Becker
> Subject: Re: New copyfile system call - discuss before LSF?
> 
> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> > > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> > > >>>
> > > >> sendfile64() pretty much already has the right arguments for a
> > > >> "copyfile", however it would be nice to add a 'flags' parameter:
> > > >> the
> > > >> NFSv4.2 version would use that to specify whether or not to copy
> > > >> file metadata.
> > > >
> > > > That would seem to be enough to me and has the advantage that it
> > > > is an relatively obvious extension to something that is at least
> > > > not totally unknown to developers.
> > > >
> > > > Do we need more than that for non-NFS paths I wonder? What does
> > > > reflink need or the SCSI mechanism?
> > >
> > > For virt we would like to be able to specify arbitrary block ranges.
> > > Copying an entire file helps some copy operations like storage
> > > migration.  However, it is not enough to convert the guest's
> > > offloaded copies to host-side offloaded copies.
> >
> > So how would a system call based on sendfile64() plus my flag
> > parameter prevent an underlying implementation from meeting your
> criterion?
> 
> If I'm guessing correctly, sendfile64()+flags would be annoying because it's
> missing an out_fd_offset.  The host will want to offload the guest's copies by
> calling sendfile on block ranges of a guest disk image file that correspond to
> the mappings of the in and out files in the guest.
> 
> You could make it work with some locking and out_fd seeking to set the
> write offset before calling sendfile64()+flags, but ugh.
> 
>  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
>   out_offset, size_t count, int flags);
> 
> That seems closer.

psendfile() ?

I fully agree that sounds reasonable... Just being an ass. :-)

> We might also want to pre-emptively offer iovs instead of offsets, because
> that's the very first thing that's going to be requested after people 
> prototype
> having to iterate calling sendfile() for each contiguous copy region.

vpsendfile() then? I agree that might be a little more future-proof. 
Particularly given that the underlying protocols tend to be fully asynchronous, 
and so it makes sense to queue up more than one copy at a time...

Cheers,
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Eric Wong

Jeremy Allison  wrote:
> On Thu, Feb 21, 2013 at 01:51:53PM +, Myklebust, Trond wrote:
> > On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:
> > > We have debated the need to have a system call to allow for offloading 
> > > copy 
> > > operations, for example to an NFS server (part to the new NFS 4.2 
> > > specification), SCSI target device (two different SCSI commands do this), 
> > > local 
> > > file systems (reflink, etc) and I suspect many other possible parts of 
> > > the stack 
> > > could implement this.
> > 
> > sendfile64() pretty much already has the right arguments for a
> > "copyfile", however it would be nice to add a 'flags' parameter: the
> > NFSv4.2 version would use that to specify whether or not to copy file
> > metadata.
> 
> What would be really nice is if sendfile allowed zero-copy
> from network socket to a file descriptor. That would help
> a *lot* of my small system OEMs (and no splice() just doesn't
> cut it :-).

I've often wish the pipe requirement of splice() could be dropped,
to allow copying between arbitrary FDs.  Perhaps this can be done?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Zach Brown

On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> > >>>
> > >> sendfile64() pretty much already has the right arguments for a
> > >> "copyfile", however it would be nice to add a 'flags' parameter: the
> > >> NFSv4.2 version would use that to specify whether or not to copy file
> > >> metadata.
> > > 
> > > That would seem to be enough to me and has the advantage that it is an
> > > relatively obvious extension to something that is at least not totally
> > > unknown to developers.
> > > 
> > > Do we need more than that for non-NFS paths I wonder? What does reflink
> > > need or the SCSI mechanism?
> > 
> > For virt we would like to be able to specify arbitrary block ranges.
> > Copying an entire file helps some copy operations like storage
> > migration.  However, it is not enough to convert the guest's offloaded
> > copies to host-side offloaded copies.
> 
> So how would a system call based on sendfile64() plus my flag parameter
> prevent an underlying implementation from meeting your criterion?

If I'm guessing correctly, sendfile64()+flags would be annoying because
it's missing an out_fd_offset.  The host will want to offload the
guest's copies by calling sendfile on block ranges of a guest disk image
file that correspond to the mappings of the in and out files in the
guest.

You could make it work with some locking and out_fd seeking to set the
write offset before calling sendfile64()+flags, but ugh.

 ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
  out_offset, size_t count, int flags);

That seems closer.

We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested after
people prototype having to iterate calling sendfile() for each
contiguous copy region. 

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote:
> On 02/21/2013 09:00 PM, Paolo Bonzini wrote:
> > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >>> sendfile64() pretty much already has the right arguments for a
> >>> "copyfile", however it would be nice to add a 'flags' parameter: the
> >>> NFSv4.2 version would use that to specify whether or not to copy file
> >>> metadata.
> >> That would seem to be enough to me and has the advantage that it is an
> >> relatively obvious extension to something that is at least not totally
> >> unknown to developers.
> >>
> >> Do we need more than that for non-NFS paths I wonder? What does reflink
> >> need or the SCSI mechanism?
> > For virt we would like to be able to specify arbitrary block ranges.
> > Copying an entire file helps some copy operations like storage
> > migration.  However, it is not enough to convert the guest's offloaded
> > copies to host-side offloaded copies.
> >
> > Paolo
> 
> I don't think that the NFS protocol allows arbitrary ranges, but the SCSI 
> commands are ranged based.
> 
> If I remember what the windows people said at a SNIA event a few years back, 
> they have a requirement that the target file be pre-allocated (at least for 
> the 
> SCSI based copy). Not clear to me where they iterate over that target file to 
> do 
> the block range copies, but I suspect it is in their kernel.

The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary
byte ranges. The main target for that functionality is indeed
virtualisation and thin provisioning of virtual machines.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Ric Wheeler


On 02/21/2013 09:00 PM, Paolo Bonzini wrote:

Il 21/02/2013 15:57, Ric Wheeler ha scritto:

sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

That would seem to be enough to me and has the advantage that it is an
relatively obvious extension to something that is at least not totally
unknown to developers.

Do we need more than that for non-NFS paths I wonder? What does reflink
need or the SCSI mechanism?

For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration.  However, it is not enough to convert the guest's offloaded
copies to host-side offloaded copies.

Paolo


I don't think that the NFS protocol allows arbitrary ranges, but the SCSI 
commands are ranged based.


If I remember what the windows people said at a SNIA event a few years back, 
they have a requirement that the target file be pre-allocated (at least for the 
SCSI based copy). Not clear to me where they iterate over that target file to do 
the block range copies, but I suspect it is in their kernel.


Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >>>
> >> sendfile64() pretty much already has the right arguments for a
> >> "copyfile", however it would be nice to add a 'flags' parameter: the
> >> NFSv4.2 version would use that to specify whether or not to copy file
> >> metadata.
> > 
> > That would seem to be enough to me and has the advantage that it is an
> > relatively obvious extension to something that is at least not totally
> > unknown to developers.
> > 
> > Do we need more than that for non-NFS paths I wonder? What does reflink
> > need or the SCSI mechanism?
> 
> For virt we would like to be able to specify arbitrary block ranges.
> Copying an entire file helps some copy operations like storage
> migration.  However, it is not enough to convert the guest's offloaded
> copies to host-side offloaded copies.

So how would a system call based on sendfile64() plus my flag parameter
prevent an underlying implementation from meeting your criterion?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Paolo Bonzini

Il 21/02/2013 15:57, Ric Wheeler ha scritto:
>>>
>> sendfile64() pretty much already has the right arguments for a
>> "copyfile", however it would be nice to add a 'flags' parameter: the
>> NFSv4.2 version would use that to specify whether or not to copy file
>> metadata.
> 
> That would seem to be enough to me and has the advantage that it is an
> relatively obvious extension to something that is at least not totally
> unknown to developers.
> 
> Do we need more than that for non-NFS paths I wonder? What does reflink
> need or the SCSI mechanism?

For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration.  However, it is not enough to convert the guest's offloaded
copies to host-side offloaded copies.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Jeremy Allison

On Thu, Feb 21, 2013 at 01:51:53PM +, Myklebust, Trond wrote:
> On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:
> > We have debated the need to have a system call to allow for offloading copy 
> > operations, for example to an NFS server (part to the new NFS 4.2 
> > specification), SCSI target device (two different SCSI commands do this), 
> > local 
> > file systems (reflink, etc) and I suspect many other possible parts of the 
> > stack 
> > could implement this.
> 
> sendfile64() pretty much already has the right arguments for a
> "copyfile", however it would be nice to add a 'flags' parameter: the
> NFSv4.2 version would use that to specify whether or not to copy file
> metadata.

What would be really nice is if sendfile allowed zero-copy
from network socket to a file descriptor. That would help
a *lot* of my small system OEMs (and no splice() just doesn't
cut it :-).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Andreas Dilger

On 2013-02-21, at 7:57 AM, Ric Wheeler wrote:
> On 02/21/2013 02:51 PM, Myklebust, Trond wrote:
>> On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:
>>> We have debated the need to have a system call to allow for offloading copy
>>> operations, for example to an NFS server (part to the new NFS 4.2
>>> specification), SCSI target device (two different SCSI commands do this), 
>>> local
>>> file systems (reflink, etc) and I suspect many other possible parts of the 
>>> stack
>>> could implement this.
>> sendfile64() pretty much already has the right arguments for a
>> "copyfile", however it would be nice to add a 'flags' parameter: the
>> NFSv4.2 version would use that to specify whether or not to copy file
>> metadata.
> 
> That would seem to be enough to me and has the advantage that it is an 
> relatively obvious extension to something that is at least not totally 
> unknown to developers.
> 
> Do we need more than that for non-NFS paths I wonder? What does reflink need 
> or the SCSI mechanism?

IMHO, the critical part about a copy syscall is avoiding the data
copy to/from userspace.  Copying file attributes opens up a huge
morass of issues related to which attrs/xattrs/ACLs are copied,
yet those don't cost nearly so much as the data copies.

We definitely want the API to be flexible enough to do server-side
copies (e.g. NFS and CIFS), but we also need to allow data copies
for regular files between different local and/or network filesystems
within the VFS.

Cheers, Andreas

>>> The earliest discussion of such a system call I saw happened back in 2001, I
>>> know we had another more recent flurry (2-3 years back?) as well that got
>>> tangled up and died away.
>>> 
>>> Given the new popularity of this in storage devices and the use case for 
>>> virt
>>> guests, any chance to get a proposal floated this year that might be able to
>>> land upstream in our life times :) ?
>> I'm planning on soon dusting off the NFS prototype that NetApp wrote 3
>> years ago and converting at least the client implementation into
>> something that can go upstream. We do also have a server prototype for
>> Linux, but the copy offload between 2 different servers is a hack and
>> would need significant work.
>> 
> 
> That would be really interesting, thanks!
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Ric Wheeler


On 02/21/2013 02:51 PM, Myklebust, Trond wrote:

On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:

We have debated the need to have a system call to allow for offloading copy
operations, for example to an NFS server (part to the new NFS 4.2
specification), SCSI target device (two different SCSI commands do this), local
file systems (reflink, etc) and I suspect many other possible parts of the stack
could implement this.

sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.


That would seem to be enough to me and has the advantage that it is an 
relatively obvious extension to something that is at least not totally unknown 
to developers.


Do we need more than that for non-NFS paths I wonder? What does reflink need or 
the SCSI mechanism?





The earliest discussion of such a system call I saw happened back in 2001, I
know we had another more recent flurry (2-3 years back?) as well that got
tangled up and died away.

Given the new popularity of this in storage devices and the use case for virt
guests, any chance to get a proposal floated this year that might be able to
land upstream in our life times :) ?

I'm planning on soon dusting off the NFS prototype that NetApp wrote 3
years ago and converting at least the client implementation into
something that can go upstream. We do also have a server prototype for
Linux, but the copy offload between 2 different servers is a hack and
would need significant work.



That would be really interesting, thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:
> We have debated the need to have a system call to allow for offloading copy 
> operations, for example to an NFS server (part to the new NFS 4.2 
> specification), SCSI target device (two different SCSI commands do this), 
> local 
> file systems (reflink, etc) and I suspect many other possible parts of the 
> stack 
> could implement this.

sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

> The earliest discussion of such a system call I saw happened back in 2001, I 
> know we had another more recent flurry (2-3 years back?) as well that got 
> tangled up and died away.
>
> Given the new popularity of this in storage devices and the use case for virt 
> guests, any chance to get a proposal floated this year that might be able to 
> land upstream in our life times :) ?

I'm planning on soon dusting off the NFS prototype that NetApp wrote 3
years ago and converting at least the client implementation into
something that can go upstream. We do also have a server prototype for
Linux, but the copy offload between 2 different servers is a hack and
would need significant work.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Hannes Reinecke


On 02/21/2013 12:37 PM, Ric Wheeler wrote:


We have debated the need to have a system call to allow for
offloading copy operations, for example to an NFS server (part to
the new NFS 4.2 specification), SCSI target device (two different
SCSI commands do this), local file systems (reflink, etc) and I
suspect many other possible parts of the stack could implement this.

The earliest discussion of such a system call I saw happened back in
2001, I know we had another more recent flurry (2-3 years back?) as
well that got tangled up and died away.

Yeah, I remember. I talked to Mkp about it, who (as usual :-) had a 
patchset stashed away for this.

Or a preliminary attempt, anyway.
However, this was waiting for the DISCARD merging patches to go in, 
which in turn were waiting for the WRITE SAME patches IIRC.


Or something.

Martin?


Given the new popularity of this in storage devices and the use case
for virt guests, any chance to get a proposal floated this year that
might be able to land upstream in our life times :) ?


Oh, most definitely.
Now that I finally have an array capable of doing ROD token copy
we should be reevaluating things.

I see to have the sg_xcopy program updated to do ROD copy, then we 
will have some real-world data.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Hannes Reinecke


On 02/21/2013 12:37 PM, Ric Wheeler wrote:


We have debated the need to have a system call to allow for
offloading copy operations, for example to an NFS server (part to
the new NFS 4.2 specification), SCSI target device (two different
SCSI commands do this), local file systems (reflink, etc) and I
suspect many other possible parts of the stack could implement this.

The earliest discussion of such a system call I saw happened back in
2001, I know we had another more recent flurry (2-3 years back?) as
well that got tangled up and died away.

Yeah, I remember. I talked to Mkp about it, who (as usual :-) had a 
patchset stashed away for this.

Or a preliminary attempt, anyway.
However, this was waiting for the DISCARD merging patches to go in, 
which in turn were waiting for the WRITE SAME patches IIRC.


Or something.

Martin?


Given the new popularity of this in storage devices and the use case
for virt guests, any chance to get a proposal floated this year that
might be able to land upstream in our life times :) ?


Oh, most definitely.
Now that I finally have an array capable of doing ROD token copy
we should be reevaluating things.

I see to have the sg_xcopy program updated to do ROD copy, then we 
will have some real-world data.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?