Re: New copyfile system call - discuss before LSF?
On Sun, Mar 31, 2013 at 04:36:59AM +, Myklebust, Trond wrote: > On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote: > > On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond > > wrote: > > > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: > > >> On 2013-03-30, at 16:21, Ric Wheeler wrote: > > >> > > >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote: > > >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek > > >> >> wrote: > > >> >> > > >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: > > >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > > >> > Hmm, really? AFAICT it would be simple to provide an > > >> > open_deleted_file("directory") syscall. You'd open_deleted_file(), > > >> > copy source file into it, then fsync(), then link it into > > >> > filesystem. > > >> > > > >> > That should have atomicity properties reflected. > > >> Actually, the open_deleted_file() syscall is quite useful for many > > >> different things all by itself. Lots of applications need to create > > >> temporary files that are unlinked at application failure (without a > > >> race if app crashes after creating the file, but before unlinking). > > >> It also avoids exposing temporary files into the namespace if other > > >> applications are accessing the directory. > > >> >>> Hmm. open_deleted_file() will still need to get a directory... so it > > >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > >> >>> be acceptable interface? > > >> >>>Pavel > > >> >> ...and what's the big plan to make this work on anything other than > > >> >> ext4 and btrfs? > > >> >> > > >> >> Cheers, > > >> >> Trond > > >> > > > >> > I know that change can be a good thing, but are we really solving a > > >> > pressing problem given that application developers have dealt with > > >> > open/rename as the way to get "atomic" file creation for several > > >> > decades now ? > > >> > > >> Using open()+rename() has side effects: > > >> - changes ctime/mtime on parent directory > > >> - leaves temporary file in path during creation > > >> - leaves temporary file in namespace during operations, and after crash > > > > > > So what is the actual problem that is being solved? Yes, the above may > > > be disadvantages, but none of them have proven to be show-stoppers so > > > far. > > > > > > So far, I've seen no justification for Andy's atomicity requirement > > > other than "it would be nice if...". That's not enough IMO... > > > > ISTM vpsendfile (or whatever it's called) plus a way to create deleted > > files plus a way to relink deleted files gives atomic copies. Perhaps > > this is less efficient than would be ideal for OCFS2, though. > > What real-life problem does the atomicity requirement solve? I've occasionally wondered whether something like that would help an nfs server implement atomic v4 open (which can acquire share locks and set attributes): create an anonymous file, get the locks and set the attributes, then link it in only once all that's succeeded. I don't know if that actually works--among other problems, I'm not sure how you'd implement O_CREAT and O_EXCL. Probably it would make more sense just to add a new open system call that does what we want. (If we decide we even care that much about perfect atomicity for v4 open semantics that few clients actually use.) --b. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sun, Mar 31, 2013 at 04:36:59AM +, Myklebust, Trond wrote: On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote: On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote: On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash So what is the actual problem that is being solved? Yes, the above may be disadvantages, but none of them have proven to be show-stoppers so far. So far, I've seen no justification for Andy's atomicity requirement other than it would be nice if That's not enough IMO... ISTM vpsendfile (or whatever it's called) plus a way to create deleted files plus a way to relink deleted files gives atomic copies. Perhaps this is less efficient than would be ideal for OCFS2, though. What real-life problem does the atomicity requirement solve? I've occasionally wondered whether something like that would help an nfs server implement atomic v4 open (which can acquire share locks and set attributes): create an anonymous file, get the locks and set the attributes, then link it in only once all that's succeeded. I don't know if that actually works--among other problems, I'm not sure how you'd implement O_CREAT and O_EXCL. Probably it would make more sense just to add a new open system call that does what we want. (If we decide we even care that much about perfect atomicity for v4 open semantics that few clients actually use.) --b. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
Hi! > >>User wants to test for a file with name "foo.txt" > >> > >>* create "foo.txt~" (or whatever) > >>* write contents into "foo.txt~" > >>* rename "foo.txt~" to "foo.txt" > >> > >>Until rename is done, the file does not exists and is not complete. > >>You will potentially have a garbage file to clean up if the program > >>(or system) crashes, but that is not racy in a classic sense, right? > >Well. If people rsync from you, they will start fetching incomplete > >foo.txt~. Plus the garbage issue. > > That is not racy, just garbage (not trying to be pedantic, just > trying to understand). I can see that the "~" file is annoying, but > we have dealt with it for a *long* time :) Ok, so lets keep it at "~" is annoying :-). [But... I was wrong. openat(..., AT_UNLINKED) is not enough to solve this: we do not have flink() and it is not easily possible to link deleted file "back to life" from /proc/self/fd: pavel@amd:/tmp$ > delme pavel@amd:/tmp$ bash 3< delme & [2] 32667 [2]+ Stopped bash 3< delme pavel@amd:/tmp$ fg bash 3< delme pavel@amd:/tmp$ ls -al delme -rw-r--r-- 1 pavel pavel 0 Apr 1 01:36 delme pavel@amd:/tmp$ ls -al /proc/self/fd/3 lr-x-- 1 pavel pavel 64 Apr 1 01:37 /proc/self/fd/3 -> /tmp/delme pavel@amd:/tmp$ rm delme pavel@amd:/tmp$ ls -al /proc/self/fd/3 lr-x-- 1 pavel pavel 64 Apr 1 01:37 /proc/self/fd/3 -> /tmp/delme (deleted) pavel@amd:/tmp$ ln /proc/self/fd/3 delme2 ln: creating hard link `delme2' => `/proc/self/fd/3': Invalid cross-device link ] > >>This is more of a garbage clean up issue? > >Also. Plus sometimes you want temporary "file" that is > >deleted. Terminals use it for history, etc... > > There you would have a race, you can create a file and unlink it of > course and still write to it, but you would have a potential empty > file issue? Yes. openat(..., AT_UNLINKED) solves that -- you'll no longer get those files. (Not sure they'd be always empty. How do you ensure rm hits the disk? fsync() on parent directory? Sounds expensive.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On 03/31/2013 07:18 PM, Pavel Machek wrote: Hi! Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? Problem is "clasical create temp file then delete it" is racy. See the archives. That is useful & common operation. Which race are you concerned with exactly? User wants to test for a file with name "foo.txt" * create "foo.txt~" (or whatever) * write contents into "foo.txt~" * rename "foo.txt~" to "foo.txt" Until rename is done, the file does not exists and is not complete. You will potentially have a garbage file to clean up if the program (or system) crashes, but that is not racy in a classic sense, right? Well. If people rsync from you, they will start fetching incomplete foo.txt~. Plus the garbage issue. That is not racy, just garbage (not trying to be pedantic, just trying to understand). I can see that the "~" file is annoying, but we have dealt with it for a *long* time :) Until it has the right name (on either the source or target system for rsync), it is not the file you are looking for. This is more of a garbage clean up issue? Also. Plus sometimes you want temporary "file" that is deleted. Terminals use it for history, etc... There you would have a race, you can create a file and unlink it of course and still write to it, but you would have a potential empty file issue? Ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
Hi! > Take a look at how many actively used filesystems out there that have > some variant of sillyrename(), and explain what you want to do in those > cases. > >>>Well. Yes, there are non-unix filesystems around. You have to deal > >>>with silly files on them, and this will not be different. > >>So this would be a local POSIX filesystem only solution to a problem > >>that has yet to be formulated? > >Problem is "clasical create temp file then delete it" is racy. See the > >archives. That is useful & common operation. > > Which race are you concerned with exactly? > > User wants to test for a file with name "foo.txt" > > * create "foo.txt~" (or whatever) > * write contents into "foo.txt~" > * rename "foo.txt~" to "foo.txt" > > Until rename is done, the file does not exists and is not complete. > You will potentially have a garbage file to clean up if the program > (or system) crashes, but that is not racy in a classic sense, right? Well. If people rsync from you, they will start fetching incomplete foo.txt~. Plus the garbage issue. > This is more of a garbage clean up issue? Also. Plus sometimes you want temporary "file" that is deleted. Terminals use it for history, etc... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On 03/31/2013 06:50 PM, Pavel Machek wrote: On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote: On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote: Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? Problem is "clasical create temp file then delete it" is racy. See the archives. That is useful & common operation. Which race are you concerned with exactly? User wants to test for a file with name "foo.txt" * create "foo.txt~" (or whatever) * write contents into "foo.txt~" * rename "foo.txt~" to "foo.txt" Until rename is done, the file does not exists and is not complete. You will potentially have a garbage file to clean up if the program (or system) crashes, but that is not racy in a classic sense, right? This is more of a garbage clean up issue? Regards, Ric Problem is "atomicaly create file at target location with guaranteed right content". That's also in the archives. Looks useful if someone does rsync from your directory. Non-POSIX filesystems have problems handling deleted files, but that was always the case. That's one of the reasons they are seldomly used for root filesystems. Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote: > On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote: > > > > > > Hmm. open_deleted_file() will still need to get a directory... so it > > > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) > > > > > > would > > > > > > be acceptable interface? > > > > > > > > > > ...and what's the big plan to make this work on anything other than > > > > > ext4 and btrfs? > > > > > > > > Deleted but open files are from original unix, so it should work on > > > > anything unixy (minix, ext, ext2, ...). > > > > > > minix, ext, ext2... are not under active development and haven't been > > > for more than a decade. > > > > > > Take a look at how many actively used filesystems out there that have > > > some variant of sillyrename(), and explain what you want to do in those > > > cases. > > > > Well. Yes, there are non-unix filesystems around. You have to deal > > with silly files on them, and this will not be different. > > So this would be a local POSIX filesystem only solution to a problem > that has yet to be formulated? Problem is "clasical create temp file then delete it" is racy. See the archives. That is useful & common operation. Problem is "atomicaly create file at target location with guaranteed right content". That's also in the archives. Looks useful if someone does rsync from your directory. Non-POSIX filesystems have problems handling deleted files, but that was always the case. That's one of the reasons they are seldomly used for root filesystems. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Pavel Machek wrote: > Eric Wong wrote: > > [1] my splice() annoyances: > > * need to create/manage a pipe > > * copy size limited by pipe size > > * doesn't reduce userspace syscalls (just data copy overhead) > > * easy to misuse and starve with blocking sockets + big buffers > > * not many users, so bugs creep in (v3.7.8 was the first usable > > version of the 3.7 series for TCP sockets) > > Could library be created to make it less annoying to use, and harder > to misuse? Maybe, but getting people to use the library would be the hard, too. And a library would not reduce syscalls in the common case. We already have current->splice_pipe for sendfile, so maybe splice can be taught to transparently use that when neither FD is a pipe. I also think a SPLICE_F_DONTWAIT flag might be necessary. It would be a superset of SPLICE_F_NONBLOCK, but also act like MSG_DONTWAIT for the non-pipe socket. > splice man page does not mention pipe size limit... It probably should. I think I discovered it by using it many years ago and burned it into my mind. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote: > > > > > Hmm. open_deleted_file() will still need to get a directory... so it > > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > > > > be acceptable interface? > > > > > > > > ...and what's the big plan to make this work on anything other than > > > > ext4 and btrfs? > > > > > > Deleted but open files are from original unix, so it should work on > > > anything unixy (minix, ext, ext2, ...). > > > > minix, ext, ext2... are not under active development and haven't been > > for more than a decade. > > > > Take a look at how many actively used filesystems out there that have > > some variant of sillyrename(), and explain what you want to do in those > > cases. > > Well. Yes, there are non-unix filesystems around. You have to deal > with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
> > > > Hmm. open_deleted_file() will still need to get a directory... so it > > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > > > be acceptable interface? > > > > > > ...and what's the big plan to make this work on anything other than ext4 > > > and btrfs? > > > > Deleted but open files are from original unix, so it should work on > > anything unixy (minix, ext, ext2, ...). > > minix, ext, ext2... are not under active development and haven't been > for more than a decade. > > Take a look at how many actively used filesystems out there that have > some variant of sillyrename(), and explain what you want to do in those > cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sun, 2013-03-31 at 09:36 +0200, Pavel Machek wrote: > Hi! > > > >>> Hmm, really? AFAICT it would be simple to provide an > > >>> open_deleted_file("directory") syscall. You'd open_deleted_file(), > > >>> copy source file into it, then fsync(), then link it into filesystem. > > >>> > > >>> That should have atomicity properties reflected. > > >> > > >> Actually, the open_deleted_file() syscall is quite useful for many > > >> different things all by itself. Lots of applications need to create > > >> temporary files that are unlinked at application failure (without a > > >> race if app crashes after creating the file, but before unlinking). > > >> It also avoids exposing temporary files into the namespace if other > > >> applications are accessing the directory. > > > > > > Hmm. open_deleted_file() will still need to get a directory... so it > > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > > be acceptable interface? > > > > ...and what's the big plan to make this work on anything other than ext4 > > and btrfs? > > Deleted but open files are from original unix, so it should work on > anything unixy (minix, ext, ext2, ...). > Pavel minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 03/30/2013 08:08 PM, Andreas Dilger wrote: > On 2013-03-30, at 12:49 PM, Pavel Machek wrote: >> Hmm, really? AFAICT it would be simple to provide an >> open_deleted_file("directory") syscall. You'd open_deleted_file(), >> copy source file into it, then fsync(), then link it into filesystem. >> >> That should have atomicity properties reflected. > > Actually, the open_deleted_file() syscall is quite useful for many > different things all by itself. Lots of applications need to create > temporary files that are unlinked at application failure (without a > race if app crashes after creating the file, but before unlinking). > It also avoids exposing temporary files into the namespace if other > applications are accessing the directory. > > We've added a library routine that does this for Lustre in a hackish > way (magical filename created in target directory) for being able to > migrate files between data servers, HSM, defragmentation, rsync, etc. > > Cheers, Andreas This reminds me of the flink() discussion: http://marc.info/?l=linux-kernel=104965452917349 Also kinda related is the exchangedata() OSX system call to "atomically exchange data between two files" thanks, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! On Sat 2013-03-30 22:38:35, AEDilger Gmail wrote: > On 2013-03-30, at 14:45, Pavel Machek wrote: > > On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: > >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > >>> Hmm, really? AFAICT it would be simple to provide an > >>> open_deleted_file("directory") syscall. You'd open_deleted_file(), > >>> copy source file into it, then fsync(), then link it into filesystem. > >>> > >>> That should have atomicity properties reflected. > >> > >> Actually, the open_deleted_file() syscall is quite useful for many > >> different things all by itself. Lots of applications need to create > >> temporary files that are unlinked at application failure (without a > >> race if app crashes after creating the file, but before unlinking). > >> It also avoids exposing temporary files into the namespace if other > >> applications are accessing the directory. > > > > Hmm. open_deleted_file() will still need to get a directory... so it > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > be acceptable interface? > > Yes, that would be reasonable, and/or possibly openat(fd, NULL, > AT_FDCWD|AT_UNLINKED)? openat() is better interface for this, I'd say. BTW... I don't think this has to be done at the same time as splice() [or how it ends up being called] changes... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! > >>> Hmm, really? AFAICT it would be simple to provide an > >>> open_deleted_file("directory") syscall. You'd open_deleted_file(), > >>> copy source file into it, then fsync(), then link it into filesystem. > >>> > >>> That should have atomicity properties reflected. > >> > >> Actually, the open_deleted_file() syscall is quite useful for many > >> different things all by itself. Lots of applications need to create > >> temporary files that are unlinked at application failure (without a > >> race if app crashes after creating the file, but before unlinking). > >> It also avoids exposing temporary files into the namespace if other > >> applications are accessing the directory. > > > > Hmm. open_deleted_file() will still need to get a directory... so it > > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > be acceptable interface? > > ...and what's the big plan to make this work on anything other than ext4 and > btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! On Sat 2013-03-30 22:38:35, AEDilger Gmail wrote: On 2013-03-30, at 14:45, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Yes, that would be reasonable, and/or possibly openat(fd, NULL, AT_FDCWD|AT_UNLINKED)? openat() is better interface for this, I'd say. BTW... I don't think this has to be done at the same time as splice() [or how it ends up being called] changes... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 03/30/2013 08:08 PM, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. We've added a library routine that does this for Lustre in a hackish way (magical filename created in target directory) for being able to migrate files between data servers, HSM, defragmentation, rsync, etc. Cheers, Andreas This reminds me of the flink() discussion: http://marc.info/?l=linux-kernelm=104965452917349 Also kinda related is the exchangedata() OSX system call to atomically exchange data between two files thanks, Pádraig. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sun, 2013-03-31 at 09:36 +0200, Pavel Machek wrote: Hi! Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). Pavel minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote: Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Pavel Machek pa...@ucw.cz wrote: Eric Wong wrote: [1] my splice() annoyances: * need to create/manage a pipe * copy size limited by pipe size * doesn't reduce userspace syscalls (just data copy overhead) * easy to misuse and starve with blocking sockets + big buffers * not many users, so bugs creep in (v3.7.8 was the first usable version of the 3.7 series for TCP sockets) Could library be created to make it less annoying to use, and harder to misuse? Maybe, but getting people to use the library would be the hard, too. And a library would not reduce syscalls in the common case. We already have current-splice_pipe for sendfile, so maybe splice can be taught to transparently use that when neither FD is a pipe. I also think a SPLICE_F_DONTWAIT flag might be necessary. It would be a superset of SPLICE_F_NONBLOCK, but also act like MSG_DONTWAIT for the non-pipe socket. splice man page does not mention pipe size limit... It probably should. I think I discovered it by using it many years ago and burned it into my mind. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote: On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote: Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? Problem is clasical create temp file then delete it is racy. See the archives. That is useful common operation. Problem is atomicaly create file at target location with guaranteed right content. That's also in the archives. Looks useful if someone does rsync from your directory. Non-POSIX filesystems have problems handling deleted files, but that was always the case. That's one of the reasons they are seldomly used for root filesystems. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On 03/31/2013 06:50 PM, Pavel Machek wrote: On Sun 2013-03-31 18:44:53, Myklebust, Trond wrote: On Sun, 2013-03-31 at 20:32 +0200, Pavel Machek wrote: Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? ...and what's the big plan to make this work on anything other than ext4 and btrfs? Deleted but open files are from original unix, so it should work on anything unixy (minix, ext, ext2, ...). minix, ext, ext2... are not under active development and haven't been for more than a decade. Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? Problem is clasical create temp file then delete it is racy. See the archives. That is useful common operation. Which race are you concerned with exactly? User wants to test for a file with name foo.txt * create foo.txt~ (or whatever) * write contents into foo.txt~ * rename foo.txt~ to foo.txt Until rename is done, the file does not exists and is not complete. You will potentially have a garbage file to clean up if the program (or system) crashes, but that is not racy in a classic sense, right? This is more of a garbage clean up issue? Regards, Ric Problem is atomicaly create file at target location with guaranteed right content. That's also in the archives. Looks useful if someone does rsync from your directory. Non-POSIX filesystems have problems handling deleted files, but that was always the case. That's one of the reasons they are seldomly used for root filesystems. Pavel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
Hi! Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? Problem is clasical create temp file then delete it is racy. See the archives. That is useful common operation. Which race are you concerned with exactly? User wants to test for a file with name foo.txt * create foo.txt~ (or whatever) * write contents into foo.txt~ * rename foo.txt~ to foo.txt Until rename is done, the file does not exists and is not complete. You will potentially have a garbage file to clean up if the program (or system) crashes, but that is not racy in a classic sense, right? Well. If people rsync from you, they will start fetching incomplete foo.txt~. Plus the garbage issue. This is more of a garbage clean up issue? Also. Plus sometimes you want temporary file that is deleted. Terminals use it for history, etc... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
On 03/31/2013 07:18 PM, Pavel Machek wrote: Hi! Take a look at how many actively used filesystems out there that have some variant of sillyrename(), and explain what you want to do in those cases. Well. Yes, there are non-unix filesystems around. You have to deal with silly files on them, and this will not be different. So this would be a local POSIX filesystem only solution to a problem that has yet to be formulated? Problem is clasical create temp file then delete it is racy. See the archives. That is useful common operation. Which race are you concerned with exactly? User wants to test for a file with name foo.txt * create foo.txt~ (or whatever) * write contents into foo.txt~ * rename foo.txt~ to foo.txt Until rename is done, the file does not exists and is not complete. You will potentially have a garbage file to clean up if the program (or system) crashes, but that is not racy in a classic sense, right? Well. If people rsync from you, they will start fetching incomplete foo.txt~. Plus the garbage issue. That is not racy, just garbage (not trying to be pedantic, just trying to understand). I can see that the ~ file is annoying, but we have dealt with it for a *long* time :) Until it has the right name (on either the source or target system for rsync), it is not the file you are looking for. This is more of a garbage clean up issue? Also. Plus sometimes you want temporary file that is deleted. Terminals use it for history, etc... There you would have a race, you can create a file and unlink it of course and still write to it, but you would have a potential empty file issue? Ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: openat(..., AT_UNLINKED) was Re: New copyfile system call - discuss before LSF?
Hi! User wants to test for a file with name foo.txt * create foo.txt~ (or whatever) * write contents into foo.txt~ * rename foo.txt~ to foo.txt Until rename is done, the file does not exists and is not complete. You will potentially have a garbage file to clean up if the program (or system) crashes, but that is not racy in a classic sense, right? Well. If people rsync from you, they will start fetching incomplete foo.txt~. Plus the garbage issue. That is not racy, just garbage (not trying to be pedantic, just trying to understand). I can see that the ~ file is annoying, but we have dealt with it for a *long* time :) Ok, so lets keep it at ~ is annoying :-). [But... I was wrong. openat(..., AT_UNLINKED) is not enough to solve this: we do not have flink() and it is not easily possible to link deleted file back to life from /proc/self/fd: pavel@amd:/tmp$ delme pavel@amd:/tmp$ bash 3 delme [2] 32667 [2]+ Stopped bash 3 delme pavel@amd:/tmp$ fg bash 3 delme pavel@amd:/tmp$ ls -al delme -rw-r--r-- 1 pavel pavel 0 Apr 1 01:36 delme pavel@amd:/tmp$ ls -al /proc/self/fd/3 lr-x-- 1 pavel pavel 64 Apr 1 01:37 /proc/self/fd/3 - /tmp/delme pavel@amd:/tmp$ rm delme pavel@amd:/tmp$ ls -al /proc/self/fd/3 lr-x-- 1 pavel pavel 64 Apr 1 01:37 /proc/self/fd/3 - /tmp/delme (deleted) pavel@amd:/tmp$ ln /proc/self/fd/3 delme2 ln: creating hard link `delme2' = `/proc/self/fd/3': Invalid cross-device link ] This is more of a garbage clean up issue? Also. Plus sometimes you want temporary file that is deleted. Terminals use it for history, etc... There you would have a race, you can create a file and unlink it of course and still write to it, but you would have a potential empty file issue? Yes. openat(..., AT_UNLINKED) solves that -- you'll no longer get those files. (Not sure they'd be always empty. How do you ensure rm hits the disk? fsync() on parent directory? Sounds expensive.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-03-30, at 14:45, Pavel Machek wrote: > On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: >>> Hmm, really? AFAICT it would be simple to provide an >>> open_deleted_file("directory") syscall. You'd open_deleted_file(), >>> copy source file into it, then fsync(), then link it into filesystem. >>> >>> That should have atomicity properties reflected. >> >> Actually, the open_deleted_file() syscall is quite useful for many >> different things all by itself. Lots of applications need to create >> temporary files that are unlinked at application failure (without a >> race if app crashes after creating the file, but before unlinking). >> It also avoids exposing temporary files into the namespace if other >> applications are accessing the directory. > > Hmm. open_deleted_file() will still need to get a directory... so it > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > be acceptable interface? Yes, that would be reasonable, and/or possibly openat(fd, NULL, AT_FDCWD|AT_UNLINKED)? Cheers, Andreas-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sun, 2013-03-31 at 00:36 -0400, Trond Myklebust wrote: > On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote: > > On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond > > wrote: > > > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: > > >> On 2013-03-30, at 16:21, Ric Wheeler wrote: > > >> > > >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote: > > >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek > > >> >> wrote: > > >> >> > > >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: > > >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > > >> > Hmm, really? AFAICT it would be simple to provide an > > >> > open_deleted_file("directory") syscall. You'd open_deleted_file(), > > >> > copy source file into it, then fsync(), then link it into > > >> > filesystem. > > >> > > > >> > That should have atomicity properties reflected. > > >> Actually, the open_deleted_file() syscall is quite useful for many > > >> different things all by itself. Lots of applications need to create > > >> temporary files that are unlinked at application failure (without a > > >> race if app crashes after creating the file, but before unlinking). > > >> It also avoids exposing temporary files into the namespace if other > > >> applications are accessing the directory. > > >> >>> Hmm. open_deleted_file() will still need to get a directory... so it > > >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > > >> >>> be acceptable interface? > > >> >>>Pavel > > >> >> ...and what's the big plan to make this work on anything other than > > >> >> ext4 and btrfs? > > >> >> > > >> >> Cheers, > > >> >> Trond > > >> > > > >> > I know that change can be a good thing, but are we really solving a > > >> > pressing problem given that application developers have dealt with > > >> > open/rename as the way to get "atomic" file creation for several > > >> > decades now ? > > >> > > >> Using open()+rename() has side effects: > > >> - changes ctime/mtime on parent directory > > >> - leaves temporary file in path during creation > > >> - leaves temporary file in namespace during operations, and after crash > > > > > > So what is the actual problem that is being solved? Yes, the above may > > > be disadvantages, but none of them have proven to be show-stoppers so > > > far. > > > > > > So far, I've seen no justification for Andy's atomicity requirement > > > other than "it would be nice if...". That's not enough IMO... > > > > ISTM vpsendfile (or whatever it's called) plus a way to create deleted > > files plus a way to relink deleted files gives atomic copies. Perhaps > > this is less efficient than would be ideal for OCFS2, though. > > What real-life problem does the atomicity requirement solve? None of our > customers have ever asked for it. They don't care... > BTW: before you do answer, please note that the current NFSv4.2 solution _does_ allow you to lock the file before you copy. IOW: the same atomicity rules apply to offloaded copy as apply to standard copy: there is no requirement anywhere to apply stronger semantics. Surprisingly enough, that works for most people... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote: > On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond > wrote: > > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: > >> On 2013-03-30, at 16:21, Ric Wheeler wrote: > >> > >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote: > >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek > >> >> wrote: > >> >> > >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: > >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > >> > Hmm, really? AFAICT it would be simple to provide an > >> > open_deleted_file("directory") syscall. You'd open_deleted_file(), > >> > copy source file into it, then fsync(), then link it into filesystem. > >> > > >> > That should have atomicity properties reflected. > >> Actually, the open_deleted_file() syscall is quite useful for many > >> different things all by itself. Lots of applications need to create > >> temporary files that are unlinked at application failure (without a > >> race if app crashes after creating the file, but before unlinking). > >> It also avoids exposing temporary files into the namespace if other > >> applications are accessing the directory. > >> >>> Hmm. open_deleted_file() will still need to get a directory... so it > >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > >> >>> be acceptable interface? > >> >>>Pavel > >> >> ...and what's the big plan to make this work on anything other than > >> >> ext4 and btrfs? > >> >> > >> >> Cheers, > >> >> Trond > >> > > >> > I know that change can be a good thing, but are we really solving a > >> > pressing problem given that application developers have dealt with > >> > open/rename as the way to get "atomic" file creation for several decades > >> > now ? > >> > >> Using open()+rename() has side effects: > >> - changes ctime/mtime on parent directory > >> - leaves temporary file in path during creation > >> - leaves temporary file in namespace during operations, and after crash > > > > So what is the actual problem that is being solved? Yes, the above may > > be disadvantages, but none of them have proven to be show-stoppers so > > far. > > > > So far, I've seen no justification for Andy's atomicity requirement > > other than "it would be nice if...". That's not enough IMO... > > ISTM vpsendfile (or whatever it's called) plus a way to create deleted > files plus a way to relink deleted files gives atomic copies. Perhaps > this is less efficient than would be ideal for OCFS2, though. What real-life problem does the atomicity requirement solve? None of our customers have ever asked for it. They don't care... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond wrote: > On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: >> On 2013-03-30, at 16:21, Ric Wheeler wrote: >> >> > On 03/30/2013 05:57 PM, Myklebust, Trond wrote: >> >> On Mar 30, 2013, at 5:45 PM, Pavel Machek >> >> wrote: >> >> >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: >> > Hmm, really? AFAICT it would be simple to provide an >> > open_deleted_file("directory") syscall. You'd open_deleted_file(), >> > copy source file into it, then fsync(), then link it into filesystem. >> > >> > That should have atomicity properties reflected. >> Actually, the open_deleted_file() syscall is quite useful for many >> different things all by itself. Lots of applications need to create >> temporary files that are unlinked at application failure (without a >> race if app crashes after creating the file, but before unlinking). >> It also avoids exposing temporary files into the namespace if other >> applications are accessing the directory. >> >>> Hmm. open_deleted_file() will still need to get a directory... so it >> >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would >> >>> be acceptable interface? >> >>>Pavel >> >> ...and what's the big plan to make this work on anything other than ext4 >> >> and btrfs? >> >> >> >> Cheers, >> >> Trond >> > >> > I know that change can be a good thing, but are we really solving a >> > pressing problem given that application developers have dealt with >> > open/rename as the way to get "atomic" file creation for several decades >> > now ? >> >> Using open()+rename() has side effects: >> - changes ctime/mtime on parent directory >> - leaves temporary file in path during creation >> - leaves temporary file in namespace during operations, and after crash > > So what is the actual problem that is being solved? Yes, the above may > be disadvantages, but none of them have proven to be show-stoppers so > far. > > So far, I've seen no justification for Andy's atomicity requirement > other than "it would be nice if...". That's not enough IMO... ISTM vpsendfile (or whatever it's called) plus a way to create deleted files plus a way to relink deleted files gives atomic copies. Perhaps this is less efficient than would be ideal for OCFS2, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: > On 2013-03-30, at 16:21, Ric Wheeler wrote: > > > On 03/30/2013 05:57 PM, Myklebust, Trond wrote: > >> On Mar 30, 2013, at 5:45 PM, Pavel Machek > >> wrote: > >> > >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: > On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > > Hmm, really? AFAICT it would be simple to provide an > > open_deleted_file("directory") syscall. You'd open_deleted_file(), > > copy source file into it, then fsync(), then link it into filesystem. > > > > That should have atomicity properties reflected. > Actually, the open_deleted_file() syscall is quite useful for many > different things all by itself. Lots of applications need to create > temporary files that are unlinked at application failure (without a > race if app crashes after creating the file, but before unlinking). > It also avoids exposing temporary files into the namespace if other > applications are accessing the directory. > >>> Hmm. open_deleted_file() will still need to get a directory... so it > >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > >>> be acceptable interface? > >>>Pavel > >> ...and what's the big plan to make this work on anything other than ext4 > >> and btrfs? > >> > >> Cheers, > >> Trond > > > > I know that change can be a good thing, but are we really solving a > > pressing problem given that application developers have dealt with > > open/rename as the way to get "atomic" file creation for several decades > > now ? > > Using open()+rename() has side effects: > - changes ctime/mtime on parent directory > - leaves temporary file in path during creation > - leaves temporary file in namespace during operations, and after crash So what is the actual problem that is being solved? Yes, the above may be disadvantages, but none of them have proven to be show-stoppers so far. So far, I've seen no justification for Andy's atomicity requirement other than "it would be nice if...". That's not enough IMO... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-03-30, at 16:21, Ric Wheeler wrote: > On 03/30/2013 05:57 PM, Myklebust, Trond wrote: >> On Mar 30, 2013, at 5:45 PM, Pavel Machek >> wrote: >> >>> On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > Hmm, really? AFAICT it would be simple to provide an > open_deleted_file("directory") syscall. You'd open_deleted_file(), > copy source file into it, then fsync(), then link it into filesystem. > > That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. >>> Hmm. open_deleted_file() will still need to get a directory... so it >>> will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would >>> be acceptable interface? >>>Pavel >> ...and what's the big plan to make this work on anything other than ext4 and >> btrfs? >> >> Cheers, >> Trond > > I know that change can be a good thing, but are we really solving a pressing > problem given that application developers have dealt with open/rename as the > way to get "atomic" file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash Cheers, Andreas-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file("directory") syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get "atomic" file creation for several decades now ? Regards, Ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, Mar 30, 2013 at 12:49 PM, Pavel Machek wrote: > Hi! > >> > I thought the first thing people would ask for is to atomically create a >> > new file and copy the old file into it (at least on local file systems). >> > The idea is that nothing should see an empty destination file, either >> > by race or by crash. (This feature would perhaps be described as a >> > pony, but it should be implementable.) >> >> Having already wasted many week trying to implement your pony, I would >> consider it about as possible as winning the lottery three times in a >> row. It clearly is in theory and yet,... > > Hmm, really? AFAICT it would be simple to provide > open_deleted_file("directory") > syscall. You'd open_deleted_file(), copy source file into it, then > fsync(), then link it into filesystem. Isn't linking a deleted file back into the filesystem explicitly forbidden? I'm pretty sure that linking from /proc/fd/whatever doesn't work. (I've often wanted a flink system call that takes a file descriptor and links it somewhere. If it came with an option to control whether it would overwrite an existing file, even better.) --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mar 30, 2013, at 5:45 PM, Pavel Machek wrote: > On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: >> On 2013-03-30, at 12:49 PM, Pavel Machek wrote: >>> Hmm, really? AFAICT it would be simple to provide an >>> open_deleted_file("directory") syscall. You'd open_deleted_file(), >>> copy source file into it, then fsync(), then link it into filesystem. >>> >>> That should have atomicity properties reflected. >> >> Actually, the open_deleted_file() syscall is quite useful for many >> different things all by itself. Lots of applications need to create >> temporary files that are unlinked at application failure (without a >> race if app crashes after creating the file, but before unlinking). >> It also avoids exposing temporary files into the namespace if other >> applications are accessing the directory. > > Hmm. open_deleted_file() will still need to get a directory... so it > will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would > be acceptable interface? > Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: > On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > > Hmm, really? AFAICT it would be simple to provide an > > open_deleted_file("directory") syscall. You'd open_deleted_file(), > > copy source file into it, then fsync(), then link it into filesystem. > > > > That should have atomicity properties reflected. > > Actually, the open_deleted_file() syscall is quite useful for many > different things all by itself. Lots of applications need to create > temporary files that are unlinked at application failure (without a > race if app crashes after creating the file, but before unlinking). > It also avoids exposing temporary files into the namespace if other > applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open("/foo/bar/mnt", O_DELETED) would be acceptable interface? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-03-30, at 12:49 PM, Pavel Machek wrote: > Hmm, really? AFAICT it would be simple to provide an > open_deleted_file("directory") syscall. You'd open_deleted_file(), > copy source file into it, then fsync(), then link it into filesystem. > > That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. We've added a library routine that does this for Lustre in a hackish way (magical filename created in target directory) for being able to migrate files between data servers, HSM, defragmentation, rsync, etc. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! > > I thought the first thing people would ask for is to atomically create a > > new file and copy the old file into it (at least on local file systems). > > The idea is that nothing should see an empty destination file, either > > by race or by crash. (This feature would perhaps be described as a > > pony, but it should be implementable.) > > Having already wasted many week trying to implement your pony, I would > consider it about as possible as winning the lottery three times in a > row. It clearly is in theory and yet,... Hmm, really? AFAICT it would be simple to provide open_deleted_file("directory") syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Pavel (who has too many (*) ponies around) (*) 1 is sometimes too many when we talk about big mammals. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! > > > If I'm guessing correctly, sendfile64()+flags would be annoying because > > > it's > > > missing an out_fd_offset. The host will want to offload the guest's > > > copies by > > > calling sendfile on block ranges of a guest disk image file that > > > correspond to > > > the mappings of the in and out files in the guest. > > > > > > You could make it work with some locking and out_fd seeking to set the > > > write offset before calling sendfile64()+flags, but ugh. > > > > > > ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > > > out_offset, size_t count, int flags); > > > > > > That seems closer. > > > > psendfile() ? > > > > I fully agree that sounds reasonable... Just being an ass. :-) > > splice() already has offset for both fds and a flags arg: > >ssize_t splice(int fd_in, loff_t *off_in, int fd_out, > loff_t *off_out, size_t len, unsigned int flags); > > The current downside is it requires one fd to be a pipe, so it's > just not very easy to use from my perspective[1]. ... > [1] my splice() annoyances: > * need to create/manage a pipe > * copy size limited by pipe size > * doesn't reduce userspace syscalls (just data copy overhead) > * easy to misuse and starve with blocking sockets + big buffers > * not many users, so bugs creep in (v3.7.8 was the first usable > version of the 3.7 series for TCP sockets) Could library be created to make it less annoying to use, and harder to misuse? splice man page does not mention pipe size limit... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. psendfile() ? I fully agree that sounds reasonable... Just being an ass. :-) splice() already has offset for both fds and a flags arg: ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags); The current downside is it requires one fd to be a pipe, so it's just not very easy to use from my perspective[1]. ... [1] my splice() annoyances: * need to create/manage a pipe * copy size limited by pipe size * doesn't reduce userspace syscalls (just data copy overhead) * easy to misuse and starve with blocking sockets + big buffers * not many users, so bugs creep in (v3.7.8 was the first usable version of the 3.7 series for TCP sockets) Could library be created to make it less annoying to use, and harder to misuse? splice man page does not mention pipe size limit... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Hi! I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) Having already wasted many week trying to implement your pony, I would consider it about as possible as winning the lottery three times in a row. It clearly is in theory and yet,... Hmm, really? AFAICT it would be simple to provide open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Pavel (who has too many (*) ponies around) (*) 1 is sometimes too many when we talk about big mammals. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. We've added a library routine that does this for Lustre in a hackish way (magical filename created in target directory) for being able to migrate files between data servers, HSM, defragmentation, rsync, etc. Cheers, Andreas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, Mar 30, 2013 at 12:49 PM, Pavel Machek pa...@ucw.cz wrote: Hi! I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) Having already wasted many week trying to implement your pony, I would consider it about as possible as winning the lottery three times in a row. It clearly is in theory and yet,... Hmm, really? AFAICT it would be simple to provide open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. Isn't linking a deleted file back into the filesystem explicitly forbidden? I'm pretty sure that linking from /proc/fd/whatever doesn't work. (I've often wanted a flink system call that takes a file descriptor and links it somewhere. If it came with an option to control whether it would overwrite an existing file, even better.) --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Regards, Ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote: On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash Cheers, Andreas-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote: On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash So what is the actual problem that is being solved? Yes, the above may be disadvantages, but none of them have proven to be show-stoppers so far. So far, I've seen no justification for Andy's atomicity requirement other than it would be nice if That's not enough IMO... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote: On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash So what is the actual problem that is being solved? Yes, the above may be disadvantages, but none of them have proven to be show-stoppers so far. So far, I've seen no justification for Andy's atomicity requirement other than it would be nice if That's not enough IMO... ISTM vpsendfile (or whatever it's called) plus a way to create deleted files plus a way to relink deleted files gives atomic copies. Perhaps this is less efficient than would be ideal for OCFS2, though. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote: On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote: On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash So what is the actual problem that is being solved? Yes, the above may be disadvantages, but none of them have proven to be show-stoppers so far. So far, I've seen no justification for Andy's atomicity requirement other than it would be nice if That's not enough IMO... ISTM vpsendfile (or whatever it's called) plus a way to create deleted files plus a way to relink deleted files gives atomic copies. Perhaps this is less efficient than would be ideal for OCFS2, though. What real-life problem does the atomicity requirement solve? None of our customers have ever asked for it. They don't care... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Sun, 2013-03-31 at 00:36 -0400, Trond Myklebust wrote: On Sat, 2013-03-30 at 21:18 -0700, Andy Lutomirski wrote: On Sat, Mar 30, 2013 at 8:52 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Sat, 2013-03-30 at 19:53 -0700, Andreas Dilger wrote: On 2013-03-30, at 16:21, Ric Wheeler rwhee...@redhat.com wrote: On 03/30/2013 05:57 PM, Myklebust, Trond wrote: On Mar 30, 2013, at 5:45 PM, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Pavel ...and what's the big plan to make this work on anything other than ext4 and btrfs? Cheers, Trond I know that change can be a good thing, but are we really solving a pressing problem given that application developers have dealt with open/rename as the way to get atomic file creation for several decades now ? Using open()+rename() has side effects: - changes ctime/mtime on parent directory - leaves temporary file in path during creation - leaves temporary file in namespace during operations, and after crash So what is the actual problem that is being solved? Yes, the above may be disadvantages, but none of them have proven to be show-stoppers so far. So far, I've seen no justification for Andy's atomicity requirement other than it would be nice if That's not enough IMO... ISTM vpsendfile (or whatever it's called) plus a way to create deleted files plus a way to relink deleted files gives atomic copies. Perhaps this is less efficient than would be ideal for OCFS2, though. What real-life problem does the atomicity requirement solve? None of our customers have ever asked for it. They don't care... BTW: before you do answer, please note that the current NFSv4.2 solution _does_ allow you to lock the file before you copy. IOW: the same atomicity rules apply to offloaded copy as apply to standard copy: there is no requirement anywhere to apply stronger semantics. Surprisingly enough, that works for most people... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-03-30, at 14:45, Pavel Machek pa...@ucw.cz wrote: On Sat 2013-03-30 13:08:39, Andreas Dilger wrote: On 2013-03-30, at 12:49 PM, Pavel Machek wrote: Hmm, really? AFAICT it would be simple to provide an open_deleted_file(directory) syscall. You'd open_deleted_file(), copy source file into it, then fsync(), then link it into filesystem. That should have atomicity properties reflected. Actually, the open_deleted_file() syscall is quite useful for many different things all by itself. Lots of applications need to create temporary files that are unlinked at application failure (without a race if app crashes after creating the file, but before unlinking). It also avoids exposing temporary files into the namespace if other applications are accessing the directory. Hmm. open_deleted_file() will still need to get a directory... so it will still need a path. Perhaps open(/foo/bar/mnt, O_DELETED) would be acceptable interface? Yes, that would be reasonable, and/or possibly openat(fd, NULL, AT_FDCWD|AT_UNLINKED)? Cheers, Andreas-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, Feb 25, 2013 at 04:03:01PM -0800, Zach Brown wrote: > > > I think it would be neat if it couldn't > > > corrupt data. > > > > It would also be neat if the moon were made of cheese... > > And there we have the lsf2013 t-shirt slogan. I think we're done here! > > - z Hey Everyone, So, of course, this thread happened while I was celebrating my 10-year anniversary on a warm, sunny island. I won't trade. But let me drop my $0.02 in here. First, we have our T-shirt slogan. That overrides every other concern. Second, I agree that moving forward on anything is better than not. I haven't delivered the updated fastcopy(2) patch I promised two years ago, and I have to admit that I can't promise code on any sane timeframe. Back when I was working on this, I thought that link(2) was a good model for a full-file copy. Thus I came up with reflink(2). This eventually became the fastcopyu(2) proposal discussed two years ago. I did not think, and I still don't think, that we should conflate the API for "copy/clone this file in some way" (ala fastcopy(2)) with "duplicate/link this range of bytes" (ala BTRFS_IOC_CLONE_RANGE). I thought that splice(2) or something like it was a better fit for ranges; this thread has already had the same thought. fastcopy(2) had a provision for CoW for atomicity, including metadata. This is because ocfs2 reflinks *can* provide atomic clones with metadata included. I would like any new proposal to allow for that. If it does not, of course, callers can continue to use OCFS2_IOC_REFLINK, but I'd rather make it part of the generic behavior, so that generic tools come with it. Joel -- "You don't make the poor richer by making the rich poorer." - Sir Winston Churchill http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, Feb 25, 2013 at 04:03:01PM -0800, Zach Brown wrote: I think it would be neat if it couldn't corrupt data. It would also be neat if the moon were made of cheese... And there we have the lsf2013 t-shirt slogan. I think we're done here! - z Hey Everyone, So, of course, this thread happened while I was celebrating my 10-year anniversary on a warm, sunny island. I won't trade. But let me drop my $0.02 in here. First, we have our T-shirt slogan. That overrides every other concern. Second, I agree that moving forward on anything is better than not. I haven't delivered the updated fastcopy(2) patch I promised two years ago, and I have to admit that I can't promise code on any sane timeframe. Back when I was working on this, I thought that link(2) was a good model for a full-file copy. Thus I came up with reflink(2). This eventually became the fastcopyu(2) proposal discussed two years ago. I did not think, and I still don't think, that we should conflate the API for copy/clone this file in some way (ala fastcopy(2)) with duplicate/link this range of bytes (ala BTRFS_IOC_CLONE_RANGE). I thought that splice(2) or something like it was a better fit for ranges; this thread has already had the same thought. fastcopy(2) had a provision for CoW for atomicity, including metadata. This is because ocfs2 reflinks *can* provide atomic clones with metadata included. I would like any new proposal to allow for that. If it does not, of course, callers can continue to use OCFS2_IOC_REFLINK, but I'd rather make it part of the generic behavior, so that generic tools come with it. Joel -- You don't make the poor richer by making the rich poorer. - Sir Winston Churchill http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Tue, Feb 26, 2013 at 1:02 PM, Jörn Engel wrote: > On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote: >> >> I thought the first thing people would ask for is to atomically create a >> new file and copy the old file into it (at least on local file systems). >> The idea is that nothing should see an empty destination file, either >> by race or by crash. (This feature would perhaps be described as a >> pony, but it should be implementable.) > > Having already wasted many week trying to implement your pony, I would > consider it about as possible as winning the lottery three times in a > row. It clearly is in theory and yet,... > > If you take a filesystem like ext[34] you are out of luck. In those > filesystems it may not even be theoretically possible to get the > cleanup right for pathological cases. And if you ignore pathological > cases and depend on userspace to do the cleanup for you, you have to > do ABI extentions that I don't want to mention with Al on Cc:. My > personal notebook ran such a kernel for several years until hardware > improved to a point that I no longer wanted to forward-port the > patches. It worked but it was far from pretty. > > If you have a filesystem where you can simply bumb a reference count > to copy the file content, implementation is fairly straightforward. > But having a system call that is effectively limited to btrfs means > pretty much noone will use it - beside the people looking for > potential kernel exploits. :) > > So my vote clearly goes to some variant of sendfile or splice. Don't get me wrong -- the vpsendfile (or whatever it's called) idea sounds extremely useful too. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote: > > I thought the first thing people would ask for is to atomically create a > new file and copy the old file into it (at least on local file systems). > The idea is that nothing should see an empty destination file, either > by race or by crash. (This feature would perhaps be described as a > pony, but it should be implementable.) Having already wasted many week trying to implement your pony, I would consider it about as possible as winning the lottery three times in a row. It clearly is in theory and yet,... If you take a filesystem like ext[34] you are out of luck. In those filesystems it may not even be theoretically possible to get the cleanup right for pathological cases. And if you ignore pathological cases and depend on userspace to do the cleanup for you, you have to do ABI extentions that I don't want to mention with Al on Cc:. My personal notebook ran such a kernel for several years until hardware improved to a point that I no longer wanted to forward-port the patches. It worked but it was far from pretty. If you have a filesystem where you can simply bumb a reference count to copy the file content, implementation is fairly straightforward. But having a system call that is effectively limited to btrfs means pretty much noone will use it - beside the people looking for potential kernel exploits. So my vote clearly goes to some variant of sendfile or splice. Jörn -- Man darf nicht das, was uns unwahrscheinlich und unnatürlich erscheint, mit dem verwechseln, was absolut unmöglich ist. -- Carl Friedrich Gauß -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote: I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) Having already wasted many week trying to implement your pony, I would consider it about as possible as winning the lottery three times in a row. It clearly is in theory and yet,... If you take a filesystem like ext[34] you are out of luck. In those filesystems it may not even be theoretically possible to get the cleanup right for pathological cases. And if you ignore pathological cases and depend on userspace to do the cleanup for you, you have to do ABI extentions that I don't want to mention with Al on Cc:. My personal notebook ran such a kernel for several years until hardware improved to a point that I no longer wanted to forward-port the patches. It worked but it was far from pretty. If you have a filesystem where you can simply bumb a reference count to copy the file content, implementation is fairly straightforward. But having a system call that is effectively limited to btrfs means pretty much noone will use it - beside the people looking for potential kernel exploits. So my vote clearly goes to some variant of sendfile or splice. Jörn -- Man darf nicht das, was uns unwahrscheinlich und unnatürlich erscheint, mit dem verwechseln, was absolut unmöglich ist. -- Carl Friedrich Gauß -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Tue, Feb 26, 2013 at 1:02 PM, Jörn Engel jo...@logfs.org wrote: On Mon, 25 February 2013 13:14:52 -0800, Andy Lutomirski wrote: I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) Having already wasted many week trying to implement your pony, I would consider it about as possible as winning the lottery three times in a row. It clearly is in theory and yet,... If you take a filesystem like ext[34] you are out of luck. In those filesystems it may not even be theoretically possible to get the cleanup right for pathological cases. And if you ignore pathological cases and depend on userspace to do the cleanup for you, you have to do ABI extentions that I don't want to mention with Al on Cc:. My personal notebook ran such a kernel for several years until hardware improved to a point that I no longer wanted to forward-port the patches. It worked but it was far from pretty. If you have a filesystem where you can simply bumb a reference count to copy the file content, implementation is fairly straightforward. But having a system call that is effectively limited to btrfs means pretty much noone will use it - beside the people looking for potential kernel exploits. :) So my vote clearly goes to some variant of sendfile or splice. Don't get me wrong -- the vpsendfile (or whatever it's called) idea sounds extremely useful too. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
> > I think it would be neat if it couldn't > > corrupt data. > > It would also be neat if the moon were made of cheese... And there we have the lsf2013 t-shirt slogan. I think we're done here! - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 2013-02-25 at 15:35 -0800, Andy Lutomirski wrote: > On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond > wrote: > > On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote: > >> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond > >> wrote: > >> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: > >> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote: > >> >> > On 02/21/2013 02:24 PM, Zach Brown wrote: > >> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > >> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > >> >> Il 21/02/2013 15:57, Ric Wheeler ha scritto: > >> >> >> sendfile64() pretty much already has the right arguments for a > >> >> >> "copyfile", however it would be nice to add a 'flags' parameter: > >> >> >> the > >> >> >> NFSv4.2 version would use that to specify whether or not to copy > >> >> >> file > >> >> >> metadata. > >> >> > That would seem to be enough to me and has the advantage that it > >> >> > is an > >> >> > relatively obvious extension to something that is at least not > >> >> > totally > >> >> > unknown to developers. > >> >> > > >> >> > Do we need more than that for non-NFS paths I wonder? What does > >> >> > reflink > >> >> > need or the SCSI mechanism? > >> >> For virt we would like to be able to specify arbitrary block > >> >> ranges. > >> >> Copying an entire file helps some copy operations like storage > >> >> migration. However, it is not enough to convert the guest's > >> >> offloaded > >> >> copies to host-side offloaded copies. > >> >> >>> So how would a system call based on sendfile64() plus my flag > >> >> >>> parameter > >> >> >>> prevent an underlying implementation from meeting your criterion? > >> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying > >> >> >> because > >> >> >> it's missing an out_fd_offset. The host will want to offload the > >> >> >> guest's copies by calling sendfile on block ranges of a guest disk > >> >> >> image > >> >> >> file that correspond to the mappings of the in and out files in the > >> >> >> guest. > >> >> >> > >> >> >> You could make it work with some locking and out_fd seeking to set > >> >> >> the > >> >> >> write offset before calling sendfile64()+flags, but ugh. > >> >> >> > >> >> >> ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > >> >> >>out_offset, size_t count, int flags); > >> >> >> > >> >> >> That seems closer. > >> >> >> > >> >> >> We might also want to pre-emptively offer iovs instead of offsets, > >> >> >> because that's the very first thing that's going to be requested > >> >> >> after > >> >> >> people prototype having to iterate calling sendfile() for each > >> >> >> contiguous copy region. > >> >> > I thought the first thing people would ask for is to atomically > >> >> > create a > >> >> > new file and copy the old file into it (at least on local file > >> >> > systems). > >> >> > The idea is that nothing should see an empty destination file, > >> >> > either > >> >> > by race or by crash. (This feature would perhaps be described as a > >> >> > pony, but it should be implementable.) > >> >> > > >> >> > This would be like a better link(2). > >> >> > > >> >> > --Andy > >> >> > >> >> Why would this need to be atomic? That would seem to be a very difficult > >> >> property to provide across all target types with multi-GB sized files... > >> > > >> > Right. It may sound cool, but what's the real-life use case? > >> > > >> > >> Download file from some source and then verify it. Now copyfile it > >> into my repository of known-good files. > >> > >> Admittedly I could link + unlink or rename it there, but I consider > >> hard links to be rather evil, especially when cow links are available. > > > > Rename is the right way to do that as it can't corrupt the data after > > you have verified it. copyfile can... > > ...copyfile doesn't exist. Wrong! The underlying NFS and SCSI copy offload protocols are fully defined at this time, and will constrain any implementation that you may dream up. > I think it would be neat if it couldn't > corrupt data. It would also be neat if the moon were made of cheese... The underlying NFS and SCSI protocols do not guarantee perfect copies; the copy may, for instance, be interrupted due to external circumstances. > In any case, this may be a bad idea -- presumably you'd have to fsync > the file you're copying *from* first to avoid a massive performance > hit. You have to do that anyway. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond wrote: > On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote: >> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond >> wrote: >> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: >> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote: >> >> > On 02/21/2013 02:24 PM, Zach Brown wrote: >> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: >> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: >> >> Il 21/02/2013 15:57, Ric Wheeler ha scritto: >> >> >> sendfile64() pretty much already has the right arguments for a >> >> >> "copyfile", however it would be nice to add a 'flags' parameter: >> >> >> the >> >> >> NFSv4.2 version would use that to specify whether or not to copy >> >> >> file >> >> >> metadata. >> >> > That would seem to be enough to me and has the advantage that it is >> >> > an >> >> > relatively obvious extension to something that is at least not >> >> > totally >> >> > unknown to developers. >> >> > >> >> > Do we need more than that for non-NFS paths I wonder? What does >> >> > reflink >> >> > need or the SCSI mechanism? >> >> For virt we would like to be able to specify arbitrary block ranges. >> >> Copying an entire file helps some copy operations like storage >> >> migration. However, it is not enough to convert the guest's >> >> offloaded >> >> copies to host-side offloaded copies. >> >> >>> So how would a system call based on sendfile64() plus my flag >> >> >>> parameter >> >> >>> prevent an underlying implementation from meeting your criterion? >> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because >> >> >> it's missing an out_fd_offset. The host will want to offload the >> >> >> guest's copies by calling sendfile on block ranges of a guest disk >> >> >> image >> >> >> file that correspond to the mappings of the in and out files in the >> >> >> guest. >> >> >> >> >> >> You could make it work with some locking and out_fd seeking to set the >> >> >> write offset before calling sendfile64()+flags, but ugh. >> >> >> >> >> >> ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t >> >> >>out_offset, size_t count, int flags); >> >> >> >> >> >> That seems closer. >> >> >> >> >> >> We might also want to pre-emptively offer iovs instead of offsets, >> >> >> because that's the very first thing that's going to be requested after >> >> >> people prototype having to iterate calling sendfile() for each >> >> >> contiguous copy region. >> >> > I thought the first thing people would ask for is to atomically create a >> >> > new file and copy the old file into it (at least on local file systems). >> >> > The idea is that nothing should see an empty destination file, either >> >> > by race or by crash. (This feature would perhaps be described as a >> >> > pony, but it should be implementable.) >> >> > >> >> > This would be like a better link(2). >> >> > >> >> > --Andy >> >> >> >> Why would this need to be atomic? That would seem to be a very difficult >> >> property to provide across all target types with multi-GB sized files... >> > >> > Right. It may sound cool, but what's the real-life use case? >> > >> >> Download file from some source and then verify it. Now copyfile it >> into my repository of known-good files. >> >> Admittedly I could link + unlink or rename it there, but I consider >> hard links to be rather evil, especially when cow links are available. > > Rename is the right way to do that as it can't corrupt the data after > you have verified it. copyfile can... ...copyfile doesn't exist. I think it would be neat if it couldn't corrupt data. In any case, this may be a bad idea -- presumably you'd have to fsync the file you're copying *from* first to avoid a massive performance hit. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote: > On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond > wrote: > > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: > >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote: > >> > On 02/21/2013 02:24 PM, Zach Brown wrote: > >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > >> Il 21/02/2013 15:57, Ric Wheeler ha scritto: > >> >> sendfile64() pretty much already has the right arguments for a > >> >> "copyfile", however it would be nice to add a 'flags' parameter: the > >> >> NFSv4.2 version would use that to specify whether or not to copy > >> >> file > >> >> metadata. > >> > That would seem to be enough to me and has the advantage that it is > >> > an > >> > relatively obvious extension to something that is at least not > >> > totally > >> > unknown to developers. > >> > > >> > Do we need more than that for non-NFS paths I wonder? What does > >> > reflink > >> > need or the SCSI mechanism? > >> For virt we would like to be able to specify arbitrary block ranges. > >> Copying an entire file helps some copy operations like storage > >> migration. However, it is not enough to convert the guest's offloaded > >> copies to host-side offloaded copies. > >> >>> So how would a system call based on sendfile64() plus my flag parameter > >> >>> prevent an underlying implementation from meeting your criterion? > >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because > >> >> it's missing an out_fd_offset. The host will want to offload the > >> >> guest's copies by calling sendfile on block ranges of a guest disk image > >> >> file that correspond to the mappings of the in and out files in the > >> >> guest. > >> >> > >> >> You could make it work with some locking and out_fd seeking to set the > >> >> write offset before calling sendfile64()+flags, but ugh. > >> >> > >> >> ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > >> >>out_offset, size_t count, int flags); > >> >> > >> >> That seems closer. > >> >> > >> >> We might also want to pre-emptively offer iovs instead of offsets, > >> >> because that's the very first thing that's going to be requested after > >> >> people prototype having to iterate calling sendfile() for each > >> >> contiguous copy region. > >> > I thought the first thing people would ask for is to atomically create a > >> > new file and copy the old file into it (at least on local file systems). > >> > The idea is that nothing should see an empty destination file, either > >> > by race or by crash. (This feature would perhaps be described as a > >> > pony, but it should be implementable.) > >> > > >> > This would be like a better link(2). > >> > > >> > --Andy > >> > >> Why would this need to be atomic? That would seem to be a very difficult > >> property to provide across all target types with multi-GB sized files... > > > > Right. It may sound cool, but what's the real-life use case? > > > > Download file from some source and then verify it. Now copyfile it > into my repository of known-good files. > > Admittedly I could link + unlink or rename it there, but I consider > hard links to be rather evil, especially when cow links are available. Rename is the right way to do that as it can't corrupt the data after you have verified it. copyfile can... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond wrote: > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote: >> > On 02/21/2013 02:24 PM, Zach Brown wrote: >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: >> Il 21/02/2013 15:57, Ric Wheeler ha scritto: >> >> sendfile64() pretty much already has the right arguments for a >> >> "copyfile", however it would be nice to add a 'flags' parameter: the >> >> NFSv4.2 version would use that to specify whether or not to copy file >> >> metadata. >> > That would seem to be enough to me and has the advantage that it is an >> > relatively obvious extension to something that is at least not totally >> > unknown to developers. >> > >> > Do we need more than that for non-NFS paths I wonder? What does reflink >> > need or the SCSI mechanism? >> For virt we would like to be able to specify arbitrary block ranges. >> Copying an entire file helps some copy operations like storage >> migration. However, it is not enough to convert the guest's offloaded >> copies to host-side offloaded copies. >> >>> So how would a system call based on sendfile64() plus my flag parameter >> >>> prevent an underlying implementation from meeting your criterion? >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because >> >> it's missing an out_fd_offset. The host will want to offload the >> >> guest's copies by calling sendfile on block ranges of a guest disk image >> >> file that correspond to the mappings of the in and out files in the >> >> guest. >> >> >> >> You could make it work with some locking and out_fd seeking to set the >> >> write offset before calling sendfile64()+flags, but ugh. >> >> >> >> ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t >> >>out_offset, size_t count, int flags); >> >> >> >> That seems closer. >> >> >> >> We might also want to pre-emptively offer iovs instead of offsets, >> >> because that's the very first thing that's going to be requested after >> >> people prototype having to iterate calling sendfile() for each >> >> contiguous copy region. >> > I thought the first thing people would ask for is to atomically create a >> > new file and copy the old file into it (at least on local file systems). >> > The idea is that nothing should see an empty destination file, either >> > by race or by crash. (This feature would perhaps be described as a >> > pony, but it should be implementable.) >> > >> > This would be like a better link(2). >> > >> > --Andy >> >> Why would this need to be atomic? That would seem to be a very difficult >> property to provide across all target types with multi-GB sized files... > > Right. It may sound cool, but what's the real-life use case? > Download file from some source and then verify it. Now copyfile it into my repository of known-good files. Admittedly I could link + unlink or rename it there, but I consider hard links to be rather evil, especially when cow links are available. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: > On 02/25/2013 04:14 PM, Andy Lutomirski wrote: > > On 02/21/2013 02:24 PM, Zach Brown wrote: > >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > >> sendfile64() pretty much already has the right arguments for a > >> "copyfile", however it would be nice to add a 'flags' parameter: the > >> NFSv4.2 version would use that to specify whether or not to copy file > >> metadata. > > That would seem to be enough to me and has the advantage that it is an > > relatively obvious extension to something that is at least not totally > > unknown to developers. > > > > Do we need more than that for non-NFS paths I wonder? What does reflink > > need or the SCSI mechanism? > For virt we would like to be able to specify arbitrary block ranges. > Copying an entire file helps some copy operations like storage > migration. However, it is not enough to convert the guest's offloaded > copies to host-side offloaded copies. > >>> So how would a system call based on sendfile64() plus my flag parameter > >>> prevent an underlying implementation from meeting your criterion? > >> If I'm guessing correctly, sendfile64()+flags would be annoying because > >> it's missing an out_fd_offset. The host will want to offload the > >> guest's copies by calling sendfile on block ranges of a guest disk image > >> file that correspond to the mappings of the in and out files in the > >> guest. > >> > >> You could make it work with some locking and out_fd seeking to set the > >> write offset before calling sendfile64()+flags, but ugh. > >> > >> ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > >>out_offset, size_t count, int flags); > >> > >> That seems closer. > >> > >> We might also want to pre-emptively offer iovs instead of offsets, > >> because that's the very first thing that's going to be requested after > >> people prototype having to iterate calling sendfile() for each > >> contiguous copy region. > > I thought the first thing people would ask for is to atomically create a > > new file and copy the old file into it (at least on local file systems). > > The idea is that nothing should see an empty destination file, either > > by race or by crash. (This feature would perhaps be described as a > > pony, but it should be implementable.) > > > > This would be like a better link(2). > > > > --Andy > > Why would this need to be atomic? That would seem to be a very difficult > property to provide across all target types with multi-GB sized files... Right. It may sound cool, but what's the real-life use case? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a "copyfile", however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 02:24 PM, Zach Brown wrote: > On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: >> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: >>> Il 21/02/2013 15:57, Ric Wheeler ha scritto: >> > sendfile64() pretty much already has the right arguments for a > "copyfile", however it would be nice to add a 'flags' parameter: the > NFSv4.2 version would use that to specify whether or not to copy file > metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? >>> >>> For virt we would like to be able to specify arbitrary block ranges. >>> Copying an entire file helps some copy operations like storage >>> migration. However, it is not enough to convert the guest's offloaded >>> copies to host-side offloaded copies. >> >> So how would a system call based on sendfile64() plus my flag parameter >> prevent an underlying implementation from meeting your criterion? > > If I'm guessing correctly, sendfile64()+flags would be annoying because > it's missing an out_fd_offset. The host will want to offload the > guest's copies by calling sendfile on block ranges of a guest disk image > file that correspond to the mappings of the in and out files in the > guest. > > You could make it work with some locking and out_fd seeking to set the > write offset before calling sendfile64()+flags, but ugh. > > ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > out_offset, size_t count, int flags); > > That seems closer. > > We might also want to pre-emptively offer iovs instead of offsets, > because that's the very first thing that's going to be requested after > people prototype having to iterate calling sendfile() for each > contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Right. It may sound cool, but what's the real-life use case? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Right. It may sound cool, but what's the real-life use case? Download file from some source and then verify it. Now copyfile it into my repository of known-good files. Admittedly I could link + unlink or rename it there, but I consider hard links to be rather evil, especially when cow links are available. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote: On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Right. It may sound cool, but what's the real-life use case? Download file from some source and then verify it. Now copyfile it into my repository of known-good files. Admittedly I could link + unlink or rename it there, but I consider hard links to be rather evil, especially when cow links are available. Rename is the right way to do that as it can't corrupt the data after you have verified it. copyfile can... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote: On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Right. It may sound cool, but what's the real-life use case? Download file from some source and then verify it. Now copyfile it into my repository of known-good files. Admittedly I could link + unlink or rename it there, but I consider hard links to be rather evil, especially when cow links are available. Rename is the right way to do that as it can't corrupt the data after you have verified it. copyfile can... ...copyfile doesn't exist. I think it would be neat if it couldn't corrupt data. In any case, this may be a bad idea -- presumably you'd have to fsync the file you're copying *from* first to avoid a massive performance hit. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Mon, 2013-02-25 at 15:35 -0800, Andy Lutomirski wrote: On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote: On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote: On 02/25/2013 04:14 PM, Andy Lutomirski wrote: On 02/21/2013 02:24 PM, Zach Brown wrote: On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. I thought the first thing people would ask for is to atomically create a new file and copy the old file into it (at least on local file systems). The idea is that nothing should see an empty destination file, either by race or by crash. (This feature would perhaps be described as a pony, but it should be implementable.) This would be like a better link(2). --Andy Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files... Right. It may sound cool, but what's the real-life use case? Download file from some source and then verify it. Now copyfile it into my repository of known-good files. Admittedly I could link + unlink or rename it there, but I consider hard links to be rather evil, especially when cow links are available. Rename is the right way to do that as it can't corrupt the data after you have verified it. copyfile can... ...copyfile doesn't exist. Wrong! The underlying NFS and SCSI copy offload protocols are fully defined at this time, and will constrain any implementation that you may dream up. I think it would be neat if it couldn't corrupt data. It would also be neat if the moon were made of cheese... The underlying NFS and SCSI protocols do not guarantee perfect copies; the copy may, for instance, be interrupted due to external circumstances. In any case, this may be a bad idea -- presumably you'd have to fsync the file you're copying *from* first to avoid a massive performance hit. You have to do that anyway. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
I think it would be neat if it couldn't corrupt data. It would also be neat if the moon were made of cheese... And there we have the lsf2013 t-shirt slogan. I think we're done here! - z -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
"Myklebust, Trond" wrote: > > -Original Message- > > From: Zach Brown [mailto:z...@redhat.com] > > Sent: Thursday, February 21, 2013 5:25 PM > > To: Myklebust, Trond > > Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; > > linux-kernel@vger.kernel.org; > > Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen; > > Hannes Reinecke; Joel Becker > > Subject: Re: New copyfile system call - discuss before LSF? > > > > On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > > > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > > > > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > > > > >>> > > > > >> sendfile64() pretty much already has the right arguments for a > > > > >> "copyfile", however it would be nice to add a 'flags' parameter: > > > > >> the > > > > >> NFSv4.2 version would use that to specify whether or not to copy > > > > >> file metadata. > > > > > > > > > > That would seem to be enough to me and has the advantage that it > > > > > is an relatively obvious extension to something that is at least > > > > > not totally unknown to developers. > > > > > > > > > > Do we need more than that for non-NFS paths I wonder? What does > > > > > reflink need or the SCSI mechanism? > > > > > > > > For virt we would like to be able to specify arbitrary block ranges. > > > > Copying an entire file helps some copy operations like storage > > > > migration. However, it is not enough to convert the guest's > > > > offloaded copies to host-side offloaded copies. > > > > > > So how would a system call based on sendfile64() plus my flag > > > parameter prevent an underlying implementation from meeting your > > criterion? > > > > If I'm guessing correctly, sendfile64()+flags would be annoying because it's > > missing an out_fd_offset. The host will want to offload the guest's copies > > by > > calling sendfile on block ranges of a guest disk image file that correspond > > to > > the mappings of the in and out files in the guest. > > > > You could make it work with some locking and out_fd seeking to set the > > write offset before calling sendfile64()+flags, but ugh. > > > > ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > > out_offset, size_t count, int flags); > > > > That seems closer. > > psendfile() ? > > I fully agree that sounds reasonable... Just being an ass. :-) splice() already has offset for both fds and a flags arg: ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags); The current downside is it requires one fd to be a pipe, so it's just not very easy to use from my perspective[1]. > > We might also want to pre-emptively offer iovs instead of offsets, because > > that's the very first thing that's going to be requested after people > > prototype > > having to iterate calling sendfile() for each contiguous copy region. > > vpsendfile() then? I agree that might be a little more future-proof. > Particularly given that the underlying protocols tend to be fully > asynchronous, and so it makes sense to queue up more than one copy at a > time... splicev() might be nice to have in that case, too. [1] my splice() annoyances: * need to create/manage a pipe * copy size limited by pipe size * doesn't reduce userspace syscalls (just data copy overhead) * easy to misuse and starve with blocking sockets + big buffers * not many users, so bugs creep in (v3.7.8 was the first usable version of the 3.7 series for TCP sockets) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: New copyfile system call - discuss before LSF?
> -Original Message- > From: Zach Brown [mailto:z...@redhat.com] > Sent: Friday, February 22, 2013 1:22 PM > To: Ric Wheeler > Cc: Paolo Bonzini; Myklebust, Trond; Linux FS Devel; linux- > ker...@vger.kernel.org; Chris L. Mason; Christoph Hellwig; Alexander Viro; > Martin K. Petersen; Hannes Reinecke; Joel Becker > Subject: Re: New copyfile system call - discuss before LSF? > > > This seems to be suspiciously close to a clear consensus on how to > > move forward after many years of spinning our wheels. Anyone want to > > promote an actual patch before we change our collective minds? > > It seems like we'd want to start with the exisiting (presumably > bitrotten) prototypes that Trond has for nfs and that Martin has for > block->scsi. Mash the new syscall on top of and get them working in > current mainline. > > I'd be happy to take responsibility for making forward progress if no one else > has the bandwidth. > > Trond, Martin, would that make sense? Are the most recent versions of the > prototypes available somewhere? Hi Zach, The wildly bitrotten NFS copyfile prototype can be found on ftp://ftp.netapp.com/frm-ntap/opensource/linux_copyfileat/v2/linux_copyfileat_v2.tgz Please open with extreme caution and apply the resulting patches to a Linux 2.6.34.2 kernel... Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
> This seems to be suspiciously close to a clear consensus on how to > move forward after many years of spinning our wheels. Anyone want to > promote an actual patch before we change our collective minds? It seems like we'd want to start with the exisiting (presumably bitrotten) prototypes that Trond has for nfs and that Martin has for block->scsi. Mash the new syscall on top of and get them working in current mainline. I'd be happy to take responsibility for making forward progress if no one else has the bandwidth. Trond, Martin, would that make sense? Are the most recent versions of the prototypes available somewhere? - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/22/2013 10:47 AM, Paolo Bonzini wrote: Il 21/02/2013 23:24, Zach Brown ha scritto: You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. Indeed, I was about to propose that exactly. So that would be psendfilev. I don't think psendfile is useful, and can be easily provided at the libc level. Paolo This seems to be suspiciously close to a clear consensus on how to move forward after many years of spinning our wheels. Anyone want to promote an actual patch before we change our collective minds? Ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Il 21/02/2013 23:24, Zach Brown ha scritto: > You could make it work with some locking and out_fd seeking to set the > write offset before calling sendfile64()+flags, but ugh. > > ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > out_offset, size_t count, int flags); > > That seems closer. > > We might also want to pre-emptively offer iovs instead of offsets, > because that's the very first thing that's going to be requested after > people prototype having to iterate calling sendfile() for each > contiguous copy region. Indeed, I was about to propose that exactly. So that would be psendfilev. I don't think psendfile is useful, and can be easily provided at the libc level. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 11:13 PM, Myklebust, Trond wrote: On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote: On 02/21/2013 09:00 PM, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a "copyfile", however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. Paolo I don't think that the NFS protocol allows arbitrary ranges, but the SCSI commands are ranged based. If I remember what the windows people said at a SNIA event a few years back, they have a requirement that the target file be pre-allocated (at least for the SCSI based copy). Not clear to me where they iterate over that target file to do the block range copies, but I suspect it is in their kernel. The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary byte ranges. The main target for that functionality is indeed virtualisation and thin provisioning of virtual machines. For background, here is a pointer to Fred Knight's SNIA talk on the SCSI support for offload: https://snia.org/sites/default/files2/SDC2011/presentations/monday/FrederickKnight_Storage_Data_Movement_Offload.pdf and a talk from Spencer Shepler that gives some detail on the NFS spec, including the "server side copy" bits: https://snia.org/sites/default/files2/SDC2011/presentations/wednesday/SpencerShepler_IETF_NFSv4_Working_Group_v4.pdf The talks both have references to the actual specs for the gory details. Ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 11:13 PM, Myklebust, Trond wrote: On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote: On 02/21/2013 09:00 PM, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. Paolo I don't think that the NFS protocol allows arbitrary ranges, but the SCSI commands are ranged based. If I remember what the windows people said at a SNIA event a few years back, they have a requirement that the target file be pre-allocated (at least for the SCSI based copy). Not clear to me where they iterate over that target file to do the block range copies, but I suspect it is in their kernel. The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary byte ranges. The main target for that functionality is indeed virtualisation and thin provisioning of virtual machines. For background, here is a pointer to Fred Knight's SNIA talk on the SCSI support for offload: https://snia.org/sites/default/files2/SDC2011/presentations/monday/FrederickKnight_Storage_Data_Movement_Offload.pdf and a talk from Spencer Shepler that gives some detail on the NFS spec, including the server side copy bits: https://snia.org/sites/default/files2/SDC2011/presentations/wednesday/SpencerShepler_IETF_NFSv4_Working_Group_v4.pdf The talks both have references to the actual specs for the gory details. Ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Il 21/02/2013 23:24, Zach Brown ha scritto: You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. Indeed, I was about to propose that exactly. So that would be psendfilev. I don't think psendfile is useful, and can be easily provided at the libc level. Paolo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/22/2013 10:47 AM, Paolo Bonzini wrote: Il 21/02/2013 23:24, Zach Brown ha scritto: You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. Indeed, I was about to propose that exactly. So that would be psendfilev. I don't think psendfile is useful, and can be easily provided at the libc level. Paolo This seems to be suspiciously close to a clear consensus on how to move forward after many years of spinning our wheels. Anyone want to promote an actual patch before we change our collective minds? Ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
This seems to be suspiciously close to a clear consensus on how to move forward after many years of spinning our wheels. Anyone want to promote an actual patch before we change our collective minds? It seems like we'd want to start with the exisiting (presumably bitrotten) prototypes that Trond has for nfs and that Martin has for block-scsi. Mash the new syscall on top of and get them working in current mainline. I'd be happy to take responsibility for making forward progress if no one else has the bandwidth. Trond, Martin, would that make sense? Are the most recent versions of the prototypes available somewhere? - z -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: New copyfile system call - discuss before LSF?
-Original Message- From: Zach Brown [mailto:z...@redhat.com] Sent: Friday, February 22, 2013 1:22 PM To: Ric Wheeler Cc: Paolo Bonzini; Myklebust, Trond; Linux FS Devel; linux- ker...@vger.kernel.org; Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen; Hannes Reinecke; Joel Becker Subject: Re: New copyfile system call - discuss before LSF? This seems to be suspiciously close to a clear consensus on how to move forward after many years of spinning our wheels. Anyone want to promote an actual patch before we change our collective minds? It seems like we'd want to start with the exisiting (presumably bitrotten) prototypes that Trond has for nfs and that Martin has for block-scsi. Mash the new syscall on top of and get them working in current mainline. I'd be happy to take responsibility for making forward progress if no one else has the bandwidth. Trond, Martin, would that make sense? Are the most recent versions of the prototypes available somewhere? Hi Zach, The wildly bitrotten NFS copyfile prototype can be found on ftp://ftp.netapp.com/frm-ntap/opensource/linux_copyfileat/v2/linux_copyfileat_v2.tgz Please open with extreme caution and apply the resulting patches to a Linux 2.6.34.2 kernel... Cheers Trond -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Myklebust, Trond trond.mykleb...@netapp.com wrote: -Original Message- From: Zach Brown [mailto:z...@redhat.com] Sent: Thursday, February 21, 2013 5:25 PM To: Myklebust, Trond Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; linux-kernel@vger.kernel.org; Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen; Hannes Reinecke; Joel Becker Subject: Re: New copyfile system call - discuss before LSF? On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. psendfile() ? I fully agree that sounds reasonable... Just being an ass. :-) splice() already has offset for both fds and a flags arg: ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags); The current downside is it requires one fd to be a pipe, so it's just not very easy to use from my perspective[1]. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. vpsendfile() then? I agree that might be a little more future-proof. Particularly given that the underlying protocols tend to be fully asynchronous, and so it makes sense to queue up more than one copy at a time... splicev() might be nice to have in that case, too. [1] my splice() annoyances: * need to create/manage a pipe * copy size limited by pipe size * doesn't reduce userspace syscalls (just data copy overhead) * easy to misuse and starve with blocking sockets + big buffers * not many users, so bugs creep in (v3.7.8 was the first usable version of the 3.7 series for TCP sockets) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: New copyfile system call - discuss before LSF?
> -Original Message- > From: Zach Brown [mailto:z...@redhat.com] > Sent: Thursday, February 21, 2013 5:25 PM > To: Myklebust, Trond > Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; linux-kernel@vger.kernel.org; > Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen; > Hannes Reinecke; Joel Becker > Subject: Re: New copyfile system call - discuss before LSF? > > On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > > > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > > > >>> > > > >> sendfile64() pretty much already has the right arguments for a > > > >> "copyfile", however it would be nice to add a 'flags' parameter: > > > >> the > > > >> NFSv4.2 version would use that to specify whether or not to copy > > > >> file metadata. > > > > > > > > That would seem to be enough to me and has the advantage that it > > > > is an relatively obvious extension to something that is at least > > > > not totally unknown to developers. > > > > > > > > Do we need more than that for non-NFS paths I wonder? What does > > > > reflink need or the SCSI mechanism? > > > > > > For virt we would like to be able to specify arbitrary block ranges. > > > Copying an entire file helps some copy operations like storage > > > migration. However, it is not enough to convert the guest's > > > offloaded copies to host-side offloaded copies. > > > > So how would a system call based on sendfile64() plus my flag > > parameter prevent an underlying implementation from meeting your > criterion? > > If I'm guessing correctly, sendfile64()+flags would be annoying because it's > missing an out_fd_offset. The host will want to offload the guest's copies by > calling sendfile on block ranges of a guest disk image file that correspond to > the mappings of the in and out files in the guest. > > You could make it work with some locking and out_fd seeking to set the > write offset before calling sendfile64()+flags, but ugh. > > ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t > out_offset, size_t count, int flags); > > That seems closer. psendfile() ? I fully agree that sounds reasonable... Just being an ass. :-) > We might also want to pre-emptively offer iovs instead of offsets, because > that's the very first thing that's going to be requested after people > prototype > having to iterate calling sendfile() for each contiguous copy region. vpsendfile() then? I agree that might be a little more future-proof. Particularly given that the underlying protocols tend to be fully asynchronous, and so it makes sense to queue up more than one copy at a time... Cheers, Trond -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Jeremy Allison wrote: > On Thu, Feb 21, 2013 at 01:51:53PM +, Myklebust, Trond wrote: > > On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: > > > We have debated the need to have a system call to allow for offloading > > > copy > > > operations, for example to an NFS server (part to the new NFS 4.2 > > > specification), SCSI target device (two different SCSI commands do this), > > > local > > > file systems (reflink, etc) and I suspect many other possible parts of > > > the stack > > > could implement this. > > > > sendfile64() pretty much already has the right arguments for a > > "copyfile", however it would be nice to add a 'flags' parameter: the > > NFSv4.2 version would use that to specify whether or not to copy file > > metadata. > > What would be really nice is if sendfile allowed zero-copy > from network socket to a file descriptor. That would help > a *lot* of my small system OEMs (and no splice() just doesn't > cut it :-). I've often wish the pipe requirement of splice() could be dropped, to allow copying between arbitrary FDs. Perhaps this can be done? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > > >>> > > >> sendfile64() pretty much already has the right arguments for a > > >> "copyfile", however it would be nice to add a 'flags' parameter: the > > >> NFSv4.2 version would use that to specify whether or not to copy file > > >> metadata. > > > > > > That would seem to be enough to me and has the advantage that it is an > > > relatively obvious extension to something that is at least not totally > > > unknown to developers. > > > > > > Do we need more than that for non-NFS paths I wonder? What does reflink > > > need or the SCSI mechanism? > > > > For virt we would like to be able to specify arbitrary block ranges. > > Copying an entire file helps some copy operations like storage > > migration. However, it is not enough to convert the guest's offloaded > > copies to host-side offloaded copies. > > So how would a system call based on sendfile64() plus my flag parameter > prevent an underlying implementation from meeting your criterion? If I'm guessing correctly, sendfile64()+flags would be annoying because it's missing an out_fd_offset. The host will want to offload the guest's copies by calling sendfile on block ranges of a guest disk image file that correspond to the mappings of the in and out files in the guest. You could make it work with some locking and out_fd seeking to set the write offset before calling sendfile64()+flags, but ugh. ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t out_offset, size_t count, int flags); That seems closer. We might also want to pre-emptively offer iovs instead of offsets, because that's the very first thing that's going to be requested after people prototype having to iterate calling sendfile() for each contiguous copy region. - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote: > On 02/21/2013 09:00 PM, Paolo Bonzini wrote: > > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > >>> sendfile64() pretty much already has the right arguments for a > >>> "copyfile", however it would be nice to add a 'flags' parameter: the > >>> NFSv4.2 version would use that to specify whether or not to copy file > >>> metadata. > >> That would seem to be enough to me and has the advantage that it is an > >> relatively obvious extension to something that is at least not totally > >> unknown to developers. > >> > >> Do we need more than that for non-NFS paths I wonder? What does reflink > >> need or the SCSI mechanism? > > For virt we would like to be able to specify arbitrary block ranges. > > Copying an entire file helps some copy operations like storage > > migration. However, it is not enough to convert the guest's offloaded > > copies to host-side offloaded copies. > > > > Paolo > > I don't think that the NFS protocol allows arbitrary ranges, but the SCSI > commands are ranged based. > > If I remember what the windows people said at a SNIA event a few years back, > they have a requirement that the target file be pre-allocated (at least for > the > SCSI based copy). Not clear to me where they iterate over that target file to > do > the block range copies, but I suspect it is in their kernel. The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary byte ranges. The main target for that functionality is indeed virtualisation and thin provisioning of virtual machines. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 09:00 PM, Paolo Bonzini wrote: Il 21/02/2013 15:57, Ric Wheeler ha scritto: sendfile64() pretty much already has the right arguments for a "copyfile", however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. Paolo I don't think that the NFS protocol allows arbitrary ranges, but the SCSI commands are ranged based. If I remember what the windows people said at a SNIA event a few years back, they have a requirement that the target file be pre-allocated (at least for the SCSI based copy). Not clear to me where they iterate over that target file to do the block range copies, but I suspect it is in their kernel. Ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > >>> > >> sendfile64() pretty much already has the right arguments for a > >> "copyfile", however it would be nice to add a 'flags' parameter: the > >> NFSv4.2 version would use that to specify whether or not to copy file > >> metadata. > > > > That would seem to be enough to me and has the advantage that it is an > > relatively obvious extension to something that is at least not totally > > unknown to developers. > > > > Do we need more than that for non-NFS paths I wonder? What does reflink > > need or the SCSI mechanism? > > For virt we would like to be able to specify arbitrary block ranges. > Copying an entire file helps some copy operations like storage > migration. However, it is not enough to convert the guest's offloaded > copies to host-side offloaded copies. So how would a system call based on sendfile64() plus my flag parameter prevent an underlying implementation from meeting your criterion? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
Il 21/02/2013 15:57, Ric Wheeler ha scritto: >>> >> sendfile64() pretty much already has the right arguments for a >> "copyfile", however it would be nice to add a 'flags' parameter: the >> NFSv4.2 version would use that to specify whether or not to copy file >> metadata. > > That would seem to be enough to me and has the advantage that it is an > relatively obvious extension to something that is at least not totally > unknown to developers. > > Do we need more than that for non-NFS paths I wonder? What does reflink > need or the SCSI mechanism? For virt we would like to be able to specify arbitrary block ranges. Copying an entire file helps some copy operations like storage migration. However, it is not enough to convert the guest's offloaded copies to host-side offloaded copies. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Thu, Feb 21, 2013 at 01:51:53PM +, Myklebust, Trond wrote: > On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: > > We have debated the need to have a system call to allow for offloading copy > > operations, for example to an NFS server (part to the new NFS 4.2 > > specification), SCSI target device (two different SCSI commands do this), > > local > > file systems (reflink, etc) and I suspect many other possible parts of the > > stack > > could implement this. > > sendfile64() pretty much already has the right arguments for a > "copyfile", however it would be nice to add a 'flags' parameter: the > NFSv4.2 version would use that to specify whether or not to copy file > metadata. What would be really nice is if sendfile allowed zero-copy from network socket to a file descriptor. That would help a *lot* of my small system OEMs (and no splice() just doesn't cut it :-). Jeremy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 2013-02-21, at 7:57 AM, Ric Wheeler wrote: > On 02/21/2013 02:51 PM, Myklebust, Trond wrote: >> On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: >>> We have debated the need to have a system call to allow for offloading copy >>> operations, for example to an NFS server (part to the new NFS 4.2 >>> specification), SCSI target device (two different SCSI commands do this), >>> local >>> file systems (reflink, etc) and I suspect many other possible parts of the >>> stack >>> could implement this. >> sendfile64() pretty much already has the right arguments for a >> "copyfile", however it would be nice to add a 'flags' parameter: the >> NFSv4.2 version would use that to specify whether or not to copy file >> metadata. > > That would seem to be enough to me and has the advantage that it is an > relatively obvious extension to something that is at least not totally > unknown to developers. > > Do we need more than that for non-NFS paths I wonder? What does reflink need > or the SCSI mechanism? IMHO, the critical part about a copy syscall is avoiding the data copy to/from userspace. Copying file attributes opens up a huge morass of issues related to which attrs/xattrs/ACLs are copied, yet those don't cost nearly so much as the data copies. We definitely want the API to be flexible enough to do server-side copies (e.g. NFS and CIFS), but we also need to allow data copies for regular files between different local and/or network filesystems within the VFS. Cheers, Andreas >>> The earliest discussion of such a system call I saw happened back in 2001, I >>> know we had another more recent flurry (2-3 years back?) as well that got >>> tangled up and died away. >>> >>> Given the new popularity of this in storage devices and the use case for >>> virt >>> guests, any chance to get a proposal floated this year that might be able to >>> land upstream in our life times :) ? >> I'm planning on soon dusting off the NFS prototype that NetApp wrote 3 >> years ago and converting at least the client implementation into >> something that can go upstream. We do also have a server prototype for >> Linux, but the copy offload between 2 different servers is a hack and >> would need significant work. >> > > That would be really interesting, thanks! > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 02:51 PM, Myklebust, Trond wrote: On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: We have debated the need to have a system call to allow for offloading copy operations, for example to an NFS server (part to the new NFS 4.2 specification), SCSI target device (two different SCSI commands do this), local file systems (reflink, etc) and I suspect many other possible parts of the stack could implement this. sendfile64() pretty much already has the right arguments for a "copyfile", however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. That would seem to be enough to me and has the advantage that it is an relatively obvious extension to something that is at least not totally unknown to developers. Do we need more than that for non-NFS paths I wonder? What does reflink need or the SCSI mechanism? The earliest discussion of such a system call I saw happened back in 2001, I know we had another more recent flurry (2-3 years back?) as well that got tangled up and died away. Given the new popularity of this in storage devices and the use case for virt guests, any chance to get a proposal floated this year that might be able to land upstream in our life times :) ? I'm planning on soon dusting off the NFS prototype that NetApp wrote 3 years ago and converting at least the client implementation into something that can go upstream. We do also have a server prototype for Linux, but the copy offload between 2 different servers is a hack and would need significant work. That would be really interesting, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: > We have debated the need to have a system call to allow for offloading copy > operations, for example to an NFS server (part to the new NFS 4.2 > specification), SCSI target device (two different SCSI commands do this), > local > file systems (reflink, etc) and I suspect many other possible parts of the > stack > could implement this. sendfile64() pretty much already has the right arguments for a "copyfile", however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. > The earliest discussion of such a system call I saw happened back in 2001, I > know we had another more recent flurry (2-3 years back?) as well that got > tangled up and died away. > > Given the new popularity of this in storage devices and the use case for virt > guests, any chance to get a proposal floated this year that might be able to > land upstream in our life times :) ? I'm planning on soon dusting off the NFS prototype that NetApp wrote 3 years ago and converting at least the client implementation into something that can go upstream. We do also have a server prototype for Linux, but the copy offload between 2 different servers is a hack and would need significant work. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 12:37 PM, Ric Wheeler wrote: We have debated the need to have a system call to allow for offloading copy operations, for example to an NFS server (part to the new NFS 4.2 specification), SCSI target device (two different SCSI commands do this), local file systems (reflink, etc) and I suspect many other possible parts of the stack could implement this. The earliest discussion of such a system call I saw happened back in 2001, I know we had another more recent flurry (2-3 years back?) as well that got tangled up and died away. Yeah, I remember. I talked to Mkp about it, who (as usual :-) had a patchset stashed away for this. Or a preliminary attempt, anyway. However, this was waiting for the DISCARD merging patches to go in, which in turn were waiting for the WRITE SAME patches IIRC. Or something. Martin? Given the new popularity of this in storage devices and the use case for virt guests, any chance to get a proposal floated this year that might be able to land upstream in our life times :) ? Oh, most definitely. Now that I finally have an array capable of doing ROD token copy we should be reevaluating things. I see to have the sg_xcopy program updated to do ROD copy, then we will have some real-world data. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On 02/21/2013 12:37 PM, Ric Wheeler wrote: We have debated the need to have a system call to allow for offloading copy operations, for example to an NFS server (part to the new NFS 4.2 specification), SCSI target device (two different SCSI commands do this), local file systems (reflink, etc) and I suspect many other possible parts of the stack could implement this. The earliest discussion of such a system call I saw happened back in 2001, I know we had another more recent flurry (2-3 years back?) as well that got tangled up and died away. Yeah, I remember. I talked to Mkp about it, who (as usual :-) had a patchset stashed away for this. Or a preliminary attempt, anyway. However, this was waiting for the DISCARD merging patches to go in, which in turn were waiting for the WRITE SAME patches IIRC. Or something. Martin? Given the new popularity of this in storage devices and the use case for virt guests, any chance to get a proposal floated this year that might be able to land upstream in our life times :) ? Oh, most definitely. Now that I finally have an array capable of doing ROD token copy we should be reevaluating things. I see to have the sg_xcopy program updated to do ROD copy, then we will have some real-world data. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New copyfile system call - discuss before LSF?
On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote: We have debated the need to have a system call to allow for offloading copy operations, for example to an NFS server (part to the new NFS 4.2 specification), SCSI target device (two different SCSI commands do this), local file systems (reflink, etc) and I suspect many other possible parts of the stack could implement this. sendfile64() pretty much already has the right arguments for a copyfile, however it would be nice to add a 'flags' parameter: the NFSv4.2 version would use that to specify whether or not to copy file metadata. The earliest discussion of such a system call I saw happened back in 2001, I know we had another more recent flurry (2-3 years back?) as well that got tangled up and died away. Given the new popularity of this in storage devices and the use case for virt guests, any chance to get a proposal floated this year that might be able to land upstream in our life times :) ? I'm planning on soon dusting off the NFS prototype that NetApp wrote 3 years ago and converting at least the client implementation into something that can go upstream. We do also have a server prototype for Linux, but the copy offload between 2 different servers is a hack and would need significant work. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/