On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote: > On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote: > > Why would anybody want to hide it at all? Why even the libc hiding? > > > > Nobody is going to use this except for special apps. Let them see what > > they can do, in all its glory. > > I expect everyone will use this through cp(1), so that cp(1) can > try to get server-side copy on the network filesystms. > Speaking of "all its glory", what we have now is: > > int sys_copyfileat(int oldfd, const char *oldname, int newfd, > const char *newname, int flags, int atflags)
Would it be worthwhile to consider adding an offset and length? Then we get dd as well. (potentially) Best, -PWM > > > So I'd suggest something like having two system calls: one to start the > > operation, and one to control it. And for a filesystem that does atomic > > copies, the 'start' one obviously would also finish it, so the 'control' > > it would be a no-op, because there would never be any outstanding ones. > > > > See what I'm saying? It wouldn't complicate _your_ life, but it would > > allow for filesystems that can't do it atomically (or even quickly). > > > > So the first one would be something like > > > > int copyfile(const char *src, const char *dest, unsigned long flags); > > > > which would return: > > > > - zero on success > > - negative (with errno) on error > > - positive cookie on "I started it, here's my cookie". For extra bonus > > points, maybe the cookie would actually be a file descriptor (for > > poll/select users), but it would _not_ be a file descriptor to the > > resulting _file_, it would literally be a "cookie" to the actual > > copyfile event. > > Actually, if the cookie is a magic file descriptor, you don't > need ctl. You can play tricks like polling for completoin, > read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd) > for cancel. Might be a bit overloaded, though. > > > and then for ocfs2 you'd never return positive cookies. You'd never have > > to worry about it. > > I suspect we'll later take advantage of copyfile's other > modes. I did reflink as reflink only for the simple fact of doing one > thing and well, not because I think copyfile isn't good. > > > Then the second interface would be something like > > > > int copyfile_ctrl(long cookie, unsigned long cmd); > > > > where you'd just have some way to wait for completion and ask how much has > > been copied. The 'cmd' would be some set of 'cancel', 'status' or > > 'uninterruptible wait' or whatever, and the return value would again be > > > > - negative (with errno) for errors (copy failed) - cookie released > > - zero for 'done' - cookie released > > - positive for 'percent remaining' or whatever - cookie still valid > > > > and this would be another callback into the filesystem code, but you'd > > never have to worry about it, since you'd never see it (just leave it > > NULL). > > I was going to ask about how to fit both calls into one inode > operation, but I see you're giving this as an additional inode > operation. > This leaves us with a simliar-to-reflink inode copyfile op and a > control op: > > ->copyfile(old_dentry, dir_inode, new_dentry, flags) > ->copyfile_ctl(int cookie, unsigned int cmd) > > I have to change the flags a little, as my original proposal > didn't handle backoff correctly. > > #define COPYFILE_WAIT 0x0001 /* Block until complete */ > #define COPYFILE_ATOMIC 0x0002 /* Things copied must be > point-in-time and it must > fail or succeed completely. */ > #define COPYFILE_ALLOW_COW 0x0004 /* The filesystem may share data > extents between the source > and target in a Copy-on-Write > fashion. If neither > COPYFILE_ALLOW_COW nor > COPYFILE_REQUIRE_COW are > specified, data extents must > NOT be shared. When neither > COW flag is provided, most > filesystems should return > -ENOTSUPP, as userspace can > do read-write looping > itself */ > #define COPYFILE_REQUIRE_COW 0x0008 /* Data extents MUST be shared > between the source and target > in a Copy-on-Write fashion */ > #define COPYFILE_UNPRIV_ATTRS 0x0010 /* Unprivileged attributes > should be copied from the > source to the target */ > #define COPYFILE_PRIV_ATTRS 0x0020 /* Privileged attributes should > be copied from the source to > the target if the caller has > the necessary privileges */ > #define COPYFILE_REQUIRE_ATTRS 0x0040 /* Combined with the other > attribute flags, the call > MUST fail if the caller lacks > the necessary privileges to > copy ever attribute > requested */ > > #define COPYFILE_SNAPSHOT_ASYNC (COPYFILE_REQUIRE_COW | > COPYFILE_UNPRIV_ATTRS | > COPYFILE_PRIV_ATTRS | > COPYFILE_ATOMIC) > #define COPYFILE_SNAPSHOT_STRICT_ASYNC (COPYFILE_SNAPSHOT_ASYNC | > COPYFILE_REQUIRE_ATTRS) > #define COPYFILE_SNAPSHOT (COPYFILE_SNAPSHOT_ASYNC | > COPYFILE_WAIT) > #define COPYFILE_SNAPSHOT_STRICT (COPYFILE_SNAPSHOT_STRICT_ASYNC | > COPYFILE_WAIT) > > > I dunno. The above seems like a fairly simple and powerful interface, and > > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole > > "background copy" ends up being used a lot, maybe even a local filesystem > > would implement it just to get easy overlapping IO - even if it would just > > be a trivial common wrapper function that says "start a thread to do a > > trivial manual copy". > > NFS and CIFS folks, please speak up. > > Joel > _______________________________________________ Ocfs2-devel mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-devel
