On Thu, 2009-09-17 at 18:43 -0700, Joel Becker wrote:
> On Thu, Sep 17, 2009 at 09:29:14AM -0700, Linus Torvalds wrote:
> > Why would anybody want to hide it at all? Why even the libc hiding?
> > 
> > Nobody is going to use this except for special apps. Let them see what 
> > they can do, in all its glory. 
> 
>       I expect everyone will use this through cp(1), so that cp(1) can
> try to get server-side copy on the network filesystms.
>       Speaking of "all its glory", what we have now is:
> 
> int sys_copyfileat(int oldfd, const char *oldname, int newfd,
>                    const char *newname, int flags, int atflags)


Would it be worthwhile to consider adding an offset and length?  

Then we get dd as well. (potentially) 


Best,
-PWM

> 
> > So I'd suggest something like having two system calls: one to start the 
> > operation, and one to control it. And for a filesystem that does atomic 
> > copies, the 'start' one obviously would also finish it, so the 'control' 
> > it would be a no-op, because there would never be any outstanding ones.
> > 
> > See what I'm saying? It wouldn't complicate _your_ life, but it would 
> > allow for filesystems that can't do it atomically (or even quickly).
> > 
> > So the first one would be something like
> > 
> >     int copyfile(const char *src, const char *dest, unsigned long flags);
> > 
> > which would return:
> > 
> >  - zero on success
> >  - negative (with errno) on error
> >  - positive cookie on "I started it, here's my cookie". For extra bonus 
> >    points, maybe the cookie would actually be a file descriptor (for 
> >    poll/select users), but it would _not_ be a file descriptor to the 
> >    resulting _file_, it would literally be a "cookie" to the actual 
> >    copyfile event.
> 
>       Actually, if the cookie is a magic file descriptor, you don't
> need ctl.  You can play tricks like polling for completoin,
> read(magic_fd, &remain, sizeof(loff_t)) for status, and close(magic_fd)
> for cancel.  Might be a bit overloaded, though.
> 
> > and then for ocfs2 you'd never return positive cookies. You'd never have 
> > to worry about it.
> 
>       I suspect we'll later take advantage of copyfile's other
> modes.  I did reflink as reflink only for the simple fact of doing one
> thing and well, not because I think copyfile isn't good.
> 
> > Then the second interface would be something like
> > 
> >     int copyfile_ctrl(long cookie, unsigned long cmd);
> > 
> > where you'd just have some way to wait for completion and ask how much has 
> > been copied. The 'cmd' would be some set of 'cancel', 'status' or 
> > 'uninterruptible wait' or whatever, and the return value would again be
> > 
> >  - negative (with errno) for errors (copy failed) - cookie released
> >  - zero for 'done' - cookie released
> >  - positive for 'percent remaining' or whatever - cookie still valid
> > 
> > and this would be another callback into the filesystem code, but you'd 
> > never have to worry about it, since you'd never see it (just leave it 
> > NULL).
> 
>       I was going to ask about how to fit both calls into one inode
> operation, but I see you're giving this as an additional inode
> operation.
>       This leaves us with a simliar-to-reflink inode copyfile op and a
> control op:
> 
>     ->copyfile(old_dentry, dir_inode, new_dentry, flags)
>     ->copyfile_ctl(int cookie, unsigned int cmd)
> 
>       I have to change the flags a little, as my original proposal
> didn't handle backoff correctly.
> 
> #define COPYFILE_WAIT         0x0001  /* Block until complete */
> #define COPYFILE_ATOMIC               0x0002  /* Things copied must be
>                                          point-in-time and it must
>                                          fail or succeed completely. */
> #define COPYFILE_ALLOW_COW    0x0004  /* The filesystem may share data
>                                          extents between the source
>                                          and target in a Copy-on-Write
>                                          fashion.  If neither
>                                          COPYFILE_ALLOW_COW nor
>                                          COPYFILE_REQUIRE_COW are
>                                          specified, data extents must
>                                          NOT be shared.  When neither
>                                          COW flag is provided, most
>                                          filesystems should return
>                                          -ENOTSUPP, as userspace can
>                                          do read-write looping
>                                          itself */
> #define COPYFILE_REQUIRE_COW  0x0008  /* Data extents MUST be shared
>                                          between the source and target
>                                          in a Copy-on-Write fashion */
> #define COPYFILE_UNPRIV_ATTRS 0x0010  /* Unprivileged attributes
>                                          should be copied from the
>                                          source to the target */
> #define COPYFILE_PRIV_ATTRS   0x0020  /* Privileged attributes should
>                                          be copied from the source to
>                                          the target if the caller has
>                                          the necessary privileges */
> #define COPYFILE_REQUIRE_ATTRS        0x0040  /* Combined with the other
>                                          attribute flags, the call
>                                          MUST fail if the caller lacks
>                                          the necessary privileges to
>                                          copy ever attribute
>                                          requested */
> 
> #define COPYFILE_SNAPSHOT_ASYNC       (COPYFILE_REQUIRE_COW |
>                                COPYFILE_UNPRIV_ATTRS |
>                                COPYFILE_PRIV_ATTRS |
>                                COPYFILE_ATOMIC)
> #define COPYFILE_SNAPSHOT_STRICT_ASYNC        (COPYFILE_SNAPSHOT_ASYNC |
>                                        COPYFILE_REQUIRE_ATTRS)
> #define COPYFILE_SNAPSHOT     (COPYFILE_SNAPSHOT_ASYNC |
>                                COPYFILE_WAIT)
> #define COPYFILE_SNAPSHOT_STRICT      (COPYFILE_SNAPSHOT_STRICT_ASYNC |
>                                        COPYFILE_WAIT)
> 
> > I dunno. The above seems like a fairly simple and powerful interface, and 
> > I _think_ it would be ok for NFS and CIFS. And in fact, if that whole 
> > "background copy" ends up being used a lot, maybe even a local filesystem 
> > would implement it just to get easy overlapping IO - even if it would just 
> > be a trivial common wrapper function that says "start a thread to do a 
> > trivial manual copy".
> 
>       NFS and CIFS folks, please speak up.
> 
> Joel
> 


_______________________________________________
Ocfs2-devel mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to