Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Miklos Szeredi
On Mon, Jun 4, 2018 at 7:35 PM, Al Viro  wrote:
> On Mon, Jun 04, 2018 at 10:16:30AM -0700, Matthew Wilcox wrote:
>> On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
>> > +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, 
>> > flags)
>> > +{
>> > +   struct file *file;
>> > +   struct path path;
>> > +   int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
>> > +   bool detached = flags & OPEN_TREE_CLONE;
>> > +   int error;
>> > +   int fd;
>> > +
>> > +   BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
>>
>> Why do we need OPEN_TREE_CLOEXEC?  Wouldn't we be better off just making
>> the fd returned by open_tree implicitly close-on-exec?  I can think of
>> no good reason for these file descriptors to be inherited across exec()
>
> How are they different from any file descriptor?  It's not as if it was
> something usable only for mounting stuff - again, you can use them
> with any ...at() syscalls.

Defaulting to close on exec helps keep out clutter from the API.

Is there a disadvantage to needing an explicit fcntl(F_SETFD) call to
disable close on exec?

Thanks,
Miklos


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Miklos Szeredi
On Mon, Jun 4, 2018 at 7:35 PM, Al Viro  wrote:
> On Mon, Jun 04, 2018 at 10:16:30AM -0700, Matthew Wilcox wrote:
>> On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
>> > +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, 
>> > flags)
>> > +{
>> > +   struct file *file;
>> > +   struct path path;
>> > +   int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
>> > +   bool detached = flags & OPEN_TREE_CLONE;
>> > +   int error;
>> > +   int fd;
>> > +
>> > +   BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
>>
>> Why do we need OPEN_TREE_CLOEXEC?  Wouldn't we be better off just making
>> the fd returned by open_tree implicitly close-on-exec?  I can think of
>> no good reason for these file descriptors to be inherited across exec()
>
> How are they different from any file descriptor?  It's not as if it was
> something usable only for mounting stuff - again, you can use them
> with any ...at() syscalls.

Defaulting to close on exec helps keep out clutter from the API.

Is there a disadvantage to needing an explicit fcntl(F_SETFD) call to
disable close on exec?

Thanks,
Miklos


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Miklos Szeredi
On Mon, Jun 4, 2018 at 5:52 PM, Al Viro  wrote:
> On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
>
>> fsopen = create fsfd
>> fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
>> fspick = path -> fsfd
>> move_mount = attach mountfd or move existing
>> fsinfo = info from path
>> open_tree = new mountfd from path or clone
>> mount_setattr = set attr on mountfd
>>
>> Notice that fsmount() encompasses mount_setattr() + move_mount()
>> functionality.   Split those out and leave fsmount() to actually do
>> the "fsfd ->mountfd" translation?
>
> Might make sense.

> FWIW, to make it clear: fsmount(2) in this series actually does *NOT*
> attach it to the tree.

Ah, that leaves the mount_setattr() functionality to split out.   I'd
be more happy to rid this new API of all the old MS_* crap and have
have a new set of attributes, that just apply to mounts.   It will
also need  two args: a bitmap of new attributes and a mask to tell us
which attributes to change.

>  Commit message definitely needs updating - as it
> is, it's
>
> +SYSCALL_DEFINE5(fsmount, int, fs_fd, unsigned int, flags, unsigned int, 
> ms_flags,
> +   void *, reserved4, void *, reserved5)
>
> PS: IMO these reserved... arguments are in bad taste; if anyone has good 
> reasons
> for that practice in ABI design, I'd like to hear those.

Agreed.  A flags argument is often wise to add even if currently
unused (and should be checked for undefined flags), but adding a
random number of pointers doesn't seem to make a lot of sense.

>
>> fsinfo() name suggests it's in the same class as
>> fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
>> think that's slightly confusing.
>>
>> Rename move_mount() -> mount_move()?
>
> mount_move_bikeshed_bikeshed_bikeshed(), surely?

Consistent naming for related functions... not unheard of in API
design.  The above set definitely does not qualify.

>> Also does it make sense to make the cloning behavior of open_tree()
>> optional?  Without cloning it's just a plain open(O_PATH).  That  way
>> it could be renamed mount_clone().
>
> Umm...  I'm not sure about that one.  If nothing else, OPEN_TREE_DETACH
> might be a good idea, in which case cloning is not the primary effect;
> hell knows.

So conceptually we have the following distinct mount tree operations:

treefd = clone(path);
treefd = detach(path);
attach(treefd, path);
move(path1, path2);

The detach/move/attach trio are more related in functionality, while
clone and detach have the same signature. I'm not sure either.

Thanks,
Miklos


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Miklos Szeredi
On Mon, Jun 4, 2018 at 5:52 PM, Al Viro  wrote:
> On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
>
>> fsopen = create fsfd
>> fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
>> fspick = path -> fsfd
>> move_mount = attach mountfd or move existing
>> fsinfo = info from path
>> open_tree = new mountfd from path or clone
>> mount_setattr = set attr on mountfd
>>
>> Notice that fsmount() encompasses mount_setattr() + move_mount()
>> functionality.   Split those out and leave fsmount() to actually do
>> the "fsfd ->mountfd" translation?
>
> Might make sense.

> FWIW, to make it clear: fsmount(2) in this series actually does *NOT*
> attach it to the tree.

Ah, that leaves the mount_setattr() functionality to split out.   I'd
be more happy to rid this new API of all the old MS_* crap and have
have a new set of attributes, that just apply to mounts.   It will
also need  two args: a bitmap of new attributes and a mask to tell us
which attributes to change.

>  Commit message definitely needs updating - as it
> is, it's
>
> +SYSCALL_DEFINE5(fsmount, int, fs_fd, unsigned int, flags, unsigned int, 
> ms_flags,
> +   void *, reserved4, void *, reserved5)
>
> PS: IMO these reserved... arguments are in bad taste; if anyone has good 
> reasons
> for that practice in ABI design, I'd like to hear those.

Agreed.  A flags argument is often wise to add even if currently
unused (and should be checked for undefined flags), but adding a
random number of pointers doesn't seem to make a lot of sense.

>
>> fsinfo() name suggests it's in the same class as
>> fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
>> think that's slightly confusing.
>>
>> Rename move_mount() -> mount_move()?
>
> mount_move_bikeshed_bikeshed_bikeshed(), surely?

Consistent naming for related functions... not unheard of in API
design.  The above set definitely does not qualify.

>> Also does it make sense to make the cloning behavior of open_tree()
>> optional?  Without cloning it's just a plain open(O_PATH).  That  way
>> it could be renamed mount_clone().
>
> Umm...  I'm not sure about that one.  If nothing else, OPEN_TREE_DETACH
> might be a good idea, in which case cloning is not the primary effect;
> hell knows.

So conceptually we have the following distinct mount tree operations:

treefd = clone(path);
treefd = detach(path);
attach(treefd, path);
move(path1, path2);

The detach/move/attach trio are more related in functionality, while
clone and detach have the same signature. I'm not sure either.

Thanks,
Miklos


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Al Viro
On Mon, Jun 04, 2018 at 10:16:30AM -0700, Matthew Wilcox wrote:
> On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
> > +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, 
> > flags)
> > +{
> > +   struct file *file;
> > +   struct path path;
> > +   int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
> > +   bool detached = flags & OPEN_TREE_CLONE;
> > +   int error;
> > +   int fd;
> > +
> > +   BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
> 
> Why do we need OPEN_TREE_CLOEXEC?  Wouldn't we be better off just making
> the fd returned by open_tree implicitly close-on-exec?  I can think of
> no good reason for these file descriptors to be inherited across exec()

How are they different from any file descriptor?  It's not as if it was
something usable only for mounting stuff - again, you can use them
with any ...at() syscalls.

> and if someone comes up with such a reason, fcntl(F_SETFD) is not an
> expensive call to make.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Al Viro
On Mon, Jun 04, 2018 at 10:16:30AM -0700, Matthew Wilcox wrote:
> On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
> > +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, 
> > flags)
> > +{
> > +   struct file *file;
> > +   struct path path;
> > +   int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
> > +   bool detached = flags & OPEN_TREE_CLONE;
> > +   int error;
> > +   int fd;
> > +
> > +   BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
> 
> Why do we need OPEN_TREE_CLOEXEC?  Wouldn't we be better off just making
> the fd returned by open_tree implicitly close-on-exec?  I can think of
> no good reason for these file descriptors to be inherited across exec()

How are they different from any file descriptor?  It's not as if it was
something usable only for mounting stuff - again, you can use them
with any ...at() syscalls.

> and if someone comes up with such a reason, fcntl(F_SETFD) is not an
> expensive call to make.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Matthew Wilcox
On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
> +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
> +{
> + struct file *file;
> + struct path path;
> + int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
> + bool detached = flags & OPEN_TREE_CLONE;
> + int error;
> + int fd;
> +
> + BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);

Why do we need OPEN_TREE_CLOEXEC?  Wouldn't we be better off just making
the fd returned by open_tree implicitly close-on-exec?  I can think of
no good reason for these file descriptors to be inherited across exec()
and if someone comes up with such a reason, fcntl(F_SETFD) is not an
expensive call to make.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Matthew Wilcox
On Sun, Jun 03, 2018 at 01:55:37AM +0100, Al Viro wrote:
> +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
> +{
> + struct file *file;
> + struct path path;
> + int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
> + bool detached = flags & OPEN_TREE_CLONE;
> + int error;
> + int fd;
> +
> + BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);

Why do we need OPEN_TREE_CLOEXEC?  Wouldn't we be better off just making
the fd returned by open_tree implicitly close-on-exec?  I can think of
no good reason for these file descriptors to be inherited across exec()
and if someone comes up with such a reason, fcntl(F_SETFD) is not an
expensive call to make.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Al Viro
On Mon, Jun 04, 2018 at 04:52:05PM +0100, Al Viro wrote:
> On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
> 
> > fsopen = create fsfd
> > fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
> > fspick = path -> fsfd
> > move_mount = attach mountfd or move existing
> > fsinfo = info from path
> > open_tree = new mountfd from path or clone
> > mount_setattr = set attr on mountfd
> > 
> > Notice that fsmount() encompasses mount_setattr() + move_mount()
> > functionality.   Split those out and leave fsmount() to actually do
> > the "fsfd ->mountfd" translation?
> 
> Might make sense.

FWIW, to make it clear: fsmount(2) in this series actually does *NOT*
attach it to the tree.  Commit message definitely needs updating - as it
is, it's

+SYSCALL_DEFINE5(fsmount, int, fs_fd, unsigned int, flags, unsigned int, 
ms_flags,
+   void *, reserved4, void *, reserved5)

PS: IMO these reserved... arguments are in bad taste; if anyone has good reasons
for that practice in ABI design, I'd like to hear those.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Al Viro
On Mon, Jun 04, 2018 at 04:52:05PM +0100, Al Viro wrote:
> On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:
> 
> > fsopen = create fsfd
> > fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
> > fspick = path -> fsfd
> > move_mount = attach mountfd or move existing
> > fsinfo = info from path
> > open_tree = new mountfd from path or clone
> > mount_setattr = set attr on mountfd
> > 
> > Notice that fsmount() encompasses mount_setattr() + move_mount()
> > functionality.   Split those out and leave fsmount() to actually do
> > the "fsfd ->mountfd" translation?
> 
> Might make sense.

FWIW, to make it clear: fsmount(2) in this series actually does *NOT*
attach it to the tree.  Commit message definitely needs updating - as it
is, it's

+SYSCALL_DEFINE5(fsmount, int, fs_fd, unsigned int, flags, unsigned int, 
ms_flags,
+   void *, reserved4, void *, reserved5)

PS: IMO these reserved... arguments are in bad taste; if anyone has good reasons
for that practice in ABI design, I'd like to hear those.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Al Viro
On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:

> fsopen = create fsfd
> fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
> fspick = path -> fsfd
> move_mount = attach mountfd or move existing
> fsinfo = info from path
> open_tree = new mountfd from path or clone
> mount_setattr = set attr on mountfd
> 
> Notice that fsmount() encompasses mount_setattr() + move_mount()
> functionality.   Split those out and leave fsmount() to actually do
> the "fsfd ->mountfd" translation?

Might make sense.

> fsinfo() name suggests it's in the same class as
> fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
> think that's slightly confusing.
> 
> Rename move_mount() -> mount_move()?

mount_move_bikeshed_bikeshed_bikeshed(), surely?

> Also does it make sense to make the cloning behavior of open_tree()
> optional?  Without cloning it's just a plain open(O_PATH).  That  way
> it could be renamed mount_clone().

Umm...  I'm not sure about that one.  If nothing else, OPEN_TREE_DETACH
might be a good idea, in which case cloning is not the primary effect;
hell knows.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Al Viro
On Mon, Jun 04, 2018 at 12:34:44PM +0200, Miklos Szeredi wrote:

> fsopen = create fsfd
> fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
> fspick = path -> fsfd
> move_mount = attach mountfd or move existing
> fsinfo = info from path
> open_tree = new mountfd from path or clone
> mount_setattr = set attr on mountfd
> 
> Notice that fsmount() encompasses mount_setattr() + move_mount()
> functionality.   Split those out and leave fsmount() to actually do
> the "fsfd ->mountfd" translation?

Might make sense.

> fsinfo() name suggests it's in the same class as
> fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
> think that's slightly confusing.
> 
> Rename move_mount() -> mount_move()?

mount_move_bikeshed_bikeshed_bikeshed(), surely?

> Also does it make sense to make the cloning behavior of open_tree()
> optional?  Without cloning it's just a plain open(O_PATH).  That  way
> it could be renamed mount_clone().

Umm...  I'm not sure about that one.  If nothing else, OPEN_TREE_DETACH
might be a good idea, in which case cloning is not the primary effect;
hell knows.


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread David Howells
Miklos Szeredi  wrote:

> fsinfo = info from path

Actually, I was thinking of making fsinfo() detect if it's been given an fsfd
and go through an fs_context operation instead in that case.

David


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread David Howells
Miklos Szeredi  wrote:

> fsinfo = info from path

Actually, I was thinking of making fsinfo() detect if it's been given an fsfd
and go through an fs_context operation instead in that case.

David


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Miklos Szeredi
On Sun, Jun 3, 2018 at 2:55 AM, Al Viro  wrote:
> On Sat, Jun 02, 2018 at 06:49:58PM +0100, Al Viro wrote:
>
>> > > Hell, might even add AT_UMOUNT for "open root and detach, to be 
>> > > dissolved on
>> > > close", incompatible with AT_CLONE.
>> >
>> > Cute.  Guess you could do:
>> >
>> > fd = open_mount(..., OPEN_TREE_DETACH);
>> > mount_setattr(fd, "",
>> >   MOUNT_SETATTR_EMPTY_PATH,
>> >   MOUNT_SETATTR_NOSUID, NULL);
>> > move_mount(fd, "", ...);
>
> Hadn't added that yet, but as for the rest of open_tree() - see
> vfs.git#mount-open_tree.  open() and its flags are not touched at all.
> Other changes compared to the previous:
> * may_mount() is required for OPEN_TREE_CLONE
> * sys_ni.c cruft is dropped - those make no sense until and unless
> those syscalls become conditional.
>
> Appears to work, combined with move_mount() it yields eqiuvalents of
> mount --{move,bind,rbind}.  Combined with mount_setattr(2) (when that
> gets added) we'll get mount -o remount,bind,nodev et.al.

fsopen = create fsfd
fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
fspick = path -> fsfd
move_mount = attach mountfd or move existing
fsinfo = info from path
open_tree = new mountfd from path or clone
mount_setattr = set attr on mountfd

Notice that fsmount() encompasses mount_setattr() + move_mount()
functionality.   Split those out and leave fsmount() to actually do
the "fsfd ->mountfd" translation?

fsinfo() name suggests it's in the same class as
fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
think that's slightly confusing.

Rename move_mount() -> mount_move()?

Also does it make sense to make the cloning behavior of open_tree()
optional?  Without cloning it's just a plain open(O_PATH).  That  way
it could be renamed mount_clone().

Thanks,
Miklos


Re: [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-04 Thread Miklos Szeredi
On Sun, Jun 3, 2018 at 2:55 AM, Al Viro  wrote:
> On Sat, Jun 02, 2018 at 06:49:58PM +0100, Al Viro wrote:
>
>> > > Hell, might even add AT_UMOUNT for "open root and detach, to be 
>> > > dissolved on
>> > > close", incompatible with AT_CLONE.
>> >
>> > Cute.  Guess you could do:
>> >
>> > fd = open_mount(..., OPEN_TREE_DETACH);
>> > mount_setattr(fd, "",
>> >   MOUNT_SETATTR_EMPTY_PATH,
>> >   MOUNT_SETATTR_NOSUID, NULL);
>> > move_mount(fd, "", ...);
>
> Hadn't added that yet, but as for the rest of open_tree() - see
> vfs.git#mount-open_tree.  open() and its flags are not touched at all.
> Other changes compared to the previous:
> * may_mount() is required for OPEN_TREE_CLONE
> * sys_ni.c cruft is dropped - those make no sense until and unless
> those syscalls become conditional.
>
> Appears to work, combined with move_mount() it yields eqiuvalents of
> mount --{move,bind,rbind}.  Combined with mount_setattr(2) (when that
> gets added) we'll get mount -o remount,bind,nodev et.al.

fsopen = create fsfd
fsmount = fsfd -> mountfd & set attr on mountfd & attach mountfd
fspick = path -> fsfd
move_mount = attach mountfd or move existing
fsinfo = info from path
open_tree = new mountfd from path or clone
mount_setattr = set attr on mountfd

Notice that fsmount() encompasses mount_setattr() + move_mount()
functionality.   Split those out and leave fsmount() to actually do
the "fsfd ->mountfd" translation?

fsinfo() name suggests it's in the same class as
fsopen/fsmount/fspick, operating on fsfd object, but's it's not and I
think that's slightly confusing.

Rename move_mount() -> mount_move()?

Also does it make sense to make the cloning behavior of open_tree()
optional?  Without cloning it's just a plain open(O_PATH).  That  way
it could be renamed mount_clone().

Thanks,
Miklos


[PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-02 Thread Al Viro
On Sat, Jun 02, 2018 at 06:49:58PM +0100, Al Viro wrote:

> > > Hell, might even add AT_UMOUNT for "open root and detach, to be dissolved 
> > > on
> > > close", incompatible with AT_CLONE.
> > 
> > Cute.  Guess you could do:
> > 
> > fd = open_mount(..., OPEN_TREE_DETACH);
> > mount_setattr(fd, "",
> >   MOUNT_SETATTR_EMPTY_PATH,
> >   MOUNT_SETATTR_NOSUID, NULL);
> > move_mount(fd, "", ...);

Hadn't added that yet, but as for the rest of open_tree() - see
vfs.git#mount-open_tree.  open() and its flags are not touched at all.
Other changes compared to the previous:
* may_mount() is required for OPEN_TREE_CLONE
* sys_ni.c cruft is dropped - those make no sense until and unless
those syscalls become conditional.

Appears to work, combined with move_mount() it yields eqiuvalents of
mount --{move,bind,rbind}.  Combined with mount_setattr(2) (when that
gets added) we'll get mount -o remount,bind,nodev et.al.
(including the currently absent whole-subtree versions) and
mount --make-{r,}{shared,slave,private,unbindable}

It also can be used to get an isolated subtree usable for at()
stuff.

The addition of syscall itself is done by the following and I'd really
like linux-abi folks to comment on that puppy

commit 6cfba4dd99b10278c2156c8d4fced2eddedf167f
Author: Al Viro 
Date:   Sat Jun 2 19:42:22 2018 -0400

new syscall: open_tree(2)

open_tree(dfd, pathname, flags)

Returns an O_PATH-opened file descriptor or an error.
dfd and pathname specify the location to open, in usual
fashion (see e.g. fstatat(2)).  flags should be an OR of
some of the following:
* AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
same meanings as usual
* OPEN_TREE_CLOEXEC - make the resulting descriptor
close-on-exec
* OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
instead of opening the location in question, create a detached
mount tree matching the subtree rooted at location specified by
dfd/pathname.  With AT_RECURSIVE the entire subtree is cloned,
without it - only the part within in the mount containing the
location in question.  In other words, the same as mount --rbind
or mount --bind would've taken.  The detached tree will be
dissolved on the final close of obtained file.  Creation of such
detached trees requires the same capabilities as doing mount --bind.

Signed-off-by: Al Viro 

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 14a2f996e543..b2b44ecd2b17 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -397,3 +397,4 @@
 383i386statx   sys_statx   
__ia32_sys_statx
 384i386arch_prctl  sys_arch_prctl  
__ia32_compat_sys_arch_prctl
 385i386io_pgetevents   sys_io_pgetevents   
__ia32_compat_sys_io_pgetevents
+391i386open_tree   sys_open_tree   
__ia32_sys_open_tree
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index cd36232ab62f..d6f4949378e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -342,6 +342,7 @@
 331common  pkey_free   __x64_sys_pkey_free
 332common  statx   __x64_sys_statx
 333common  io_pgetevents   __x64_sys_io_pgetevents
+339common  open_tree   __x64_sys_open_tree
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/file_table.c b/fs/file_table.c
index 7ec0b3e5f05d..7480271a0d21 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -189,6 +189,7 @@ static void __fput(struct file *file)
struct dentry *dentry = file->f_path.dentry;
struct vfsmount *mnt = file->f_path.mnt;
struct inode *inode = file->f_inode;
+   fmode_t mode = file->f_mode;
 
might_sleep();
 
@@ -209,14 +210,14 @@ static void __fput(struct file *file)
file->f_op->release(inode, file);
security_file_free(file);
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
-!(file->f_mode & FMODE_PATH))) {
+!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev);
}
fops_put(file->f_op);
put_pid(file->f_owner.pid);
-   if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
+   if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
i_readcount_dec(inode);
-   if (file->f_mode & FMODE_WRITER) {
+   if (mode & FMODE_WRITER) {
put_write_access(inode);
__mnt_drop_write(mnt);
}
@@ -224,6 +225,8 @@ static void __fput(struct file *file)
file->f_path.mnt = NULL;
file->f_inode = NULL;

[PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8])

2018-06-02 Thread Al Viro
On Sat, Jun 02, 2018 at 06:49:58PM +0100, Al Viro wrote:

> > > Hell, might even add AT_UMOUNT for "open root and detach, to be dissolved 
> > > on
> > > close", incompatible with AT_CLONE.
> > 
> > Cute.  Guess you could do:
> > 
> > fd = open_mount(..., OPEN_TREE_DETACH);
> > mount_setattr(fd, "",
> >   MOUNT_SETATTR_EMPTY_PATH,
> >   MOUNT_SETATTR_NOSUID, NULL);
> > move_mount(fd, "", ...);

Hadn't added that yet, but as for the rest of open_tree() - see
vfs.git#mount-open_tree.  open() and its flags are not touched at all.
Other changes compared to the previous:
* may_mount() is required for OPEN_TREE_CLONE
* sys_ni.c cruft is dropped - those make no sense until and unless
those syscalls become conditional.

Appears to work, combined with move_mount() it yields eqiuvalents of
mount --{move,bind,rbind}.  Combined with mount_setattr(2) (when that
gets added) we'll get mount -o remount,bind,nodev et.al.
(including the currently absent whole-subtree versions) and
mount --make-{r,}{shared,slave,private,unbindable}

It also can be used to get an isolated subtree usable for at()
stuff.

The addition of syscall itself is done by the following and I'd really
like linux-abi folks to comment on that puppy

commit 6cfba4dd99b10278c2156c8d4fced2eddedf167f
Author: Al Viro 
Date:   Sat Jun 2 19:42:22 2018 -0400

new syscall: open_tree(2)

open_tree(dfd, pathname, flags)

Returns an O_PATH-opened file descriptor or an error.
dfd and pathname specify the location to open, in usual
fashion (see e.g. fstatat(2)).  flags should be an OR of
some of the following:
* AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
same meanings as usual
* OPEN_TREE_CLOEXEC - make the resulting descriptor
close-on-exec
* OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
instead of opening the location in question, create a detached
mount tree matching the subtree rooted at location specified by
dfd/pathname.  With AT_RECURSIVE the entire subtree is cloned,
without it - only the part within in the mount containing the
location in question.  In other words, the same as mount --rbind
or mount --bind would've taken.  The detached tree will be
dissolved on the final close of obtained file.  Creation of such
detached trees requires the same capabilities as doing mount --bind.

Signed-off-by: Al Viro 

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 14a2f996e543..b2b44ecd2b17 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -397,3 +397,4 @@
 383i386statx   sys_statx   
__ia32_sys_statx
 384i386arch_prctl  sys_arch_prctl  
__ia32_compat_sys_arch_prctl
 385i386io_pgetevents   sys_io_pgetevents   
__ia32_compat_sys_io_pgetevents
+391i386open_tree   sys_open_tree   
__ia32_sys_open_tree
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index cd36232ab62f..d6f4949378e7 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -342,6 +342,7 @@
 331common  pkey_free   __x64_sys_pkey_free
 332common  statx   __x64_sys_statx
 333common  io_pgetevents   __x64_sys_io_pgetevents
+339common  open_tree   __x64_sys_open_tree
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/file_table.c b/fs/file_table.c
index 7ec0b3e5f05d..7480271a0d21 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -189,6 +189,7 @@ static void __fput(struct file *file)
struct dentry *dentry = file->f_path.dentry;
struct vfsmount *mnt = file->f_path.mnt;
struct inode *inode = file->f_inode;
+   fmode_t mode = file->f_mode;
 
might_sleep();
 
@@ -209,14 +210,14 @@ static void __fput(struct file *file)
file->f_op->release(inode, file);
security_file_free(file);
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
-!(file->f_mode & FMODE_PATH))) {
+!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev);
}
fops_put(file->f_op);
put_pid(file->f_owner.pid);
-   if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
+   if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
i_readcount_dec(inode);
-   if (file->f_mode & FMODE_WRITER) {
+   if (mode & FMODE_WRITER) {
put_write_access(inode);
__mnt_drop_write(mnt);
}
@@ -224,6 +225,8 @@ static void __fput(struct file *file)
file->f_path.mnt = NULL;
file->f_inode = NULL;