Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts

2018-03-12 Thread Linus Torvalds
On Mon, Mar 12, 2018 at 1:10 PM, Christian Brauner
 wrote:
>>
>> So let's just change the behavior of devpts_mntget error if a mount with
>> the pty underneath it can not be found.
>
> I'm sort of torn but I think I'd prefer changing the behavior too. The
> /dev/ptmx, /dev/pts/ptmx schisma is really a special case and I would
> like it to be the only one. I really don't want to support bind-mounting
> the ptmx character device under the devpts mounts across the system.
> That seems like unnecessary complexity that is also unused. So if Linus
> is fine with this change as well I don't mind sending a v4.

I'm fine with whatever error handling - I cannot imagine that it matters.

And if it does, and somebody finds somebody who depends on insane
behavior, we can revisit it then,

But fundamentally. I think there are only two valid situations:

 - people use devpts as-is (so the ptmx node is inherently in the same
subdirectory as the pts nodes, and is fundamentally the same
filesystem)

   NOTE! In this case, the ptmx node is *never* a bind-mount. It is
simply part of the pts directory as exported by devpts.

   This is the "people actually use /dev/pts/ptmx" case. It's fairly
rare, I think.

 - the parent directory ptmx node is an arbitrary ptmx node (symlink,
bind mount, or mknod)

   NOTE! In this case, the ptmx node is not necessarily associated
with the pts directory in any way, and the only thing that matters is
that it's the right special character device.

Honestly, I personally detest case #1, even if some people think it's
the "clean" case. It isn't particularly clean, since it fundamentally
is not how /dev/ptmx has traditionally ever worked. It would have been
clean had that been how ptmx worked traditionally, but that' ssimply
not the case.

But if something really odd happens, and we cannot find a pts
subdirectory, or there is not a pts node, or not a ptmx node in that
subdirectory, I think all bets are pretty much off path-wise.
Returning an error is probably better than returning a bogus path.

 Linus


Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts

2018-03-12 Thread Eric W. Biederman
Christian Brauner  writes:

> On Mon, Mar 12, 2018 at 02:52:53PM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> 
>> > Most libcs will still look at /dev/ptmx when opening the master fd of a pty
>> > device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
>> > ioctl() is used to safely retrieve a file descriptor for the slave side of
>> > the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
>> > point to /.
>> >
>> > When the kernel tries to look up the root mount of the dentry for the slave
>> > file descriptor it will detect that the dentry is escaping its bind-mount
>> > since the root mount of the dentry is /dev/pts where the devpts is mounted
>> > but the root mount of /dev/ptmx is /dev.
>> >
>> > Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
>> > regression. In addition, it is also a fairly common scenario in containers
>> > employing user namespaces.
>> >
>> > To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
>> > walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
>> > contents of /proc//fd/ symlinks attached to the slave side of a
>> > file descriptor will always point to a path under the devpts mount we need
>> > to try and ensure that the kernel doesn't falsely get the impression that a
>> > pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
>> > file descriptor opened via a bind-mount of the ptmx device escapes its
>> > bind-mount. To clarify in pseudo code:
>> >
>> > * bind-mount /dev/pts/ptmx to /dev/ptmx
>> > * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
>> > * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
>> >
>> > would cause the kernel to think that slave is escaping its bind-mount. The
>> > reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
>> > /dev as its parent mount:
>> >
>> > 21 -- -- / /dev
>> > -- 21 -- / /dev/pts
>> >
>> > they are on different devices
>> >
>> > -- -- 0:6  / /dev
>> > -- -- 0:20 / /dev/pts
>> >
>> > which has the consequence that the pathname of the directory which forms
>> > the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
>> > /dev/ptmx we will end up on the same device as the devtmpfs mount at
>> > /dev/pts
>> >
>> > -- -- 0:20 /ptmx /dev/ptmx
>> >
>> > Without the bind-mount resolution patch here the kernel will now perform
>> > the bind-mount escape check directly on /dev/ptmx. When it hits
>> > devpts_ptmx_path()  calls pts_path() which in turn calls
>> > path_parent_directory(). While one would expect that
>> > path_parent_directory() *should* yield /dev it will yield / since
>> > /dev and /dev/pts are on different devices. This will cause path_pts() to
>> > fail finding a "pts" directory since there is none under /. Thus, the
>> > kernel detects that /dev/ptmx is escaping its bind-mount and will set
>> > /proc//fd/ to /.
>> >
>> > This patch changes the logic to do bind-mount resolution and after the
>> > bind-mount has been resolved (i.e. we have traced it back to the devpts
>> > mount) we can safely perform devpts_ptmx_path() and check whether we find a
>> > "pts" directory in the parent directory of the devpts mount. Since
>> > path_parent_directory() will now correctly yield /dev as parent directory
>> > for the devpts mount at /dev/pts.
>> >
>> > However, we can only perform devpts_ptmx_path() devpts_mntget() if we
>> > either did resolve a bind-mount or did not find a suitable devpts
>> > filesystem. The reason is that we want and need to support non-standard
>> > mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
>> > although we did already find a devpts filesystem and did not resolve
>> > bind-mounts we will fail on devpts mounts such as:
>> >
>> > mount -t devpts devpts /mnt
>> >
>> > where no "pts" directory will be under /. So change the logic to account
>> > for this.
>> >
>> > Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
>> > its openpty() implementation:
>> >
>> > unshare --mount
>> > mount --bind /dev/pts/ptmx /dev/ptmx
>> > chmod 666 /dev/ptmx
>> > script
>> > ls -al /proc/self/fd/0
>> >
>> > with output:
>> >
>> > lrwx-- 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /
>> >
>> > Signed-off-by: Christian Brauner 
>> > Suggested-by: Eric Biederman 
>> > Suggested-by: Linus Torvalds 
>> > ---
>> > ChangeLog v2->v3:
>> > * rework logic to account for non-standard devpts mounts such as
>> >   mount -t devpts devpts /mnt
>> > ChangeLog v1->v2:
>> > * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>> >   condition to separate patch with non-functional changes
>> > ChangeLog v0->v1:
>> > * remove
>> > /* Has the devpts filesystem already been found? */
>> > if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>> >return 0
>> >   from devpts_ptmx_path()
>> > * check superblock after devpts_ptmx_path() return

Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts

2018-03-12 Thread Christian Brauner
On Mon, Mar 12, 2018 at 02:52:53PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > Most libcs will still look at /dev/ptmx when opening the master fd of a pty
> > device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
> > ioctl() is used to safely retrieve a file descriptor for the slave side of
> > the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
> > point to /.
> >
> > When the kernel tries to look up the root mount of the dentry for the slave
> > file descriptor it will detect that the dentry is escaping its bind-mount
> > since the root mount of the dentry is /dev/pts where the devpts is mounted
> > but the root mount of /dev/ptmx is /dev.
> >
> > Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
> > regression. In addition, it is also a fairly common scenario in containers
> > employing user namespaces.
> >
> > To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
> > walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
> > contents of /proc//fd/ symlinks attached to the slave side of a
> > file descriptor will always point to a path under the devpts mount we need
> > to try and ensure that the kernel doesn't falsely get the impression that a
> > pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
> > file descriptor opened via a bind-mount of the ptmx device escapes its
> > bind-mount. To clarify in pseudo code:
> >
> > * bind-mount /dev/pts/ptmx to /dev/ptmx
> > * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
> > * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
> >
> > would cause the kernel to think that slave is escaping its bind-mount. The
> > reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
> > /dev as its parent mount:
> >
> > 21 -- -- / /dev
> > -- 21 -- / /dev/pts
> >
> > they are on different devices
> >
> > -- -- 0:6  / /dev
> > -- -- 0:20 / /dev/pts
> >
> > which has the consequence that the pathname of the directory which forms
> > the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
> > /dev/ptmx we will end up on the same device as the devtmpfs mount at
> > /dev/pts
> >
> > -- -- 0:20 /ptmx /dev/ptmx
> >
> > Without the bind-mount resolution patch here the kernel will now perform
> > the bind-mount escape check directly on /dev/ptmx. When it hits
> > devpts_ptmx_path()  calls pts_path() which in turn calls
> > path_parent_directory(). While one would expect that
> > path_parent_directory() *should* yield /dev it will yield / since
> > /dev and /dev/pts are on different devices. This will cause path_pts() to
> > fail finding a "pts" directory since there is none under /. Thus, the
> > kernel detects that /dev/ptmx is escaping its bind-mount and will set
> > /proc//fd/ to /.
> >
> > This patch changes the logic to do bind-mount resolution and after the
> > bind-mount has been resolved (i.e. we have traced it back to the devpts
> > mount) we can safely perform devpts_ptmx_path() and check whether we find a
> > "pts" directory in the parent directory of the devpts mount. Since
> > path_parent_directory() will now correctly yield /dev as parent directory
> > for the devpts mount at /dev/pts.
> >
> > However, we can only perform devpts_ptmx_path() devpts_mntget() if we
> > either did resolve a bind-mount or did not find a suitable devpts
> > filesystem. The reason is that we want and need to support non-standard
> > mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
> > although we did already find a devpts filesystem and did not resolve
> > bind-mounts we will fail on devpts mounts such as:
> >
> > mount -t devpts devpts /mnt
> >
> > where no "pts" directory will be under /. So change the logic to account
> > for this.
> >
> > Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
> > its openpty() implementation:
> >
> > unshare --mount
> > mount --bind /dev/pts/ptmx /dev/ptmx
> > chmod 666 /dev/ptmx
> > script
> > ls -al /proc/self/fd/0
> >
> > with output:
> >
> > lrwx-- 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /
> >
> > Signed-off-by: Christian Brauner 
> > Suggested-by: Eric Biederman 
> > Suggested-by: Linus Torvalds 
> > ---
> > ChangeLog v2->v3:
> > * rework logic to account for non-standard devpts mounts such as
> >   mount -t devpts devpts /mnt
> > ChangeLog v1->v2:
> > * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> >   condition to separate patch with non-functional changes
> > ChangeLog v0->v1:
> > * remove
> > /* Has the devpts filesystem already been found? */
> > if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> > return 0
> >   from devpts_ptmx_path()
> > * check superblock after devpts_ptmx_path() returned
> > ---
> >  fs/devpts/inode.c | 24 
> >  1 file changed, 16 insertions(+), 8 deletions(-)
> >
> > diff --git 

Re: [PATCH 2/3 v3] devpts: resolve devpts bind-mounts

2018-03-12 Thread Eric W. Biederman
Christian Brauner  writes:

> Most libcs will still look at /dev/ptmx when opening the master fd of a pty
> device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
> ioctl() is used to safely retrieve a file descriptor for the slave side of
> the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
> point to /.
>
> When the kernel tries to look up the root mount of the dentry for the slave
> file descriptor it will detect that the dentry is escaping its bind-mount
> since the root mount of the dentry is /dev/pts where the devpts is mounted
> but the root mount of /dev/ptmx is /dev.
>
> Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
> regression. In addition, it is also a fairly common scenario in containers
> employing user namespaces.
>
> To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
> walk up the bind-mounts for /dev/ptmx in devpts_mntget(). Since the
> contents of /proc//fd/ symlinks attached to the slave side of a
> file descriptor will always point to a path under the devpts mount we need
> to try and ensure that the kernel doesn't falsely get the impression that a
> pty slave file descriptor retrieved via TIOCGPTPEER based on a pty master
> file descriptor opened via a bind-mount of the ptmx device escapes its
> bind-mount. To clarify in pseudo code:
>
> * bind-mount /dev/pts/ptmx to /dev/ptmx
> * master = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_CLOEXEC);
> * slave = ioctl(master, TIOCGPTPEER, O_RDWR | O_NOCTTY | O_CLOEXEC);
>
> would cause the kernel to think that slave is escaping its bind-mount. The
> reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at
> /dev as its parent mount:
>
> 21 -- -- / /dev
> -- 21 -- / /dev/pts
>
> they are on different devices
>
> -- -- 0:6  / /dev
> -- -- 0:20 / /dev/pts
>
> which has the consequence that the pathname of the directory which forms
> the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
> /dev/ptmx we will end up on the same device as the devtmpfs mount at
> /dev/pts
>
> -- -- 0:20 /ptmx /dev/ptmx
>
> Without the bind-mount resolution patch here the kernel will now perform
> the bind-mount escape check directly on /dev/ptmx. When it hits
> devpts_ptmx_path()  calls pts_path() which in turn calls
> path_parent_directory(). While one would expect that
> path_parent_directory() *should* yield /dev it will yield / since
> /dev and /dev/pts are on different devices. This will cause path_pts() to
> fail finding a "pts" directory since there is none under /. Thus, the
> kernel detects that /dev/ptmx is escaping its bind-mount and will set
> /proc//fd/ to /.
>
> This patch changes the logic to do bind-mount resolution and after the
> bind-mount has been resolved (i.e. we have traced it back to the devpts
> mount) we can safely perform devpts_ptmx_path() and check whether we find a
> "pts" directory in the parent directory of the devpts mount. Since
> path_parent_directory() will now correctly yield /dev as parent directory
> for the devpts mount at /dev/pts.
>
> However, we can only perform devpts_ptmx_path() devpts_mntget() if we
> either did resolve a bind-mount or did not find a suitable devpts
> filesystem. The reason is that we want and need to support non-standard
> mountpoints for the devpts filesystem. If we call devpts_ptmx_path()
> although we did already find a devpts filesystem and did not resolve
> bind-mounts we will fail on devpts mounts such as:
>
> mount -t devpts devpts /mnt
>
> where no "pts" directory will be under /. So change the logic to account
> for this.
>
> Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
> its openpty() implementation:
>
> unshare --mount
> mount --bind /dev/pts/ptmx /dev/ptmx
> chmod 666 /dev/ptmx
> script
> ls -al /proc/self/fd/0
>
> with output:
>
> lrwx-- 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /
>
> Signed-off-by: Christian Brauner 
> Suggested-by: Eric Biederman 
> Suggested-by: Linus Torvalds 
> ---
> ChangeLog v2->v3:
> * rework logic to account for non-standard devpts mounts such as
>   mount -t devpts devpts /mnt
> ChangeLog v1->v2:
> * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>   condition to separate patch with non-functional changes
> ChangeLog v0->v1:
> * remove
> /* Has the devpts filesystem already been found? */
> if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>   return 0
>   from devpts_ptmx_path()
> * check superblock after devpts_ptmx_path() returned
> ---
>  fs/devpts/inode.c | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index d3ddbb74..b31362c6c548 100644
> --- a/fs/devpts/inode.c
> +++ b/fs/devpts/inode.c
> @@ -154,27 +154,35 @@ static int devpts_ptmx_path(struct path *path)

There is a question I asked in my original version and I haven't
seen it answered.

W