Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts

2018-03-12 Thread Christian Brauner
On Sun, Mar 11, 2018 at 02:46:26PM -0700, Linus Torvalds wrote:
> On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner
>  wrote:
> >
> > This is the second iteration of this patch.
> 
> This looks good to me. Just wondering how this should be merged, and
> whether we should have a Cc: stable for it?
> 
> .. and, just in case, maybe Al can verify that there's nothing subtle
> about follow_up() that we need to worry about. That said, NFS already

Right, sorry I forgot to CC him again after the first version of the
patch.

> has that exact same loop for follow_to_parent(), just syntactically
> slightly different version.

Once we got devpts worked out I can send a follow-up patch that adds a
helper to namei.c that walks up bind-mounts. Seems it would make sense
as I can see this being useful in general. To be honest, before this I
had no real concept what resolving a bind-mount inside the kernel means
which was why I was worried to just bang out a patch without discussing
it first. Anyway, that should be a small follow-up.

> 
> In fact, I wonder if we even need to do that
> 
> if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
> (path.mnt->mnt_root == fsi->ptmx_dentry)) {
> 
> and maybe we could do the follow_up() loop unconditionally?
> 
> Because if the ptmx dentry is *not* a bind mount, then the loop will
> be a no-op, and if it *is* a bind-mount, then I'm not convinced we
> should even try to just limit it to the devpts case - maybe somebody
> did a bind-mount on just a legacy ptmx device node?

Hm, I think we want to keep the condition and the reasoning points to a
snag in the current patch I think:
So in case that e.g. /dev/ptmx or /some/other/place is indeed a
bind-mount of a devpts mounted somewhere else we want to give userspace
a chance and check whether we find a suitable "pts" directory in the
same directory where the bind-mount is. This ensures that the paths in
/proc//fd/ are correct and that operations like
readlink("/proc//fd/", buf, sizeof(buf))
chown(buf, 0, 0)
are actually sane and don't suddenly chown() / instead of
//. This also let's us easily support libcs still going
through /dev/ptmx instead of /dev/pts/ptmx [1].
But in case the passed in path is not a bind-mount we actually don't
want to call devpts_ptmx_path() without the

/* Has the devpts filesystem already been found? */
if (path->mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC)

check being performed because if (path->mnt->mnt_sb->s_magic == 
DEVPTS_SUPER_MAGIC)
holds and we still call path_pts() it will try to look for a "pts" mount
in the parent directory but there's no requirement that I know of for a
devpts to only be mounted at /dev/pts or //pts. This means
that the current logic would cause us to e.g. break stuff like
mount -t devpts devpts /wherever which we shouldn't do. (That's a
discussion we had in the previous round of devpts fixes.) So I think we
actually want devpts_mntget() to do:

struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
{
bool unwind;
struct path path;
int err = 0;

path = filp->f_path;
path_get();

unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
 (path.mnt->mnt_root == fsi->ptmx_dentry);
/* Walk upward while the start point is a bind mount of
 * a single file.
 */
while (path.mnt->mnt_root == path.dentry && unwind)
if (follow_up() == 0)
break;

if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind)
err = devpts_ptmx_path();
dput(path.dentry);
if (err) {
mntput(path.mnt);
return ERR_PTR(err);
}

if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
mntput(path.mnt);
return ERR_PTR(-ENODEV);
}

return path.mnt;
}

Does that look sane?

I'll send a new version of the patch today and will add an additional
test for devpts being mounted at a different location.

Christian

> 
> So that "if()" actually seems to be to be superfluous, and only limit
> the "follow bind mounts' case unnecessarily. Hmm?


Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts

2018-03-12 Thread Christian Brauner
On Sun, Mar 11, 2018 at 02:46:26PM -0700, Linus Torvalds wrote:
> On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner
>  wrote:
> >
> > This is the second iteration of this patch.
> 
> This looks good to me. Just wondering how this should be merged, and
> whether we should have a Cc: stable for it?
> 
> .. and, just in case, maybe Al can verify that there's nothing subtle
> about follow_up() that we need to worry about. That said, NFS already

Right, sorry I forgot to CC him again after the first version of the
patch.

> has that exact same loop for follow_to_parent(), just syntactically
> slightly different version.

Once we got devpts worked out I can send a follow-up patch that adds a
helper to namei.c that walks up bind-mounts. Seems it would make sense
as I can see this being useful in general. To be honest, before this I
had no real concept what resolving a bind-mount inside the kernel means
which was why I was worried to just bang out a patch without discussing
it first. Anyway, that should be a small follow-up.

> 
> In fact, I wonder if we even need to do that
> 
> if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
> (path.mnt->mnt_root == fsi->ptmx_dentry)) {
> 
> and maybe we could do the follow_up() loop unconditionally?
> 
> Because if the ptmx dentry is *not* a bind mount, then the loop will
> be a no-op, and if it *is* a bind-mount, then I'm not convinced we
> should even try to just limit it to the devpts case - maybe somebody
> did a bind-mount on just a legacy ptmx device node?

Hm, I think we want to keep the condition and the reasoning points to a
snag in the current patch I think:
So in case that e.g. /dev/ptmx or /some/other/place is indeed a
bind-mount of a devpts mounted somewhere else we want to give userspace
a chance and check whether we find a suitable "pts" directory in the
same directory where the bind-mount is. This ensures that the paths in
/proc//fd/ are correct and that operations like
readlink("/proc//fd/", buf, sizeof(buf))
chown(buf, 0, 0)
are actually sane and don't suddenly chown() / instead of
//. This also let's us easily support libcs still going
through /dev/ptmx instead of /dev/pts/ptmx [1].
But in case the passed in path is not a bind-mount we actually don't
want to call devpts_ptmx_path() without the

/* Has the devpts filesystem already been found? */
if (path->mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC)

check being performed because if (path->mnt->mnt_sb->s_magic == 
DEVPTS_SUPER_MAGIC)
holds and we still call path_pts() it will try to look for a "pts" mount
in the parent directory but there's no requirement that I know of for a
devpts to only be mounted at /dev/pts or //pts. This means
that the current logic would cause us to e.g. break stuff like
mount -t devpts devpts /wherever which we shouldn't do. (That's a
discussion we had in the previous round of devpts fixes.) So I think we
actually want devpts_mntget() to do:

struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi)
{
bool unwind;
struct path path;
int err = 0;

path = filp->f_path;
path_get();

unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
 (path.mnt->mnt_root == fsi->ptmx_dentry);
/* Walk upward while the start point is a bind mount of
 * a single file.
 */
while (path.mnt->mnt_root == path.dentry && unwind)
if (follow_up() == 0)
break;

if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind)
err = devpts_ptmx_path();
dput(path.dentry);
if (err) {
mntput(path.mnt);
return ERR_PTR(err);
}

if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) {
mntput(path.mnt);
return ERR_PTR(-ENODEV);
}

return path.mnt;
}

Does that look sane?

I'll send a new version of the patch today and will add an additional
test for devpts being mounted at a different location.

Christian

> 
> So that "if()" actually seems to be to be superfluous, and only limit
> the "follow bind mounts' case unnecessarily. Hmm?


Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts

2018-03-11 Thread Linus Torvalds
On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner
 wrote:
>
> This is the second iteration of this patch.

This looks good to me. Just wondering how this should be merged, and
whether we should have a Cc: stable for it?

.. and, just in case, maybe Al can verify that there's nothing subtle
about follow_up() that we need to worry about. That said, NFS already
has that exact same loop for follow_to_parent(), just syntactically
slightly different version.

In fact, I wonder if we even need to do that

if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
(path.mnt->mnt_root == fsi->ptmx_dentry)) {

and maybe we could do the follow_up() loop unconditionally?

Because if the ptmx dentry is *not* a bind mount, then the loop will
be a no-op, and if it *is* a bind-mount, then I'm not convinced we
should even try to just limit it to the devpts case - maybe somebody
did a bind-mount on just a legacy ptmx device node?

So that "if()" actually seems to be to be superfluous, and only limit
the "follow bind mounts' case unnecessarily. Hmm?

   Linus


Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts

2018-03-11 Thread Linus Torvalds
On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner
 wrote:
>
> This is the second iteration of this patch.

This looks good to me. Just wondering how this should be merged, and
whether we should have a Cc: stable for it?

.. and, just in case, maybe Al can verify that there's nothing subtle
about follow_up() that we need to worry about. That said, NFS already
has that exact same loop for follow_to_parent(), just syntactically
slightly different version.

In fact, I wonder if we even need to do that

if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) &&
(path.mnt->mnt_root == fsi->ptmx_dentry)) {

and maybe we could do the follow_up() loop unconditionally?

Because if the ptmx dentry is *not* a bind mount, then the loop will
be a no-op, and if it *is* a bind-mount, then I'm not convinced we
should even try to just limit it to the devpts case - maybe somebody
did a bind-mount on just a legacy ptmx device node?

So that "if()" actually seems to be to be superfluous, and only limit
the "follow bind mounts' case unnecessarily. Hmm?

   Linus


[PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts

2018-03-11 Thread Christian Brauner
Hey everyone,

This is the second iteration of this patch. Please note, that I added a
selftest for correct /proc//fd/{0,1,2} symlinks and added it to
tools/testing/selftests/filesystem because the root cause of the problem is
located in the devpts filesystem. But I'm not sure if this is the right place.
It might also be argued that this should be under
tools/testing/selftests/drivers/pty.

Most libcs will still look at /dev/ptmx when opening the master fd of a pty
device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
ioctl() is used to safely retrieve a file descriptor for the slave side of
the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
point to /.

When the kernel tries to look up the root mount of the dentry for the slave
file descriptor it will detect that the dentry is escaping its bind-mount
since the root mount of the dentry is /dev/pts where the devpts is mounted
but the root mount of /dev/ptmx is /dev.

Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
regression. In addition, it is also a fairly common scenario in containers
employing user namespaces.

To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
walk up the bind-mounts for /dev/ptmx in devpts_mntget(). This also means
we need to remove the
"if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" check from
devpts_mntget(). However, devpts_acquire() needs to perform that check.
The reason is that while the devpts mounted at /dev/pts has devtmpfs
mounted at /dev as its parent mount:

21 -- -- / /dev
-- 21 -- / /dev/pts

they are on different devices

-- -- 0:6  / /dev
-- -- 0:20 / /dev/pts

which has the consequence that the pathname of the directory which forms
the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
/dev/ptmx we will end up on the same device as the devtmpfs mount on
/dev/pts

-- -- 0:20 /ptmx /dev/ptmx

but given that / is the pathname of the directory which forms the root of
the /dev/pts bind-mount, the resulting source will be /ptmx. So in
devpts_acquire() we still need to check if
"(path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" and exit early before
we try to call path_pts(). If we call path_pts() in the bind-mount case we
will hit path_parent_directory() which will **not** yield /dev as parent
directory but rather / which will cause path_pts() to fail since it will
not find a "pts" directory underneath /. We should still perform the
path_pts() check in devpts_acquire() if we haven't found a suitable devpts
filesystem at open time but we should always try to exit early in
devpts_acquire() and defer resolving bind-mounts until devpts_mntget(). If
we fail in devpts_mntget() we will end up with a wrong
/proc//fd/{0,1,2} symlink, if we fail in devpts_acquire() we will end
up with a failed open("/dev/ptmx") which is more troublesome.

Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
its openpty() implementation:

unshare --mount
mount --bind /dev/pts/ptmx /dev/ptmx
chmod 666 /dev/ptmx
script
ls -al /proc/self/fd/0

with output:

lrwx-- 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /

Christian Brauner (3):
  devpts: hoist out check for DEVPTS_SUPER_MAGIC
  devpts: resolve devpts bind-mounts
  selftests: add devpts selftests

 fs/devpts/inode.c|  32 ++--
 tools/testing/selftests/Makefile |   1 +
 tools/testing/selftests/filesystems/.gitignore   |   1 +
 tools/testing/selftests/filesystems/Makefile |   3 +-
 tools/testing/selftests/filesystems/devpts_pts.c | 191 +++
 5 files changed, 217 insertions(+), 11 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/devpts_pts.c

---
ChangeLog v1->v2:
* patch for non-functional changes added
* patch for tests added
ChangeLog v0->v1:
* remove
/* Has the devpts filesystem already been found? */
if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
return 0
  from devpts_ptmx_path()
* check superblock after devpts_ptmx_path() returned

-- 
2.15.1



[PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts

2018-03-11 Thread Christian Brauner
Hey everyone,

This is the second iteration of this patch. Please note, that I added a
selftest for correct /proc//fd/{0,1,2} symlinks and added it to
tools/testing/selftests/filesystem because the root cause of the problem is
located in the devpts filesystem. But I'm not sure if this is the right place.
It might also be argued that this should be under
tools/testing/selftests/drivers/pty.

Most libcs will still look at /dev/ptmx when opening the master fd of a pty
device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
ioctl() is used to safely retrieve a file descriptor for the slave side of
the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
point to /.

When the kernel tries to look up the root mount of the dentry for the slave
file descriptor it will detect that the dentry is escaping its bind-mount
since the root mount of the dentry is /dev/pts where the devpts is mounted
but the root mount of /dev/ptmx is /dev.

Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
regression. In addition, it is also a fairly common scenario in containers
employing user namespaces.

To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to
walk up the bind-mounts for /dev/ptmx in devpts_mntget(). This also means
we need to remove the
"if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" check from
devpts_mntget(). However, devpts_acquire() needs to perform that check.
The reason is that while the devpts mounted at /dev/pts has devtmpfs
mounted at /dev as its parent mount:

21 -- -- / /dev
-- 21 -- / /dev/pts

they are on different devices

-- -- 0:6  / /dev
-- -- 0:20 / /dev/pts

which has the consequence that the pathname of the directory which forms
the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to
/dev/ptmx we will end up on the same device as the devtmpfs mount on
/dev/pts

-- -- 0:20 /ptmx /dev/ptmx

but given that / is the pathname of the directory which forms the root of
the /dev/pts bind-mount, the resulting source will be /ptmx. So in
devpts_acquire() we still need to check if
"(path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" and exit early before
we try to call path_pts(). If we call path_pts() in the bind-mount case we
will hit path_parent_directory() which will **not** yield /dev as parent
directory but rather / which will cause path_pts() to fail since it will
not find a "pts" directory underneath /. We should still perform the
path_pts() check in devpts_acquire() if we haven't found a suitable devpts
filesystem at open time but we should always try to exit early in
devpts_acquire() and defer resolving bind-mounts until devpts_mntget(). If
we fail in devpts_mntget() we will end up with a wrong
/proc//fd/{0,1,2} symlink, if we fail in devpts_acquire() we will end
up with a failed open("/dev/ptmx") which is more troublesome.

Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in
its openpty() implementation:

unshare --mount
mount --bind /dev/pts/ptmx /dev/ptmx
chmod 666 /dev/ptmx
script
ls -al /proc/self/fd/0

with output:

lrwx-- 1 chb chb 64 Mar  7 16:41 /proc/self/fd/0 -> /

Christian Brauner (3):
  devpts: hoist out check for DEVPTS_SUPER_MAGIC
  devpts: resolve devpts bind-mounts
  selftests: add devpts selftests

 fs/devpts/inode.c|  32 ++--
 tools/testing/selftests/Makefile |   1 +
 tools/testing/selftests/filesystems/.gitignore   |   1 +
 tools/testing/selftests/filesystems/Makefile |   3 +-
 tools/testing/selftests/filesystems/devpts_pts.c | 191 +++
 5 files changed, 217 insertions(+), 11 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/devpts_pts.c

---
ChangeLog v1->v2:
* patch for non-functional changes added
* patch for tests added
ChangeLog v0->v1:
* remove
/* Has the devpts filesystem already been found? */
if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
return 0
  from devpts_ptmx_path()
* check superblock after devpts_ptmx_path() returned

-- 
2.15.1