Re: [PATCH 2/4 v5] devpts: resolve devpts bind-mounts

2018-03-13 Thread Eric W. Biederman
Christian Brauner  writes:

Reviewed-by: "Eric W. Biederman" 

It would have been more readable if Linus's suggested cleanup were
in a separate patch.  But unless you need to respin I would not worry
about that.

> Most libcs will still look at /dev/ptmx when opening the master fd of a pty
> device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
> ioctl() is used to safely retrieve a file descriptor for the slave side of
> the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
> point to /. A very simply reproducer for this issue presupposing a libc
> that uses TIOCGPTPEER in its openpty() implementation is:
>
> unshare --mount
> mount --bind /dev/pts/ptmx /dev/ptmx
> chmod 666 /dev/ptmx
> script
> ls -al /proc/self/fd/0
>
> Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
> regression. In addition, it is also a fairly common scenario in containers
> employing user namespaces.
>
> The reason for the current failure is that the kernel tries to verify the
> useability of the devpts filesystem without resolving the /dev/ptmx
> bind-mount first. This will lead it to detect that the dentry is escaping
> its bind-mount. The reason is that while the devpts filesystem mounted at
> /dev/pts has the devtmpfs mounted at /dev as its parent mount:
>
> 21 -- -- / /dev
> -- 21 -- / /dev/pts
>
> devtmpfs and devpts are on different devices
>
> -- -- 0:6  / /dev
> -- -- 0:20 / /dev/pts
>
> This has the consequence that the pathname of the parent directory of the
> devpts filesystem mount at /dev/pts is /. So if /dev/ptmx is a bind-mount
> of /dev/pts/ptmx then the /dev/ptmx bind-mount and the devpts mount at
> /dev/pts will end up being located on the same device which is recorded in
> the superblock of their vfsmount. This means the parent directory of the
> /dev/ptmx bind-mount will be /ptmx:
>
> -- --  /ptmx /dev/ptmx
>
> Without the bind-mount resolution patch the kernel will now perform the
> bind-mount escape check directly on /dev/ptmx. The function responsible for
> this is devpts_ptmx_path() which calls pts_path() which in turn calls
> path_parent_directory(). Based on the above explanation,
> path_parent_directory() will yield / as the parent directory for the
> /dev/ptmx bind-mount and not the expected /dev. Thus, the kernel detects
> that /dev/ptmx is escaping its bind-mount and will set /proc//fd/
> to /.
>
> This patch changes the logic to first resolve any bind-mounts. After the
> bind-mounts have been resolved (i.e. we have traced it back to the
> associated devpts mount) devpts_ptmx_path() can be called. In order to
> guarantee correct path generation for the slave file descriptor the kernel
> now requires that a pts directory is found in the parent directory of the
> ptmx bind-mount. This implies that when doing bind-mounts the ptmx
> bind-mount and the devpts mount should have a common parent directory. A
> valid example is:
>
> mount -t devpts devpts /dev/pts
> mount --bind /dev/pts/ptmx /dev/ptmx
>
> an invalid example is:
>
> mount -t devpts devpts /dev/pts
> mount --bind /dev/pts/ptmx /ptmx
>
> This allows us to support:
> - calling open on ptmx devices located inside non-standard devpts mounts:
>   mount -t devpts devpts /mnt
>   master = open("/mnt/ptmx", ...);
>   slave = ioctl(master, TIOCGPTPEER, ...);
> - calling open on ptmx devices located outside the devpts mount with a
>   common ancestor directory:
>   mount -t devpts devpts /dev/pts
>   mount --bind /dev/pts/ptmx /dev/ptmx
>   master = open("/dev/ptmx", ...);
>   slave = ioctl(master, TIOCGPTPEER, ...);
>
> while failing on ptmx devices located outside the devpts mount without a
> common ancestor directory:
>   mount -t devpts devpts /dev/pts
>   mount --bind /dev/pts/ptmx /ptmx
>   master = open("/ptmx", ...);
>   slave = ioctl(master, TIOCGPTPEER, ...);
>
> in which case save path generation cannot be guaranteed.
>
> Signed-off-by: Christian Brauner 
> Suggested-by: Eric Biederman 
> Suggested-by: Linus Torvalds 
> ---
> ChangeLog v4->v5:
> * reverse error handling logic to further simplify (Linus)
> ChangeLog v3->v4:
> * simplify if condition (Eric)
> ChangeLog v2->v3:
> * rework logic to account for non-standard devpts mounts such as
>   mount -t devpts devpts /mnt (Christian)
> ChangeLog v1->v2:
> * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>   condition to separate patch with non-functional changes (Linus)
> ChangeLog v0->v1:
> * remove
> /* Has the devpts filesystem already been found? */
> if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> return 0
>   from devpts_ptmx_path() (Eric)
> * check superblock after devpts_ptmx_path() returned (Christian)
> ---
>  fs/devpts/inode.c | 26 --
>  1 file changed, 16 insertions(+), 10 

Re: [PATCH 2/4 v5] devpts: resolve devpts bind-mounts

2018-03-13 Thread Eric W. Biederman
Christian Brauner  writes:

Reviewed-by: "Eric W. Biederman" 

It would have been more readable if Linus's suggested cleanup were
in a separate patch.  But unless you need to respin I would not worry
about that.

> Most libcs will still look at /dev/ptmx when opening the master fd of a pty
> device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
> ioctl() is used to safely retrieve a file descriptor for the slave side of
> the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
> point to /. A very simply reproducer for this issue presupposing a libc
> that uses TIOCGPTPEER in its openpty() implementation is:
>
> unshare --mount
> mount --bind /dev/pts/ptmx /dev/ptmx
> chmod 666 /dev/ptmx
> script
> ls -al /proc/self/fd/0
>
> Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
> regression. In addition, it is also a fairly common scenario in containers
> employing user namespaces.
>
> The reason for the current failure is that the kernel tries to verify the
> useability of the devpts filesystem without resolving the /dev/ptmx
> bind-mount first. This will lead it to detect that the dentry is escaping
> its bind-mount. The reason is that while the devpts filesystem mounted at
> /dev/pts has the devtmpfs mounted at /dev as its parent mount:
>
> 21 -- -- / /dev
> -- 21 -- / /dev/pts
>
> devtmpfs and devpts are on different devices
>
> -- -- 0:6  / /dev
> -- -- 0:20 / /dev/pts
>
> This has the consequence that the pathname of the parent directory of the
> devpts filesystem mount at /dev/pts is /. So if /dev/ptmx is a bind-mount
> of /dev/pts/ptmx then the /dev/ptmx bind-mount and the devpts mount at
> /dev/pts will end up being located on the same device which is recorded in
> the superblock of their vfsmount. This means the parent directory of the
> /dev/ptmx bind-mount will be /ptmx:
>
> -- --  /ptmx /dev/ptmx
>
> Without the bind-mount resolution patch the kernel will now perform the
> bind-mount escape check directly on /dev/ptmx. The function responsible for
> this is devpts_ptmx_path() which calls pts_path() which in turn calls
> path_parent_directory(). Based on the above explanation,
> path_parent_directory() will yield / as the parent directory for the
> /dev/ptmx bind-mount and not the expected /dev. Thus, the kernel detects
> that /dev/ptmx is escaping its bind-mount and will set /proc//fd/
> to /.
>
> This patch changes the logic to first resolve any bind-mounts. After the
> bind-mounts have been resolved (i.e. we have traced it back to the
> associated devpts mount) devpts_ptmx_path() can be called. In order to
> guarantee correct path generation for the slave file descriptor the kernel
> now requires that a pts directory is found in the parent directory of the
> ptmx bind-mount. This implies that when doing bind-mounts the ptmx
> bind-mount and the devpts mount should have a common parent directory. A
> valid example is:
>
> mount -t devpts devpts /dev/pts
> mount --bind /dev/pts/ptmx /dev/ptmx
>
> an invalid example is:
>
> mount -t devpts devpts /dev/pts
> mount --bind /dev/pts/ptmx /ptmx
>
> This allows us to support:
> - calling open on ptmx devices located inside non-standard devpts mounts:
>   mount -t devpts devpts /mnt
>   master = open("/mnt/ptmx", ...);
>   slave = ioctl(master, TIOCGPTPEER, ...);
> - calling open on ptmx devices located outside the devpts mount with a
>   common ancestor directory:
>   mount -t devpts devpts /dev/pts
>   mount --bind /dev/pts/ptmx /dev/ptmx
>   master = open("/dev/ptmx", ...);
>   slave = ioctl(master, TIOCGPTPEER, ...);
>
> while failing on ptmx devices located outside the devpts mount without a
> common ancestor directory:
>   mount -t devpts devpts /dev/pts
>   mount --bind /dev/pts/ptmx /ptmx
>   master = open("/ptmx", ...);
>   slave = ioctl(master, TIOCGPTPEER, ...);
>
> in which case save path generation cannot be guaranteed.
>
> Signed-off-by: Christian Brauner 
> Suggested-by: Eric Biederman 
> Suggested-by: Linus Torvalds 
> ---
> ChangeLog v4->v5:
> * reverse error handling logic to further simplify (Linus)
> ChangeLog v3->v4:
> * simplify if condition (Eric)
> ChangeLog v2->v3:
> * rework logic to account for non-standard devpts mounts such as
>   mount -t devpts devpts /mnt (Christian)
> ChangeLog v1->v2:
> * move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
>   condition to separate patch with non-functional changes (Linus)
> ChangeLog v0->v1:
> * remove
> /* Has the devpts filesystem already been found? */
> if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
> return 0
>   from devpts_ptmx_path() (Eric)
> * check superblock after devpts_ptmx_path() returned (Christian)
> ---
>  fs/devpts/inode.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> index 71b901936113..542364bf923e 100644
> --- a/fs/devpts/inode.c
> +++ 

[PATCH 2/4 v5] devpts: resolve devpts bind-mounts

2018-03-13 Thread Christian Brauner
Most libcs will still look at /dev/ptmx when opening the master fd of a pty
device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
ioctl() is used to safely retrieve a file descriptor for the slave side of
the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
point to /. A very simply reproducer for this issue presupposing a libc
that uses TIOCGPTPEER in its openpty() implementation is:

unshare --mount
mount --bind /dev/pts/ptmx /dev/ptmx
chmod 666 /dev/ptmx
script
ls -al /proc/self/fd/0

Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
regression. In addition, it is also a fairly common scenario in containers
employing user namespaces.

The reason for the current failure is that the kernel tries to verify the
useability of the devpts filesystem without resolving the /dev/ptmx
bind-mount first. This will lead it to detect that the dentry is escaping
its bind-mount. The reason is that while the devpts filesystem mounted at
/dev/pts has the devtmpfs mounted at /dev as its parent mount:

21 -- -- / /dev
-- 21 -- / /dev/pts

devtmpfs and devpts are on different devices

-- -- 0:6  / /dev
-- -- 0:20 / /dev/pts

This has the consequence that the pathname of the parent directory of the
devpts filesystem mount at /dev/pts is /. So if /dev/ptmx is a bind-mount
of /dev/pts/ptmx then the /dev/ptmx bind-mount and the devpts mount at
/dev/pts will end up being located on the same device which is recorded in
the superblock of their vfsmount. This means the parent directory of the
/dev/ptmx bind-mount will be /ptmx:

-- --  /ptmx /dev/ptmx

Without the bind-mount resolution patch the kernel will now perform the
bind-mount escape check directly on /dev/ptmx. The function responsible for
this is devpts_ptmx_path() which calls pts_path() which in turn calls
path_parent_directory(). Based on the above explanation,
path_parent_directory() will yield / as the parent directory for the
/dev/ptmx bind-mount and not the expected /dev. Thus, the kernel detects
that /dev/ptmx is escaping its bind-mount and will set /proc//fd/
to /.

This patch changes the logic to first resolve any bind-mounts. After the
bind-mounts have been resolved (i.e. we have traced it back to the
associated devpts mount) devpts_ptmx_path() can be called. In order to
guarantee correct path generation for the slave file descriptor the kernel
now requires that a pts directory is found in the parent directory of the
ptmx bind-mount. This implies that when doing bind-mounts the ptmx
bind-mount and the devpts mount should have a common parent directory. A
valid example is:

mount -t devpts devpts /dev/pts
mount --bind /dev/pts/ptmx /dev/ptmx

an invalid example is:

mount -t devpts devpts /dev/pts
mount --bind /dev/pts/ptmx /ptmx

This allows us to support:
- calling open on ptmx devices located inside non-standard devpts mounts:
  mount -t devpts devpts /mnt
  master = open("/mnt/ptmx", ...);
  slave = ioctl(master, TIOCGPTPEER, ...);
- calling open on ptmx devices located outside the devpts mount with a
  common ancestor directory:
  mount -t devpts devpts /dev/pts
  mount --bind /dev/pts/ptmx /dev/ptmx
  master = open("/dev/ptmx", ...);
  slave = ioctl(master, TIOCGPTPEER, ...);

while failing on ptmx devices located outside the devpts mount without a
common ancestor directory:
  mount -t devpts devpts /dev/pts
  mount --bind /dev/pts/ptmx /ptmx
  master = open("/ptmx", ...);
  slave = ioctl(master, TIOCGPTPEER, ...);

in which case save path generation cannot be guaranteed.

Signed-off-by: Christian Brauner 
Suggested-by: Eric Biederman 
Suggested-by: Linus Torvalds 
---
ChangeLog v4->v5:
* reverse error handling logic to further simplify (Linus)
ChangeLog v3->v4:
* simplify if condition (Eric)
ChangeLog v2->v3:
* rework logic to account for non-standard devpts mounts such as
  mount -t devpts devpts /mnt (Christian)
ChangeLog v1->v2:
* move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
  condition to separate patch with non-functional changes (Linus)
ChangeLog v0->v1:
* remove
/* Has the devpts filesystem already been found? */
if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
return 0
  from devpts_ptmx_path() (Eric)
* check superblock after devpts_ptmx_path() returned (Christian)
---
 fs/devpts/inode.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 71b901936113..542364bf923e 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -160,21 +160,27 @@ struct vfsmount *devpts_mntget(struct file *filp, struct 
pts_fs_info *fsi)
path = filp->f_path;
path_get();
 
-   /* Has the devpts filesystem already been found? */
-   if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC)
+   /* Walk upward while the start point is a bind 

[PATCH 2/4 v5] devpts: resolve devpts bind-mounts

2018-03-13 Thread Christian Brauner
Most libcs will still look at /dev/ptmx when opening the master fd of a pty
device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER
ioctl() is used to safely retrieve a file descriptor for the slave side of
the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will
point to /. A very simply reproducer for this issue presupposing a libc
that uses TIOCGPTPEER in its openpty() implementation is:

unshare --mount
mount --bind /dev/pts/ptmx /dev/ptmx
chmod 666 /dev/ptmx
script
ls -al /proc/self/fd/0

Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a
regression. In addition, it is also a fairly common scenario in containers
employing user namespaces.

The reason for the current failure is that the kernel tries to verify the
useability of the devpts filesystem without resolving the /dev/ptmx
bind-mount first. This will lead it to detect that the dentry is escaping
its bind-mount. The reason is that while the devpts filesystem mounted at
/dev/pts has the devtmpfs mounted at /dev as its parent mount:

21 -- -- / /dev
-- 21 -- / /dev/pts

devtmpfs and devpts are on different devices

-- -- 0:6  / /dev
-- -- 0:20 / /dev/pts

This has the consequence that the pathname of the parent directory of the
devpts filesystem mount at /dev/pts is /. So if /dev/ptmx is a bind-mount
of /dev/pts/ptmx then the /dev/ptmx bind-mount and the devpts mount at
/dev/pts will end up being located on the same device which is recorded in
the superblock of their vfsmount. This means the parent directory of the
/dev/ptmx bind-mount will be /ptmx:

-- --  /ptmx /dev/ptmx

Without the bind-mount resolution patch the kernel will now perform the
bind-mount escape check directly on /dev/ptmx. The function responsible for
this is devpts_ptmx_path() which calls pts_path() which in turn calls
path_parent_directory(). Based on the above explanation,
path_parent_directory() will yield / as the parent directory for the
/dev/ptmx bind-mount and not the expected /dev. Thus, the kernel detects
that /dev/ptmx is escaping its bind-mount and will set /proc//fd/
to /.

This patch changes the logic to first resolve any bind-mounts. After the
bind-mounts have been resolved (i.e. we have traced it back to the
associated devpts mount) devpts_ptmx_path() can be called. In order to
guarantee correct path generation for the slave file descriptor the kernel
now requires that a pts directory is found in the parent directory of the
ptmx bind-mount. This implies that when doing bind-mounts the ptmx
bind-mount and the devpts mount should have a common parent directory. A
valid example is:

mount -t devpts devpts /dev/pts
mount --bind /dev/pts/ptmx /dev/ptmx

an invalid example is:

mount -t devpts devpts /dev/pts
mount --bind /dev/pts/ptmx /ptmx

This allows us to support:
- calling open on ptmx devices located inside non-standard devpts mounts:
  mount -t devpts devpts /mnt
  master = open("/mnt/ptmx", ...);
  slave = ioctl(master, TIOCGPTPEER, ...);
- calling open on ptmx devices located outside the devpts mount with a
  common ancestor directory:
  mount -t devpts devpts /dev/pts
  mount --bind /dev/pts/ptmx /dev/ptmx
  master = open("/dev/ptmx", ...);
  slave = ioctl(master, TIOCGPTPEER, ...);

while failing on ptmx devices located outside the devpts mount without a
common ancestor directory:
  mount -t devpts devpts /dev/pts
  mount --bind /dev/pts/ptmx /ptmx
  master = open("/ptmx", ...);
  slave = ioctl(master, TIOCGPTPEER, ...);

in which case save path generation cannot be guaranteed.

Signed-off-by: Christian Brauner 
Suggested-by: Eric Biederman 
Suggested-by: Linus Torvalds 
---
ChangeLog v4->v5:
* reverse error handling logic to further simplify (Linus)
ChangeLog v3->v4:
* simplify if condition (Eric)
ChangeLog v2->v3:
* rework logic to account for non-standard devpts mounts such as
  mount -t devpts devpts /mnt (Christian)
ChangeLog v1->v2:
* move removal of if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
  condition to separate patch with non-functional changes (Linus)
ChangeLog v0->v1:
* remove
/* Has the devpts filesystem already been found? */
if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)
return 0
  from devpts_ptmx_path() (Eric)
* check superblock after devpts_ptmx_path() returned (Christian)
---
 fs/devpts/inode.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 71b901936113..542364bf923e 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -160,21 +160,27 @@ struct vfsmount *devpts_mntget(struct file *filp, struct 
pts_fs_info *fsi)
path = filp->f_path;
path_get();
 
-   /* Has the devpts filesystem already been found? */
-   if (path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC)
+   /* Walk upward while the start point is a bind mount of
+* a single file.
+*/
+   while (path.mnt->mnt_root ==