Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts
On Sun, Mar 11, 2018 at 02:46:26PM -0700, Linus Torvalds wrote: > On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner >wrote: > > > > This is the second iteration of this patch. > > This looks good to me. Just wondering how this should be merged, and > whether we should have a Cc: stable for it? > > .. and, just in case, maybe Al can verify that there's nothing subtle > about follow_up() that we need to worry about. That said, NFS already Right, sorry I forgot to CC him again after the first version of the patch. > has that exact same loop for follow_to_parent(), just syntactically > slightly different version. Once we got devpts worked out I can send a follow-up patch that adds a helper to namei.c that walks up bind-mounts. Seems it would make sense as I can see this being useful in general. To be honest, before this I had no real concept what resolving a bind-mount inside the kernel means which was why I was worried to just bang out a patch without discussing it first. Anyway, that should be a small follow-up. > > In fact, I wonder if we even need to do that > > if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) && > (path.mnt->mnt_root == fsi->ptmx_dentry)) { > > and maybe we could do the follow_up() loop unconditionally? > > Because if the ptmx dentry is *not* a bind mount, then the loop will > be a no-op, and if it *is* a bind-mount, then I'm not convinced we > should even try to just limit it to the devpts case - maybe somebody > did a bind-mount on just a legacy ptmx device node? Hm, I think we want to keep the condition and the reasoning points to a snag in the current patch I think: So in case that e.g. /dev/ptmx or /some/other/place is indeed a bind-mount of a devpts mounted somewhere else we want to give userspace a chance and check whether we find a suitable "pts" directory in the same directory where the bind-mount is. This ensures that the paths in /proc//fd/ are correct and that operations like readlink("/proc//fd/", buf, sizeof(buf)) chown(buf, 0, 0) are actually sane and don't suddenly chown() / instead of //. This also let's us easily support libcs still going through /dev/ptmx instead of /dev/pts/ptmx [1]. But in case the passed in path is not a bind-mount we actually don't want to call devpts_ptmx_path() without the /* Has the devpts filesystem already been found? */ if (path->mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) check being performed because if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC) holds and we still call path_pts() it will try to look for a "pts" mount in the parent directory but there's no requirement that I know of for a devpts to only be mounted at /dev/pts or //pts. This means that the current logic would cause us to e.g. break stuff like mount -t devpts devpts /wherever which we shouldn't do. (That's a discussion we had in the previous round of devpts fixes.) So I think we actually want devpts_mntget() to do: struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi) { bool unwind; struct path path; int err = 0; path = filp->f_path; path_get(); unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) && (path.mnt->mnt_root == fsi->ptmx_dentry); /* Walk upward while the start point is a bind mount of * a single file. */ while (path.mnt->mnt_root == path.dentry && unwind) if (follow_up() == 0) break; if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind) err = devpts_ptmx_path(); dput(path.dentry); if (err) { mntput(path.mnt); return ERR_PTR(err); } if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) { mntput(path.mnt); return ERR_PTR(-ENODEV); } return path.mnt; } Does that look sane? I'll send a new version of the patch today and will add an additional test for devpts being mounted at a different location. Christian > > So that "if()" actually seems to be to be superfluous, and only limit > the "follow bind mounts' case unnecessarily. Hmm?
Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts
On Sun, Mar 11, 2018 at 02:46:26PM -0700, Linus Torvalds wrote: > On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner > wrote: > > > > This is the second iteration of this patch. > > This looks good to me. Just wondering how this should be merged, and > whether we should have a Cc: stable for it? > > .. and, just in case, maybe Al can verify that there's nothing subtle > about follow_up() that we need to worry about. That said, NFS already Right, sorry I forgot to CC him again after the first version of the patch. > has that exact same loop for follow_to_parent(), just syntactically > slightly different version. Once we got devpts worked out I can send a follow-up patch that adds a helper to namei.c that walks up bind-mounts. Seems it would make sense as I can see this being useful in general. To be honest, before this I had no real concept what resolving a bind-mount inside the kernel means which was why I was worried to just bang out a patch without discussing it first. Anyway, that should be a small follow-up. > > In fact, I wonder if we even need to do that > > if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) && > (path.mnt->mnt_root == fsi->ptmx_dentry)) { > > and maybe we could do the follow_up() loop unconditionally? > > Because if the ptmx dentry is *not* a bind mount, then the loop will > be a no-op, and if it *is* a bind-mount, then I'm not convinced we > should even try to just limit it to the devpts case - maybe somebody > did a bind-mount on just a legacy ptmx device node? Hm, I think we want to keep the condition and the reasoning points to a snag in the current patch I think: So in case that e.g. /dev/ptmx or /some/other/place is indeed a bind-mount of a devpts mounted somewhere else we want to give userspace a chance and check whether we find a suitable "pts" directory in the same directory where the bind-mount is. This ensures that the paths in /proc//fd/ are correct and that operations like readlink("/proc//fd/", buf, sizeof(buf)) chown(buf, 0, 0) are actually sane and don't suddenly chown() / instead of //. This also let's us easily support libcs still going through /dev/ptmx instead of /dev/pts/ptmx [1]. But in case the passed in path is not a bind-mount we actually don't want to call devpts_ptmx_path() without the /* Has the devpts filesystem already been found? */ if (path->mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) check being performed because if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC) holds and we still call path_pts() it will try to look for a "pts" mount in the parent directory but there's no requirement that I know of for a devpts to only be mounted at /dev/pts or //pts. This means that the current logic would cause us to e.g. break stuff like mount -t devpts devpts /wherever which we shouldn't do. (That's a discussion we had in the previous round of devpts fixes.) So I think we actually want devpts_mntget() to do: struct vfsmount *devpts_mntget(struct file *filp, struct pts_fs_info *fsi) { bool unwind; struct path path; int err = 0; path = filp->f_path; path_get(); unwind = (DEVPTS_SB(path.mnt->mnt_sb) == fsi) && (path.mnt->mnt_root == fsi->ptmx_dentry); /* Walk upward while the start point is a bind mount of * a single file. */ while (path.mnt->mnt_root == path.dentry && unwind) if (follow_up() == 0) break; if ((path.mnt->mnt_sb->s_magic != DEVPTS_SUPER_MAGIC) || unwind) err = devpts_ptmx_path(); dput(path.dentry); if (err) { mntput(path.mnt); return ERR_PTR(err); } if (DEVPTS_SB(path.mnt->mnt_sb) != fsi) { mntput(path.mnt); return ERR_PTR(-ENODEV); } return path.mnt; } Does that look sane? I'll send a new version of the patch today and will add an additional test for devpts being mounted at a different location. Christian > > So that "if()" actually seems to be to be superfluous, and only limit > the "follow bind mounts' case unnecessarily. Hmm?
Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts
On Sun, Mar 11, 2018 at 2:05 PM, Christian Braunerwrote: > > This is the second iteration of this patch. This looks good to me. Just wondering how this should be merged, and whether we should have a Cc: stable for it? .. and, just in case, maybe Al can verify that there's nothing subtle about follow_up() that we need to worry about. That said, NFS already has that exact same loop for follow_to_parent(), just syntactically slightly different version. In fact, I wonder if we even need to do that if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) && (path.mnt->mnt_root == fsi->ptmx_dentry)) { and maybe we could do the follow_up() loop unconditionally? Because if the ptmx dentry is *not* a bind mount, then the loop will be a no-op, and if it *is* a bind-mount, then I'm not convinced we should even try to just limit it to the devpts case - maybe somebody did a bind-mount on just a legacy ptmx device node? So that "if()" actually seems to be to be superfluous, and only limit the "follow bind mounts' case unnecessarily. Hmm? Linus
Re: [PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts
On Sun, Mar 11, 2018 at 2:05 PM, Christian Brauner wrote: > > This is the second iteration of this patch. This looks good to me. Just wondering how this should be merged, and whether we should have a Cc: stable for it? .. and, just in case, maybe Al can verify that there's nothing subtle about follow_up() that we need to worry about. That said, NFS already has that exact same loop for follow_to_parent(), just syntactically slightly different version. In fact, I wonder if we even need to do that if ((DEVPTS_SB(path.mnt->mnt_sb) == fsi) && (path.mnt->mnt_root == fsi->ptmx_dentry)) { and maybe we could do the follow_up() loop unconditionally? Because if the ptmx dentry is *not* a bind mount, then the loop will be a no-op, and if it *is* a bind-mount, then I'm not convinced we should even try to just limit it to the devpts case - maybe somebody did a bind-mount on just a legacy ptmx device node? So that "if()" actually seems to be to be superfluous, and only limit the "follow bind mounts' case unnecessarily. Hmm? Linus
[PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts
Hey everyone, This is the second iteration of this patch. Please note, that I added a selftest for correct /proc//fd/{0,1,2} symlinks and added it to tools/testing/selftests/filesystem because the root cause of the problem is located in the devpts filesystem. But I'm not sure if this is the right place. It might also be argued that this should be under tools/testing/selftests/drivers/pty. Most libcs will still look at /dev/ptmx when opening the master fd of a pty device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER ioctl() is used to safely retrieve a file descriptor for the slave side of the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will point to /. When the kernel tries to look up the root mount of the dentry for the slave file descriptor it will detect that the dentry is escaping its bind-mount since the root mount of the dentry is /dev/pts where the devpts is mounted but the root mount of /dev/ptmx is /dev. Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a regression. In addition, it is also a fairly common scenario in containers employing user namespaces. To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to walk up the bind-mounts for /dev/ptmx in devpts_mntget(). This also means we need to remove the "if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" check from devpts_mntget(). However, devpts_acquire() needs to perform that check. The reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at /dev as its parent mount: 21 -- -- / /dev -- 21 -- / /dev/pts they are on different devices -- -- 0:6 / /dev -- -- 0:20 / /dev/pts which has the consequence that the pathname of the directory which forms the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to /dev/ptmx we will end up on the same device as the devtmpfs mount on /dev/pts -- -- 0:20 /ptmx /dev/ptmx but given that / is the pathname of the directory which forms the root of the /dev/pts bind-mount, the resulting source will be /ptmx. So in devpts_acquire() we still need to check if "(path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" and exit early before we try to call path_pts(). If we call path_pts() in the bind-mount case we will hit path_parent_directory() which will **not** yield /dev as parent directory but rather / which will cause path_pts() to fail since it will not find a "pts" directory underneath /. We should still perform the path_pts() check in devpts_acquire() if we haven't found a suitable devpts filesystem at open time but we should always try to exit early in devpts_acquire() and defer resolving bind-mounts until devpts_mntget(). If we fail in devpts_mntget() we will end up with a wrong /proc//fd/{0,1,2} symlink, if we fail in devpts_acquire() we will end up with a failed open("/dev/ptmx") which is more troublesome. Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in its openpty() implementation: unshare --mount mount --bind /dev/pts/ptmx /dev/ptmx chmod 666 /dev/ptmx script ls -al /proc/self/fd/0 with output: lrwx-- 1 chb chb 64 Mar 7 16:41 /proc/self/fd/0 -> / Christian Brauner (3): devpts: hoist out check for DEVPTS_SUPER_MAGIC devpts: resolve devpts bind-mounts selftests: add devpts selftests fs/devpts/inode.c| 32 ++-- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/filesystems/.gitignore | 1 + tools/testing/selftests/filesystems/Makefile | 3 +- tools/testing/selftests/filesystems/devpts_pts.c | 191 +++ 5 files changed, 217 insertions(+), 11 deletions(-) create mode 100644 tools/testing/selftests/filesystems/devpts_pts.c --- ChangeLog v1->v2: * patch for non-functional changes added * patch for tests added ChangeLog v0->v1: * remove /* Has the devpts filesystem already been found? */ if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC) return 0 from devpts_ptmx_path() * check superblock after devpts_ptmx_path() returned -- 2.15.1
[PATCH 0/3 v2] devpts: handle /dev/ptmx bind-mounts
Hey everyone, This is the second iteration of this patch. Please note, that I added a selftest for correct /proc//fd/{0,1,2} symlinks and added it to tools/testing/selftests/filesystem because the root cause of the problem is located in the devpts filesystem. But I'm not sure if this is the right place. It might also be argued that this should be under tools/testing/selftests/drivers/pty. Most libcs will still look at /dev/ptmx when opening the master fd of a pty device. When /dev/ptmx is a bind-mount of /dev/pts/ptmx and the TIOCGPTPEER ioctl() is used to safely retrieve a file descriptor for the slave side of the pty based on the master fd, the /proc/self/fd/{0,1,2} symlinks will point to /. When the kernel tries to look up the root mount of the dentry for the slave file descriptor it will detect that the dentry is escaping its bind-mount since the root mount of the dentry is /dev/pts where the devpts is mounted but the root mount of /dev/ptmx is /dev. Having bind-mounts of /dev/pts/ptmx to /dev/ptmx not working correctly is a regression. In addition, it is also a fairly common scenario in containers employing user namespaces. To handle bind-mounts of /dev/pts/ptmx to /dev/ptmx correctly we need to walk up the bind-mounts for /dev/ptmx in devpts_mntget(). This also means we need to remove the "if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" check from devpts_mntget(). However, devpts_acquire() needs to perform that check. The reason is that while the devpts mounted at /dev/pts has devtmpfs mounted at /dev as its parent mount: 21 -- -- / /dev -- 21 -- / /dev/pts they are on different devices -- -- 0:6 / /dev -- -- 0:20 / /dev/pts which has the consequence that the pathname of the directory which forms the root of the /dev/pts mount is /. So if we bind-mount /dev/pts/ptmx to /dev/ptmx we will end up on the same device as the devtmpfs mount on /dev/pts -- -- 0:20 /ptmx /dev/ptmx but given that / is the pathname of the directory which forms the root of the /dev/pts bind-mount, the resulting source will be /ptmx. So in devpts_acquire() we still need to check if "(path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC)" and exit early before we try to call path_pts(). If we call path_pts() in the bind-mount case we will hit path_parent_directory() which will **not** yield /dev as parent directory but rather / which will cause path_pts() to fail since it will not find a "pts" directory underneath /. We should still perform the path_pts() check in devpts_acquire() if we haven't found a suitable devpts filesystem at open time but we should always try to exit early in devpts_acquire() and defer resolving bind-mounts until devpts_mntget(). If we fail in devpts_mntget() we will end up with a wrong /proc//fd/{0,1,2} symlink, if we fail in devpts_acquire() we will end up with a failed open("/dev/ptmx") which is more troublesome. Here's a little reproducer that presupposes a libc that uses TIOCGPTPEER in its openpty() implementation: unshare --mount mount --bind /dev/pts/ptmx /dev/ptmx chmod 666 /dev/ptmx script ls -al /proc/self/fd/0 with output: lrwx-- 1 chb chb 64 Mar 7 16:41 /proc/self/fd/0 -> / Christian Brauner (3): devpts: hoist out check for DEVPTS_SUPER_MAGIC devpts: resolve devpts bind-mounts selftests: add devpts selftests fs/devpts/inode.c| 32 ++-- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/filesystems/.gitignore | 1 + tools/testing/selftests/filesystems/Makefile | 3 +- tools/testing/selftests/filesystems/devpts_pts.c | 191 +++ 5 files changed, 217 insertions(+), 11 deletions(-) create mode 100644 tools/testing/selftests/filesystems/devpts_pts.c --- ChangeLog v1->v2: * patch for non-functional changes added * patch for tests added ChangeLog v0->v1: * remove /* Has the devpts filesystem already been found? */ if (path->mnt->mnt_sb->s_magic == DEVPTS_SUPER_MAGIC) return 0 from devpts_ptmx_path() * check superblock after devpts_ptmx_path() returned -- 2.15.1