Re: [PATCH v22 08/12] landlock: Add syscall implementations
On 30/10/2020 04:07, Jann Horn wrote: > On Thu, Oct 29, 2020 at 12:30 PM Mickaël Salaün wrote: >> On 29/10/2020 02:06, Jann Horn wrote: >>> On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün wrote: These 3 system calls are designed to be used by unprivileged processes to sandbox themselves: > [...] + /* +* Similar checks as for seccomp(2), except that an -EPERM may be +* returned. +*/ + if (!task_no_new_privs(current)) { + err = security_capable(current_cred(), current_user_ns(), + CAP_SYS_ADMIN, CAP_OPT_NOAUDIT); >>> >>> I think this should be ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN)? >> >> Right. The main difference is that ns_capable*() set PF_SUPERPRIV in >> current->flags. I guess seccomp should use ns_capable_noaudit() as well? > > Yeah. That seccomp code is from commit e2cfabdfd0756, with commit date > in April 2012, while ns_capable_noaudit() was introduced in commit > 98f368e9e263, with commit date in June 2016; the seccomp code predates > the availability of that API. > > Do you want to send a patch to Kees for that, or should I? > I found another case of this inconsistency in ptrace. I sent patches: https://lore.kernel.org/lkml/20201030123849.770769-1-...@digikod.net/
Re: [PATCH v22 08/12] landlock: Add syscall implementations
On Thu, Oct 29, 2020 at 12:30 PM Mickaël Salaün wrote: > On 29/10/2020 02:06, Jann Horn wrote: > > On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün wrote: > >> These 3 system calls are designed to be used by unprivileged processes > >> to sandbox themselves: [...] > >> + /* > >> +* Similar checks as for seccomp(2), except that an -EPERM may be > >> +* returned. > >> +*/ > >> + if (!task_no_new_privs(current)) { > >> + err = security_capable(current_cred(), current_user_ns(), > >> + CAP_SYS_ADMIN, CAP_OPT_NOAUDIT); > > > > I think this should be ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN)? > > Right. The main difference is that ns_capable*() set PF_SUPERPRIV in > current->flags. I guess seccomp should use ns_capable_noaudit() as well? Yeah. That seccomp code is from commit e2cfabdfd0756, with commit date in April 2012, while ns_capable_noaudit() was introduced in commit 98f368e9e263, with commit date in June 2016; the seccomp code predates the availability of that API. Do you want to send a patch to Kees for that, or should I?
Re: [PATCH v22 08/12] landlock: Add syscall implementations
On 29/10/2020 02:06, Jann Horn wrote: > On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün wrote: >> These 3 system calls are designed to be used by unprivileged processes >> to sandbox themselves: >> * landlock_create_ruleset(2): Creates a ruleset and returns its file >> descriptor. >> * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a >> ruleset, identified by the dedicated file descriptor. >> * landlock_enforce_ruleset_current(2): Enforces a ruleset on the current >> thread and its future children (similar to seccomp). This syscall has >> the same usage restrictions as seccomp(2): the caller must have the >> no_new_privs attribute set or have CAP_SYS_ADMIN in the current user >> namespace. > [...] >> Cc: Arnd Bergmann >> Cc: James Morris >> Cc: Jann Horn >> Cc: Kees Cook >> Cc: Serge E. Hallyn >> Signed-off-by: Mickaël Salaün > [...] >> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h > [...] >> +/** >> + * struct landlock_path_beneath_attr - Path hierarchy definition >> + * >> + * Argument of sys_landlock_add_rule(). >> + */ >> +struct landlock_path_beneath_attr { >> + /** >> +* @allowed_access: Bitmask of allowed actions for this file >> hierarchy >> +* (cf. `Filesystem flags`_). >> +*/ >> + __u64 allowed_access; >> + /** >> +* @parent_fd: File descriptor, open with ``O_PATH``, which identify > > nit: "identifies" OK > >> +* the parent directory of a file hierarchy, or just a file. >> +*/ >> + __s32 parent_fd; >> + /* >> +* This struct is packed to avoid trailing reserved members. >> +* Cf. security/landlock/syscall.c:build_check_abi() >> +*/ >> +} __attribute__((packed)); > [...] >> diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c > [...] >> +static int copy_min_struct_from_user(void *const dst, const size_t ksize, >> + const size_t ksize_min, const void __user *const src, >> + const size_t usize) >> +{ >> + /* Checks buffer inconsistencies. */ >> + BUILD_BUG_ON(!dst); >> + if (!src) >> + return -EFAULT; >> + >> + /* Checks size ranges. */ >> + BUILD_BUG_ON(ksize <= 0); >> + BUILD_BUG_ON(ksize < ksize_min); > > To make these checks work reliably, you should add __always_inline to > the function. Done. > >> + if (usize < ksize_min) >> + return -EINVAL; >> + if (usize > PAGE_SIZE) >> + return -E2BIG; >> + >> + /* Copies user buffer and fills with zeros. */ >> + return copy_struct_from_user(dst, ksize, src, usize); >> +} > [...] >> +static int get_path_from_fd(const s32 fd, struct path *const path) >> +{ >> + struct fd f; >> + int err = 0; >> + >> + BUILD_BUG_ON(!__same_type(fd, >> + ((struct landlock_path_beneath_attr *)NULL)->parent_fd)); >> + >> + /* Handles O_PATH. */ >> + f = fdget_raw(fd); >> + if (!f.file) >> + return -EBADF; >> + /* >> +* Only allows O_PATH file descriptor: enables to restrict ambient >> +* filesystem access without requiring to open and risk leaking or >> +* misusing a file descriptor. Forbid internal filesystems (e.g. >> +* nsfs), including pseudo filesystems that will never be mountable >> +* (e.g. sockfs, pipefs). >> +*/ >> + if (!(f.file->f_mode & FMODE_PATH) || >> + (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) || >> + (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) || >> + d_is_negative(f.file->f_path.dentry) || >> + IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) { >> + err = -EBADFD; >> + goto out_fdput; >> + } >> + path->mnt = f.file->f_path.mnt; >> + path->dentry = f.file->f_path.dentry; > > those two lines can be replaced with "*path = f.file->f_path" Done. > >> + path_get(path); >> + >> +out_fdput: >> + fdput(f); >> + return err; >> +} > [...] >> +/** >> + * sys_landlock_enforce_ruleset_current - Enforce a ruleset on the current >> task >> + * >> + * @ruleset_fd: File descriptor tied to the ruleset to merge with the >> target. >> + * @flags: Must be 0. >> + * >> + * This system call enables to enforce a Landlock ruleset on the current >> + * thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in >> its >> + * namespace or be running with no_new_privs. This avoids scenarios where > > s/be/is/ OK. > >> + * unprivileged tasks can affect the behavior of privileged children. >> + * >> + * Possible returned errors are: >> + * >> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot >> time; >> + * - EINVAL: @flags is not 0. >> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread; >> + * - EBADFD:
Re: [PATCH v22 08/12] landlock: Add syscall implementations
On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün wrote: > These 3 system calls are designed to be used by unprivileged processes > to sandbox themselves: > * landlock_create_ruleset(2): Creates a ruleset and returns its file > descriptor. > * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a > ruleset, identified by the dedicated file descriptor. > * landlock_enforce_ruleset_current(2): Enforces a ruleset on the current > thread and its future children (similar to seccomp). This syscall has > the same usage restrictions as seccomp(2): the caller must have the > no_new_privs attribute set or have CAP_SYS_ADMIN in the current user > namespace. [...] > Cc: Arnd Bergmann > Cc: James Morris > Cc: Jann Horn > Cc: Kees Cook > Cc: Serge E. Hallyn > Signed-off-by: Mickaël Salaün [...] > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h [...] > +/** > + * struct landlock_path_beneath_attr - Path hierarchy definition > + * > + * Argument of sys_landlock_add_rule(). > + */ > +struct landlock_path_beneath_attr { > + /** > +* @allowed_access: Bitmask of allowed actions for this file hierarchy > +* (cf. `Filesystem flags`_). > +*/ > + __u64 allowed_access; > + /** > +* @parent_fd: File descriptor, open with ``O_PATH``, which identify nit: "identifies" > +* the parent directory of a file hierarchy, or just a file. > +*/ > + __s32 parent_fd; > + /* > +* This struct is packed to avoid trailing reserved members. > +* Cf. security/landlock/syscall.c:build_check_abi() > +*/ > +} __attribute__((packed)); [...] > diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c [...] > +static int copy_min_struct_from_user(void *const dst, const size_t ksize, > + const size_t ksize_min, const void __user *const src, > + const size_t usize) > +{ > + /* Checks buffer inconsistencies. */ > + BUILD_BUG_ON(!dst); > + if (!src) > + return -EFAULT; > + > + /* Checks size ranges. */ > + BUILD_BUG_ON(ksize <= 0); > + BUILD_BUG_ON(ksize < ksize_min); To make these checks work reliably, you should add __always_inline to the function. > + if (usize < ksize_min) > + return -EINVAL; > + if (usize > PAGE_SIZE) > + return -E2BIG; > + > + /* Copies user buffer and fills with zeros. */ > + return copy_struct_from_user(dst, ksize, src, usize); > +} [...] > +static int get_path_from_fd(const s32 fd, struct path *const path) > +{ > + struct fd f; > + int err = 0; > + > + BUILD_BUG_ON(!__same_type(fd, > + ((struct landlock_path_beneath_attr *)NULL)->parent_fd)); > + > + /* Handles O_PATH. */ > + f = fdget_raw(fd); > + if (!f.file) > + return -EBADF; > + /* > +* Only allows O_PATH file descriptor: enables to restrict ambient > +* filesystem access without requiring to open and risk leaking or > +* misusing a file descriptor. Forbid internal filesystems (e.g. > +* nsfs), including pseudo filesystems that will never be mountable > +* (e.g. sockfs, pipefs). > +*/ > + if (!(f.file->f_mode & FMODE_PATH) || > + (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) || > + (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) || > + d_is_negative(f.file->f_path.dentry) || > + IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) { > + err = -EBADFD; > + goto out_fdput; > + } > + path->mnt = f.file->f_path.mnt; > + path->dentry = f.file->f_path.dentry; those two lines can be replaced with "*path = f.file->f_path" > + path_get(path); > + > +out_fdput: > + fdput(f); > + return err; > +} [...] > +/** > + * sys_landlock_enforce_ruleset_current - Enforce a ruleset on the current > task > + * > + * @ruleset_fd: File descriptor tied to the ruleset to merge with the target. > + * @flags: Must be 0. > + * > + * This system call enables to enforce a Landlock ruleset on the current > + * thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in > its > + * namespace or be running with no_new_privs. This avoids scenarios where s/be/is/ > + * unprivileged tasks can affect the behavior of privileged children. > + * > + * Possible returned errors are: > + * > + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot > time; > + * - EINVAL: @flags is not 0. > + * - EBADF: @ruleset_fd is not a file descriptor for the current thread; > + * - EBADFD: @ruleset_fd is not a ruleset file descriptor; > + * - EPERM: @ruleset_fd has no read access to the underlying ruleset, or the > + * current thread is not running with no_new_privs (or doesn't have > + * CAP_SYS_ADMIN in its
[PATCH v22 08/12] landlock: Add syscall implementations
From: Mickaël Salaün These 3 system calls are designed to be used by unprivileged processes to sandbox themselves: * landlock_create_ruleset(2): Creates a ruleset and returns its file descriptor. * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a ruleset, identified by the dedicated file descriptor. * landlock_enforce_ruleset_current(2): Enforces a ruleset on the current thread and its future children (similar to seccomp). This syscall has the same usage restrictions as seccomp(2): the caller must have the no_new_privs attribute set or have CAP_SYS_ADMIN in the current user namespace. All these syscalls have a "flags" argument (not currently used) to enable extensibility. Here are the motivations for these new syscalls: * A sandboxed process may not have access to file systems, including /dev, /sys or /proc, but it should still be able to add more restrictions to itself. * Neither prctl(2) nor seccomp(2) (which was used in a previous version) fit well with the current definition of a Landlock security policy. All passed structs (attributes) are checked at build time to ensure that they don't contain holes and that they are aligned the same way for each architecture. See the user and kernel documentation for more details (provided by a following commit): * Documentation/userspace-api/landlock.rst * Documentation/security/landlock.rst Cc: Arnd Bergmann Cc: James Morris Cc: Jann Horn Cc: Kees Cook Cc: Serge E. Hallyn Signed-off-by: Mickaël Salaün --- Changes since v21: * Fix and improve comments. Changes since v20: * Remove two arguments to landlock_enforce_ruleset(2) (requested by Arnd Bergmann) and rename it to landlock_enforce_ruleset_current(2): remove the enum landlock_target_type and the target file descriptor (not used for now). A ruleset can only be enforced on the current thread. * Remove the size argument in landlock_add_rule() (requested by Arnd Bergmann). * Remove landlock_get_features(2) (suggested by Arnd Bergmann). * Simplify and rename copy_struct_if_any_from_user() to copy_min_struct_from_user(). * Rename "options" to "flags" to allign with current syscalls. * Rename some types and variables in a more consistent way. * Fix missing type declarations in syscalls.h . Changes since v19: * Replace the landlock(2) syscall with 4 syscalls (one for each command): landlock_get_features(2), landlock_create_ruleset(2), landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by Arnd Bergmann). https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23...@digikod.net/ * Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled. * Add two new fields to landlock_attr_features to fit with the new syscalls: last_rule_type and last_target_type. This enable to easily identify which types are supported. * Pack landlock_attr_path_beneath struct because of the removed ruleset_fd. * Update documentation and fix spelling. Changes since v18: * Remove useless include. * Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning"). Changes since v17: * Synchronize syscall declaration. * Fix comment. Changes since v16: * Add a size_attr_features field to struct landlock_attr_features for self-introspection, and move the access_fs field to be more consistent. * Replace __aligned_u64 types of attribute fields with __u16, __s32, __u32 and __u64, and check at build time that these structures does not contain hole and that they are aligned the same way (8-bits) on all architectures. This shrinks the size of the userspace ABI, which may be appreciated especially for struct landlock_attr_features which could grow a lot in the future. For instance, struct landlock_attr_features shrinks from 72 bytes to 32 bytes. This change also enables to remove 64-bits to 32-bits conversion checks. * Switch syscall attribute pointer and size arguments to follow similar syscall argument order (e.g. bpf, clone3, openat2). * Set LANDLOCK_OPT_* types to 32-bits. * Allow enforcement of empty ruleset, which enables deny-all policies. * Fix documentation inconsistency. Changes since v15: * Do not add file descriptors referring to internal filesystems (e.g. nsfs) in a ruleset. * Replace is_user_mountable() with in-place clean checks. * Replace EBADR with EBADFD in get_ruleset_from_fd() and get_path_from_fd(). * Remove ruleset's show_fdinfo() for now. Changes since v14: * Remove the security_file_open() check in get_path_from_fd(): an opened FD should not be restricted here, and even less with this hook. As a result, it is now allowed to add a path to a ruleset even if the access to this path is not allowed (without O_PATH). This doesn't change the fact that enforcing a ruleset can't grant any right, only remove some rights. The new layer levels add more consistent restrictions. * Check minimal landlock_attr_* size/content. This