Re: [PATCH v22 08/12] landlock: Add syscall implementations

2020-10-30 Thread Mickaël Salaün


On 30/10/2020 04:07, Jann Horn wrote:
> On Thu, Oct 29, 2020 at 12:30 PM Mickaël Salaün  wrote:
>> On 29/10/2020 02:06, Jann Horn wrote:
>>> On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün  wrote:
 These 3 system calls are designed to be used by unprivileged processes
 to sandbox themselves:
> [...]
 +   /*
 +* Similar checks as for seccomp(2), except that an -EPERM may be
 +* returned.
 +*/
 +   if (!task_no_new_privs(current)) {
 +   err = security_capable(current_cred(), current_user_ns(),
 +   CAP_SYS_ADMIN, CAP_OPT_NOAUDIT);
>>>
>>> I think this should be ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN)?
>>
>> Right. The main difference is that ns_capable*() set PF_SUPERPRIV in
>> current->flags. I guess seccomp should use ns_capable_noaudit() as well?
> 
> Yeah. That seccomp code is from commit e2cfabdfd0756, with commit date
> in April 2012, while ns_capable_noaudit() was introduced in commit
> 98f368e9e263, with commit date in June 2016; the seccomp code predates
> the availability of that API.
> 
> Do you want to send a patch to Kees for that, or should I?
> 

I found another case of this inconsistency in ptrace. I sent patches:
https://lore.kernel.org/lkml/20201030123849.770769-1-...@digikod.net/


Re: [PATCH v22 08/12] landlock: Add syscall implementations

2020-10-29 Thread Jann Horn
On Thu, Oct 29, 2020 at 12:30 PM Mickaël Salaün  wrote:
> On 29/10/2020 02:06, Jann Horn wrote:
> > On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün  wrote:
> >> These 3 system calls are designed to be used by unprivileged processes
> >> to sandbox themselves:
[...]
> >> +   /*
> >> +* Similar checks as for seccomp(2), except that an -EPERM may be
> >> +* returned.
> >> +*/
> >> +   if (!task_no_new_privs(current)) {
> >> +   err = security_capable(current_cred(), current_user_ns(),
> >> +   CAP_SYS_ADMIN, CAP_OPT_NOAUDIT);
> >
> > I think this should be ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN)?
>
> Right. The main difference is that ns_capable*() set PF_SUPERPRIV in
> current->flags. I guess seccomp should use ns_capable_noaudit() as well?

Yeah. That seccomp code is from commit e2cfabdfd0756, with commit date
in April 2012, while ns_capable_noaudit() was introduced in commit
98f368e9e263, with commit date in June 2016; the seccomp code predates
the availability of that API.

Do you want to send a patch to Kees for that, or should I?


Re: [PATCH v22 08/12] landlock: Add syscall implementations

2020-10-29 Thread Mickaël Salaün


On 29/10/2020 02:06, Jann Horn wrote:
> On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün  wrote:
>> These 3 system calls are designed to be used by unprivileged processes
>> to sandbox themselves:
>> * landlock_create_ruleset(2): Creates a ruleset and returns its file
>>   descriptor.
>> * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
>>   ruleset, identified by the dedicated file descriptor.
>> * landlock_enforce_ruleset_current(2): Enforces a ruleset on the current
>>   thread and its future children (similar to seccomp).  This syscall has
>>   the same usage restrictions as seccomp(2): the caller must have the
>>   no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
>>   namespace.
> [...]
>> Cc: Arnd Bergmann 
>> Cc: James Morris 
>> Cc: Jann Horn 
>> Cc: Kees Cook 
>> Cc: Serge E. Hallyn 
>> Signed-off-by: Mickaël Salaün 
> [...]
>> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> [...]
>> +/**
>> + * struct landlock_path_beneath_attr - Path hierarchy definition
>> + *
>> + * Argument of sys_landlock_add_rule().
>> + */
>> +struct landlock_path_beneath_attr {
>> +   /**
>> +* @allowed_access: Bitmask of allowed actions for this file 
>> hierarchy
>> +* (cf. `Filesystem flags`_).
>> +*/
>> +   __u64 allowed_access;
>> +   /**
>> +* @parent_fd: File descriptor, open with ``O_PATH``, which identify
> 
> nit: "identifies"

OK

> 
>> +* the parent directory of a file hierarchy, or just a file.
>> +*/
>> +   __s32 parent_fd;
>> +   /*
>> +* This struct is packed to avoid trailing reserved members.
>> +* Cf. security/landlock/syscall.c:build_check_abi()
>> +*/
>> +} __attribute__((packed));
> [...]
>> diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c
> [...]
>> +static int copy_min_struct_from_user(void *const dst, const size_t ksize,
>> +   const size_t ksize_min, const void __user *const src,
>> +   const size_t usize)
>> +{
>> +   /* Checks buffer inconsistencies. */
>> +   BUILD_BUG_ON(!dst);
>> +   if (!src)
>> +   return -EFAULT;
>> +
>> +   /* Checks size ranges. */
>> +   BUILD_BUG_ON(ksize <= 0);
>> +   BUILD_BUG_ON(ksize < ksize_min);
> 
> To make these checks work reliably, you should add __always_inline to
> the function.

Done.

> 
>> +   if (usize < ksize_min)
>> +   return -EINVAL;
>> +   if (usize > PAGE_SIZE)
>> +   return -E2BIG;
>> +
>> +   /* Copies user buffer and fills with zeros. */
>> +   return copy_struct_from_user(dst, ksize, src, usize);
>> +}
> [...]
>> +static int get_path_from_fd(const s32 fd, struct path *const path)
>> +{
>> +   struct fd f;
>> +   int err = 0;
>> +
>> +   BUILD_BUG_ON(!__same_type(fd,
>> +   ((struct landlock_path_beneath_attr *)NULL)->parent_fd));
>> +
>> +   /* Handles O_PATH. */
>> +   f = fdget_raw(fd);
>> +   if (!f.file)
>> +   return -EBADF;
>> +   /*
>> +* Only allows O_PATH file descriptor: enables to restrict ambient
>> +* filesystem access without requiring to open and risk leaking or
>> +* misusing a file descriptor.  Forbid internal filesystems (e.g.
>> +* nsfs), including pseudo filesystems that will never be mountable
>> +* (e.g. sockfs, pipefs).
>> +*/
>> +   if (!(f.file->f_mode & FMODE_PATH) ||
>> +   (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
>> +   (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
>> +   d_is_negative(f.file->f_path.dentry) ||
>> +   IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
>> +   err = -EBADFD;
>> +   goto out_fdput;
>> +   }
>> +   path->mnt = f.file->f_path.mnt;
>> +   path->dentry = f.file->f_path.dentry;
> 
> those two lines can be replaced with "*path = f.file->f_path"

Done.

> 
>> +   path_get(path);
>> +
>> +out_fdput:
>> +   fdput(f);
>> +   return err;
>> +}
> [...]
>> +/**
>> + * sys_landlock_enforce_ruleset_current - Enforce a ruleset on the current 
>> task
>> + *
>> + * @ruleset_fd: File descriptor tied to the ruleset to merge with the 
>> target.
>> + * @flags: Must be 0.
>> + *
>> + * This system call enables to enforce a Landlock ruleset on the current
>> + * thread.  Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in 
>> its
>> + * namespace or be running with no_new_privs.  This avoids scenarios where
> 
> s/be/is/

OK.

> 
>> + * unprivileged tasks can affect the behavior of privileged children.
>> + *
>> + * Possible returned errors are:
>> + *
>> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot 
>> time;
>> + * - EINVAL: @flags is not 0.
>> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread;
>> + * - EBADFD: 

Re: [PATCH v22 08/12] landlock: Add syscall implementations

2020-10-28 Thread Jann Horn
On Tue, Oct 27, 2020 at 9:04 PM Mickaël Salaün  wrote:
> These 3 system calls are designed to be used by unprivileged processes
> to sandbox themselves:
> * landlock_create_ruleset(2): Creates a ruleset and returns its file
>   descriptor.
> * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
>   ruleset, identified by the dedicated file descriptor.
> * landlock_enforce_ruleset_current(2): Enforces a ruleset on the current
>   thread and its future children (similar to seccomp).  This syscall has
>   the same usage restrictions as seccomp(2): the caller must have the
>   no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
>   namespace.
[...]
> Cc: Arnd Bergmann 
> Cc: James Morris 
> Cc: Jann Horn 
> Cc: Kees Cook 
> Cc: Serge E. Hallyn 
> Signed-off-by: Mickaël Salaün 
[...]
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
[...]
> +/**
> + * struct landlock_path_beneath_attr - Path hierarchy definition
> + *
> + * Argument of sys_landlock_add_rule().
> + */
> +struct landlock_path_beneath_attr {
> +   /**
> +* @allowed_access: Bitmask of allowed actions for this file hierarchy
> +* (cf. `Filesystem flags`_).
> +*/
> +   __u64 allowed_access;
> +   /**
> +* @parent_fd: File descriptor, open with ``O_PATH``, which identify

nit: "identifies"

> +* the parent directory of a file hierarchy, or just a file.
> +*/
> +   __s32 parent_fd;
> +   /*
> +* This struct is packed to avoid trailing reserved members.
> +* Cf. security/landlock/syscall.c:build_check_abi()
> +*/
> +} __attribute__((packed));
[...]
> diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c
[...]
> +static int copy_min_struct_from_user(void *const dst, const size_t ksize,
> +   const size_t ksize_min, const void __user *const src,
> +   const size_t usize)
> +{
> +   /* Checks buffer inconsistencies. */
> +   BUILD_BUG_ON(!dst);
> +   if (!src)
> +   return -EFAULT;
> +
> +   /* Checks size ranges. */
> +   BUILD_BUG_ON(ksize <= 0);
> +   BUILD_BUG_ON(ksize < ksize_min);

To make these checks work reliably, you should add __always_inline to
the function.

> +   if (usize < ksize_min)
> +   return -EINVAL;
> +   if (usize > PAGE_SIZE)
> +   return -E2BIG;
> +
> +   /* Copies user buffer and fills with zeros. */
> +   return copy_struct_from_user(dst, ksize, src, usize);
> +}
[...]
> +static int get_path_from_fd(const s32 fd, struct path *const path)
> +{
> +   struct fd f;
> +   int err = 0;
> +
> +   BUILD_BUG_ON(!__same_type(fd,
> +   ((struct landlock_path_beneath_attr *)NULL)->parent_fd));
> +
> +   /* Handles O_PATH. */
> +   f = fdget_raw(fd);
> +   if (!f.file)
> +   return -EBADF;
> +   /*
> +* Only allows O_PATH file descriptor: enables to restrict ambient
> +* filesystem access without requiring to open and risk leaking or
> +* misusing a file descriptor.  Forbid internal filesystems (e.g.
> +* nsfs), including pseudo filesystems that will never be mountable
> +* (e.g. sockfs, pipefs).
> +*/
> +   if (!(f.file->f_mode & FMODE_PATH) ||
> +   (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
> +   (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
> +   d_is_negative(f.file->f_path.dentry) ||
> +   IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
> +   err = -EBADFD;
> +   goto out_fdput;
> +   }
> +   path->mnt = f.file->f_path.mnt;
> +   path->dentry = f.file->f_path.dentry;

those two lines can be replaced with "*path = f.file->f_path"

> +   path_get(path);
> +
> +out_fdput:
> +   fdput(f);
> +   return err;
> +}
[...]
> +/**
> + * sys_landlock_enforce_ruleset_current - Enforce a ruleset on the current 
> task
> + *
> + * @ruleset_fd: File descriptor tied to the ruleset to merge with the target.
> + * @flags: Must be 0.
> + *
> + * This system call enables to enforce a Landlock ruleset on the current
> + * thread.  Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in 
> its
> + * namespace or be running with no_new_privs.  This avoids scenarios where

s/be/is/

> + * unprivileged tasks can affect the behavior of privileged children.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot 
> time;
> + * - EINVAL: @flags is not 0.
> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread;
> + * - EBADFD: @ruleset_fd is not a ruleset file descriptor;
> + * - EPERM: @ruleset_fd has no read access to the underlying ruleset, or the
> + *   current thread is not running with no_new_privs (or doesn't have
> + *   CAP_SYS_ADMIN in its 

[PATCH v22 08/12] landlock: Add syscall implementations

2020-10-27 Thread Mickaël Salaün
From: Mickaël Salaün 

These 3 system calls are designed to be used by unprivileged processes
to sandbox themselves:
* landlock_create_ruleset(2): Creates a ruleset and returns its file
  descriptor.
* landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
  ruleset, identified by the dedicated file descriptor.
* landlock_enforce_ruleset_current(2): Enforces a ruleset on the current
  thread and its future children (similar to seccomp).  This syscall has
  the same usage restrictions as seccomp(2): the caller must have the
  no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
  namespace.

All these syscalls have a "flags" argument (not currently used) to
enable extensibility.

Here are the motivations for these new syscalls:
* A sandboxed process may not have access to file systems, including
  /dev, /sys or /proc, but it should still be able to add more
  restrictions to itself.
* Neither prctl(2) nor seccomp(2) (which was used in a previous version)
  fit well with the current definition of a Landlock security policy.

All passed structs (attributes) are checked at build time to ensure that
they don't contain holes and that they are aligned the same way for each
architecture.

See the user and kernel documentation for more details (provided by a
following commit):
* Documentation/userspace-api/landlock.rst
* Documentation/security/landlock.rst

Cc: Arnd Bergmann 
Cc: James Morris 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
---

Changes since v21:
* Fix and improve comments.

Changes since v20:
* Remove two arguments to landlock_enforce_ruleset(2) (requested by Arnd
  Bergmann) and rename it to landlock_enforce_ruleset_current(2): remove
  the enum landlock_target_type and the target file descriptor (not used
  for now).  A ruleset can only be enforced on the current thread.
* Remove the size argument in landlock_add_rule() (requested by Arnd
  Bergmann).
* Remove landlock_get_features(2) (suggested by Arnd Bergmann).
* Simplify and rename copy_struct_if_any_from_user() to
  copy_min_struct_from_user().
* Rename "options" to "flags" to allign with current syscalls.
* Rename some types and variables in a more consistent way.
* Fix missing type declarations in syscalls.h .

Changes since v19:
* Replace the landlock(2) syscall with 4 syscalls (one for each
  command): landlock_get_features(2), landlock_create_ruleset(2),
  landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by
  Arnd Bergmann).
  https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23...@digikod.net/
* Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled.
* Add two new fields to landlock_attr_features to fit with the new
  syscalls: last_rule_type and last_target_type.  This enable to easily
  identify which types are supported.
* Pack landlock_attr_path_beneath struct because of the removed
  ruleset_fd.
* Update documentation and fix spelling.

Changes since v18:
* Remove useless include.
* Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit
  bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning").

Changes since v17:
* Synchronize syscall declaration.
* Fix comment.

Changes since v16:
* Add a size_attr_features field to struct landlock_attr_features for
  self-introspection, and move the access_fs field to be more
  consistent.
* Replace __aligned_u64 types of attribute fields with __u16, __s32,
  __u32 and __u64, and check at build time that these structures does
  not contain hole and that they are aligned the same way (8-bits) on
  all architectures.  This shrinks the size of the userspace ABI, which
  may be appreciated especially for struct landlock_attr_features which
  could grow a lot in the future.  For instance, struct
  landlock_attr_features shrinks from 72 bytes to 32 bytes.  This change
  also enables to remove 64-bits to 32-bits conversion checks.
* Switch syscall attribute pointer and size arguments to follow similar
  syscall argument order (e.g. bpf, clone3, openat2).
* Set LANDLOCK_OPT_* types to 32-bits.
* Allow enforcement of empty ruleset, which enables deny-all policies.
* Fix documentation inconsistency.

Changes since v15:
* Do not add file descriptors referring to internal filesystems (e.g.
  nsfs) in a ruleset.
* Replace is_user_mountable() with in-place clean checks.
* Replace EBADR with EBADFD in get_ruleset_from_fd() and
  get_path_from_fd().
* Remove ruleset's show_fdinfo() for now.

Changes since v14:
* Remove the security_file_open() check in get_path_from_fd(): an
  opened FD should not be restricted here, and even less with this hook.
  As a result, it is now allowed to add a path to a ruleset even if the
  access to this path is not allowed (without O_PATH). This doesn't
  change the fact that enforcing a ruleset can't grant any right, only
  remove some rights.  The new layer levels add more consistent
  restrictions.
* Check minimal landlock_attr_* size/content. This