From: Mickaël Salaün
These 3 system calls are designed to be used by unprivileged processes
to sandbox themselves:
* landlock_create_ruleset(2): Creates a ruleset and returns its file
descriptor.
* landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
ruleset, identified by the dedicated file descriptor.
* landlock_enforce_ruleset_current(2): Enforces a ruleset on the current
thread and its future children (similar to seccomp). This syscall has
the same usage restrictions as seccomp(2): the caller must have the
no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
namespace.
All these syscalls have a "flags" argument (not currently used) to
enable extensibility.
Here are the motivations for these new syscalls:
* A sandboxed process may not have access to file systems, including
/dev, /sys or /proc, but it should still be able to add more
restrictions to itself.
* Neither prctl(2) nor seccomp(2) (which was used in a previous version)
fit well with the current definition of a Landlock security policy.
All passed structs (attributes) are checked at build time to ensure that
they don't contain holes and that they are aligned the same way for each
architecture.
See the user and kernel documentation for more details (provided by a
following commit):
* Documentation/userspace-api/landlock.rst
* Documentation/security/landlock.rst
Cc: Arnd Bergmann
Cc: James Morris
Cc: Jann Horn
Cc: Kees Cook
Cc: Serge E. Hallyn
Signed-off-by: Mickaël Salaün
---
Changes since v23:
* Rewrite get_ruleset_from_fd() to please the 0-DAY CI Kernel Test
Service that reported an uninitialized variable (false positive):
https://lore.kernel.org/linux-security-module/202011101854.zgbwwusk-...@intel.com/
Anyway, it is cleaner like this.
* Add a comment about E2BIG which can be returned by
landlock_enforce_ruleset_current(2) when there is no more room for
another stacked ruleset (i.e. domain).
Changes since v22:
* Replace security_capable() with ns_capable_noaudit() (suggested by
Jann Horn) and explicitly return EPERM.
* Fix landlock_enforce_ruleset_current(2)'s out_put_creds (spotted by
Jann Horn).
* Add __always_inline to copy_min_struct_from_user() to make its
BUILD_BUG_ON() checks reliable (suggested by Jann Horn).
* Simplify path assignation in get_path_from_fd() (suggested by Jann
Horn).
* Fix spelling (spotted by Jann Horn).
Changes since v21:
* Fix and improve comments.
Changes since v20:
* Remove two arguments to landlock_enforce_ruleset(2) (requested by Arnd
Bergmann) and rename it to landlock_enforce_ruleset_current(2): remove
the enum landlock_target_type and the target file descriptor (not used
for now). A ruleset can only be enforced on the current thread.
* Remove the size argument in landlock_add_rule() (requested by Arnd
Bergmann).
* Remove landlock_get_features(2) (suggested by Arnd Bergmann).
* Simplify and rename copy_struct_if_any_from_user() to
copy_min_struct_from_user().
* Rename "options" to "flags" to allign with current syscalls.
* Rename some types and variables in a more consistent way.
* Fix missing type declarations in syscalls.h .
Changes since v19:
* Replace the landlock(2) syscall with 4 syscalls (one for each
command): landlock_get_features(2), landlock_create_ruleset(2),
landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by
Arnd Bergmann).
https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23...@digikod.net/
* Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled.
* Add two new fields to landlock_attr_features to fit with the new
syscalls: last_rule_type and last_target_type. This enable to easily
identify which types are supported.
* Pack landlock_attr_path_beneath struct because of the removed
ruleset_fd.
* Update documentation and fix spelling.
Changes since v18:
* Remove useless include.
* Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit
bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning").
Changes since v17:
* Synchronize syscall declaration.
* Fix comment.
Changes since v16:
* Add a size_attr_features field to struct landlock_attr_features for
self-introspection, and move the access_fs field to be more
consistent.
* Replace __aligned_u64 types of attribute fields with __u16, __s32,
__u32 and __u64, and check at build time that these structures does
not contain hole and that they are aligned the same way (8-bits) on
all architectures. This shrinks the size of the userspace ABI, which
may be appreciated especially for struct landlock_attr_features which
could grow a lot in the future. For instance, struct
landlock_attr_features shrinks from 72 bytes to 32 bytes. This change
also enables to remove 64-bits to 32-bits conversion checks.
* Switch syscall attribute pointer and size arguments to follow similar
syscall argument order (e.g. bpf, clone3, openat2).
* Set LANDLOCK_OPT_* types to 32-bits.
*