From: Mickaël Salaün
Thanks to the Landlock objects and ruleset, it is possible to identify
inodes according to a process's domain. To enable an unprivileged
process to express a file hierarchy, it first needs to open a directory
(or a file) and pass this file descriptor to the kernel through
landlock_add_rule(2). When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers. The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy. This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.
This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions. This is the
result of multiple discussions to minimize the code of Landlock to ease
review. Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem. Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.
Cc: Al Viro
Cc: Anton Ivanov
Cc: James Morris
Cc: Jann Horn
Cc: Jeff Dike
Cc: Kees Cook
Cc: Richard Weinberger
Cc: Serge E. Hallyn
Signed-off-by: Mickaël Salaün
---
Changes since v27:
* Fix domains with layers of non-overlapping access rights (cf.
layout1.non_overlapping_accesses test) thanks to a stack of access
rights per layer (replacing ORed access rights). This avoids
too-restrictive domains.
* Cosmetic fixes and updates in comments and Kconfig.
Changes since v26:
* Check each rule of a path to enable a more permissive and pragmatic
access control per layer. Suggested by Jann Horn:
https://lore.kernel.org/lkml/cag48ez1o0vtweird3kqexof78wr+cmp5bgk5kh5cs7apepi...@mail.gmail.com/
* Rename check_access_path_continue() to unmask_layers() and make it
return the new layer mask.
* Avoid double domain check in hook_file_open().
* In the documentation, add utime(2) as another example of unhandled
syscalls. Indeed, using `touch` to test write access may be tempting.
* Remove outdated comment about OverlayFS.
* Rename the landlock.h ifdef to align with most similar files.
* Fix spelling.
Changes since v25:
* Move build_check_layer() to ruleset.c, and add built-time checks for
the fs_access_mask and access variables according to
_LANDLOCK_ACCESS_FS_MASK.
* Move limits to a dedicated file and rename them:
_LANDLOCK_ACCESS_FS_LAST and _LANDLOCK_ACCESS_FS_MASK.
* Set build_check_layer() as non-inline to trigger a warning if it is
not called.
* Use BITS_PER_TYPE() macro.
* Rename function to landlock_add_fs_hooks().
* Cosmetic variable renames.
Changes since v24:
* Use the new struct landlock_rule and landlock_layer to not mix
accesses from different layers. Revert "Enforce deterministic
interleaved path rules" from v24, and fix the layer check. This
enables to follow a sane semantic: an access is granted if, for each
policy layer, at least one rule encountered on the pathwalk grants the
access, regardless of their position in the layer stack (suggested by
Jann Horn). See layout1.interleaved_masked_accesses tests from
tools/testing/selftests/landlock/fs_test.c for corner cases.
* Add build-time checks for layers.
* Use the new landlock_insert_rule() API.
Changes since v23:
* Enforce deterministic interleaved path rules. To have consistent
layered rules, granting access to a path implies that all accesses
tied to inodes, from the requested file to the real root, must be
checked. Otherwise, stacked rules may result to overzealous
restrictions. By excluding the ability to add exceptions in the same
layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get
deterministic interleaved path rules. This removes an optimization
which could be replaced by a proper cache mechanism. This also
further simplifies and explain check_access_path_continue().
* Fix memory allocation error handling in landlock_create_object()
calls. This prevent to inadvertently hold an inode.
* In get_inode_object(), improve comments, make code more readable and
move kfree() call out of the lock window.
* Use the simplified landlock_insert_rule() API.
Changes since v22:
* Simplify check_access_path_continue() (suggested by Jann Horn).
* Remove prefetch() call for now (suggested by Jann Horn).
* Fix spelling and remove superfluous comment (spotted by Jann Horn).
* Cosmetic variable renaming.
Changes since v21:
* Rename ARCH_EPHEMERAL_STATES to ARCH_EPHEMERAL_INODES (suggested by
James Morris).
* Remove the LANDLOCK_ACCESS_FS_CHROOT right because chroot(2) (which
requires CAP