[PATCH v1] cgroup,bpf: Add access check for cgroup_get_from_fd()
Add security access check for cgroup backed FD. The "cgroup.procs" file of the corresponding cgroup should be readable to identify the cgroup, and writable to prove that the current process can manage this cgroup (e.g. through delegation). This is similar to the check done by cgroup_procs_write_permission(). Fixes: 4ed8ec521ed5 ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY") Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Daniel Mack Cc: David S. Miller Cc: James Morris Cc: Kees Cook Cc: Martin KaFai Lau Cc: Tejun Heo --- include/linux/cgroup.h | 2 +- kernel/bpf/arraymap.c | 2 +- kernel/bpf/syscall.c | 1 + kernel/cgroup.c| 34 +++--- 4 files changed, 30 insertions(+), 9 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index c4688742ddc4..5767d471e292 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -87,7 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry, struct cgroup_subsys *ss); struct cgroup *cgroup_get_from_path(const char *path); -struct cgroup *cgroup_get_from_fd(int fd); +struct cgroup *cgroup_get_from_fd(int fd, int access_mask); int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from); diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index a2ac051c342f..3d97c70134a0 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -543,7 +543,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map, struct file *map_file /* not used */, int fd) { - return cgroup_get_from_fd(fd); + return cgroup_get_from_fd(fd, MAY_READ); } static void cgroup_fd_array_put_ptr(void *ptr) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 228f962447a5..cc7270eadcf7 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -17,6 +17,7 @@ #include #include #include +#include DEFINE_PER_CPU(int, bpf_prog_active); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index b0d727d26fc7..e02e0a531be9 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -6236,34 +6236,54 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path); /** * cgroup_get_from_fd - get a cgroup pointer from a fd * @fd: fd obtained by open(cgroup2_dir) + * @access_mask: contains the permission mask * * Find the cgroup from a fd which should be obtained * by opening a cgroup directory. Returns a pointer to the * cgroup on success. ERR_PTR is returned if the cgroup - * cannot be found. + * cannot be found or its access is denied. */ -struct cgroup *cgroup_get_from_fd(int fd) +struct cgroup *cgroup_get_from_fd(int fd, int access_mask) { struct cgroup_subsys_state *css; struct cgroup *cgrp; struct file *f; + struct inode *inode; + int ret; f = fget_raw(fd); if (!f) return ERR_PTR(-EBADF); css = css_tryget_online_from_dir(f->f_path.dentry, NULL); - fput(f); - if (IS_ERR(css)) - return ERR_CAST(css); + if (IS_ERR(css)) { + ret = PTR_ERR(css); + goto put_f; + } cgrp = css->cgroup; if (!cgroup_on_dfl(cgrp)) { - cgroup_put(cgrp); - return ERR_PTR(-EBADF); + ret = -EBADF; + goto put_cgrp; + } + + ret = -ENOMEM; + inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn); + if (inode) { + ret = inode_permission(inode, access_mask); + iput(inode); } + if (ret) + goto put_cgrp; + fput(f); return cgrp; + +put_cgrp: + cgroup_put(cgrp); +put_f: + fput(f); + return ERR_PTR(ret); } EXPORT_SYMBOL_GPL(cgroup_get_from_fd); -- 2.9.3
Re: [PATCH v1] cgroup,bpf: Add access check for cgroup_get_from_fd()
On 20/09/2016 02:30, Alexei Starovoitov wrote: > On Tue, Sep 20, 2016 at 12:49:13AM +0200, Mickaël Salaün wrote: >> Add security access check for cgroup backed FD. The "cgroup.procs" file >> of the corresponding cgroup should be readable to identify the cgroup, >> and writable to prove that the current process can manage this cgroup >> (e.g. through delegation). This is similar to the check done by >> cgroup_procs_write_permission(). >> >> Fixes: 4ed8ec521ed5 ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY") > > I don't understand what 'fixes' is about. > Looks like new feature or tightening? > Since cgroup was opened by the process and it got an fd, > it had an access, so extra check here looks unnecessary. It may not be a "fix", but this patch tighten the access control. The current cgroup_get_from_fd() only rely on the access check done on the passed FD. However, this FD come from a cgroup directory, not a "cgroup.procs" (in this directory). The "cgroup.procs" is used for cgroup delegation by cgroup_procs_write_permission(). Checking "cgroup.procs" is then more consistent with access checks done by other part of the cgroup code. Being able to open a cgroup directory only means that the current process is able to list the cgroup hierarchy, not necessarily to list the tasks in this cgroups. A BPF_MAP_TYPE_CGROUP_ARRAY should then only contains cgroups readable by the process that filled the map. It is currently possible to call bpf_skb_in_cgroup() and know if a packet come from a task in a cgroup, whereas the loading process may not be able to list this tasks. Write access to a cgroup directory means to be able to create sub-cgroups, not to add or remove tasks from that cgroup. This will be important for future use like the Daniel Mack's patch (attach an eBPF program to a cgroup). Indeed, with the current code, a process with CAP_NET_ADMIN (but without the right to manage a cgroup) would be able to attach programs to a cgroup. Similar thing goes for Landlock. > >> -struct cgroup *cgroup_get_from_fd(int fd) >> +struct cgroup *cgroup_get_from_fd(int fd, int access_mask) >> { >> struct cgroup_subsys_state *css; >> struct cgroup *cgrp; >> struct file *f; >> +struct inode *inode; >> +int ret; >> >> f = fget_raw(fd); >> if (!f) >> return ERR_PTR(-EBADF); >> >> css = css_tryget_online_from_dir(f->f_path.dentry, NULL); >> -fput(f); > > why move it down? Because it is used by kernfs_get_inode(). > >> -if (IS_ERR(css)) >> -return ERR_CAST(css); >> +if (IS_ERR(css)) { >> +ret = PTR_ERR(css); >> +goto put_f; >> +} >> >> cgrp = css->cgroup; >> if (!cgroup_on_dfl(cgrp)) { >> -cgroup_put(cgrp); >> -return ERR_PTR(-EBADF); >> +ret = -EBADF; >> +goto put_cgrp; >> +} >> + >> +ret = -ENOMEM; >> +inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn); >> +if (inode) { >> +ret = inode_permission(inode, access_mask); >> +iput(inode); >> } >> +if (ret) >> +goto put_cgrp; >> >> +fput(f); >> return cgrp; >> + >> +put_cgrp: >> +cgroup_put(cgrp); >> +put_f: >> +fput(f); >> +return ERR_PTR(ret); >> } >> EXPORT_SYMBOL_GPL(cgroup_get_from_fd); >> >> -- >> 2.9.3 >> > signature.asc Description: OpenPGP digital signature
Re: lsm naming dilemma. Re: [RFC v3 07/22] landlock: Handle file comparisons
On 20/09/2016 03:10, Sargun Dhillon wrote: > I'm fine giving up the Checmate name. Landlock seems easy enough to > Google. I haven't gotten a chance to look through the entire patchset > yet, but it does seem like they are somewhat similar. Excellent! I'm looking forward for your review. > > On Mon, Sep 19, 2016 at 5:12 PM, Alexei Starovoitov > wrote: >> On Thu, Sep 15, 2016 at 11:25:10PM +0200, Mickaël Salaün wrote: >>>>> Agreed. With this RFC, the Checmate features (i.e. network helpers) >>>>> should be able to sit on top of Landlock. >>>> >>>> I think neither of them should be called fancy names for no technical >>>> reason. >>>> We will have only one bpf based lsm. That's it and it doesn't >>>> need an obscure name. Directory name can be security/bpf/..stuff.c >>> >>> I disagree on an LSM named "BPF". I first started with the "seccomp LSM" >>> name (first RFC) but I later realized that it is confusing because >>> seccomp is associated to its syscall and the underlying features. Same >>> thing goes for BPF. It is also artificially hard to grep on a name too >>> used in the kernel source tree. >>> Making an association between the generic eBPF mechanism and a security >>> centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover, >>> the seccomp interface [1] can still be used. >> >> agree with above. >> >>> Landlock is a nice name to depict a sandbox as an enclave (i.e. a >>> landlocked country/state). I want to keep this name, which is simple, >>> express the goal of Landlock nicely and is comparable to other sandbox >>> mechanisms as Seatbelt or Pledge. >>> Landlock should not be confused with the underlying eBPF implementation. >>> Landlock could use more than only eBPF in the future and eBPF could be >>> used in other LSM as well. >> >> there will not be two bpf based LSMs. >> Therefore unless you can convince Sargun to give up his 'checmate' name, >> nothing goes in. >> The features you both need are 90% the same, so they must be done >> as part of single LSM whatever you both agree to call it. >> > signature.asc Description: OpenPGP digital signature
Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
On 20/09/2016 06:37, Sargun Dhillon wrote: > On Thu, Sep 15, 2016 at 09:41:33PM +0200, Mickaël Salaün wrote: >> >> On 15/09/2016 06:48, Alexei Starovoitov wrote: >>> On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote: >>>> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov >>>> wrote: >>>>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote: >>>>>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov >>>>>> wrote: >>>>>>> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote: >>>>>>>>>>> >>>>>>>>>>> This RFC handle both cgroup and seccomp approaches in a similar >>>>>>>>>>> way. I >>>>>>>>>>> don't see why building on top of cgroup v2 is a problem. Is there >>>>>>>>>>> security issues with delegation? >>>>>>>>>> >>>>>>>>>> What I mean is: cgroup v2 delegation has a functionality problem. >>>>>>>>>> Tejun says [1]: >>>>>>>>>> >>>>>>>>>> We haven't had to face this decision because cgroup has never >>>>>>>>>> properly >>>>>>>>>> supported delegating to applications and the in-use setups where this >>>>>>>>>> happens are custom configurations where there is no boundary between >>>>>>>>>> system and applications and adhoc trial-and-error is good enough a >>>>>>>>>> way >>>>>>>>>> to find a working solution. That wiggle room goes away once we >>>>>>>>>> officially open this up to individual applications. >>>>>>>>>> >>>>>>>>>> Unless and until that changes, I think that landlock should stay away >>>>>>>>>> from cgroups. Others could reasonably disagree with me. >>>>>>>>> >>>>>>>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security >>>>>>>>> and not for sandboxing. So the above doesn't matter in such contexts. >>>>>>>>> lsm hooks + cgroups provide convenient scope and existing entry >>>>>>>>> points. >>>>>>>>> Please see checmate examples how it's used. >>>>>>>>> >>>>>>>> >>>>>>>> To be clear: I'm not arguing at all that there shouldn't be >>>>>>>> bpf+lsm+cgroup integration. I'm arguing that the unprivileged >>>>>>>> landlock interface shouldn't expose any cgroup integration, at least >>>>>>>> until the cgroup situation settles down a lot. >>>>>>> >>>>>>> ahh. yes. we're perfectly in agreement here. >>>>>>> I'm suggesting that the next RFC shouldn't include unpriv >>>>>>> and seccomp at all. Once bpf+lsm+cgroup is merged, we can >>>>>>> argue about unpriv with cgroups and even unpriv as a whole, >>>>>>> since it's not a given. Seccomp integration is also questionable. >>>>>>> I'd rather not have seccomp as a gate keeper for this lsm. >>>>>>> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks >>>>>>> don't have one to one relationship, so mixing them up is only >>>>>>> asking for trouble further down the road. >>>>>>> If we really need to carry some information from seccomp to lsm+bpf, >>>>>>> it's easier to add eBPF support to seccomp and let bpf side deal >>>>>>> with passing whatever information. >>>>>>> >>>>>> >>>>>> As an argument for keeping seccomp (or an extended seccomp) as the >>>>>> interface for an unprivileged bpf+lsm: seccomp already checks off most >>>>>> of the boxes for safely letting unprivileged programs sandbox >>>>>> themselves. >>>>> >>>>> you mean the attach part of seccomp syscall that deals with no_new_priv? >>>>> sure, that's reusable. >>>>> >>>>>> Furthermore, to the extent that there are use cases for >>>>>> unprivileged bpf+ls
Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
On 15/09/2016 11:19, Pavel Machek wrote: > Hi! > >> This series is a proof of concept to fill some missing part of seccomp as the >> ability to check syscall argument pointers or creating more dynamic security >> policies. The goal of this new stackable Linux Security Module (LSM) called >> Landlock is to allow any process, including unprivileged ones, to create >> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the >> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of >> bugs or unexpected/malicious behaviors in userland applications. >> >> The first RFC [1] was focused on extending seccomp while staying at the >> syscall >> level. This brought a working PoC but with some (mitigated) ToCToU race >> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic >> syscall argument evaluation (hence the LSM hooks). > > Long and nice description follows. Should it go to Documentation/ > somewhere? > > Because some documentation would be useful... > Pavel Right, but I was looking for feedback before investing in documentation. :) > >> include/linux/bpf.h | 41 + >> include/linux/lsm_hooks.h | 5 + >> include/linux/seccomp.h | 54 ++- >> include/uapi/asm-generic/errno-base.h | 1 + >> include/uapi/linux/bpf.h | 103 >> include/uapi/linux/seccomp.h | 2 + >> kernel/bpf/arraymap.c | 222 + >> kernel/bpf/syscall.c | 18 ++- >> kernel/bpf/verifier.c | 32 +++- >> kernel/fork.c | 41 - >> kernel/seccomp.c | 211 +++- >> samples/Makefile | 2 +- >> samples/landlock/.gitignore | 1 + >> samples/landlock/Makefile | 16 ++ >> samples/landlock/sandbox.c| 295 >> ++ >> security/Kconfig | 1 + >> security/Makefile | 2 + >> security/landlock/Kconfig | 19 +++ >> security/landlock/Makefile| 3 + >> security/landlock/checker_cgroup.c| 96 +++ >> security/landlock/checker_cgroup.h| 18 +++ >> security/landlock/checker_fs.c| 183 + >> security/landlock/checker_fs.h| 20 +++ >> security/landlock/lsm.c | 228 ++ >> security/security.c | 1 + >> 25 files changed, 1592 insertions(+), 23 deletions(-) >> create mode 100644 samples/landlock/.gitignore >> create mode 100644 samples/landlock/Makefile >> create mode 100644 samples/landlock/sandbox.c >> create mode 100644 security/landlock/Kconfig >> create mode 100644 security/landlock/Makefile >> create mode 100644 security/landlock/checker_cgroup.c >> create mode 100644 security/landlock/checker_cgroup.h >> create mode 100644 security/landlock/checker_fs.c >> create mode 100644 security/landlock/checker_fs.h >> create mode 100644 security/landlock/lsm.c >> > signature.asc Description: OpenPGP digital signature
[PATCH v1] seccomp: Fix documentation
Fix struct seccomp_filter and seccomp_run_filters() signatures. Signed-off-by: Mickaël Salaün Cc: Andy Lutomirski Cc: James Morris Cc: Kees Cook Cc: Will Drewry --- kernel/seccomp.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 0db7c8a2afe2..494cba230ca0 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -41,8 +41,7 @@ * outside of a lifetime-guarded section. In general, this * is only needed for handling filters shared across tasks. * @prev: points to a previously installed, or inherited, filter - * @len: the number of instructions in the program - * @insnsi: the BPF program instructions to evaluate + * @prog: the BPF program to evaluate * * seccomp_filter objects are organized in a tree linked via the @prev * pointer. For any task, it appears to be a singly-linked list starting @@ -168,8 +167,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen) } /** - * seccomp_run_filters - evaluates all seccomp filters against @syscall - * @syscall: number of the current system call + * seccomp_run_filters - evaluates all seccomp filters against @sd + * @sd: optional seccomp data to be passed to filters * * Returns valid seccomp BPF response codes. */ -- 2.9.3
[PATCH v1] bpf: Set register type according to is_valid_access()
This fix a pointer leak when an unprivileged eBPF program read a pointer value from the context. Even if is_valid_access() returns a pointer type, the eBPF verifier replace it with UNKNOWN_VALUE. The register value containing an address is then allowed to leak. Moreover, this prevented unprivileged eBPF programs to use functions with (legitimate) pointer arguments. This bug is not an issue for now because the only unprivileged eBPF program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types from its context are UNKNOWN_VALUE. However, this fix is important for future unprivileged eBPF program types which could use pointers in their context. Signed-off-by: Mickaël Salaün Fixes: 969bf05eb3ce ("bpf: direct packet access") Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Kees Cook Acked-by: Sargun Dhillon --- kernel/bpf/verifier.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index daea765d72e6..0698ccd67715 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, } err = check_ctx_access(env, off, size, t, ®_type); if (!err && t == BPF_READ && value_regno >= 0) { - mark_reg_unknown_value(state->regs, value_regno); - if (env->allow_ptr_leaks) - /* note that reg.[id|off|range] == 0 */ - state->regs[value_regno].type = reg_type; + /* note that reg.[id|off|range] == 0 */ + state->regs[value_regno].type = reg_type; } } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) { -- 2.9.3
Re: [PATCH v1] bpf: Set register type according to is_valid_access()
On 22/09/2016 21:41, Daniel Borkmann wrote: > On 09/22/2016 08:35 PM, Mickaël Salaün wrote: >> This fix a pointer leak when an unprivileged eBPF program read a pointer >> value from the context. Even if is_valid_access() returns a pointer >> type, the eBPF verifier replace it with UNKNOWN_VALUE. The register >> value containing an address is then allowed to leak. Moreover, this >> prevented unprivileged eBPF programs to use functions with (legitimate) >> pointer arguments. >> >> This bug is not an issue for now because the only unprivileged eBPF >> program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types >> from its context are UNKNOWN_VALUE. However, this fix is important for >> future unprivileged eBPF program types which could use pointers in their >> context. >> >> Signed-off-by: Mickaël Salaün >> Fixes: 969bf05eb3ce ("bpf: direct packet access") >> Cc: Alexei Starovoitov >> Cc: Andy Lutomirski >> Cc: Daniel Borkmann >> Cc: Kees Cook >> Acked-by: Sargun Dhillon >> --- >> kernel/bpf/verifier.c | 6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index daea765d72e6..0698ccd67715 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env >> *env, u32 regno, int off, >> } >> err = check_ctx_access(env, off, size, t, ®_type); >> if (!err && t == BPF_READ && value_regno >= 0) { >> -mark_reg_unknown_value(state->regs, value_regno); >> -if (env->allow_ptr_leaks) >> -/* note that reg.[id|off|range] == 0 */ >> -state->regs[value_regno].type = reg_type; >> +/* note that reg.[id|off|range] == 0 */ >> +state->regs[value_regno].type = reg_type; > > True that it's not an issue currently, since reg_type is only set for > PTR_TO_PACKET/PTR_TO_PACKET_END in xdp and tc programs that can only be > loaded as privileged. So not an issue for BPF_PROG_TYPE_SOCKET_FILTER. > > One thing I don't quite follow is why you remove the > mark_reg_unknown_value() > as this also clears imm? I think this could result in an actual verifier > bug when it would reuse previous tracked imm value of that dst register? Good catch, I missed the imm initialization. I'm going to send a new patch. > >> } >> >> } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) { >> > signature.asc Description: OpenPGP digital signature
[PATCH v2] bpf: Set register type according to is_valid_access()
This fix a pointer leak when an unprivileged eBPF program read a pointer value from the context. Even if is_valid_access() returns a pointer type, the eBPF verifier replace it with UNKNOWN_VALUE. The register value containing an address is then allowed to leak. Moreover, this prevented unprivileged eBPF programs to use functions with (legitimate) pointer arguments. This bug is not an issue for now because the only unprivileged eBPF program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types from its context are UNKNOWN_VALUE. However, this fix is important for future unprivileged eBPF program types which could use pointers in their context. Signed-off-by: Mickaël Salaün Fixes: 969bf05eb3ce ("bpf: direct packet access") Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Kees Cook Acked-by: Sargun Dhillon --- kernel/bpf/verifier.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index daea765d72e6..adbc7c161ba5 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -795,9 +795,8 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, err = check_ctx_access(env, off, size, t, ®_type); if (!err && t == BPF_READ && value_regno >= 0) { mark_reg_unknown_value(state->regs, value_regno); - if (env->allow_ptr_leaks) - /* note that reg.[id|off|range] == 0 */ - state->regs[value_regno].type = reg_type; + /* note that reg.[id|off|range] == 0 */ + state->regs[value_regno].type = reg_type; } } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) { -- 2.9.3
[PATCH v3] bpf: Set register type according to is_valid_access()
This prevent future potential pointer leaks when an unprivileged eBPF program will read a pointer value from its context. Even if is_valid_access() returns a pointer type, the eBPF verifier replace it with UNKNOWN_VALUE. The register value that contains a kernel address is then allowed to leak. Moreover, this fix allows unprivileged eBPF programs to use functions with (legitimate) pointer arguments. Not an issue currently since reg_type is only set for PTR_TO_PACKET or PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as privileged. For now, the only unprivileged eBPF program allowed is for socket filtering and all the types from its context are UNKNOWN_VALUE. However, this fix is important for future unprivileged eBPF programs which could use pointers in their context. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann --- kernel/bpf/verifier.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index daea765d72e6..adbc7c161ba5 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -795,9 +795,8 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, err = check_ctx_access(env, off, size, t, ®_type); if (!err && t == BPF_READ && value_regno >= 0) { mark_reg_unknown_value(state->regs, value_regno); - if (env->allow_ptr_leaks) - /* note that reg.[id|off|range] == 0 */ - state->regs[value_regno].type = reg_type; + /* note that reg.[id|off|range] == 0 */ + state->regs[value_regno].type = reg_type; } } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) { -- 2.9.3
Re: [PATCH v2 0/3] Fix seccomp for UM
Hi, It seems that some of the fixes from linux-security have landed in the Linus' tree but some seccomp fixes are still missing. They fix bugs introduced in Linux v4.8 and are still present in v4.8-rc5. Could you please push this series before the final 4.8 release? Regards, Mickaël On 09/08/2016 02:35, James Morris wrote: > On Mon, 1 Aug 2016, Mickaël Salaün wrote: > >> Hi, >> >> This series fix the recent seccomp update for the User-mode Linux >> architecture >> (32-bit and 64-bit) since commit 26703c636c1f ("um/ptrace: run seccomp after >> ptrace") which close the hole where ptrace can change a syscall out from >> under >> seccomp. >> >> Changes since v1: >> * fix commit message typo [2/3] >> * add Kees Cook's Acked-by >> * rebased on commit 7616ac70d1bb ("apparmor: fix >> SECURITY_APPARMOR_HASH_DEFAULT >> parameter handling") > > All applied to > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next > > > signature.asc Description: OpenPGP digital signature
[RFC v3 09/22] seccomp: Move struct seccomp_filter in seccomp.h
Set struct seccomp_filter public because of the next use of the new field thread_prev added for Landlock LSM. Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry --- include/linux/seccomp.h | 27 ++- kernel/seccomp.c| 26 -- 2 files changed, 26 insertions(+), 27 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index ecc296c137cd..a0459a7315ce 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -10,7 +10,32 @@ #include #include -struct seccomp_filter; +/** + * struct seccomp_filter - container for seccomp BPF programs + * + * @usage: reference count to manage the object lifetime. + * get/put helpers should be used when accessing an instance + * outside of a lifetime-guarded section. In general, this + * is only needed for handling filters shared across tasks. + * @prev: points to a previously installed, or inherited, filter + * @prog: the BPF program to evaluate + * + * seccomp_filter objects are organized in a tree linked via the @prev + * pointer. For any task, it appears to be a singly-linked list starting + * with current->seccomp.filter, the most recently attached or inherited filter. + * However, multiple filters may share a @prev node, by way of fork(), which + * results in a unidirectional tree existing in memory. This is similar to + * how namespaces work. + * + * seccomp_filter objects should never be modified after being attached + * to a task_struct (other than @usage). + */ +struct seccomp_filter { + atomic_t usage; + struct seccomp_filter *prev; + struct bpf_prog *prog; +}; + /** * struct seccomp - the state of a seccomp'ed process * diff --git a/kernel/seccomp.c b/kernel/seccomp.c index dccfc05cb3ec..1867bbfa7c6c 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -33,32 +33,6 @@ #include #include -/** - * struct seccomp_filter - container for seccomp BPF programs - * - * @usage: reference count to manage the object lifetime. - * get/put helpers should be used when accessing an instance - * outside of a lifetime-guarded section. In general, this - * is only needed for handling filters shared across tasks. - * @prev: points to a previously installed, or inherited, filter - * @prog: the BPF program to evaluate - * - * seccomp_filter objects are organized in a tree linked via the @prev - * pointer. For any task, it appears to be a singly-linked list starting - * with current->seccomp.filter, the most recently attached or inherited filter. - * However, multiple filters may share a @prev node, by way of fork(), which - * results in a unidirectional tree existing in memory. This is similar to - * how namespaces work. - * - * seccomp_filter objects should never be modified after being attached - * to a task_struct (other than @usage). - */ -struct seccomp_filter { - atomic_t usage; - struct seccomp_filter *prev; - struct bpf_prog *prog; -}; - /* Limit any path through the tree to 256KB worth of instructions. */ #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter)) -- 2.9.3
[RFC v3 04/22] bpf: Set register type according to is_valid_access()
This fix a pointer leak when an unprivileged eBPF program read a pointer value from the context. Even if is_valid_access() returns a pointer type, the eBPF verifier replace it with UNKNOWN_VALUE. The register value containing an address is then allowed to leak. Moreover, this prevented unprivileged eBPF programs to use functions with (legitimate) pointer arguments. This bug was not a problem until now because the only unprivileged eBPF program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types from its context are UNKNOWN_VALUE. Signed-off-by: Mickaël Salaün Fixes: 969bf05eb3ce ("bpf: direct packet access") Cc: Alexei Starovoitov Cc: Daniel Borkmann --- kernel/bpf/verifier.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c0c4a92dae8c..608cbffb0e86 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, } err = check_ctx_access(env, off, size, t, ®_type); if (!err && t == BPF_READ && value_regno >= 0) { - mark_reg_unknown_value(state->regs, value_regno); - if (env->allow_ptr_leaks) - /* note that reg.[id|off|range] == 0 */ - state->regs[value_regno].type = reg_type; + /* note that reg.[id|off|range] == 0 */ + state->regs[value_regno].type = reg_type; } } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) { -- 2.9.3
[RFC v3 08/22] seccomp: Fix documentation for struct seccomp_filter
Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry --- kernel/seccomp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 0db7c8a2afe2..dccfc05cb3ec 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -41,8 +41,7 @@ * outside of a lifetime-guarded section. In general, this * is only needed for handling filters shared across tasks. * @prev: points to a previously installed, or inherited, filter - * @len: the number of instructions in the program - * @insnsi: the BPF program instructions to evaluate + * @prog: the BPF program to evaluate * * seccomp_filter objects are organized in a tree linked via the @prev * pointer. For any task, it appears to be a singly-linked list starting -- 2.9.3
[RFC v3 20/22] landlock: Add update and debug access flags
For now, the update and debug accesses are only accessible to a process with CAP_SYS_ADMIN. This could change in the future. The capability check is statically done when loading an eBPF program, according to the current process. If the process has enough rights and set the appropriate access flags, then the dedicated functions or data will be accessible. With the update access, the following functions are available: * bpf_map_lookup_elem * bpf_map_update_elem * bpf_map_delete_elem * bpf_tail_call With the debug access, the following functions are available: * bpf_trace_printk * bpf_get_prandom_u32 * bpf_get_current_pid_tgid * bpf_get_current_uid_gid * bpf_get_current_comm Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: David S. Miller Cc: Kees Cook Cc: Sargun Dhillon --- include/uapi/linux/bpf.h | 4 +++- security/landlock/lsm.c | 54 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 3cc52e51357f..8cfc2de2ab76 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -584,7 +584,9 @@ enum landlock_hook_id { #define _LANDLOCK_FLAG_ORIGIN_MASK ((1 << 3) - 1) /* context of function access flags */ -#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 0) - 1) +#define LANDLOCK_FLAG_ACCESS_UPDATE(1 << 0) +#define LANDLOCK_FLAG_ACCESS_DEBUG (1 << 1) +#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 2) - 1) /* Handle check flags */ #define LANDLOCK_FLAG_FS_DENTRY(1 << 0) diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c index 2a15839a08c8..56c45abe979c 100644 --- a/security/landlock/lsm.c +++ b/security/landlock/lsm.c @@ -202,11 +202,57 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6]) static const struct bpf_func_proto *bpf_landlock_func_proto( enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype) { + bool access_update = !!(prog_subtype->landlock_hook.access & + LANDLOCK_FLAG_ACCESS_UPDATE); + bool access_debug = !!(prog_subtype->landlock_hook.access & + LANDLOCK_FLAG_ACCESS_DEBUG); + switch (func_id) { case BPF_FUNC_landlock_cmp_fs_prop_with_struct_file: return &bpf_landlock_cmp_fs_prop_with_struct_file_proto; case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file: return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto; + + /* access_update */ + case BPF_FUNC_map_lookup_elem: + if (access_update) + return &bpf_map_lookup_elem_proto; + return NULL; + case BPF_FUNC_map_update_elem: + if (access_update) + return &bpf_map_update_elem_proto; + return NULL; + case BPF_FUNC_map_delete_elem: + if (access_update) + return &bpf_map_delete_elem_proto; + return NULL; + case BPF_FUNC_tail_call: + if (access_update) + return &bpf_tail_call_proto; + return NULL; + + /* access_debug */ + case BPF_FUNC_trace_printk: + if (access_debug) + return bpf_get_trace_printk_proto(); + return NULL; + case BPF_FUNC_get_prandom_u32: + if (access_debug) + return &bpf_get_prandom_u32_proto; + return NULL; + case BPF_FUNC_get_current_pid_tgid: + if (access_debug) + return &bpf_get_current_pid_tgid_proto; + return NULL; + case BPF_FUNC_get_current_uid_gid: + if (access_debug) + return &bpf_get_current_uid_gid_proto; + return NULL; + case BPF_FUNC_get_current_comm: + if (access_debug) + return &bpf_get_current_comm_proto; + return NULL; + default: return NULL; } @@ -348,6 +394,14 @@ static inline bool bpf_landlock_is_valid_subtype( if (prog_subtype->landlock_hook.access & ~_LANDLOCK_FLAG_ACCESS_MASK) return false; + /* check access flags */ + if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_UPDATE && + !capable(CAP_SYS_ADMIN)) + return false; + if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_DEBUG && + !capable(CAP_SYS_ADMIN)) + return false; + return true; } -- 2.9.3
[RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup
This allows to add new eBPF programs to Landlock hooks dedicated to a cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF programs, the Landlock hooks attached to a cgroup are propagated to the nested cgroups. However, when a new Landlock program is attached to one of this nested cgroup, this cgroup hierarchy fork the Landlock hooks. This design is simple and match the current CONFIG_BPF_CGROUP inheritance. The difference lie in the fact that Landlock programs can only be stacked but not removed. This match the append-only seccomp behavior. Userland is free to handle Landlock hooks attached to a cgroup in more complicated ways (e.g. continuous inheritance), but care should be taken to properly handle error cases (e.g. memory allocation errors). Changes since v2: * new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov) Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Daniel Mack Cc: David S. Miller Cc: Kees Cook Cc: Tejun Heo Link: https://lkml.kernel.org/r/20160826021432.ga8...@ast-mbp.thefacebook.com Link: https://lkml.kernel.org/r/20160827204307.ga43...@ast-mbp.thefacebook.com --- include/linux/bpf-cgroup.h | 7 +++ include/linux/cgroup-defs.h | 2 ++ include/linux/landlock.h| 9 + include/uapi/linux/bpf.h| 1 + kernel/bpf/cgroup.c | 33 ++--- kernel/bpf/syscall.c| 11 +++ security/landlock/lsm.c | 40 +++- security/landlock/manager.c | 32 8 files changed, 131 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 6cca7924ee17..439c681159e2 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -14,8 +14,15 @@ struct sk_buff; extern struct static_key_false cgroup_bpf_enabled_key; #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key) +#ifdef CONFIG_SECURITY_LANDLOCK +struct landlock_hooks; +#endif /* CONFIG_SECURITY_LANDLOCK */ + union bpf_object { struct bpf_prog *prog; +#ifdef CONFIG_SECURITY_LANDLOCK + struct landlock_hooks *hooks; +#endif /* CONFIG_SECURITY_LANDLOCK */ }; struct cgroup_bpf { diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 861b4677fc5b..fe1023bf7b9d 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -301,8 +301,10 @@ struct cgroup { /* used to schedule release agent */ struct work_struct release_agent_work; +#ifdef CONFIG_CGROUP_BPF /* used to store eBPF programs */ struct cgroup_bpf bpf; +#endif /* CONFIG_CGROUP_BPF */ /* ids of the ancestors at each level including self */ int ancestor_ids[]; diff --git a/include/linux/landlock.h b/include/linux/landlock.h index 932ae57fa70e..179a848110f3 100644 --- a/include/linux/landlock.h +++ b/include/linux/landlock.h @@ -19,6 +19,9 @@ #include /* struct seccomp_filter */ #endif /* CONFIG_SECCOMP_FILTER */ +#ifdef CONFIG_CGROUP_BPF +#include /* struct cgroup */ +#endif /* CONFIG_CGROUP_BPF */ #ifdef CONFIG_SECCOMP_FILTER struct landlock_seccomp_ret { @@ -65,6 +68,7 @@ struct landlock_hooks { struct landlock_hooks *new_landlock_hooks(void); +void get_landlock_hooks(struct landlock_hooks *hooks); void put_landlock_hooks(struct landlock_hooks *hooks); #ifdef CONFIG_SECCOMP_FILTER @@ -73,5 +77,10 @@ int landlock_seccomp_set_hook(unsigned int flags, const char __user *user_bpf_fd); #endif /* CONFIG_SECCOMP_FILTER */ +#ifdef CONFIG_CGROUP_BPF +struct landlock_hooks *landlock_cgroup_set_hook(struct cgroup *cgrp, + struct bpf_prog *prog); +#endif /* CONFIG_CGROUP_BPF */ + #endif /* CONFIG_SECURITY_LANDLOCK */ #endif /* _LINUX_LANDLOCK_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 905dcace7255..12e61508f879 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -124,6 +124,7 @@ enum bpf_prog_type { enum bpf_attach_type { BPF_CGROUP_INET_INGRESS, BPF_CGROUP_INET_EGRESS, + BPF_CGROUP_LANDLOCK, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 7b75fa692617..1c18fe46958a 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -15,6 +15,7 @@ #include #include #include +#include DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key); EXPORT_SYMBOL(cgroup_bpf_enabled_key); @@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp) union bpf_object pinned = cgrp->bpf.pinned[type]; if (pinned.prog) { - bpf_prog_put(pinned.prog); + switch (type) { + case BPF_CGROUP_LANDLOCK: +#ifdef CONFIG_SECURITY_LANDLOCK + put_landlock_hooks(pinned.hooks); + break;
[RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd()
Add security access check for cgroup backed FD. The "cgroup.procs" file of the corresponding cgroup must be readable to identify the cgroup, and writable to prove that the current process can manage this cgroup (e.g. through delegation). This is similar to the check done by cgroup_procs_write_permission(). Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Daniel Mack Cc: David S. Miller Cc: Kees Cook Cc: Tejun Heo --- include/linux/cgroup.h | 2 +- kernel/bpf/arraymap.c | 2 +- kernel/bpf/syscall.c | 6 +++--- kernel/cgroup.c| 16 +++- 4 files changed, 20 insertions(+), 6 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index c4688742ddc4..5767d471e292 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -87,7 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry, struct cgroup_subsys *ss); struct cgroup *cgroup_get_from_path(const char *path); -struct cgroup *cgroup_get_from_fd(int fd); +struct cgroup *cgroup_get_from_fd(int fd, int access_mask); int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from); diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index edaab4c87292..1d4de8e0ab13 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -552,7 +552,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map, struct file *map_file /* not used */, int fd) { - return cgroup_get_from_fd(fd); + return cgroup_get_from_fd(fd, MAY_READ); } static void cgroup_fd_array_put_ptr(void *ptr) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e9c5add327e6..f90225dbbb59 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -17,6 +17,7 @@ #include #include #include +#include DEFINE_PER_CPU(int, bpf_prog_active); @@ -863,7 +864,7 @@ static int bpf_prog_attach(const union bpf_attr *attr) if (IS_ERR(prog)) return PTR_ERR(prog); - cgrp = cgroup_get_from_fd(attr->target_fd); + cgrp = cgroup_get_from_fd(attr->target_fd, MAY_WRITE); if (IS_ERR(cgrp)) { bpf_prog_put(prog); return PTR_ERR(cgrp); @@ -891,10 +892,9 @@ static int bpf_prog_detach(const union bpf_attr *attr) if (!capable(CAP_NET_ADMIN)) return -EPERM; - cgrp = cgroup_get_from_fd(attr->target_fd); + cgrp = cgroup_get_from_fd(attr->target_fd, MAY_WRITE); if (IS_ERR(cgrp)) return PTR_ERR(cgrp); - result = cgroup_bpf_update(cgrp, NULL, attr->attach_type); cgroup_put(cgrp); break; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 48b650a640a9..3bbaf3f02ed2 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -6241,17 +6241,20 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path); /** * cgroup_get_from_fd - get a cgroup pointer from a fd * @fd: fd obtained by open(cgroup2_dir) + * @access_mask: contains the permission mask * * Find the cgroup from a fd which should be obtained * by opening a cgroup directory. Returns a pointer to the * cgroup on success. ERR_PTR is returned if the cgroup * cannot be found. */ -struct cgroup *cgroup_get_from_fd(int fd) +struct cgroup *cgroup_get_from_fd(int fd, int access_mask) { struct cgroup_subsys_state *css; struct cgroup *cgrp; struct file *f; + struct inode *inode; + int ret; f = fget_raw(fd); if (!f) @@ -6268,6 +6271,17 @@ struct cgroup *cgroup_get_from_fd(int fd) return ERR_PTR(-EBADF); } + ret = -ENOMEM; + inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn); + if (inode) { + ret = inode_permission(inode, access_mask); + iput(inode); + } + if (ret) { + cgroup_put(cgrp); + return ERR_PTR(ret); + } + return cgrp; } EXPORT_SYMBOL_GPL(cgroup_get_from_fd); -- 2.9.3
[RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially set for all cgroup except the root. The flag is clear when a new process without the no_new_privs flags is attached to the cgroup. If a cgroup is landlocked, then any new attempt, from an unprivileged process, to attach a process without no_new_privs to this cgroup will be denied. This allows to safely manage Landlock rules with cgroup delegation as with seccomp. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Daniel Mack Cc: David S. Miller Cc: Kees Cook Cc: Tejun Heo --- include/linux/cgroup-defs.h | 7 +++ kernel/bpf/syscall.c| 7 --- kernel/cgroup.c | 44 ++-- security/landlock/manager.c | 7 +++ 4 files changed, 60 insertions(+), 5 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index fe1023bf7b9d..ce0e4c90ae7d 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -59,6 +59,13 @@ enum { * specified at mount time and thus is implemented here. */ CGRP_CPUSET_CLONE_CHILDREN, + /* +* Keep track of the no_new_privs property of processes in the cgroup. +* This is useful to quickly check if all processes in the cgroup have +* their no_new_privs bit on. This flag is initially set to true but +* ANDed with every processes coming in the cgroup. +*/ + CGRP_NO_NEW_PRIVS, }; /* cgroup_root->flags */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f90225dbbb59..ff8b53a8a2a0 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -849,9 +849,10 @@ static int bpf_prog_attach(const union bpf_attr *attr) case BPF_CGROUP_LANDLOCK: #ifdef CONFIG_SECURITY_LANDLOCK - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; - + /* +* security/capability check done in landlock_cgroup_set_hook() +* called by cgroup_bpf_update() +*/ prog = bpf_prog_get_type(attr->attach_bpf_fd, BPF_PROG_TYPE_LANDLOCK); break; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 3bbaf3f02ed2..913e2d3b6d55 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -1985,6 +1986,7 @@ static void init_cgroup_root(struct cgroup_root *root, strcpy(root->name, opts->name); if (opts->cpuset_clone_children) set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags); + /* no CGRP_NO_NEW_PRIVS flag for the root */ } static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) @@ -2812,14 +2814,35 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp, LIST_HEAD(preloaded_csets); struct task_struct *task; int ret; +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK) + bool no_new_privs; +#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */ if (!cgroup_may_migrate_to(dst_cgrp)) return -EBUSY; + task = leader; +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK) + no_new_privs = !!(dst_cgrp->flags & BIT_ULL(CGRP_NO_NEW_PRIVS)); + do { + no_new_privs = no_new_privs && task_no_new_privs(task); + if (!no_new_privs) { + if (dst_cgrp->bpf.pinned[BPF_CGROUP_LANDLOCK].hooks && + security_capable_noaudit(current_cred(), + current_user_ns(), + CAP_SYS_ADMIN) != 0) + return -EPERM; + clear_bit(CGRP_NO_NEW_PRIVS, &dst_cgrp->flags); + break; + } + if (!threadgroup) + break; + } while_each_thread(leader, task); +#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */ + /* look up all src csets */ spin_lock_irq(&css_set_lock); rcu_read_lock(); - task = leader; do { cgroup_migrate_add_src(task_css_set(task), dst_cgrp, &preloaded_csets); @@ -4345,9 +4368,22 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from) return -EBUSY; mutex_lock(&cgroup_mutex); - percpu_down_write(&cgroup_threadgroup_rwsem); +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK) + if (!(from->flags & BIT_ULL(CGRP_NO_NEW_PRIVS))) { + if (to->bpf.pinned[BPF_CGROUP_LANDLOCK].hooks && +
[RFC v3 22/22] samples/landlock: Add sandbox example
Add a basic sandbox tool to create a process isolated from some part of the system. This can depend of the current cgroup. Example with the current process hierarchy (seccomp): $ ls /home user1 $ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \ ./samples/landlock/sandbox /bin/sh -i Launching a new sandboxed process. $ ls /home ls: cannot open directory '/home': Permission denied Example with a cgroup: $ mkdir /sys/fs/cgroup/sandboxed $ ls /home user1 $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \ ./samples/landlock/sandbox Ready to sandbox with cgroups. $ ls /home user1 $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs $ ls /home ls: cannot open directory '/home': Permission denied Changes since v2: * use BPF_PROG_ATTACH for cgroup handling Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: David S. Miller Cc: James Morris Cc: Kees Cook Cc: Serge E. Hallyn --- samples/Makefile| 2 +- samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 16 +++ samples/landlock/sandbox.c | 307 4 files changed, 325 insertions(+), 1 deletion(-) create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandbox.c diff --git a/samples/Makefile b/samples/Makefile index 1a20169d85ac..a2dcd57ca7ac 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -2,4 +2,4 @@ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \ - configfs/ connector/ v4l/ trace_printk/ + configfs/ connector/ v4l/ trace_printk/ landlock/ diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore new file mode 100644 index ..f6c6da930a30 --- /dev/null +++ b/samples/landlock/.gitignore @@ -0,0 +1 @@ +/sandbox diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile new file mode 100644 index ..d1044b2afd27 --- /dev/null +++ b/samples/landlock/Makefile @@ -0,0 +1,16 @@ +# kbuild trick to avoid linker error. Can be omitted if a module is built. +obj- := dummy.o + +hostprogs-$(CONFIG_SECURITY_LANDLOCK) := sandbox +sandbox-objs := sandbox.o + +always := $(hostprogs-y) + +HOSTCFLAGS += -I$(objtree)/usr/include + +# Trick to allow make to be run from this directory +all: + $(MAKE) -C ../../ $$PWD/ + +clean: + $(MAKE) -C ../../ M=$$PWD clean diff --git a/samples/landlock/sandbox.c b/samples/landlock/sandbox.c new file mode 100644 index ..9d6ac00cdd23 --- /dev/null +++ b/samples/landlock/sandbox.c @@ -0,0 +1,307 @@ +/* + * Landlock LSM - Sandbox example + * + * Copyright (C) 2016 Mickaël Salaün + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 3, as + * published by the Free Software Foundation. + */ + +#define _GNU_SOURCE +#include +#include /* open() */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../../tools/include/linux/filter.h" + +#include "../bpf/libbpf.c" + +#ifndef seccomp +static int seccomp(unsigned int op, unsigned int flags, void *args) +{ + errno = 0; + return syscall(__NR_seccomp, op, flags, args); +} +#endif + +static int landlock_prog_load(const struct bpf_insn *insns, int prog_len, + enum landlock_hook_id hook_id, __u64 access) +{ + union bpf_attr attr = { + .prog_type = BPF_PROG_TYPE_LANDLOCK, + .insns = ptr_to_u64((void *) insns), + .insn_cnt = prog_len / sizeof(struct bpf_insn), + .license = ptr_to_u64((void *) "GPL"), + .log_buf = ptr_to_u64(bpf_log_buf), + .log_size = LOG_BUF_SIZE, + .log_level = 1, + .prog_subtype.landlock_hook = { + .id = hook_id, + .origin = LANDLOCK_FLAG_ORIGIN_SECCOMP | + LANDLOCK_FLAG_ORIGIN_SYSCALL | + LANDLOCK_FLAG_ORIGIN_INTERRUPT, + .access = access, + }, + }; + + /* assign one field outside of struct init to make sure any +* padding is zero initialized +*/ + attr.kern_version = 0; + + bpf_log_buf[0] = 0; + + return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr)); +} + +#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0])) + +static int apply_sandbox(const char **allowed_paths, int path_nb, const char + **cgroup_paths, int cgroup_nb) +{ + __u32 key; + int i,
[RFC v3 19/22] landlock: Add interrupted origin
This third origin of hook call should cover all possible trigger paths (e.g. page fault). Landlock eBPF programs can then take decisions accordingly. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: Kees Cook --- include/uapi/linux/bpf.h | 3 ++- security/landlock/lsm.c | 17 +++-- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 12e61508f879..3cc52e51357f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -580,7 +580,8 @@ enum landlock_hook_id { /* Trigger type */ #define LANDLOCK_FLAG_ORIGIN_SYSCALL (1 << 0) #define LANDLOCK_FLAG_ORIGIN_SECCOMP (1 << 1) -#define _LANDLOCK_FLAG_ORIGIN_MASK ((1 << 2) - 1) +#define LANDLOCK_FLAG_ORIGIN_INTERRUPT (1 << 2) +#define _LANDLOCK_FLAG_ORIGIN_MASK ((1 << 3) - 1) /* context of function access flags */ #define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 0) - 1) diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c index 000dd0c7ec3d..2a15839a08c8 100644 --- a/security/landlock/lsm.c +++ b/security/landlock/lsm.c @@ -17,6 +17,7 @@ #include /* FIELD_SIZEOF() */ #include #include +#include /* in_interrupt() */ #include /* struct seccomp_* */ #include /* uintptr_t */ @@ -109,6 +110,7 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6]) #endif /* CONFIG_CGROUP_BPF */ struct landlock_rule *rule; u32 hook_idx = get_index(hook_id); + u16 current_call; struct landlock_data ctx = { .hook = hook_id, @@ -128,6 +130,16 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6]) * prioritize fine-grained policies (i.e. per thread), and return early. */ + if (unlikely(in_interrupt())) { + current_call = LANDLOCK_FLAG_ORIGIN_INTERRUPT; +#ifdef CONFIG_SECCOMP_FILTER + /* bypass landlock_ret evaluation */ + goto seccomp_int; +#endif /* CONFIG_SECCOMP_FILTER */ + } else { + current_call = LANDLOCK_FLAG_ORIGIN_SYSCALL; + } + #ifdef CONFIG_SECCOMP_FILTER /* seccomp triggers and landlock_ret cleanup */ ctx.origin = LANDLOCK_FLAG_ORIGIN_SECCOMP; @@ -164,8 +176,9 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6]) return -ret; ctx.cookie = 0; +seccomp_int: /* syscall trigger */ - ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL; + ctx.origin = current_call; ret = landlock_run_prog_for_syscall(hook_idx, &ctx, current->seccomp.landlock_hooks); if (ret) @@ -175,7 +188,7 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, __u64 args[6]) #ifdef CONFIG_CGROUP_BPF /* syscall trigger */ if (cgroup_bpf_enabled) { - ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL; + ctx.origin = current_call; /* get the default cgroup associated with the current thread */ cgrp = task_css_set(current)->dfl_cgrp; ret = landlock_run_prog_for_syscall(hook_idx, &ctx, -- 2.9.3
[RFC v3 02/22] bpf: Move u64_to_ptr() to BPF headers and inline it
This helper will be useful for arraymap (next commit). Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: David S. Miller Cc: Daniel Borkmann --- include/linux/bpf.h | 6 ++ kernel/bpf/syscall.c | 6 -- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9a904f63f8c1..fa9a988400d9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -274,6 +274,12 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) /* verify correctness of eBPF program */ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); + +/* helper to convert user pointers passed inside __aligned_u64 fields */ +static inline void __user *u64_to_ptr(__u64 val) +{ + return (void __user *) (unsigned long) val; +} #else static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 1a8592a082ce..776c752604b0 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -252,12 +252,6 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) return map; } -/* helper to convert user pointers passed inside __aligned_u64 fields */ -static void __user *u64_to_ptr(__u64 val) -{ - return (void __user *) (unsigned long) val; -} - int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) { return -ENOTSUPP; -- 2.9.3
[RFC v3 06/22] landlock: Add LSM hooks
Add LSM hooks which can be used by userland through Landlock (eBPF) programs. This programs are limited to a whitelist of functions (cf. next commit). The eBPF program context is depicted by the struct landlock_data (cf. include/uapi/linux/bpf.h): * hook: LSM hook ID * origin: what triggered this Landlock program (syscall, dedicated seccomp return or interruption) * cookie: the 16-bit value from the seccomp filter that triggered this Landlock program * args[6]: array of some LSM hook arguments The LSM hook arguments can contain raw values as integers or (unleakable) pointers. The only way to use the pointers are to pass them to an eBPF function according to their types (e.g. the bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct file pointer). For each Landlock program, the subtype allows to specify for which LSM hook the program is dedicated thanks to the "id" field. The "origin" field must contains each triggers for which the Landlock program will be called (e.g. every syscall or/and seccomp filters returning RET_LANDLOCK). The "access" bitfield can be used to allow a program to access a specific feature from a Landlock hook (i.e. context value or function). The flag guarding this feature may only be enabled according to the capabilities of the process loading the program. For now, there is three hooks for file system access control: * file_open * file_permission * mmap_file Changes since v2: * use subtypes instead of dedicated eBPF program types for each hook (suggested by Alexei Starovoitov) * replace convert_ctx_access() with subtype check * use an array of Landlock program list instead of a single list * handle running Landlock programs without needing a seccomp filter * use, check and expose "origin" to Landlock programs * mask the unused struct cred * (suggested by Andy Lutomirski) Changes since v1: * revamp access control from a syscall-based to a LSM hooks-based * do not use audit cache * no race conditions by design * architecture agnostic * switch from cBPF to eBPF (suggested by Daniel Borkmann) * new BPF context Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: David S. Miller Cc: James Morris Cc: Kees Cook Cc: Serge E. Hallyn Cc: Will Drewry Link: https://lkml.kernel.org/r/20160827205559.ga43...@ast-mbp.thefacebook.com Link: https://lkml.kernel.org/r/20160827180642.ga38...@ast-mbp.thefacebook.com Link: https://lkml.kernel.org/r/CALCETrUK1umtXMEXXKzMAccNQCVTPA8_XNDf01B5=gazujw...@mail.gmail.com Link: https://lkml.kernel.org/r/20160827204307.ga43...@ast-mbp.thefacebook.com --- include/linux/bpf.h| 5 + include/linux/lsm_hooks.h | 5 + include/uapi/linux/bpf.h | 37 kernel/bpf/syscall.c | 10 +- kernel/bpf/verifier.c | 6 ++ security/Makefile | 2 + security/landlock/Makefile | 3 + security/landlock/lsm.c| 222 + security/security.c| 1 + 9 files changed, 289 insertions(+), 2 deletions(-) create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/lsm.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9aa01d9d3d80..36c3e482239c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -85,6 +85,8 @@ enum bpf_arg_type { ARG_PTR_TO_CTX, /* pointer to context */ ARG_ANYTHING, /* any (initialized) argument is ok */ + + ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ }; /* type of values returned from helper functions */ @@ -143,6 +145,9 @@ enum bpf_reg_type { */ PTR_TO_PACKET, PTR_TO_PACKET_END, /* skb->data + headlen */ + + /* Landlock */ + PTR_TO_STRUCT_FILE, }; struct bpf_prog; diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 558adfa5c8a8..069af34301d4 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1933,5 +1933,10 @@ void __init loadpin_add_hooks(void); #else static inline void loadpin_add_hooks(void) { }; #endif +#ifdef CONFIG_SECURITY_LANDLOCK +extern void __init landlock_add_hooks(void); +#else +static inline void __init landlock_add_hooks(void) { } +#endif /* CONFIG_SECURITY_LANDLOCK */ #endif /* ! __LINUX_LSM_HOOKS_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 667b6ef3ff1e..ad87003fe892 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -108,6 +108,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_XDP, BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_CGROUP_SOCKET, + BPF_PROG_TYPE_LANDLOCK, }; enum bpf_attach_type { @@ -528,6 +529,23 @@ struct xdp_md { __u32 data_end; }; +/* LSM hooks */ +enum landlock_hook_id { + LANDLOCK_HOOK_UNSPEC, + LANDLOCK_HOOK_FILE_OPEN, + LANDLOCK_HOOK_FILE_PERMISSION, + LANDLOCK_HOOK_MMAP_FILE, +}; +#define _LANDLOCK_HOOK
[RFC v3 10/22] seccomp: Split put_seccomp_filter() with put_seccomp()
The semantic is unchanged. This will be useful for the Landlock integration with seccomp (next commit). Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry --- include/linux/seccomp.h | 5 +++-- kernel/fork.c | 2 +- kernel/seccomp.c| 18 +- 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index a0459a7315ce..ffdab7cdd162 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -102,13 +102,14 @@ static inline int seccomp_mode(struct seccomp *s) #endif /* CONFIG_SECCOMP */ #ifdef CONFIG_SECCOMP_FILTER -extern void put_seccomp_filter(struct task_struct *tsk); +extern void put_seccomp(struct task_struct *tsk); extern void get_seccomp_filter(struct task_struct *tsk); #else /* CONFIG_SECCOMP_FILTER */ -static inline void put_seccomp_filter(struct task_struct *tsk) +static inline void put_seccomp(struct task_struct *tsk) { return; } + static inline void get_seccomp_filter(struct task_struct *tsk) { return; diff --git a/kernel/fork.c b/kernel/fork.c index 3584f521e3a6..99df46f157cf 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -276,7 +276,7 @@ void free_task(struct task_struct *tsk) free_thread_stack(tsk); rt_mutex_debug_task_free(tsk); ftrace_graph_exit_task(tsk); - put_seccomp_filter(tsk); + put_seccomp(tsk); arch_release_task_struct(tsk); free_task_struct(tsk); } diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 1867bbfa7c6c..92b15083b1b2 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -36,6 +36,8 @@ /* Limit any path through the tree to 256KB worth of instructions. */ #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter)) +static void put_seccomp_filter(struct seccomp_filter *filter); + /* * Endianness is explicitly ignored and left for BPF program authors to manage * as per the specific architecture. @@ -286,7 +288,7 @@ static inline void seccomp_sync_threads(void) * current's path will hold a reference. (This also * allows a put before the assignment.) */ - put_seccomp_filter(thread); + put_seccomp_filter(thread->seccomp.filter); smp_store_release(&thread->seccomp.filter, caller->seccomp.filter); @@ -448,10 +450,11 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter) } } -/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */ -void put_seccomp_filter(struct task_struct *tsk) +/* put_seccomp_filter - decrements the ref count of a filter */ +static void put_seccomp_filter(struct seccomp_filter *filter) { - struct seccomp_filter *orig = tsk->seccomp.filter; + struct seccomp_filter *orig = filter; + /* Clean up single-reference branches iteratively. */ while (orig && atomic_dec_and_test(&orig->usage)) { struct seccomp_filter *freeme = orig; @@ -460,6 +463,11 @@ void put_seccomp_filter(struct task_struct *tsk) } } +void put_seccomp(struct task_struct *tsk) +{ + put_seccomp_filter(tsk->seccomp.filter); +} + /** * seccomp_send_sigsys - signals the task to allow in-process syscall emulation * @syscall: syscall number to send to userland @@ -871,7 +879,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off, if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) ret = -EFAULT; - put_seccomp_filter(task); + put_seccomp_filter(task->seccomp.filter); return ret; out: -- 2.9.3
[RFC v3 01/22] landlock: Add Kconfig
Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp parts to ease the review. Changes from v2: * add seccomp filter or cgroups (with eBPF programs attached support) dependencies Signed-off-by: Mickaël Salaün Cc: James Morris Cc: Kees Cook Cc: Serge E. Hallyn --- security/Kconfig | 1 + security/landlock/Kconfig | 23 +++ 2 files changed, 24 insertions(+) create mode 100644 security/landlock/Kconfig diff --git a/security/Kconfig b/security/Kconfig index 118f4549404e..c63194c561c5 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -164,6 +164,7 @@ source security/tomoyo/Kconfig source security/apparmor/Kconfig source security/loadpin/Kconfig source security/yama/Kconfig +source security/landlock/Kconfig source security/integrity/Kconfig diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index ..dec64270b06d --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,23 @@ +config SECURITY_LANDLOCK + bool "Landlock sandbox support" + depends on SECURITY + depends on BPF_SYSCALL + depends on SECCOMP_FILTER || CGROUP_BPF + default y + help + Landlock is a stacked LSM which allows any user to load a security + policy to restrict their processes (i.e. create a sandbox). The + policy is a list of stacked eBPF programs for some LSM hooks. Each + program can do some access comparison to check if an access request + is legitimate. + + You need to enable seccomp filter and/or cgroups (with eBPF programs + attached support) to apply a security policy to either a process + hierarchy (e.g. application with built-in sandboxing) or a group of + processes (e.g. container sandboxing). It is recommended to enable + both seccomp filter and cgroups. + + Further information about eBPF can be found in + Documentation/networking/filter.txt + + If you are unsure how to answer this question, answer Y. -- 2.9.3
[RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
This new arraymap looks like a set and brings new properties: * strong typing of entries: the eBPF functions get the array type of elements instead of CONST_PTR_TO_MAP (e.g. CONST_PTR_TO_LANDLOCK_HANDLE_FS); * force sequential filling (i.e. replace or append-only update), which allow quick browsing of all entries. This strong typing is useful to statically check if the content of a map can be passed to an eBPF function. For example, Landlock use it to store and manage kernel objects (e.g. struct file) instead of dealing with userland raw data. This improve efficiency and ensure that an eBPF program can only call functions with the right high-level arguments. The enum bpf_map_handle_type list low-level types (e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when updating a map entry (handle). This handle types are used to infer a high-level arraymap type which are listed in enum bpf_map_array_type (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS). For now, this new arraymap is only used by Landlock LSM (cf. next commits) but it could be useful for other needs. Changes since v2: * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap handle entries (suggested by Andy Lutomirski) * remove useless checks Changes since v1: * arraymap of handles replace custom checker groups * simpler userland API Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: David S. Miller Cc: Kees Cook Link: https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com --- include/linux/bpf.h | 14 include/uapi/linux/bpf.h | 18 + kernel/bpf/arraymap.c| 203 +++ kernel/bpf/verifier.c| 12 ++- 4 files changed, 246 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index fa9a988400d9..eae4ce4542c1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -13,6 +13,10 @@ #include #include +#ifdef CONFIG_SECURITY_LANDLOCK +#include /* struct file */ +#endif /* CONFIG_SECURITY_LANDLOCK */ + struct perf_event; struct bpf_map; @@ -38,6 +42,7 @@ struct bpf_map_ops { struct bpf_map { atomic_t refcnt; enum bpf_map_type map_type; + enum bpf_map_array_type map_array_type; u32 key_size; u32 value_size; u32 max_entries; @@ -187,6 +192,9 @@ struct bpf_array { */ enum bpf_prog_type owner_prog_type; bool owner_jited; +#ifdef CONFIG_SECURITY_LANDLOCK + u32 n_entries; /* number of entries in a handle array */ +#endif /* CONFIG_SECURITY_LANDLOCK */ union { char value[0] __aligned(8); void *ptrs[0] __aligned(8); @@ -194,6 +202,12 @@ struct bpf_array { }; }; +#ifdef CONFIG_SECURITY_LANDLOCK +struct map_landlock_handle { + u32 type; /* enum bpf_map_handle_type */ +}; +#endif /* CONFIG_SECURITY_LANDLOCK */ + #define MAX_TAIL_CALL_CNT 32 struct bpf_event_entry { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7cd36166f9b7..b68de57f7ab8 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -87,6 +87,15 @@ enum bpf_map_type { BPF_MAP_TYPE_PERCPU_ARRAY, BPF_MAP_TYPE_STACK_TRACE, BPF_MAP_TYPE_CGROUP_ARRAY, + BPF_MAP_TYPE_LANDLOCK_ARRAY, +}; + +enum bpf_map_array_type { + BPF_MAP_ARRAY_TYPE_UNSPEC, +}; + +enum bpf_map_handle_type { + BPF_MAP_HANDLE_TYPE_UNSPEC, }; enum bpf_prog_type { @@ -510,4 +519,13 @@ struct xdp_md { __u32 data_end; }; +/* Map handle entry */ +struct landlock_handle { + __u32 type; /* enum bpf_map_handle_type */ + union { + __u32 fd; + __aligned_u64 glob; + }; +} __attribute__((aligned(8))); + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index a2ac051c342f..94256597eacd 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -16,6 +16,13 @@ #include #include #include +#include /* fput() */ +#include /* struct file */ + +#ifdef CONFIG_SECURITY_LANDLOCK +#include /* RLIMIT_NOFILE */ +#include /* rlimit() */ +#endif /* CONFIG_SECURITY_LANDLOCK */ static void bpf_array_free_percpu(struct bpf_array *array) { @@ -580,3 +587,199 @@ static int __init register_cgroup_array_map(void) } late_initcall(register_cgroup_array_map); #endif + +#ifdef CONFIG_SECURITY_LANDLOCK +static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr) +{ + if (attr->value_size != sizeof(struct landlock_handle)) + return ERR_PTR(-EINVAL); + attr->value_size = sizeof(struct map_landlock_handle); + + return array_map_alloc(attr); +} + +static void landlock_put_handle(struct map_landlock_handle *handle) +{ + enum bpf_map_handle_type handle_type = handle->type; + + switch (handle_type) { + case BPF_MAP_HANDLE_TYPE_UNSPEC: +
[RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context
This is a proof of concept to expose optional values that could depend of the process access rights. There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access eBPF functions manipulating a skb in a read or write way. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: David S. Miller Cc: Kees Cook Cc: Sargun Dhillon --- include/linux/bpf.h | 2 ++ include/uapi/linux/bpf.h | 7 ++- kernel/bpf/verifier.c| 6 ++ security/landlock/lsm.c | 26 ++ 4 files changed, 40 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f7325c17f720..218973777612 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -88,6 +88,7 @@ enum bpf_arg_type { ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS handle */ + ARG_PTR_TO_STRUCT_SKB, /* pointer to struct skb */ }; /* type of values returned from helper functions */ @@ -150,6 +151,7 @@ enum bpf_reg_type { /* Landlock */ PTR_TO_STRUCT_FILE, CONST_PTR_TO_LANDLOCK_HANDLE_FS, + PTR_TO_STRUCT_SKB, }; struct bpf_prog; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8cfc2de2ab76..7d9e56952ed9 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -586,7 +586,9 @@ enum landlock_hook_id { /* context of function access flags */ #define LANDLOCK_FLAG_ACCESS_UPDATE(1 << 0) #define LANDLOCK_FLAG_ACCESS_DEBUG (1 << 1) -#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 2) - 1) +#define LANDLOCK_FLAG_ACCESS_SKB_READ (1 << 2) +#define LANDLOCK_FLAG_ACCESS_SKB_WRITE (1 << 3) +#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 4) - 1) /* Handle check flags */ #define LANDLOCK_FLAG_FS_DENTRY(1 << 0) @@ -619,12 +621,15 @@ struct landlock_handle { * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there *description and the LANDLOCK_HOOK* definitions from *security/landlock/lsm.c for their types. + * @opt_skb: optional skb pointer, accessible with the + * LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks. */ struct landlock_data { __u32 hook; /* enum landlock_hook_id */ __u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */ __u16 cookie; /* seccomp RET_LANDLOCK */ __u64 args[6]; + __u64 opt_skb; }; #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8d7b18574f5a..a95154c1a60f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -247,6 +247,7 @@ static const char * const reg_type_str[] = { [PTR_TO_PACKET_END] = "pkt_end", [PTR_TO_STRUCT_FILE]= "struct_file", [CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs", + [PTR_TO_STRUCT_SKB] = "struct_skb", }; static void print_verifier_state(struct verifier_state *state) @@ -559,6 +560,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type) case CONST_PTR_TO_MAP: case PTR_TO_STRUCT_FILE: case CONST_PTR_TO_LANDLOCK_HANDLE_FS: + case PTR_TO_STRUCT_SKB: return true; default: return false; @@ -984,6 +986,10 @@ static int check_func_arg(struct verifier_env *env, u32 regno, expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS; if (type != expected_type) goto err_type; + } else if (arg_type == ARG_PTR_TO_STRUCT_SKB) { + expected_type = PTR_TO_STRUCT_SKB; + if (type != expected_type) + goto err_type; } else if (arg_type == ARG_PTR_TO_STACK || arg_type == ARG_PTR_TO_RAW_STACK) { expected_type = PTR_TO_STACK; diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c index 56c45abe979c..8b0e6f0eb6b7 100644 --- a/security/landlock/lsm.c +++ b/security/landlock/lsm.c @@ -281,6 +281,7 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type, break; case offsetof(struct landlock_data, args[0]) ... offsetof(struct landlock_data, args[5]): + case offsetof(struct landlock_data, opt_skb): expected_size = sizeof(__u64); break; default: @@ -299,6 +300,13 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type, if (*reg_type == NOT_INIT) return false; break; + case offsetof(struct landlock_data, opt_skb): + if (!(prog_subtype->landlock_hook.access & + (LANDLOCK_FLAG_ACCESS_SKB_READ | +
[RFC v3 13/22] bpf/cgroup: Replace struct bpf_prog with union bpf_object
This allows CONFIG_CGROUP_BPF to manage different type of pointers instead of only eBPF programs. This will be useful for the next commits to support Landlock with cgroups. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Daniel Mack Cc: David S. Miller Cc: Tejun Heo --- include/linux/bpf-cgroup.h | 8 ++-- kernel/bpf/cgroup.c| 44 +++- 2 files changed, 29 insertions(+), 23 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index fc076de74ab9..2234042d7f61 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -14,14 +14,18 @@ struct sk_buff; extern struct static_key_false cgroup_bpf_enabled_key; #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key) +union bpf_object { + struct bpf_prog *prog; +}; + struct cgroup_bpf { /* * Store two sets of bpf_prog pointers, one for programs that are * pinned directly to this cgroup, and one for those that are effective * when this cgroup is accessed. */ - struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE]; - struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE]; + union bpf_object pinned[MAX_BPF_ATTACH_TYPE]; + union bpf_object effective[MAX_BPF_ATTACH_TYPE]; }; void cgroup_bpf_put(struct cgroup *cgrp); diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 21d168c3ad35..782878ec4f2d 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -20,18 +20,18 @@ DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key); EXPORT_SYMBOL(cgroup_bpf_enabled_key); /** - * cgroup_bpf_put() - put references of all bpf programs + * cgroup_bpf_put() - put references of all bpf objects * @cgrp: the cgroup to modify */ void cgroup_bpf_put(struct cgroup *cgrp) { unsigned int type; - for (type = 0; type < ARRAY_SIZE(cgrp->bpf.prog); type++) { - struct bpf_prog *prog = cgrp->bpf.prog[type]; + for (type = 0; type < ARRAY_SIZE(cgrp->bpf.pinned); type++) { + union bpf_object pinned = cgrp->bpf.pinned[type]; - if (prog) { - bpf_prog_put(prog); + if (pinned.prog) { + bpf_prog_put(pinned.prog); static_branch_dec(&cgroup_bpf_enabled_key); } } @@ -47,11 +47,12 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent) unsigned int type; for (type = 0; type < ARRAY_SIZE(cgrp->bpf.effective); type++) { - struct bpf_prog *e; + union bpf_object e; - e = rcu_dereference_protected(parent->bpf.effective[type], - lockdep_is_held(&cgroup_mutex)); - rcu_assign_pointer(cgrp->bpf.effective[type], e); + e.prog = rcu_dereference_protected( + parent->bpf.effective[type].prog, + lockdep_is_held(&cgroup_mutex)); + rcu_assign_pointer(cgrp->bpf.effective[type].prog, e.prog); } } @@ -87,32 +88,33 @@ void __cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type) { - struct bpf_prog *old_prog, *effective; + union bpf_object obj, old_pinned, effective; struct cgroup_subsys_state *pos; - old_prog = xchg(cgrp->bpf.prog + type, prog); + obj.prog = prog; + old_pinned = xchg(cgrp->bpf.pinned + type, obj); - effective = (!prog && parent) ? - rcu_dereference_protected(parent->bpf.effective[type], + effective.prog = (!obj.prog && parent) ? + rcu_dereference_protected(parent->bpf.effective[type].prog, lockdep_is_held(&cgroup_mutex)) : - prog; + obj.prog; css_for_each_descendant_pre(pos, &cgrp->self) { struct cgroup *desc = container_of(pos, struct cgroup, self); /* skip the subtree if the descendant has its own program */ - if (desc->bpf.prog[type] && desc != cgrp) + if (desc->bpf.pinned[type].prog && desc != cgrp) pos = css_rightmost_descendant(pos); else - rcu_assign_pointer(desc->bpf.effective[type], - effective); + rcu_assign_pointer(desc->bpf.effective[type].prog, + effective.prog); } - if (prog) + if (obj.prog) static_branch_inc(&cgroup_bpf_enabled_key); - if (old_prog) { - bpf_prog_put(old_prog); + if (old_pinned.prog) { +
[RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
A Landlock program will be triggered according to its subtype/origin bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the Landlock program when a seccomp filter will return RET_LANDLOCK. Moreover, it is possible to return a 16-bit cookie which will be readable by the Landlock programs in its context. Only seccomp filters loaded from the same thread and before a Landlock program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple Landlock programs can be triggered by one or more seccomp filters. This way, each RET_LANDLOCK (with specific cookie) will trigger all the allowed Landlock programs once. Changes since v2: * Landlock programs can now be run without seccomp filter but for any syscall (from the process) or interruption * move Landlock related functions and structs into security/landlock/* (to manage cgroups as well) * fix seccomp filter handling: run Landlock programs for each of their legitimate seccomp filter * properly clean up all seccomp results * cosmetic changes to ease the understanding * fix some ifdef Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Andrew Morton --- include/linux/landlock.h | 77 ++ include/linux/seccomp.h | 26 + include/uapi/linux/seccomp.h | 2 + kernel/fork.c| 23 +++- kernel/seccomp.c | 68 +++- security/landlock/Makefile | 2 +- security/landlock/common.h | 27 + security/landlock/lsm.c | 96 - security/landlock/manager.c | 242 +++ 9 files changed, 552 insertions(+), 11 deletions(-) create mode 100644 include/linux/landlock.h create mode 100644 security/landlock/common.h create mode 100644 security/landlock/manager.c diff --git a/include/linux/landlock.h b/include/linux/landlock.h new file mode 100644 index ..932ae57fa70e --- /dev/null +++ b/include/linux/landlock.h @@ -0,0 +1,77 @@ +/* + * Landlock LSM - Public headers + * + * Copyright (C) 2016 Mickaël Salaün + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. + */ + +#ifndef _LINUX_LANDLOCK_H +#define _LINUX_LANDLOCK_H +#ifdef CONFIG_SECURITY_LANDLOCK + +#include /* _LANDLOCK_HOOK_LAST */ +#include /* atomic_t */ + +#ifdef CONFIG_SECCOMP_FILTER +#include /* struct seccomp_filter */ +#endif /* CONFIG_SECCOMP_FILTER */ + + +#ifdef CONFIG_SECCOMP_FILTER +struct landlock_seccomp_ret { + struct landlock_seccomp_ret *prev; + struct seccomp_filter *filter; + u16 cookie; + bool triggered; +}; +#endif /* CONFIG_SECCOMP_FILTER */ + +struct landlock_rule { + atomic_t usage; + struct landlock_rule *prev; + /* +* List of filters (through filter->thread_prev) allowed to trigger +* this Landlock program. +*/ + struct bpf_prog *prog; +#ifdef CONFIG_SECCOMP_FILTER + struct seccomp_filter *thread_filter; +#endif /* CONFIG_SECCOMP_FILTER */ +}; + +/** + * struct landlock_hooks - Landlock hook programs enforced on a thread + * + * This is used for low performance impact when forking a process. Instead of + * copying the full array and incrementing the usage field of each entries, + * only create a pointer to struct landlock_hooks and increment the usage + * field. + * + * A new struct landlock_hooks must be created thanks to a call to + * new_landlock_hooks(). + * + * @usage: reference count to manage the object lifetime. When a thread need to + * add Landlock programs and if @usage is greater than 1, then the + * thread must duplicate struct landlock_hooks to not change the + * children' rules as well. + */ +struct landlock_hooks { + atomic_t usage; + struct landlock_rule *rules[_LANDLOCK_HOOK_LAST]; +}; + + +struct landlock_hooks *new_landlock_hooks(void); +void put_landlock_hooks(struct landlock_hooks *hooks); + +#ifdef CONFIG_SECCOMP_FILTER +void put_landlock_ret(struct landlock_seccomp_ret *landlock_ret); +int landlock_seccomp_set_hook(unsigned int flags, + const char __user *user_bpf_fd); +#endif /* CONFIG_SECCOMP_FILTER */ + +#endif /* CONFIG_SECURITY_LANDLOCK */ +#endif /* _LINUX_LANDLOCK_H */ diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index ffdab7cdd162..3cb90bf43a24 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -10,6 +10,10 @@ #include #include +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK) +#include +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */ + /** * struct seccomp_filter - container for seccomp BPF programs * @@ -19,6 +23,7 @@ * is only needed for handling filters shared across tasks. * @prev: points to a previously installed, or inherited, filter * @prog: the BPF program to evaluate
[RFC v3 00/22] Landlock LSM: Unprivileged sandboxing
ck LSM. [1] https://lkml.kernel.org/r/1472121165-29071-1-git-send-email-...@digikod.net [2] https://crypto.stanford.edu/cs155/papers/traps.pdf [3] https://lkml.kernel.org/r/1473696735-11269-1-git-send-email-dan...@zonque.org Regards, Mickaël Salaün (22): landlock: Add Kconfig bpf: Move u64_to_ptr() to BPF headers and inline it bpf,landlock: Add a new arraymap type to deal with (Landlock) handles bpf: Set register type according to is_valid_access() bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier landlock: Add LSM hooks landlock: Handle file comparisons seccomp: Fix documentation for struct seccomp_filter seccomp: Move struct seccomp_filter in seccomp.h seccomp: Split put_seccomp_filter() with put_seccomp() seccomp,landlock: Handle Landlock hooks per process hierarchy bpf: Cosmetic change for bpf_prog_attach() bpf/cgroup: Replace struct bpf_prog with union bpf_object bpf/cgroup: Make cgroup_bpf_update() return an error code bpf/cgroup: Move capability check bpf/cgroup,landlock: Handle Landlock hooks per cgroup cgroup: Add access check for cgroup_get_from_fd() cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks landlock: Add interrupted origin landlock: Add update and debug access flags bpf,landlock: Add optional skb pointer in the Landlock context samples/landlock: Add sandbox example include/linux/bpf-cgroup.h | 19 +- include/linux/bpf.h| 45 +++- include/linux/cgroup-defs.h| 9 + include/linux/cgroup.h | 2 +- include/linux/filter.h | 1 + include/linux/landlock.h | 86 include/linux/lsm_hooks.h | 5 + include/linux/seccomp.h| 58 +- include/uapi/linux/bpf.h | 122 +++ include/uapi/linux/seccomp.h | 2 + kernel/bpf/arraymap.c | 226 - kernel/bpf/cgroup.c| 78 --- kernel/bpf/syscall.c | 78 --- kernel/bpf/verifier.c | 47 - kernel/cgroup.c| 66 +- kernel/fork.c | 25 ++- kernel/seccomp.c | 107 +++--- kernel/trace/bpf_trace.c | 12 +- net/core/filter.c | 21 +- samples/Makefile | 2 +- samples/landlock/.gitignore| 1 + samples/landlock/Makefile | 16 ++ samples/landlock/sandbox.c | 307 security/Kconfig | 1 + security/Makefile | 2 + security/landlock/Kconfig | 23 +++ security/landlock/Makefile | 3 + security/landlock/checker_fs.c | 179 security/landlock/checker_fs.h | 20 ++ security/landlock/common.h | 27 +++ security/landlock/lsm.c| 451 + security/landlock/manager.c| 281 + security/security.c| 1 + 33 files changed, 2194 insertions(+), 129 deletions(-) create mode 100644 include/linux/landlock.h create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandbox.c create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/checker_fs.c create mode 100644 security/landlock/checker_fs.h create mode 100644 security/landlock/common.h create mode 100644 security/landlock/lsm.c create mode 100644 security/landlock/manager.c -- 2.9.3
[RFC v3 07/22] landlock: Handle file comparisons
Add eBPF functions to compare file system access with a Landlock file system handle: * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) This function allows to compare the dentry, inode, device or mount point of the currently accessed file, with a reference handle. * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) This function allows an eBPF program to check if the current accessed file is the same or in the hierarchy of a reference handle. The goal of file system handle is to abstract kernel objects such as a struct file or a struct inode. Userland can create this kind of handle thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct landlock_handle containing the handle type (e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could also be any descriptions able to match a struct file or a struct inode (e.g. path or glob string). Changes since v2: * add MNT_INTERNAL check to only add file handle from user-visible FS (e.g. no anonymous inode) * replace struct file* with struct path* in map_landlock_handle * add BPF protos * fix bpf_landlock_cmp_fs_prop_with_struct_file() Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Andy Lutomirski Cc: Daniel Borkmann Cc: David S. Miller Cc: James Morris Cc: Kees Cook Cc: Serge E. Hallyn Link: https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com --- include/linux/bpf.h| 10 +++ include/uapi/linux/bpf.h | 49 +++ kernel/bpf/arraymap.c | 21 + kernel/bpf/verifier.c | 8 ++ security/landlock/Makefile | 2 +- security/landlock/checker_fs.c | 179 + security/landlock/checker_fs.h | 20 + security/landlock/lsm.c| 6 ++ 8 files changed, 294 insertions(+), 1 deletion(-) create mode 100644 security/landlock/checker_fs.c create mode 100644 security/landlock/checker_fs.h diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 36c3e482239c..f7325c17f720 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -87,6 +87,7 @@ enum bpf_arg_type { ARG_ANYTHING, /* any (initialized) argument is ok */ ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ + ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS handle */ }; /* type of values returned from helper functions */ @@ -148,6 +149,7 @@ enum bpf_reg_type { /* Landlock */ PTR_TO_STRUCT_FILE, + CONST_PTR_TO_LANDLOCK_HANDLE_FS, }; struct bpf_prog; @@ -214,6 +216,9 @@ struct bpf_array { #ifdef CONFIG_SECURITY_LANDLOCK struct map_landlock_handle { u32 type; /* enum bpf_map_handle_type */ + union { + struct path path; + }; }; #endif /* CONFIG_SECURITY_LANDLOCK */ @@ -348,6 +353,11 @@ extern const struct bpf_func_proto bpf_skb_vlan_push_proto; extern const struct bpf_func_proto bpf_skb_vlan_pop_proto; extern const struct bpf_func_proto bpf_get_stackid_proto; +#ifdef CONFIG_SECURITY_LANDLOCK +extern const struct bpf_func_proto bpf_landlock_cmp_fs_prop_with_struct_file_proto; +extern const struct bpf_func_proto bpf_landlock_cmp_fs_beneath_with_struct_file_proto; +#endif /* CONFIG_SECURITY_LANDLOCK */ + /* Shared helpers among cBPF and eBPF. */ void bpf_user_rnd_init_once(void); u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index ad87003fe892..905dcace7255 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -92,10 +92,20 @@ enum bpf_map_type { enum bpf_map_array_type { BPF_MAP_ARRAY_TYPE_UNSPEC, + BPF_MAP_ARRAY_TYPE_LANDLOCK_FS, }; enum bpf_map_handle_type { BPF_MAP_HANDLE_TYPE_UNSPEC, + BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD, + /* BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB, */ +}; + +enum bpf_map_array_op { + BPF_MAP_ARRAY_OP_UNSPEC, + BPF_MAP_ARRAY_OP_OR, + BPF_MAP_ARRAY_OP_AND, + BPF_MAP_ARRAY_OP_XOR, }; enum bpf_prog_type { @@ -434,6 +444,34 @@ enum bpf_func_id { */ BPF_FUNC_skb_change_tail, + /** +* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) +* Compare file system handles with a struct file +* +* @prop: properties to check against (e.g. LANDLOCK_FLAG_FS_DENTRY) +* @map: handles to compare against +* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR) +* @file: struct file address to compare with (taken from the context) +* +* Return: 0 if the file match the handles, 1 otherwise, or a negative +* value if an error occurred. +*/ + BPF_FUNC_landlock_cmp_fs_prop_with_struct_file, + + /** +* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) +* Check if a struct file is a leaf of file system
[RFC v3 12/22] bpf: Cosmetic change for bpf_prog_attach()
Move code outside a switch/case to ease code factoring (cf. next commit). This apply on Daniel Mack's "Add eBPF hooks for cgroups": https://lkml.kernel.org/r/1473696735-11269-1-git-send-email-dan...@zonque.org Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Daniel Mack --- kernel/bpf/syscall.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f22e3b63d253..45a91d59 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -843,23 +843,24 @@ static int bpf_prog_attach(const union bpf_attr *attr) case BPF_CGROUP_INET_EGRESS: prog = bpf_prog_get_type(attr->attach_bpf_fd, BPF_PROG_TYPE_CGROUP_SOCKET); - if (IS_ERR(prog)) - return PTR_ERR(prog); - - cgrp = cgroup_get_from_fd(attr->target_fd); - if (IS_ERR(cgrp)) { - bpf_prog_put(prog); - return PTR_ERR(cgrp); - } - - cgroup_bpf_update(cgrp, prog, attr->attach_type); - cgroup_put(cgrp); break; default: return -EINVAL; } + if (IS_ERR(prog)) + return PTR_ERR(prog); + + cgrp = cgroup_get_from_fd(attr->target_fd); + if (IS_ERR(cgrp)) { + bpf_prog_put(prog); + return PTR_ERR(cgrp); + } + + cgroup_bpf_update(cgrp, prog, attr->attach_type); + cgroup_put(cgrp); + return 0; } -- 2.9.3
[RFC v3 15/22] bpf/cgroup: Move capability check
This will be useful to be able to add more BPF attach type with different capability checks. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Daniel Mack --- kernel/bpf/syscall.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c978f2d9a1b3..8599596fd6cf 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -833,15 +833,15 @@ static int bpf_prog_attach(const union bpf_attr *attr) struct cgroup *cgrp; int result; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - if (CHECK_ATTR(BPF_PROG_ATTACH)) return -EINVAL; switch (attr->attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + prog = bpf_prog_get_type(attr->attach_bpf_fd, BPF_PROG_TYPE_CGROUP_SOCKET); break; @@ -872,15 +872,15 @@ static int bpf_prog_detach(const union bpf_attr *attr) struct cgroup *cgrp; int result = 0; - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - if (CHECK_ATTR(BPF_PROG_DETACH)) return -EINVAL; switch (attr->attach_type) { case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + cgrp = cgroup_get_from_fd(attr->target_fd); if (IS_ERR(cgrp)) return PTR_ERR(cgrp); -- 2.9.3
[RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier
The program subtype goal is to be able to have different static fine-grained verifications for a unique program type. The struct bpf_verifier_ops gets a new optional function: is_valid_subtype(). This new verifier is called at the begening of the eBPF program verification to check if the (optional) program subtype is valid. For now, only Landlock eBPF programs are using a program subtype but this could be used by other program types in the future. Cf. the next commit to see how the subtype is used by Landlock LSM. Signed-off-by: Mickaël Salaün Link: https://lkml.kernel.org/r/20160827205559.ga43...@ast-mbp.thefacebook.com Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: David S. Miller --- include/linux/bpf.h | 8 ++-- include/linux/filter.h | 1 + include/uapi/linux/bpf.h | 9 + kernel/bpf/syscall.c | 5 +++-- kernel/bpf/verifier.c| 9 +++-- kernel/trace/bpf_trace.c | 12 net/core/filter.c| 21 + 7 files changed, 47 insertions(+), 18 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eae4ce4542c1..9aa01d9d3d80 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -149,17 +149,21 @@ struct bpf_prog; struct bpf_verifier_ops { /* return eBPF function prototype for verification */ - const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id); + const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id, + union bpf_prog_subtype *prog_subtype); /* return true if 'size' wide access at offset 'off' within bpf_context * with 'type' (read or write) is allowed */ bool (*is_valid_access)(int off, int size, enum bpf_access_type type, - enum bpf_reg_type *reg_type); + enum bpf_reg_type *reg_type, + union bpf_prog_subtype *prog_subtype); u32 (*convert_ctx_access)(enum bpf_access_type type, int dst_reg, int src_reg, int ctx_off, struct bpf_insn *insn, struct bpf_prog *prog); + + bool (*is_valid_subtype)(union bpf_prog_subtype *prog_subtype); }; struct bpf_prog_type_list { diff --git a/include/linux/filter.h b/include/linux/filter.h index 1f09c521adfe..88470cdd3ee1 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -406,6 +406,7 @@ struct bpf_prog { kmemcheck_bitfield_end(meta); u32 len;/* Number of filter blocks */ enum bpf_prog_type type; /* Type of BPF program */ + union bpf_prog_subtype subtype;/* For fine-grained verifications */ struct bpf_prog_aux *aux; /* Auxiliary fields */ struct sock_fprog_kern *orig_prog; /* Original BPF program */ unsigned int(*bpf_func)(const struct sk_buff *skb, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b68de57f7ab8..667b6ef3ff1e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -127,6 +127,14 @@ enum bpf_attach_type { #define BPF_F_NO_PREALLOC (1U << 0) +union bpf_prog_subtype { + struct { + __u32 id; /* enum landlock_hook_id */ + __u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */ + __aligned_u64 access; /* LANDLOCK_FLAG_ACCESS_* */ + } landlock_hook; +} __attribute__((aligned(8))); + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ @@ -155,6 +163,7 @@ union bpf_attr { __u32 log_size; /* size of user buffer */ __aligned_u64 log_buf;/* user supplied buffer */ __u32 kern_version; /* checked when prog_type=kprobe */ + union bpf_prog_subtype prog_subtype;/* checked when prog_type=landlock */ }; struct { /* anonymous struct used by BPF_OBJ_* commands */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 776c752604b0..8b3f4d2b4802 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -572,7 +572,7 @@ static void fixup_bpf_calls(struct bpf_prog *prog) continue; } - fn = prog->aux->ops->get_func_proto(insn->imm); + fn = prog->aux->ops->get_func_proto(insn->imm, &prog->subtype); /* all functions that have prototype and verifier allowed * programs to call them, must be real in-kernel functions */ @@ -710,7 +710,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type) EXPORT_SYMBOL_GPL(bpf_prog_get_type);
[RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code
This will be useful to support Landlock for the next commits. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Daniel Mack Cc: David S. Miller Cc: Tejun Heo --- include/linux/bpf-cgroup.h | 4 ++-- kernel/bpf/cgroup.c| 3 ++- kernel/bpf/syscall.c | 10 ++ kernel/cgroup.c| 6 -- 4 files changed, 14 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 2234042d7f61..6cca7924ee17 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -31,13 +31,13 @@ struct cgroup_bpf { void cgroup_bpf_put(struct cgroup *cgrp); void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent); -void __cgroup_bpf_update(struct cgroup *cgrp, +int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent, struct bpf_prog *prog, enum bpf_attach_type type); /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */ -void cgroup_bpf_update(struct cgroup *cgrp, +int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type); diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 782878ec4f2d..7b75fa692617 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -83,7 +83,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent) * * Must be called with cgroup_mutex held. */ -void __cgroup_bpf_update(struct cgroup *cgrp, +int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent, struct bpf_prog *prog, enum bpf_attach_type type) @@ -117,6 +117,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp, bpf_prog_put(old_pinned.prog); static_branch_dec(&cgroup_bpf_enabled_key); } + return 0; } /** diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 45a91d59..c978f2d9a1b3 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -831,6 +831,7 @@ static int bpf_prog_attach(const union bpf_attr *attr) { struct bpf_prog *prog; struct cgroup *cgrp; + int result; if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -858,10 +859,10 @@ static int bpf_prog_attach(const union bpf_attr *attr) return PTR_ERR(cgrp); } - cgroup_bpf_update(cgrp, prog, attr->attach_type); + result = cgroup_bpf_update(cgrp, prog, attr->attach_type); cgroup_put(cgrp); - return 0; + return result; } #define BPF_PROG_DETACH_LAST_FIELD attach_type @@ -869,6 +870,7 @@ static int bpf_prog_attach(const union bpf_attr *attr) static int bpf_prog_detach(const union bpf_attr *attr) { struct cgroup *cgrp; + int result = 0; if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -883,7 +885,7 @@ static int bpf_prog_detach(const union bpf_attr *attr) if (IS_ERR(cgrp)) return PTR_ERR(cgrp); - cgroup_bpf_update(cgrp, NULL, attr->attach_type); + result = cgroup_bpf_update(cgrp, NULL, attr->attach_type); cgroup_put(cgrp); break; @@ -891,7 +893,7 @@ static int bpf_prog_detach(const union bpf_attr *attr) return -EINVAL; } - return 0; + return result; } #endif /* CONFIG_CGROUP_BPF */ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 87324ce481b1..48b650a640a9 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -6450,15 +6450,17 @@ static __init int cgroup_namespaces_init(void) subsys_initcall(cgroup_namespaces_init); #ifdef CONFIG_CGROUP_BPF -void cgroup_bpf_update(struct cgroup *cgrp, +int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog, enum bpf_attach_type type) { struct cgroup *parent = cgroup_parent(cgrp); + int result; mutex_lock(&cgroup_mutex); - __cgroup_bpf_update(cgrp, parent, prog, type); + result = __cgroup_bpf_update(cgrp, parent, prog, type); mutex_unlock(&cgroup_mutex); + return result; } #endif /* CONFIG_CGROUP_BPF */ -- 2.9.3
Re: [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd()
On 14/09/2016 09:24, Mickaël Salaün wrote: > Add security access check for cgroup backed FD. The "cgroup.procs" file > of the corresponding cgroup must be readable to identify the cgroup, and > writable to prove that the current process can manage this cgroup (e.g. > through delegation). This is similar to the check done by > cgroup_procs_write_permission(). > > Signed-off-by: Mickaël Salaün > Cc: Alexei Starovoitov > Cc: Andy Lutomirski > Cc: Daniel Borkmann > Cc: Daniel Mack > Cc: David S. Miller > Cc: Kees Cook > Cc: Tejun Heo > --- > include/linux/cgroup.h | 2 +- > kernel/bpf/arraymap.c | 2 +- > kernel/bpf/syscall.c | 6 +++--- > kernel/cgroup.c| 16 +++- > 4 files changed, 20 insertions(+), 6 deletions(-) ... > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 48b650a640a9..3bbaf3f02ed2 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -6241,17 +6241,20 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path); > /** > * cgroup_get_from_fd - get a cgroup pointer from a fd > * @fd: fd obtained by open(cgroup2_dir) > + * @access_mask: contains the permission mask > * > * Find the cgroup from a fd which should be obtained > * by opening a cgroup directory. Returns a pointer to the > * cgroup on success. ERR_PTR is returned if the cgroup > * cannot be found. > */ > -struct cgroup *cgroup_get_from_fd(int fd) > +struct cgroup *cgroup_get_from_fd(int fd, int access_mask) > { > struct cgroup_subsys_state *css; > struct cgroup *cgrp; > struct file *f; > + struct inode *inode; > + int ret; > > f = fget_raw(fd); > if (!f) > @@ -6268,6 +6271,17 @@ struct cgroup *cgroup_get_from_fd(int fd) > return ERR_PTR(-EBADF); > } > > + ret = -ENOMEM; > + inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn); I forgot to properly move fput(f) after this line… This will be fixed. signature.asc Description: OpenPGP digital signature
Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
On 14/09/2016 20:27, Andy Lutomirski wrote: > On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün wrote: >> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially >> set for all cgroup except the root. The flag is clear when a new process >> without the no_new_privs flags is attached to the cgroup. >> >> If a cgroup is landlocked, then any new attempt, from an unprivileged >> process, to attach a process without no_new_privs to this cgroup will >> be denied. > > Until and unless everyone can agree on a way to properly namespace, > delegate, etc cgroups, I think that trying to add unprivileged > semantics to cgroups is nuts. Given the big thread about cgroup v2, > no-internal-tasks, etc, I just don't see how this approach can be > viable. As far as I can tell, the no_new_privs flag of at task is not related to namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged. Using cgroup is optional, any task could use the seccomp-based landlocking instead. However, for those that want/need to manage a security policy in a more dynamic way, using cgroups may make sense. I though cgroup delegation was OK in the v2, isn't it the case? Do you have some links? > > Can we try to make landlock work completely independently of cgroups > so that it doesn't get stuck and so that programs can use it without > worrying about cgroup v1 vs v2, interactions with cgroup managers, > cgroup managers that (supposedly?) will start migrating processes > around piecemeal and almost certainly blowing up landlock in the > process, etc? This RFC handle both cgroup and seccomp approaches in a similar way. I don't see why building on top of cgroup v2 is a problem. Is there security issues with delegation? > > I have no problem with looking at prototypes for how landlock + > cgroups would work, but I can't imagine the result being mergeable. > signature.asc Description: OpenPGP digital signature
Re: [RFC v3 19/22] landlock: Add interrupted origin
On 14/09/2016 20:29, Andy Lutomirski wrote: > On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün wrote: >> This third origin of hook call should cover all possible trigger paths >> (e.g. page fault). Landlock eBPF programs can then take decisions >> accordingly. >> >> Signed-off-by: Mickaël Salaün >> Cc: Alexei Starovoitov >> Cc: Andy Lutomirski >> Cc: Daniel Borkmann >> Cc: Kees Cook >> --- > > >> >> + if (unlikely(in_interrupt())) { > > IMO security hooks have no business being called from interrupts. > Aren't they all synchronous things done by tasks? Interrupts are > driver things. > > Are you trying to check for page faults and such? Yes, that was the idea you did put in my mind. Not sure how to deal with this. signature.asc Description: OpenPGP digital signature
Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy
On 14/09/2016 20:43, Andy Lutomirski wrote: > On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün wrote: >> A Landlock program will be triggered according to its subtype/origin >> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the >> Landlock program when a seccomp filter will return RET_LANDLOCK. >> Moreover, it is possible to return a 16-bit cookie which will be >> readable by the Landlock programs in its context. > > Are you envisioning that the filters will return RET_LANDLOCK most of > the time or rarely? If it's most of the time, then maybe this could > be simplified a bit by unconditionally calling the landlock filter and > letting the landlock filter access a struct seccomp_data if needed. Exposing seccomp_data in a Landlock context may be a good idea. The main implication is that Landlock programs may then be architecture specific (if dealing with data) as seccomp filters are. Another point is that it remove any direct binding between seccomp filters and Landlock programs. I will try this (more simple) approach. > >> >> Only seccomp filters loaded from the same thread and before a Landlock >> program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple >> Landlock programs can be triggered by one or more seccomp filters. This >> way, each RET_LANDLOCK (with specific cookie) will trigger all the >> allowed Landlock programs once. > > This interface seems somewhat awkward. Should we not have a way to > atomicaly install a whole pile of landlock filters and associated > seccomp filter all at once? I can change the seccomp(2) use in this way: instead of loading a Landlock program, (atomically) load an array of Landlock programs. However, exposing seccomp_data to Landlock programs looks like a better way to deal with it. This does not needs to manage an array of Landlock programs. Mickaël signature.asc Description: OpenPGP digital signature
Re: [RFC v3 07/22] landlock: Handle file comparisons
On 14/09/2016 21:07, Jann Horn wrote: > On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote: >> Add eBPF functions to compare file system access with a Landlock file >> system handle: >> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) >> This function allows to compare the dentry, inode, device or mount >> point of the currently accessed file, with a reference handle. >> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) >> This function allows an eBPF program to check if the current accessed >> file is the same or in the hierarchy of a reference handle. > [...] >> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c >> index 94256597eacd..edaab4c87292 100644 >> --- a/kernel/bpf/arraymap.c >> +++ b/kernel/bpf/arraymap.c >> @@ -603,6 +605,9 @@ static void landlock_put_handle(struct >> map_landlock_handle *handle) >> enum bpf_map_handle_type handle_type = handle->type; >> >> switch (handle_type) { >> +case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD: >> +path_put(&handle->path); >> +break; >> case BPF_MAP_HANDLE_TYPE_UNSPEC: >> default: >> WARN_ON(1); > [...] >> diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c >> new file mode 100644 >> index ..39eb85dc7d18 >> --- /dev/null >> +++ b/security/landlock/checker_fs.c > [...] >> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property, >> +u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5) >> +{ >> +u8 property = (u8) r1_property; >> +struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; >> +enum bpf_map_array_op map_op = r3_map_op; >> +struct file *file = (struct file *) (unsigned long) r4_file; >> +struct bpf_array *array = container_of(map, struct bpf_array, map); >> +struct path *p1, *p2; >> +struct map_landlock_handle *handle; >> +int i; > > Please don't use int when iterating over an array, use size_t. OK, I will use size_t. > > >> +/* for now, only handle OP_OR */ > > Is "OP_OR" an appropriate name for something that ANDs the success of > checks? > > > [...] >> +synchronize_rcu(); > > Can you put a comment here that explains what's going on? Hum, this should not be here. > > >> +for (i = 0; i < array->n_entries; i++) { >> +bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY); >> +bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE); >> +bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE); >> +bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT); >> + >> +handle = (struct map_landlock_handle *) >> +(array->value + array->elem_size * i); >> + >> +if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) { >> +WARN_ON(1); >> +return -EFAULT; >> +} >> +p1 = &handle->path; >> + >> +if (!result_dentry && p1->dentry == p2->dentry) >> +result_dentry = true; > > Why is this safe? As far as I can tell, this is not in an RCU read-side > critical section (synchronize_rcu() was just called), and no lock has been > taken. What prevents someone from removing the arraymap entry while we're > looking at it? Am I missing something? I will try to properly deal with RCU. signature.asc Description: OpenPGP digital signature
Re: [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context
On 14/09/2016 23:20, Alexei Starovoitov wrote: > On Wed, Sep 14, 2016 at 09:24:14AM +0200, Mickaël Salaün wrote: >> This is a proof of concept to expose optional values that could depend >> of the process access rights. >> >> There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and >> LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access >> eBPF functions manipulating a skb in a read or write way. >> >> Signed-off-by: Mickaël Salaün > ... >> /* Handle check flags */ >> #define LANDLOCK_FLAG_FS_DENTRY (1 << 0) >> @@ -619,12 +621,15 @@ struct landlock_handle { >> * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there >> *description and the LANDLOCK_HOOK* definitions from >> *security/landlock/lsm.c for their types. >> + * @opt_skb: optional skb pointer, accessible with the >> + * LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks. >> */ >> struct landlock_data { >> __u32 hook; /* enum landlock_hook_id */ >> __u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */ >> __u16 cookie; /* seccomp RET_LANDLOCK */ >> __u64 args[6]; >> +__u64 opt_skb; >> }; > > missing something here. > This patch doesn't make use of it. > That's something for the future? > How that field will be populated? > Why make it different vs the rest or args[6] ? > > I don't use this code, it's only purpose is to show how to deal with fine-grained privileges of Landlock programs (to allow Sargun to add his custom helpers from Checmate). However, this optional field may be part of args[6]. signature.asc Description: OpenPGP digital signature
Re: [RFC v3 07/22] landlock: Handle file comparisons
On 14/09/2016 23:06, Alexei Starovoitov wrote: > On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote: >> Add eBPF functions to compare file system access with a Landlock file >> system handle: >> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) >> This function allows to compare the dentry, inode, device or mount >> point of the currently accessed file, with a reference handle. >> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) >> This function allows an eBPF program to check if the current accessed >> file is the same or in the hierarchy of a reference handle. >> >> The goal of file system handle is to abstract kernel objects such as a >> struct file or a struct inode. Userland can create this kind of handle >> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct >> landlock_handle containing the handle type (e.g. >> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could >> also be any descriptions able to match a struct file or a struct inode >> (e.g. path or glob string). >> >> Changes since v2: >> * add MNT_INTERNAL check to only add file handle from user-visible FS >> (e.g. no anonymous inode) >> * replace struct file* with struct path* in map_landlock_handle >> * add BPF protos >> * fix bpf_landlock_cmp_fs_prop_with_struct_file() >> >> Signed-off-by: Mickaël Salaün >> Cc: Alexei Starovoitov >> Cc: Andy Lutomirski >> Cc: Daniel Borkmann >> Cc: David S. Miller >> Cc: James Morris >> Cc: Kees Cook >> Cc: Serge E. Hallyn >> Link: >> https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com > > thanks for keeping the links to the previous discussion. > Long term it should help, though I worry we already at the point > where there are too many outstanding issues to resolve before we > can proceed with reasonable code review. > >> +/* >> + * bpf_landlock_cmp_fs_prop_with_struct_file >> + * >> + * Cf. include/uapi/linux/bpf.h >> + */ >> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property, >> +u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5) >> +{ >> +u8 property = (u8) r1_property; >> +struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; >> +enum bpf_map_array_op map_op = r3_map_op; >> +struct file *file = (struct file *) (unsigned long) r4_file; > > please use just added BPF_CALL_ macros. They will help readability of the > above. > >> +struct bpf_array *array = container_of(map, struct bpf_array, map); >> +struct path *p1, *p2; >> +struct map_landlock_handle *handle; >> +int i; >> + >> +/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */ >> +if (unlikely(!map)) { >> +WARN_ON(1); >> +return -EFAULT; >> +} >> +if (unlikely(!file)) >> +return -ENOENT; >> +if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != >> _LANDLOCK_FLAG_FS_MASK)) >> +return -EINVAL; >> + >> +/* for now, only handle OP_OR */ >> +switch (map_op) { >> +case BPF_MAP_ARRAY_OP_OR: >> +break; >> +case BPF_MAP_ARRAY_OP_UNSPEC: >> +case BPF_MAP_ARRAY_OP_AND: >> +case BPF_MAP_ARRAY_OP_XOR: >> +default: >> +return -EINVAL; >> +} >> +p2 = &file->f_path; >> + >> +synchronize_rcu(); > > that is completely broken. > bpf programs are executing under rcu_lock. > please enable CONFIG_PROVE_RCU and retest everything. Thanks for the tip. I will fix this. > > I would suggest for the next RFC to do minimal 7 patches up to this point > with simple example that demonstrates the use case. > I would avoid all unpriv stuff and all of seccomp for the next RFC as well, > otherwise I don't think we can realistically make forward progress, since > there are too many issues raised in the subsequent patches. I hope we will find a common agreement about seccomp vs cgroup… I think both approaches have their advantages, can be complementary and nicely combined. Unprivileged sandboxing is the main goal of Landlock. This should not be a problem, even for privileged features, thanks to the new subtype/access. > > The common part that is mergeable is prog's subtype extension to > the verifier that can be used for better tracing and is the common > piece of infra needed for both landlock and checmate LSMs > (which must be one LSM anyway) Agreed. With this RFC, the Checmate features (i.e. network helpers) should be able to sit on top of Landlock. signature.asc Description: OpenPGP digital signature
Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
On 14/09/2016 20:51, Alexei Starovoitov wrote: > On Wed, Sep 14, 2016 at 09:23:56AM +0200, Mickaël Salaün wrote: >> This new arraymap looks like a set and brings new properties: >> * strong typing of entries: the eBPF functions get the array type of >> elements instead of CONST_PTR_TO_MAP (e.g. >> CONST_PTR_TO_LANDLOCK_HANDLE_FS); >> * force sequential filling (i.e. replace or append-only update), which >> allow quick browsing of all entries. >> >> This strong typing is useful to statically check if the content of a map >> can be passed to an eBPF function. For example, Landlock use it to store >> and manage kernel objects (e.g. struct file) instead of dealing with >> userland raw data. This improve efficiency and ensure that an eBPF >> program can only call functions with the right high-level arguments. >> >> The enum bpf_map_handle_type list low-level types (e.g. >> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when >> updating a map entry (handle). This handle types are used to infer a >> high-level arraymap type which are listed in enum bpf_map_array_type >> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS). >> >> For now, this new arraymap is only used by Landlock LSM (cf. next >> commits) but it could be useful for other needs. >> >> Changes since v2: >> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap >> handle entries (suggested by Andy Lutomirski) >> * remove useless checks >> >> Changes since v1: >> * arraymap of handles replace custom checker groups >> * simpler userland API >> >> Signed-off-by: Mickaël Salaün >> Cc: Alexei Starovoitov >> Cc: Andy Lutomirski >> Cc: Daniel Borkmann >> Cc: David S. Miller >> Cc: Kees Cook >> Link: >> https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com >> --- >> include/linux/bpf.h | 14 >> include/uapi/linux/bpf.h | 18 + >> kernel/bpf/arraymap.c| 203 >> +++ >> kernel/bpf/verifier.c| 12 ++- >> 4 files changed, 246 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index fa9a988400d9..eae4ce4542c1 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -13,6 +13,10 @@ >> #include >> #include >> >> +#ifdef CONFIG_SECURITY_LANDLOCK >> +#include /* struct file */ >> +#endif /* CONFIG_SECURITY_LANDLOCK */ >> + >> struct perf_event; >> struct bpf_map; >> >> @@ -38,6 +42,7 @@ struct bpf_map_ops { >> struct bpf_map { >> atomic_t refcnt; >> enum bpf_map_type map_type; >> +enum bpf_map_array_type map_array_type; >> u32 key_size; >> u32 value_size; >> u32 max_entries; >> @@ -187,6 +192,9 @@ struct bpf_array { >> */ >> enum bpf_prog_type owner_prog_type; >> bool owner_jited; >> +#ifdef CONFIG_SECURITY_LANDLOCK >> +u32 n_entries; /* number of entries in a handle array */ >> +#endif /* CONFIG_SECURITY_LANDLOCK */ >> union { >> char value[0] __aligned(8); >> void *ptrs[0] __aligned(8); >> @@ -194,6 +202,12 @@ struct bpf_array { >> }; >> }; >> >> +#ifdef CONFIG_SECURITY_LANDLOCK >> +struct map_landlock_handle { >> +u32 type; /* enum bpf_map_handle_type */ >> +}; >> +#endif /* CONFIG_SECURITY_LANDLOCK */ >> + >> #define MAX_TAIL_CALL_CNT 32 >> >> struct bpf_event_entry { >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 7cd36166f9b7..b68de57f7ab8 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -87,6 +87,15 @@ enum bpf_map_type { >> BPF_MAP_TYPE_PERCPU_ARRAY, >> BPF_MAP_TYPE_STACK_TRACE,P_TYPE_CGROUP_ARRAY >> BPF_MAP_TYPE_CGROUP_ARRAY, >> +BPF_MAP_TYPE_LANDLOCK_ARRAY, >> +}; >> + >> +enum bpf_map_array_type { >> +BPF_MAP_ARRAY_TYPE_UNSPEC, >> +}; >> + >> +enum bpf_map_handle_type { >> +BPF_MAP_HANDLE_TYPE_UNSPEC, >> }; > > missing something. why it has to be special to have it's own > fd array implementation? > Please take a look how BPF_MAP_TYPE_PERF_EVENT_ARRAY, > BPF_MAP_TYPE_CGROUP_ARRAY and BPF_MAP_TYPE_PROG_ARRAY are done. > The all store objects into array map that user space passes via FD. > I think the same model should apply here. The idea is to have multiple way for userland to describe a resource (e.g. an open file descriptor, a path or a glob pattern). The kernel representation could then be a "struct path *" or dedicated types (e.g. custom glob). Another interesting point (that could replace check_map_func_compatibility()) is that BPF_MAP_TYPE_LANDLOCK_ARRAY translate to dedicated (abstract) types (instead of CONST_PTR_TO_MAP) thanks to bpf_reg_type_from_map(). This is useful to abstract userland (map) interface with kernel object(s) dealing with that type. A third point is that BPF_MAP_TYPE_LANDLOCK_ARRAY is a kind of set. It is optimized to quickly walk through all the elements in a sequential way. signature.asc Description: OpenPGP digital signature
Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
On 15/09/2016 03:25, Andy Lutomirski wrote: > On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün wrote: >> >> On 14/09/2016 20:27, Andy Lutomirski wrote: >>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün wrote: >>>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially >>>> set for all cgroup except the root. The flag is clear when a new process >>>> without the no_new_privs flags is attached to the cgroup. >>>> >>>> If a cgroup is landlocked, then any new attempt, from an unprivileged >>>> process, to attach a process without no_new_privs to this cgroup will >>>> be denied. >>> >>> Until and unless everyone can agree on a way to properly namespace, >>> delegate, etc cgroups, I think that trying to add unprivileged >>> semantics to cgroups is nuts. Given the big thread about cgroup v2, >>> no-internal-tasks, etc, I just don't see how this approach can be >>> viable. >> >> As far as I can tell, the no_new_privs flag of at task is not related to >> namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access >> the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged. >> >> Using cgroup is optional, any task could use the seccomp-based >> landlocking instead. However, for those that want/need to manage a >> security policy in a more dynamic way, using cgroups may make sense. >> >> I though cgroup delegation was OK in the v2, isn't it the case? Do you >> have some links? >> >>> >>> Can we try to make landlock work completely independently of cgroups >>> so that it doesn't get stuck and so that programs can use it without >>> worrying about cgroup v1 vs v2, interactions with cgroup managers, >>> cgroup managers that (supposedly?) will start migrating processes >>> around piecemeal and almost certainly blowing up landlock in the >>> process, etc? >> >> This RFC handle both cgroup and seccomp approaches in a similar way. I >> don't see why building on top of cgroup v2 is a problem. Is there >> security issues with delegation? > > What I mean is: cgroup v2 delegation has a functionality problem. > Tejun says [1]: > > We haven't had to face this decision because cgroup has never properly > supported delegating to applications and the in-use setups where this > happens are custom configurations where there is no boundary between > system and applications and adhoc trial-and-error is good enough a way > to find a working solution. That wiggle room goes away once we > officially open this up to individual applications. > > Unless and until that changes, I think that landlock should stay away > from cgroups. Others could reasonably disagree with me. > > [1] https://lkml.kernel.org/r/20160909225747.ga30...@mtj.duckdns.org > I don't get the same echo here: https://lkml.kernel.org/r/20160826155026.gd16...@mtj.duckdns.org On 26/08/2016 17:50, Tejun Heo wrote: > Please refer to "2-5. Delegation" of Documentation/cgroup-v2.txt. > Delegation on v1 is broken on both core and specific controller > behaviors and thus discouraged. On v2, delegation should work just > fine. Tejun, could you please clarify if there is still a problem with cgroup v2 delegation? This patch only implement a cache mechanism with the CGRP_NO_NEW_PRIVS flag. If cgroups can group processes correctly, I don't see any (security) issue here. It's the administrator choice to delegate a part of the cgroup management. It's then the delegatee responsibility to correctly put processes in cgroups. This is comparable to a process which is responsible to correctly call seccomp(2). Mickaël signature.asc Description: OpenPGP digital signature
Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
On 15/09/2016 06:48, Alexei Starovoitov wrote: > On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote: >> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov >> wrote: >>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote: On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov wrote: > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote: > > This RFC handle both cgroup and seccomp approaches in a similar way. I > don't see why building on top of cgroup v2 is a problem. Is there > security issues with delegation? What I mean is: cgroup v2 delegation has a functionality problem. Tejun says [1]: We haven't had to face this decision because cgroup has never properly supported delegating to applications and the in-use setups where this happens are custom configurations where there is no boundary between system and applications and adhoc trial-and-error is good enough a way to find a working solution. That wiggle room goes away once we officially open this up to individual applications. Unless and until that changes, I think that landlock should stay away from cgroups. Others could reasonably disagree with me. >>> >>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security >>> and not for sandboxing. So the above doesn't matter in such contexts. >>> lsm hooks + cgroups provide convenient scope and existing entry points. >>> Please see checmate examples how it's used. >>> >> >> To be clear: I'm not arguing at all that there shouldn't be >> bpf+lsm+cgroup integration. I'm arguing that the unprivileged >> landlock interface shouldn't expose any cgroup integration, at least >> until the cgroup situation settles down a lot. > > ahh. yes. we're perfectly in agreement here. > I'm suggesting that the next RFC shouldn't include unpriv > and seccomp at all. Once bpf+lsm+cgroup is merged, we can > argue about unpriv with cgroups and even unpriv as a whole, > since it's not a given. Seccomp integration is also questionable. > I'd rather not have seccomp as a gate keeper for this lsm. > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks > don't have one to one relationship, so mixing them up is only > asking for trouble further down the road. > If we really need to carry some information from seccomp to lsm+bpf, > it's easier to add eBPF support to seccomp and let bpf side deal > with passing whatever information. > As an argument for keeping seccomp (or an extended seccomp) as the interface for an unprivileged bpf+lsm: seccomp already checks off most of the boxes for safely letting unprivileged programs sandbox themselves. >>> >>> you mean the attach part of seccomp syscall that deals with no_new_priv? >>> sure, that's reusable. >>> Furthermore, to the extent that there are use cases for unprivileged bpf+lsm that *aren't* expressible within the seccomp hierarchy, I suspect that syscall filters have exactly the same problem and that we should fix seccomp to cover it. >>> >>> not sure what you mean by 'seccomp hierarchy'. The normal process >>> hierarchy ? >> >> Kind of. I mean the filter layers that are inherited across fork(), >> the TSYNC mechanism, etc. >> >>> imo the main deficiency of secccomp is inability to look into arguments. >>> One can argue that it's a blessing, since composite args >>> are not yet copied into the kernel memory. >>> But in a lot of cases the seccomp arguments are FDs pointing >>> to kernel objects and if programs could examine those objects >>> the sandboxing scope would be more precise. >>> lsm+bpf solves that part and I'd still argue that it's >>> orthogonal to seccomp's pass/reject flow. >>> I mean if seccomp says 'ok' the syscall should continue executing >>> as normal and whatever LSM hooks were triggered by it may have >>> their own lsm+bpf verdicts. >> >> I agree with all of this... >> >>> Furthermore in the process hierarchy different children >>> should be able to set their own lsm+bpf filters that are not >>> related to parallel seccomp+bpf hierarchy of programs. >>> seccomp syscall can be an interface to attach programs >>> to lsm hooks, but nothing more than that. >> >> I'm not sure what you mean. I mean that, logically, I think we should >> be able to do: >> >> seccomp(attach a syscall filter); >> fork(); >> child does seccomp(attach some lsm filters); >> >> I think that they *should* be related to the seccomp+bpf hierarchy of >> programs in that they are entries in the same logical list of filter >> layers installed. Some of those layers can be syscall filters and >> some of the layers can be lsm filters. If we subsequently add a way >> to attach a removable seccomp filter or a way to attach a seccomp >> filter
[RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
ser can already use seccomp-filter to whitelist a set of syscalls to reduce the kernel attack surface for a set of processes. However an unprivileged user can't create a security policy as the root user can thanks to SELinux and other access control LSMs. Landlock allows any unprivileged user to protect their data from being accessed by any process they run but only an identified subset. User tools can be created to help create such a high-level access control policy. This policy may not be powerful enough to express the same policies as the current access control LSMs, because of the threat an unprivileged user can be to the system, but it should be enough for most use-cases (e.g. blacklist or whitelist a set of file hierarchies). ## Does Landlock can limit network access or other resources? Limiting network access is obviously in the scope of Landlock but it is not yet implemented. The main goal now is to get feedback about the whole concept, the API and the file access control part. More access control types could be implemented in the future. ## Why using the seccomp(2) syscall? Landlock use the same semantic as seccomp to apply access rule restrictions. It add a new layer of security for the current process which is inherited by its childs. It make sense to use an unique access-restricting syscall (that should be allowed by seccomp-filter rules) which can only drop privileges. Moreover, a Landlock eBPF program could come from outside a process (e.g. passed through a UNIX socket). It is then useful to differentiate the creation/load of Landlock eBPF programs via bpf(2), from rule enforcing via seccomp(2). # Differences from the RFC v1 * focus on the LSM hooks, not the syscalls: * much more simple implementation * does not need audit cache tricks to avoid race conditions * more simple to use and more generic because using the LSM hook abstraction directly * more efficient because only checking in LSM hooks * architecture agnostic * switch from cBPF to eBPF: * new eBPF program types dedicated to Landlock * custom functions used by the eBPF program * gain some new features (e.g. 10 registers, can load values of different size, LLVM translator) but only a few functions allowed and a dedicated map type * new context: LSM hook ID, cookie and LSM hook arguments * need to set the sysctl kernel.unprivileged_bpf_disable to 0 (default value) to be able to load hook filters as unprivileged users * smaller and simpler: * no more checker groups but dedicated arraymap of handles * simpler userland structs thanks to eBPF functions * distinctive name: Landlock [1] https://lkml.kernel.org/r/1458784008-16277-1-git-send-email-...@digikod.net [2] https://crypto.stanford.edu/cs155/papers/traps.pdf This series can be applied on Linux 4.7 and be tested with CONFIG_SECURITY_LANDLOCK and CONFIG_CGROUPS. I would really appreciate constructive comments on the usability, architecture, code and userland API of Landlock LSM. Regards, Mickaël Salaün (10): landlock: Add Kconfig bpf: Move u64_to_ptr() to BPF headers and inline it bpf,landlock: Add a new arraymap type to deal with (Landlock) handles seccomp: Split put_seccomp_filter() with put_seccomp() seccomp: Handle Landlock landlock: Add LSM hooks landlock: Add errno check landlock: Handle file system comparisons landlock: Handle cgroups samples/landlock: Add sandbox example include/linux/bpf.h | 41 + include/linux/lsm_hooks.h | 5 + include/linux/seccomp.h | 54 ++- include/uapi/asm-generic/errno-base.h | 1 + include/uapi/linux/bpf.h | 103 include/uapi/linux/seccomp.h | 2 + kernel/bpf/arraymap.c | 222 + kernel/bpf/syscall.c | 18 ++- kernel/bpf/verifier.c | 32 +++- kernel/fork.c | 41 - kernel/seccomp.c | 211 +++- samples/Makefile | 2 +- samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 16 ++ samples/landlock/sandbox.c| 295 ++ security/Kconfig | 1 + security/Makefile | 2 + security/landlock/Kconfig | 19 +++ security/landlock/Makefile| 3 + security/landlock/checker_cgroup.c| 96 +++ security/landlock/checker_cgroup.h| 18 +++ security/landlock/checker_fs.c| 183 + security/landlock/checker_fs.h| 20 +++ security/landlock/lsm.c | 228 ++ security/security.c | 1 + 25 files changed, 1592 insertions(+), 23 deletions(-) create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandbox.c create
[RFC v2 09/10] landlock: Handle cgroups
Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) to compare the current process cgroup with a cgroup handle, The handle can match the current cgroup if it is the same or a child. This allows to make conditional rules according to the current cgroup. A cgroup handle is a map entry created from a file descriptor referring a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. An unprivileged process can create and manipulate cgroups thanks to cgroup delegation. Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Alexei Starovoitov Cc: James Morris Cc: Serge E. Hallyn Cc: David S. Miller Cc: Daniel Borkmann --- include/linux/bpf.h| 8 include/uapi/linux/bpf.h | 15 ++ kernel/bpf/arraymap.c | 30 kernel/bpf/verifier.c | 6 +++ security/landlock/Kconfig | 3 ++ security/landlock/Makefile | 2 +- security/landlock/checker_cgroup.c | 96 ++ security/landlock/checker_cgroup.h | 18 +++ security/landlock/lsm.c| 8 9 files changed, 185 insertions(+), 1 deletion(-) create mode 100644 security/landlock/checker_cgroup.c create mode 100644 security/landlock/checker_cgroup.h diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 79014aedbea4..9e6786e7a40a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -14,6 +14,9 @@ #ifdef CONFIG_SECURITY_LANDLOCK #include /* struct file */ +#ifdef CONFIG_CGROUPS +#include /* struct cgroup_subsys_state */ +#endif /* CONFIG_CGROUPS */ #endif /* CONFIG_SECURITY_LANDLOCK */ struct bpf_map; @@ -85,6 +88,7 @@ enum bpf_arg_type { ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */ ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS handle */ + ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP,/* pointer to Landlock cgroup handle */ }; /* type of values returned from helper functions */ @@ -148,6 +152,7 @@ enum bpf_reg_type { PTR_TO_STRUCT_FILE, PTR_TO_STRUCT_CRED, CONST_PTR_TO_LANDLOCK_HANDLE_FS, + CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP, }; struct bpf_prog; @@ -212,6 +217,9 @@ struct map_landlock_handle { u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */ union { struct file *file; +#ifdef CONFIG_CGROUPS + struct cgroup_subsys_state *css; +#endif /* CONFIG_CGROUPS */ }; }; #endif /* CONFIG_SECURITY_LANDLOCK */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 88af79dd668c..7f60b9fdb35c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -90,12 +90,14 @@ enum bpf_map_type { enum bpf_map_array_type { BPF_MAP_ARRAY_TYPE_UNSPEC, BPF_MAP_ARRAY_TYPE_LANDLOCK_FS, + BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP, }; enum bpf_map_handle_type { BPF_MAP_HANDLE_TYPE_UNSPEC, BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD, BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB, + BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD, }; enum bpf_map_array_op { @@ -364,6 +366,19 @@ enum bpf_func_id { */ BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file, + /** +* bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) +* Check if the current process is a leaf of cgroup handles +* +* @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE) +* @map: handles to compare against +* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR) +* +* Return: 0 if the current cgroup is the sam or beneath the handle, +* 1 otherwise, or a negative value if an error occurred. +*/ + BPF_FUNC_landlock_cmp_cgroup_beneath, + __BPF_FUNC_MAX_ID, }; diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 6804dafd8355..050b3d8d88c8 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -19,6 +19,12 @@ #include /* fput() */ #include /* struct file */ +#ifdef CONFIG_SECURITY_LANDLOCK +#ifdef CONFIG_CGROUPS +#include /* struct cgroup_subsys_state */ +#endif /* CONFIG_CGROUPS */ +#endif /* CONFIG_SECURITY_LANDLOCK */ + static void bpf_array_free_percpu(struct bpf_array *array) { int i; @@ -514,6 +520,12 @@ static void landlock_put_handle(struct map_landlock_handle *handle) else WARN_ON(1); break; + case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD: + if (likely(handle->css)) + css_put(handle->css); + else + WARN_ON(1); + break; default: WARN_ON(1); } @@ -541,6 +553,10 @@ stati
[RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp()
The semantic is unchanged. This will be useful for the Landlock integration with seccomp (next commit). Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry --- include/linux/seccomp.h | 5 +++-- kernel/fork.c | 2 +- kernel/seccomp.c| 18 +- 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 2296e6b2f690..29b20fe8fd4d 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -83,13 +83,14 @@ static inline int seccomp_mode(struct seccomp *s) #endif /* CONFIG_SECCOMP */ #ifdef CONFIG_SECCOMP_FILTER -extern void put_seccomp_filter(struct task_struct *tsk); +extern void put_seccomp(struct task_struct *tsk); extern void get_seccomp_filter(struct task_struct *tsk); #else /* CONFIG_SECCOMP_FILTER */ -static inline void put_seccomp_filter(struct task_struct *tsk) +static inline void put_seccomp(struct task_struct *tsk) { return; } + static inline void get_seccomp_filter(struct task_struct *tsk) { return; diff --git a/kernel/fork.c b/kernel/fork.c index 4a7ec0c6c88c..b23a71ec8003 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -235,7 +235,7 @@ void free_task(struct task_struct *tsk) free_thread_stack(tsk->stack); rt_mutex_debug_task_free(tsk); ftrace_graph_exit_task(tsk); - put_seccomp_filter(tsk); + put_seccomp(tsk); arch_release_task_struct(tsk); free_task_struct(tsk); } diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 7002796f14a4..f1f475691c27 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -60,6 +60,8 @@ struct seccomp_filter { struct bpf_prog *prog; }; +static void put_seccomp_filter(struct seccomp_filter *filter); + /* Limit any path through the tree to 256KB worth of instructions. */ #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter)) @@ -313,7 +315,7 @@ static inline void seccomp_sync_threads(void) * current's path will hold a reference. (This also * allows a put before the assignment.) */ - put_seccomp_filter(thread); + put_seccomp_filter(thread->seccomp.filter); smp_store_release(&thread->seccomp.filter, caller->seccomp.filter); @@ -475,10 +477,11 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter) } } -/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */ -void put_seccomp_filter(struct task_struct *tsk) +/* put_seccomp_filter - decrements the ref count of a filter */ +static void put_seccomp_filter(struct seccomp_filter *filter) { - struct seccomp_filter *orig = tsk->seccomp.filter; + struct seccomp_filter *orig = filter; + /* Clean up single-reference branches iteratively. */ while (orig && atomic_dec_and_test(&orig->usage)) { struct seccomp_filter *freeme = orig; @@ -487,6 +490,11 @@ void put_seccomp_filter(struct task_struct *tsk) } } +void put_seccomp(struct task_struct *tsk) +{ + put_seccomp_filter(tsk->seccomp.filter); +} + /** * seccomp_send_sigsys - signals the task to allow in-process syscall emulation * @syscall: syscall number to send to userland @@ -926,7 +934,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off, if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog))) ret = -EFAULT; - put_seccomp_filter(task); + put_seccomp_filter(task->seccomp.filter); return ret; out: -- 2.8.1
[RFC v2 01/10] landlock: Add Kconfig
Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp parts to ease the review. Signed-off-by: Mickaël Salaün Cc: James Morris Cc: Kees Cook Cc: Serge E. Hallyn --- security/Kconfig | 1 + security/landlock/Kconfig | 16 2 files changed, 17 insertions(+) create mode 100644 security/landlock/Kconfig diff --git a/security/Kconfig b/security/Kconfig index 176758cdfa57..be6c549dd0ca 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -124,6 +124,7 @@ source security/tomoyo/Kconfig source security/apparmor/Kconfig source security/loadpin/Kconfig source security/yama/Kconfig +source security/landlock/Kconfig source security/integrity/Kconfig diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index ..dc8328d216d7 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,16 @@ +config SECURITY_LANDLOCK + bool "Landlock sandbox support" + depends on SECURITY + select BPF_SYSCALL + select SECCOMP + default y + help + Landlock is a stacked LSM which allows any user to load a security policy + to restrict their processes (i.e. create a sandbox). The policy is a list + of stacked eBPF programs for some LSM hooks. Each program can do some + access comparison to check if an access request is legitimate. + + Further information about eBPF can be found in + Documentation/networking/filter.txt + + If you are unsure how to answer this question, answer Y. -- 2.8.1
[RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
This new arraymap looks like a set and brings new properties: * strong typing of entries: the eBPF functions get the array type of elements instead of CONST_PTR_TO_MAP (e.g. CONST_PTR_TO_LANDLOCK_HANDLE_FS); * force sequential filling (i.e. replace or append-only update), which allow quick browsing of all entries. This strong typing is useful to statically check if the content of a map can be passed to an eBPF function. For example, Landlock use it to store and manage kernel objects (e.g. struct file) instead of dealing with userland raw data. This improve efficiency and ensure that an eBPF program can only call functions with the right high-level arguments. The enum bpf_map_handle_type list low-level types (e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when updating a map entry (handle). This handle types are used to infer a high-level arraymap type which are listed in enum bpf_map_array_type (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS). For now, this new arraymap is only used by Landlock LSM (cf. next commits) but it could be useful for other needs. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: David S. Miller Cc: Daniel Borkmann Cc: James Morris Cc: Kees Cook --- include/linux/bpf.h | 18 + include/uapi/linux/bpf.h | 18 + kernel/bpf/arraymap.c| 181 +++ kernel/bpf/syscall.c | 9 ++- kernel/bpf/verifier.c| 12 +++- 5 files changed, 235 insertions(+), 3 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ca3742729ae7..9a5b388be099 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -12,6 +12,10 @@ #include #include +#ifdef CONFIG_SECURITY_LANDLOCK +#include /* struct file */ +#endif /* CONFIG_SECURITY_LANDLOCK */ + struct bpf_map; /* map is generic key/value storage optionally accesible by eBPF programs */ @@ -34,6 +38,7 @@ struct bpf_map_ops { struct bpf_map { atomic_t refcnt; enum bpf_map_type map_type; + enum bpf_map_array_type map_array_type; u32 key_size; u32 value_size; u32 max_entries; @@ -183,12 +188,25 @@ struct bpf_array { */ enum bpf_prog_type owner_prog_type; bool owner_jited; +#ifdef CONFIG_SECURITY_LANDLOCK + u32 n_entries; /* number of entries in a handle array */ +#endif /* CONFIG_SECURITY_LANDLOCK */ union { char value[0] __aligned(8); void *ptrs[0] __aligned(8); void __percpu *pptrs[0] __aligned(8); }; }; + +#ifdef CONFIG_SECURITY_LANDLOCK +struct map_landlock_handle { + u32 type; + union { + struct file *file; + }; +}; +#endif /* CONFIG_SECURITY_LANDLOCK */ + #define MAX_TAIL_CALL_CNT 32 u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 406459b935a2..a60eedc17d40 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -84,6 +84,15 @@ enum bpf_map_type { BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_PERCPU_ARRAY, BPF_MAP_TYPE_STACK_TRACE, + BPF_MAP_TYPE_LANDLOCK_ARRAY, +}; + +enum bpf_map_array_type { + BPF_MAP_ARRAY_TYPE_UNSPEC, +}; + +enum bpf_map_handle_type { + BPF_MAP_HANDLE_TYPE_UNSPEC, }; enum bpf_prog_type { @@ -386,4 +395,13 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +/* Map handle entry */ +struct landlock_handle { + __u32 type; /* enum bpf_map_handle_type */ + union { + __u32 fd; + __aligned_u64 glob; + }; +} __attribute__((aligned(8))); + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 76d5a794e426..5938b8ee475b 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -16,6 +16,8 @@ #include #include #include +#include /* fput() */ +#include /* struct file */ static void bpf_array_free_percpu(struct bpf_array *array) { @@ -491,3 +493,182 @@ static int __init register_perf_event_array_map(void) return 0; } late_initcall(register_perf_event_array_map); + + +#ifdef CONFIG_SECURITY_LANDLOCK +static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr) +{ + if (attr->value_size != sizeof(struct landlock_handle)) + return ERR_PTR(-EINVAL); + attr->value_size = sizeof(struct map_landlock_handle); + + return array_map_alloc(attr); +} + +static void landlock_put_handle(struct map_landlock_handle *handle) +{ + switch (handle->type) { + /* TODO: add handle types */ + default: + WARN_ON(1); + } + /* safeguard */ + handle->type = BPF_MAP_HANDLE_TYPE_UNSPEC; +} + +static void landlock_array_map_free(struct bpf_map *map) +{ + struct bpf_array *array = container_of(map, struct bpf_array, map); + int i; + + synchronize_rcu(); + + for (i =
[RFC v2 06/10] landlock: Add LSM hooks
Add LSM hooks which can be used by userland through Landlock (eBPF) programs. This programs are limited to a whitelist of functions (cf. next commit). The eBPF program context is depicted by the struct landlock_data (cf. include/uapi/linux/bpf.h): * hook: LSM hook ID (useful when using the same program for multiple LSM hooks); * cookie: the 16-bit value from the seccomp filter that triggered this Landlock program; * args[6]: array of LSM hook arguments. The LSM hook arguments can contain raw values as integers or (unleakable) pointers. The only way to use the pointers are to pass them to an eBPF function according to their types (e.g. the bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct file pointer). For now, there is three hooks for file system access control: * file_open; * file_permission; * mmap_file. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: James Morris Cc: Serge E. Hallyn Cc: David S. Miller Cc: Daniel Borkmann --- include/linux/bpf.h| 7 ++ include/linux/lsm_hooks.h | 5 ++ include/uapi/linux/bpf.h | 20 + kernel/bpf/syscall.c | 3 + kernel/bpf/verifier.c | 8 ++ kernel/seccomp.c | 7 +- security/Makefile | 2 + security/landlock/Makefile | 3 + security/landlock/lsm.c| 211 + security/security.c| 1 + 10 files changed, 265 insertions(+), 2 deletions(-) create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/lsm.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9a5b388be099..557e7efdf0cd 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -81,6 +81,9 @@ enum bpf_arg_type { ARG_PTR_TO_CTX, /* pointer to context */ ARG_ANYTHING, /* any (initialized) argument is ok */ + + ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ + ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */ }; /* type of values returned from helper functions */ @@ -139,6 +142,10 @@ enum bpf_reg_type { */ PTR_TO_PACKET, PTR_TO_PACKET_END, /* skb->data + headlen */ + + /* Landlock */ + PTR_TO_STRUCT_FILE, + PTR_TO_STRUCT_CRED, }; struct bpf_prog; diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 7ae397669d8b..6792ae8fb53d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1898,5 +1898,10 @@ void __init loadpin_add_hooks(void); #else static inline void loadpin_add_hooks(void) { }; #endif +#ifdef CONFIG_SECURITY_LANDLOCK +extern void __init landlock_add_hooks(void); +#else +static inline void __init landlock_add_hooks(void) { } +#endif /* CONFIG_SECURITY_LANDLOCK */ #endif /* ! __LINUX_LSM_HOOKS_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a60eedc17d40..983d14e910ff 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -102,6 +102,9 @@ enum bpf_prog_type { BPF_PROG_TYPE_SCHED_CLS, BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_TRACEPOINT, + BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, + BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION, + BPF_PROG_TYPE_LANDLOCK_MMAP_FILE, }; #define BPF_PSEUDO_MAP_FD 1 @@ -404,4 +407,21 @@ struct landlock_handle { }; } __attribute__((aligned(8))); +/** + * struct landlock_data + * + * @hook: LSM hook ID + * @cookie: value set by a seccomp-filter return value RET_LANDLOCK. This come + * from a trusted seccomp-bpf program: the same process that loaded + * this Landlock hook program. + * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there + *description and the LANDLOCK_HOOK* definitions from + *security/landlock/lsm.c for their types. + */ +struct landlock_data { + __u32 hook; + __u16 cookie; + __u64 args[6]; +}; + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 32a10ef4b878..6b8bfc34c751 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -719,6 +719,9 @@ static int bpf_prog_load(union bpf_attr *attr) switch (type) { case BPF_PROG_TYPE_SOCKET_FILTER: + case BPF_PROG_TYPE_LANDLOCK_FILE_OPEN: + case BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION: + case BPF_PROG_TYPE_LANDLOCK_MMAP_FILE: break; default: if (!capable(CAP_SYS_ADMIN)) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c15f6cc28e00..2931e2efcc10 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -244,6 +244,8 @@ static const char * const reg_type_str[] = { [CONST_IMM] = "imm", [PTR_TO_PACKET] = "pkt", [PTR_TO_PACKET_END] = "pkt_end", + [PTR_TO_STRUCT_FILE]= "struct_file", + [PTR_TO_STRUCT_C
[RFC v2 08/10] landlock: Handle file system comparisons
Add eBPF functions to compare file system access with a Landlock file system handle: * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) This function allows to compare the dentry, inode, device or mount point of the currently accessed file, with a reference handle. * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) This function allows an eBPF program to check if the current accessed file is the same or in the hierarchy of a reference handle. The goal of file system handle is to abstract kernel objects such as a struct file or a struct inode. Userland can create this kind of handle thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct landlock_handle containing the handle type (e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could also be any descriptions able to match a struct file or a struct inode (e.g. path or glob string). Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Alexei Starovoitov Cc: James Morris Cc: Serge E. Hallyn Cc: David S. Miller Cc: Daniel Borkmann --- include/linux/bpf.h| 4 +- include/uapi/linux/bpf.h | 52 +++- kernel/bpf/arraymap.c | 17 +++- kernel/bpf/verifier.c | 6 ++ security/landlock/Makefile | 2 +- security/landlock/checker_fs.c | 183 + security/landlock/checker_fs.h | 20 + security/landlock/lsm.c| 11 ++- 8 files changed, 288 insertions(+), 7 deletions(-) create mode 100644 security/landlock/checker_fs.c create mode 100644 security/landlock/checker_fs.h diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 557e7efdf0cd..79014aedbea4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -84,6 +84,7 @@ enum bpf_arg_type { ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */ ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */ + ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS handle */ }; /* type of values returned from helper functions */ @@ -146,6 +147,7 @@ enum bpf_reg_type { /* Landlock */ PTR_TO_STRUCT_FILE, PTR_TO_STRUCT_CRED, + CONST_PTR_TO_LANDLOCK_HANDLE_FS, }; struct bpf_prog; @@ -207,7 +209,7 @@ struct bpf_array { #ifdef CONFIG_SECURITY_LANDLOCK struct map_landlock_handle { - u32 type; + u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */ union { struct file *file; }; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 983d14e910ff..88af79dd668c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -89,10 +89,20 @@ enum bpf_map_type { enum bpf_map_array_type { BPF_MAP_ARRAY_TYPE_UNSPEC, + BPF_MAP_ARRAY_TYPE_LANDLOCK_FS, }; enum bpf_map_handle_type { BPF_MAP_HANDLE_TYPE_UNSPEC, + BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD, + BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB, +}; + +enum bpf_map_array_op { + BPF_MAP_ARRAY_OP_UNSPEC, + BPF_MAP_ARRAY_OP_OR, + BPF_MAP_ARRAY_OP_AND, + BPF_MAP_ARRAY_OP_XOR, }; enum bpf_prog_type { @@ -325,6 +335,35 @@ enum bpf_func_id { */ BPF_FUNC_skb_get_tunnel_opt, BPF_FUNC_skb_set_tunnel_opt, + + /** +* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) +* Compare file system handles with a struct file +* +* @prop: properties to check against (e.g. LANDLOCK_FLAG_FS_DENTRY) +* @map: handles to compare against +* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR) +* @file: struct file address to compare with (taken from the context) +* +* Return: 0 if the file match the handles, 1 otherwise, or a negative +* value if an error occurred. +*/ + BPF_FUNC_landlock_cmp_fs_prop_with_struct_file, + + /** +* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) +* Check if a struct file is a leaf of file system handles +* +* @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE) +* @map: handles to compare against +* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR) +* @file: struct file address to compare with (taken from the context) +* +* Return: 0 if the file is the same or beneath the handles, +* 1 otherwise, or a negative value if an error occurred. +*/ + BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file, + __BPF_FUNC_MAX_ID, }; @@ -398,6 +437,17 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +/* Handle check flags */ +#define LANDLOCK_FLAG_FS_DENTRY(1 << 0) +#define LANDLOCK_FLAG_FS_INODE (1 << 1) +#define LANDLOCK_FLAG_FS_DEVICE(1 << 2) +#define LANDLOCK_FLAG_FS_MOUNT (1 << 3) +#define _LANDLOCK_FLAG_FS_MASK ((1 << 4) - 1) + +/* H
[RFC v2 10/10] samples/landlock: Add sandbox example
Add a basic sandbox tool to create a process isolated from some part of the system. This can depend of the current cgroup. Example: $ mkdir /sys/fs/cgroup/sandboxed $ ls /home user1 $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \ ./sandbox /bin/sh -i $ ls /home user1 $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs $ ls /home ls: cannot open directory '/home': Permission denied Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Alexei Starovoitov Cc: James Morris Cc: Serge E. Hallyn Cc: David S. Miller Cc: Daniel Borkmann --- samples/Makefile| 2 +- samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 16 +++ samples/landlock/sandbox.c | 295 4 files changed, 313 insertions(+), 1 deletion(-) create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandbox.c diff --git a/samples/Makefile b/samples/Makefile index 2e3b523d7097..42e6a613f728 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -2,4 +2,4 @@ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \ - configfs/ connector/ v4l/ + configfs/ connector/ v4l/ landlock/ diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore new file mode 100644 index ..f6c6da930a30 --- /dev/null +++ b/samples/landlock/.gitignore @@ -0,0 +1 @@ +/sandbox diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile new file mode 100644 index ..d1044b2afd27 --- /dev/null +++ b/samples/landlock/Makefile @@ -0,0 +1,16 @@ +# kbuild trick to avoid linker error. Can be omitted if a module is built. +obj- := dummy.o + +hostprogs-$(CONFIG_SECURITY_LANDLOCK) := sandbox +sandbox-objs := sandbox.o + +always := $(hostprogs-y) + +HOSTCFLAGS += -I$(objtree)/usr/include + +# Trick to allow make to be run from this directory +all: + $(MAKE) -C ../../ $$PWD/ + +clean: + $(MAKE) -C ../../ M=$$PWD clean diff --git a/samples/landlock/sandbox.c b/samples/landlock/sandbox.c new file mode 100644 index ..86604963c30c --- /dev/null +++ b/samples/landlock/sandbox.c @@ -0,0 +1,295 @@ +/* + * Landlock LSM - Sandbox Example + * + * Copyright (C) 2016 Mickaël Salaün + * + * The code may be used by anyone for any purpose, and can serve as a starting + * point for developing a sandbox. + */ + +#define _GNU_SOURCE +#include +#include /* open() */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../../tools/include/linux/filter.h" + +#include "../bpf/libbpf.c" + +#ifndef seccomp +static int seccomp(unsigned int op, unsigned int flags, void *args) +{ + errno = 0; + return syscall(__NR_seccomp, op, flags, args); +} +#endif + +#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0])) + +static int apply_sandbox(const char **allowed_paths, int path_nb, const char **cgroup_paths, int cgroup_nb) +{ + __u32 key; + int i, ret = 0, map_fs = -1, map_cg = -1, offset; + + /* set up the test sandbox */ + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { + perror("prctl(no_new_priv)"); + return 1; + } + + /* register a new syscall filter */ + struct sock_filter filter0[] = { + /* pass a cookie containing 5 to the LSM hook filter */ + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_LANDLOCK | 5), + }; + struct sock_fprog prog0 = { + .len = (unsigned short)ARRAY_SIZE(filter0), + .filter = filter0, + }; + if (seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog0)) { + perror("seccomp(set_filter)"); + return 1; + } + + if (path_nb) { + map_fs = bpf_create_map(BPF_MAP_TYPE_LANDLOCK_ARRAY, sizeof(key), sizeof(struct landlock_handle), 10, 0); + if (map_fs < 0) { + fprintf(stderr, "bpf_create_map(fs"); + perror(")"); + return 1; + } + for (key = 0; key < path_nb; key++) { + int fd = open(allowed_paths[key], O_RDONLY | O_CLOEXEC); + if (fd < 0) { + fprintf(stderr, "open(fs: \"%s\"", allowed_paths[key]); + perror(")"); + return 1; + } + struct landlock_handle handle = { + .type = BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD, + .fd = (__
[RFC v2 05/10] seccomp: Handle Landlock
A Landlock program can be triggered when a seccomp filter return RET_LANDLOCK. Moreover, it is possible to return a 16-bit cookie which will be readable by the Landlock programs. Only seccomp filters loaded from the same thread and before a Landlock program can trigger it. Multiple Landlock programs can be triggered by one or more seccomp filters. This way, each RET_LANDLOCK (with specific cookie) will trigger all the allowed Landlock programs once. Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Andrew Morton --- include/linux/seccomp.h | 49 +++ include/uapi/linux/seccomp.h | 2 + kernel/fork.c| 39 - kernel/seccomp.c | 190 ++- 4 files changed, 275 insertions(+), 5 deletions(-) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 29b20fe8fd4d..785ccbebf687 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -10,7 +10,33 @@ #include #include +#ifdef CONFIG_SECURITY_LANDLOCK +#include /* struct bpf_prog */ +#endif /* CONFIG_SECURITY_LANDLOCK */ + struct seccomp_filter; + +#ifdef CONFIG_SECURITY_LANDLOCK +struct seccomp_landlock_ret { + struct seccomp_landlock_ret *prev; + /* @filter points to a @landlock_filter list */ + struct seccomp_filter *filter; + u16 cookie; + bool triggered; +}; + +struct seccomp_landlock_prog { + atomic_t usage; + struct seccomp_landlock_prog *prev; + /* +* List of filters (through filter->landlock_prev) allowed to trigger +* this Landlock program. +*/ + struct seccomp_filter *filter; + struct bpf_prog *prog; +}; +#endif /* CONFIG_SECURITY_LANDLOCK */ + /** * struct seccomp - the state of a seccomp'ed process * @@ -18,6 +44,10 @@ struct seccomp_filter; * system calls available to a process. * @filter: must always point to a valid seccomp-filter or NULL as it is * accessed without locking during system call entry. + * @landlock_filter: list of filters allowed to trigger an associated + *Landlock hook via a RET_LANDLOCK. + * @landlock_ret: stored values from a RET_LANDLOCK. + * @landlock_prog: list of Landlock programs. * * @filter must only be accessed from the context of current as there * is no read locking. @@ -25,6 +55,12 @@ struct seccomp_filter; struct seccomp { int mode; struct seccomp_filter *filter; + +#ifdef CONFIG_SECURITY_LANDLOCK + struct seccomp_filter *landlock_filter; + struct seccomp_landlock_ret *landlock_ret; + struct seccomp_landlock_prog *landlock_prog; +#endif /* CONFIG_SECURITY_LANDLOCK */ }; #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER @@ -85,6 +121,12 @@ static inline int seccomp_mode(struct seccomp *s) #ifdef CONFIG_SECCOMP_FILTER extern void put_seccomp(struct task_struct *tsk); extern void get_seccomp_filter(struct task_struct *tsk); +#ifdef CONFIG_SECURITY_LANDLOCK +extern void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret); +extern struct seccomp_landlock_ret *dup_landlock_ret( + struct seccomp_landlock_ret *ret_orig); +#endif /* CONFIG_SECURITY_LANDLOCK */ + #else /* CONFIG_SECCOMP_FILTER */ static inline void put_seccomp(struct task_struct *tsk) { @@ -95,6 +137,13 @@ static inline void get_seccomp_filter(struct task_struct *tsk) { return; } + +#ifdef CONFIG_SECURITY_LANDLOCK +static inline void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret) {} +static inline struct seccomp_landlock_ret *dup_landlock_ret( + struct seccomp_landlock_ret *ret_orig) {} +#endif /* CONFIG_SECURITY_LANDLOCK */ + #endif /* CONFIG_SECCOMP_FILTER */ #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 0f238a43ff1e..b4aab1c19b8a 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -13,6 +13,7 @@ /* Valid operations for seccomp syscall. */ #define SECCOMP_SET_MODE_STRICT0 #define SECCOMP_SET_MODE_FILTER1 +#define SECCOMP_SET_LANDLOCK_HOOK 2 /* Valid flags for SECCOMP_SET_MODE_FILTER */ #define SECCOMP_FILTER_FLAG_TSYNC 1 @@ -28,6 +29,7 @@ #define SECCOMP_RET_KILL 0xU /* kill the task immediately */ #define SECCOMP_RET_TRAP 0x0003U /* disallow and force a SIGSYS */ #define SECCOMP_RET_ERRNO 0x0005U /* returns an errno */ +#define SECCOMP_RET_LANDLOCK 0x0007U /* trigger LSM evaluation */ #define SECCOMP_RET_TRACE 0x7ff0U /* pass to a tracer or disallow */ #define SECCOMP_RET_ALLOW 0x7fffU /* allow */ diff --git a/kernel/fork.c b/kernel/fork.c index b23a71ec8003..3658c1e95e03 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -369,7 +369,12 @@ static struct task_struct *dup_task_struct(struct task_struct
[RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it
This helper will be useful for arraymap (next commit). Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: David S. Miller Cc: Daniel Borkmann --- include/linux/bpf.h | 6 ++ kernel/bpf/syscall.c | 6 -- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0de4de6dd43e..ca3742729ae7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -251,6 +251,12 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) /* verify correctness of eBPF program */ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); + +/* helper to convert user pointers passed inside __aligned_u64 fields */ +static inline void __user *u64_to_ptr(__u64 val) +{ + return (void __user *) (unsigned long) val; +} #else static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 46ecce4b79ed..d305a3ce0fa7 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -247,12 +247,6 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd) return map; } -/* helper to convert user pointers passed inside __aligned_u64 fields */ -static void __user *u64_to_ptr(__u64 val) -{ - return (void __user *) (unsigned long) val; -} - int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) { return -ENOTSUPP; -- 2.8.1
[RFC v2 07/10] landlock: Add errno check
Add a max errno value. This is not strictly needed but should improve reliability. Signed-off-by: Mickaël Salaün Cc: Arnd Bergmann Cc: Serge E. Hallyn Cc: James Morris Cc: Kees Cook --- include/uapi/asm-generic/errno-base.h | 1 + security/landlock/lsm.c | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/uapi/asm-generic/errno-base.h b/include/uapi/asm-generic/errno-base.h index 65115978510f..43407a403e72 100644 --- a/include/uapi/asm-generic/errno-base.h +++ b/include/uapi/asm-generic/errno-base.h @@ -35,5 +35,6 @@ #defineEPIPE 32 /* Broken pipe */ #defineEDOM33 /* Math argument out of domain of func */ #defineERANGE 34 /* Math result not representable */ +#define_ERRNO_LAST ERANGE #endif diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c index aa9d4a64826e..322309068066 100644 --- a/security/landlock/lsm.c +++ b/security/landlock/lsm.c @@ -11,7 +11,6 @@ #include #include /* enum bpf_reg_type, struct landlock_data */ #include -#include /* MAX_ERRNO */ #include /* struct bpf_prog, BPF_PROG_RUN() */ #include /* FIELD_SIZEOF() */ #include @@ -104,8 +103,9 @@ static int landlock_run_prog(__u64 args[6]) } } if (!ret) { - if (cur_ret > MAX_ERRNO) - ret = MAX_ERRNO; + /* check errno to not mess with kernel code */ + if (cur_ret > _ERRNO_LAST) + ret = EPERM; else ret = cur_ret; } -- 2.8.1
Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
On 25/08/2016 13:05, Andy Lutomirski wrote: > On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün wrote: >> Hi, >> >> This series is a proof of concept to fill some missing part of seccomp as the >> ability to check syscall argument pointers or creating more dynamic security >> policies. The goal of this new stackable Linux Security Module (LSM) called >> Landlock is to allow any process, including unprivileged ones, to create >> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the >> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of >> bugs or unexpected/malicious behaviors in userland applications. >> > > Maybe I'm missing an obvious description, but: do you have a > description of the eBPF API to landlock? What function do you > provide, when is it called, what functions can it call, what does the > fancy new arraymap do, etc? > > --Andy > The eBPF context is described in "[RFC v2 06/10] landlock: Add LSM hooks". The provided eBPF functions are described in "[RFC v2 08/10] landlock: Handle file system comparisons" (bpf_landlock_cmp_fs_prop_with_struct_file and bpf_landlock_cmp_fs_beneath_with_struct_file) and "[RFC v2 09/10] landlock: Handle cgroups" (bpf_landlock_cmp_cgroup_beneath). The function descriptions are summarized in include/uapi/linux/bpf.h . This functions can be called by an eBPF program of type BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION and BPF_PROG_TYPE_LANDLOCK_MMAP_FILE as described in "[RFC v2 06/10] landlock: Add LSM hooks". I tried to split the commits as much as possible to ease the review. The "[RFC v2 10/10] samples/landlock: Add sandbox example" may help to see the whole picture. Hope this helps, Mickaël signature.asc Description: OpenPGP digital signature
Re: [RFC v2 08/10] landlock: Handle file system comparisons
On 25/08/2016 13:12, Andy Lutomirski wrote: > On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün wrote: >> Add eBPF functions to compare file system access with a Landlock file >> system handle: >> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file) >> This function allows to compare the dentry, inode, device or mount >> point of the currently accessed file, with a reference handle. >> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file) >> This function allows an eBPF program to check if the current accessed >> file is the same or in the hierarchy of a reference handle. >> >> The goal of file system handle is to abstract kernel objects such as a >> struct file or a struct inode. Userland can create this kind of handle >> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct >> landlock_handle containing the handle type (e.g. >> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could >> also be any descriptions able to match a struct file or a struct inode >> (e.g. path or glob string). > > This needs Eric's opinion. > > Also, where do all the struct file *'s get stashed? Are they > preserved in the arraymap? What prevents reference cycles or absurdly > large numbers of struct files getting pinned? Yes, the struct file are kept in the arraymap and dropped when there is no more reference on them. Currently, the limitations are the maximum number of open file descriptors referring to an arraymap and the maximum number of eBPF Landlock programs loaded in a process (LANDLOCK_PROG_LIST_MAX_PAGES in kernel/seccomp.c). What kind of reference cycles have you in mind? It probably needs another limit for kernel object references as well. What is the best option here? Add another static limitation or use an existing one? Mickaël signature.asc Description: OpenPGP digital signature
Re: [RFC v2 09/10] landlock: Handle cgroups
On 25/08/2016 13:09, Andy Lutomirski wrote: > On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün wrote: >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) >> to compare the current process cgroup with a cgroup handle, The handle >> can match the current cgroup if it is the same or a child. This allows >> to make conditional rules according to the current cgroup. >> >> A cgroup handle is a map entry created from a file descriptor referring >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. > > Can you elaborate on why this is useful? I.e. why not just supply > different policies to different subtrees. The main use case I see is to load the security policies at the start of a user session for all processes but not enforce them right away. The user can then keep a shell for Landlock administration tasks and lock the other processes with a dedicated cgroup on the fly. This allows the user to make unremovable Landlock security policies but only activate them when needed for specific processes. > > Also, how does this interact with the current cgroup v1 vs v2 mess? > As far as I can tell, no one can even really agree on what "what > cgroup am I in" means right now. I tested with cgroup-v2 but indeed, it seems a bit different with cgroup-v1 :) Does anyone know how to handle both cases? > >> >> An unprivileged process can create and manipulate cgroups thanks to >> cgroup delegation. > > What is cgroup delegation? This is simply the action of changing the owner of cgroup sysfs files to allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt) Mickaël signature.asc Description: OpenPGP digital signature
Re: [RFC v2 09/10] landlock: Handle cgroups
On 26/08/2016 04:14, Alexei Starovoitov wrote: > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote: >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op) >> to compare the current process cgroup with a cgroup handle, The handle >> can match the current cgroup if it is the same or a child. This allows >> to make conditional rules according to the current cgroup. >> >> A cgroup handle is a map entry created from a file descriptor referring >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP. >> >> An unprivileged process can create and manipulate cgroups thanks to >> cgroup delegation. >> >> Signed-off-by: Mickaël Salaün > ... >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map, >> +u64 r3_map_op, u64 r4, u64 r5) >> +{ >> +u8 option = (u8) r1_option; >> +struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map; >> +enum bpf_map_array_op map_op = r3_map_op; >> +struct bpf_array *array = container_of(map, struct bpf_array, map); >> +struct cgroup *cg1, *cg2; >> +struct map_landlock_handle *handle; >> +int i; >> + >> +/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */ >> +if (unlikely(!map)) { >> +WARN_ON(1); >> +return -EFAULT; >> +} >> +if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != >> _LANDLOCK_FLAG_OPT_MASK)) >> +return -EINVAL; >> + >> +/* for now, only handle OP_OR */ >> +switch (map_op) { >> +case BPF_MAP_ARRAY_OP_OR: >> +break; >> +case BPF_MAP_ARRAY_OP_UNSPEC: >> +case BPF_MAP_ARRAY_OP_AND: >> +case BPF_MAP_ARRAY_OP_XOR: >> +default: >> +return -EINVAL; >> +} >> + >> +synchronize_rcu(); >> + >> +for (i = 0; i < array->n_entries; i++) { >> +handle = (struct map_landlock_handle *) >> +(array->value + array->elem_size * i); >> + >> +/* protected by the proto types, should not happen */ >> +if (unlikely(handle->type != >> BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) { >> +WARN_ON(1); >> +return -EFAULT; >> +} >> +if (unlikely(!handle->css)) { >> +WARN_ON(1); >> +return -EFAULT; >> +} >> + >> +if (option & LANDLOCK_FLAG_OPT_REVERSE) { >> +cg1 = handle->css->cgroup; >> +cg2 = task_css_set(current)->dfl_cgrp; >> +} else { >> +cg1 = task_css_set(current)->dfl_cgrp; >> +cg2 = handle->css->cgroup; >> +} >> + >> +if (cgroup_is_descendant(cg1, cg2)) >> +return 0; >> +} >> +return 1; >> +} > > - please take a loook at exisiting bpf_current_task_under_cgroup and > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array > is nothing but duplication of the code. Oh, I didn't know about this patchset and the new helper. Indeed, it looks a lot like mine except there is no static verification of the map type as I did with the arraymap of handles, and no batch mode either. I think the return value of bpf_current_task_under_cgroup is error-prone if an eBPF program do an "if(ret)" test on the value (because of the negative ERRNO return value). Inverting the 0 and 1 return values should fix this (0 == succeed, 1 == failed, <0 == error). To sum up, there is four related patchsets: * "Landlock LSM: Unprivileged sandboxing" (this series) * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon) * "Networking cgroup controller" (Anoop Naravaram) * "Add eBPF hooks for cgroups" (Daniel Mack) The three other series (Sargun's, Anoop's and Daniel's) are mainly focused on network access-control via cgroup for *containers*. As far as I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's goal is to empower all processes (privileged or not) to create their own sandbox. This also means, like explained in "[RFC v2 00/10] Landlock LSM: Unprivileged sandboxing", there is more constraints. For example, it is not acceptable to let a process probe the kernel memory as it wish. More details are in the
[PATCH v5 1/7] selftests: Make test_harness.h more generally available
The seccomp/test_harness.h file contains useful helpers to build tests. Moving it to the selftest directory should benefit to other test components. Keep seccomp maintainers for this file. Changes since v1: * rename to kselftest_harness.h (suggested by Shuah Khan) * keep maintainers Signed-off-by: Mickaël Salaün Acked-by: Kees Cook Acked-by: Will Drewry Cc: Andy Lutomirski Cc: Shuah Khan Link: https://lkml.kernel.org/r/CAGXu5j+8CVz8vL51DRYXqOY=xc3zuKFf=ptene88xyhzfyi...@mail.gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/{seccomp/test_harness.h => kselftest_harness.h} | 0 tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 3 files changed, 2 insertions(+), 1 deletion(-) rename tools/testing/selftests/{seccomp/test_harness.h => kselftest_harness.h} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index f7d568b8f133..ef292b8c771d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11492,6 +11492,7 @@ F: kernel/seccomp.c F: include/uapi/linux/seccomp.h F: include/linux/seccomp.h F: tools/testing/selftests/seccomp/* +F: tools/testing/selftests/kselftest_harness.h K: \bsecure_computing K: \bTIF_SECCOMP\b diff --git a/tools/testing/selftests/seccomp/test_harness.h b/tools/testing/selftests/kselftest_harness.h similarity index 100% rename from tools/testing/selftests/seccomp/test_harness.h rename to tools/testing/selftests/kselftest_harness.h diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 03f1fa495d74..7ba94efb24fd 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -37,7 +37,7 @@ #include #include -#include "test_harness.h" +#include "../kselftest_harness.h" #ifndef PR_SET_PTRACER # define PR_SET_PTRACER 0x59616d61 -- 2.11.0
[PATCH v5 3/7] selftests/seccomp: Force rebuild according to dependencies
Rebuild the seccomp tests when kselftest_harness.h is updated. Signed-off-by: Mickaël Salaün Acked-by: Kees Cook Cc: Andy Lutomirski Cc: Shuah Khan Cc: Will Drewry --- tools/testing/selftests/seccomp/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/seccomp/Makefile b/tools/testing/selftests/seccomp/Makefile index 5fa6fd2246b1..aeb0c805f3ca 100644 --- a/tools/testing/selftests/seccomp/Makefile +++ b/tools/testing/selftests/seccomp/Makefile @@ -4,3 +4,5 @@ LDFLAGS += -lpthread include ../lib.mk +$(TEST_GEN_PROGS): seccomp_bpf.c ../kselftest_harness.h + $(CC) $(CFLAGS) $(LDFLAGS) $< -o $@ -- 2.11.0
[PATCH v5 7/7] Documentation/dev-tools: Add kselftest_harness documentation
Add ReST metadata to kselftest_harness.h to be able to include the comments in the Sphinx documentation. Changes since v4: * exclude the TEST_API() changes (requested by Kees Cook) Changes since v3: * document macros as actual functions (suggested by Jonathan Corbet) * remove the TEST_API() wrapper to expose the underlying macro arguments to the documentation tools * move and cleanup comments Changes since v2: * add reference to the full documentation in the header file (suggested by Kees Cook) Signed-off-by: Mickaël Salaün Cc: Andy Lutomirski Cc: Jonathan Corbet Cc: Kees Cook Cc: Shuah Khan Cc: Will Drewry --- Documentation/dev-tools/kselftest.rst | 34 +++ tools/testing/selftests/kselftest_harness.h | 415 ++-- 2 files changed, 364 insertions(+), 85 deletions(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 9232ce94612c..a92fa181b6cf 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -120,3 +120,37 @@ Contributing new tests (details) executable which is not tested by default. TEST_FILES, TEST_GEN_FILES mean it is the file which is used by test. + +Test Harness + + +The kselftest_harness.h file contains useful helpers to build tests. The tests +from tools/testing/selftests/seccomp/seccomp_bpf.c can be used as example. + +Example +--- + +.. kernel-doc:: tools/testing/selftests/kselftest_harness.h +:doc: example + + +Helpers +--- + +.. kernel-doc:: tools/testing/selftests/kselftest_harness.h +:functions: TH_LOG TEST TEST_SIGNAL FIXTURE FIXTURE_DATA FIXTURE_SETUP +FIXTURE_TEARDOWN TEST_F TEST_HARNESS_MAIN + +Operators +- + +.. kernel-doc:: tools/testing/selftests/kselftest_harness.h +:doc: operators + +.. kernel-doc:: tools/testing/selftests/kselftest_harness.h +:functions: ASSERT_EQ ASSERT_NE ASSERT_LT ASSERT_LE ASSERT_GT ASSERT_GE +ASSERT_NULL ASSERT_TRUE ASSERT_NULL ASSERT_TRUE ASSERT_FALSE +ASSERT_STREQ ASSERT_STRNE EXPECT_EQ EXPECT_NE EXPECT_LT +EXPECT_LE EXPECT_GT EXPECT_GE EXPECT_NULL EXPECT_TRUE +EXPECT_FALSE EXPECT_STREQ EXPECT_STRNE + diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 45f807ce37e1..c56f72e07cd7 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -4,41 +4,49 @@ * * kselftest_harness.h: simple C unit test helper. * - * Usage: - * #include "../kselftest_harness.h" - * TEST(standalone_test) { - * do_some_stuff; - * EXPECT_GT(10, stuff) { - *stuff_state_t state; - *enumerate_stuff_state(&state); - *TH_LOG("expectation failed with state: %s", state.msg); - * } - * more_stuff; - * ASSERT_NE(some_stuff, NULL) TH_LOG("how did it happen?!"); - * last_stuff; - * EXPECT_EQ(0, last_stuff); - * } - * - * FIXTURE(my_fixture) { - * mytype_t *data; - * int awesomeness_level; - * }; - * FIXTURE_SETUP(my_fixture) { - * self->data = mytype_new(); - * ASSERT_NE(NULL, self->data); - * } - * FIXTURE_TEARDOWN(my_fixture) { - * mytype_free(self->data); - * } - * TEST_F(my_fixture, data_is_good) { - * EXPECT_EQ(1, is_my_data_good(self->data)); - * } - * - * TEST_HARNESS_MAIN + * See documentation in Documentation/dev-tools/kselftest.rst * * API inspired by code.google.com/p/googletest */ +/** + * DOC: example + * + * .. code-block:: c + * + *#include "../kselftest_harness.h" + * + *TEST(standalone_test) { + * do_some_stuff; + * EXPECT_GT(10, stuff) { + * stuff_state_t state; + * enumerate_stuff_state(&state); + * TH_LOG("expectation failed with state: %s", state.msg); + * } + * more_stuff; + * ASSERT_NE(some_stuff, NULL) TH_LOG("how did it happen?!"); + * last_stuff; + * EXPECT_EQ(0, last_stuff); + *} + * + *FIXTURE(my_fixture) { + * mytype_t *data; + * int awesomeness_level; + *}; + *FIXTURE_SETUP(my_fixture) { + * self->data = mytype_new(); + * ASSERT_NE(NULL, self->data); + *} + *FIXTURE_TEARDOWN(my_fixture) { + * mytype_free(self->data); + *} + *TEST_F(my_fixture, data_is_good) { + * EXPECT_EQ(1, is_my_data_good(self->data)); + *} + * + *TEST_HARNESS_MAIN + */ + #ifndef __KSELFTEST_HARNESS_H #define __KSELFTEST_HARNESS_H @@ -61,10 +69,20 @@ # define TH_LOG_ENABLED 1 #endif -/* TH_LOG(format, ...) +/** + * TH_LOG(fmt, ...) + * + * @fmt: format string + * @...: optional arguments + * + * .. code-block:: c + * + * TH_LOG(format, ...) + * * Optional debug logging function available for use in tests. * Logging may be enabled or disabled by defining TH_LOG_
[PATCH v5 6/7] selftests: Remove the TEST_API() wrapper from kselftest_harness.h
Remove the TEST_API() wrapper to expose the underlying macro arguments to the documentation tools. Use "git diff --patience" to get a more readable patch. Changes since v4: * standalone patch to ease the review (requested by Kees Cook) Signed-off-by: Mickaël Salaün Cc: Andy Lutomirski Cc: Jonathan Corbet Cc: Kees Cook Cc: Shuah Khan Cc: Will Drewry --- tools/testing/selftests/kselftest_harness.h | 349 1 file changed, 147 insertions(+), 202 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 171e70aead9c..45f807ce37e1 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -51,147 +51,6 @@ #include #include -/* All exported functionality should be declared through this macro. */ -#define TEST_API(x) _##x - -/* - * Exported APIs - */ - -/* TEST(name) { implementation } - * Defines a test by name. - * Names must be unique and tests must not be run in parallel. The - * implementation containing block is a function and scoping should be treated - * as such. Returning early may be performed with a bare "return;" statement. - * - * EXPECT_* and ASSERT_* are valid in a TEST() { } context. - */ -#define TEST TEST_API(TEST) - -/* TEST_SIGNAL(name, signal) { implementation } - * Defines a test by name and the expected term signal. - * Names must be unique and tests must not be run in parallel. The - * implementation containing block is a function and scoping should be treated - * as such. Returning early may be performed with a bare "return;" statement. - * - * EXPECT_* and ASSERT_* are valid in a TEST() { } context. - */ -#define TEST_SIGNAL TEST_API(TEST_SIGNAL) - -/* FIXTURE(datatype name) { - * type property1; - * ... - * }; - * Defines the data provided to TEST_F()-defined tests as |self|. It should be - * populated and cleaned up using FIXTURE_SETUP and FIXTURE_TEARDOWN. - */ -#define FIXTURE TEST_API(FIXTURE) - -/* FIXTURE_DATA(datatype name) - * This call may be used when the type of the fixture data - * is needed. In general, this should not be needed unless - * the |self| is being passed to a helper directly. - */ -#define FIXTURE_DATA TEST_API(FIXTURE_DATA) - -/* FIXTURE_SETUP(fixture name) { implementation } - * Populates the required "setup" function for a fixture. An instance of the - * datatype defined with _FIXTURE_DATA will be exposed as |self| for the - * implementation. - * - * ASSERT_* are valid for use in this context and will prempt the execution - * of any dependent fixture tests. - * - * A bare "return;" statement may be used to return early. - */ -#define FIXTURE_SETUP TEST_API(FIXTURE_SETUP) - -/* FIXTURE_TEARDOWN(fixture name) { implementation } - * Populates the required "teardown" function for a fixture. An instance of the - * datatype defined with _FIXTURE_DATA will be exposed as |self| for the - * implementation to clean up. - * - * A bare "return;" statement may be used to return early. - */ -#define FIXTURE_TEARDOWN TEST_API(FIXTURE_TEARDOWN) - -/* TEST_F(fixture, name) { implementation } - * Defines a test that depends on a fixture (e.g., is part of a test case). - * Very similar to TEST() except that |self| is the setup instance of fixture's - * datatype exposed for use by the implementation. - */ -#define TEST_F TEST_API(TEST_F) - -#define TEST_F_SIGNAL TEST_API(TEST_F_SIGNAL) - -/* Use once to append a main() to the test file. E.g., - * TEST_HARNESS_MAIN - */ -#define TEST_HARNESS_MAIN TEST_API(TEST_HARNESS_MAIN) - -/* - * Operators for use in TEST and TEST_F. - * ASSERT_* calls will stop test execution immediately. - * EXPECT_* calls will emit a failure warning, note it, and continue. - */ - -/* ASSERT_EQ(expected, measured): expected == measured */ -#define ASSERT_EQ TEST_API(ASSERT_EQ) -/* ASSERT_NE(expected, measured): expected != measured */ -#define ASSERT_NE TEST_API(ASSERT_NE) -/* ASSERT_LT(expected, measured): expected < measured */ -#define ASSERT_LT TEST_API(ASSERT_LT) -/* ASSERT_LE(expected, measured): expected <= measured */ -#define ASSERT_LE TEST_API(ASSERT_LE) -/* ASSERT_GT(expected, measured): expected > measured */ -#define ASSERT_GT TEST_API(ASSERT_GT) -/* ASSERT_GE(expected, measured): expected >= measured */ -#define ASSERT_GE TEST_API(ASSERT_GE) -/* ASSERT_NULL(measured): NULL == measured */ -#define ASSERT_NULL TEST_API(ASSERT_NULL) -/* ASSERT_TRUE(measured): measured != 0 */ -#define ASSERT_TRUE TEST_API(ASSERT_TRUE) -/* ASSERT_FALSE(measured): measured == 0 */ -#define ASSERT_FALSE TEST_API(ASSERT_FALSE) -/* ASSERT_STREQ(expected, measured): !strcmp(expected, measured) */ -#define ASSERT_STREQ TEST_API(ASSERT_STREQ) -/* ASSERT_STRNE(expected, measured): strcmp(expected, measured) */ -#define ASSERT_STRNE TEST_API(ASSERT_STRNE) -/* EXPECT_EQ(expected, measured): expected == measure
[PATCH v5 2/7] selftests: Cosmetic renames in kselftest_harness.h
Keep the content consistent with the new name. Signed-off-by: Mickaël Salaün Acked-by: Kees Cook Cc: Andy Lutomirski Cc: Shuah Khan Cc: Will Drewry --- tools/testing/selftests/kselftest_harness.h | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index a786c69c7584..171e70aead9c 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -2,10 +2,10 @@ * Copyright (c) 2012 The Chromium OS Authors. All rights reserved. * Use of this source code is governed by the GPLv2 license. * - * test_harness.h: simple C unit test helper. + * kselftest_harness.h: simple C unit test helper. * * Usage: - * #include "test_harness.h" + * #include "../kselftest_harness.h" * TEST(standalone_test) { * do_some_stuff; * EXPECT_GT(10, stuff) { @@ -38,8 +38,9 @@ * * API inspired by code.google.com/p/googletest */ -#ifndef TEST_HARNESS_H_ -#define TEST_HARNESS_H_ + +#ifndef __KSELFTEST_HARNESS_H +#define __KSELFTEST_HARNESS_H #define _GNU_SOURCE #include @@ -532,4 +533,4 @@ static void __attribute__((constructor)) __constructor_order_first(void) __constructor_order = _CONSTRUCTOR_ORDER_FORWARD; } -#endif /* TEST_HARNESS_H_ */ +#endif /* __KSELFTEST_HARNESS_H */ -- 2.11.0
[PATCH v5 0/7] Add kselftest_harness.h
Hi, This patch series make the seccomp/test_harness.h more generally available [1] and update the kselftest documentation in the Sphinx format. It also improve the Makefile of seccomp tests to take into account any kselftest_harness.h update. [1] https://lkml.kernel.org/r/CAGXu5j+8CVz8vL51DRYXqOY=xc3zuKFf=ptene88xyhzfyi...@mail.gmail.com Regards, Mickaël Salaün (7): selftests: Make test_harness.h more generally available selftests: Cosmetic renames in kselftest_harness.h selftests/seccomp: Force rebuild according to dependencies Documentation/dev-tools: Add kselftest Documentation/dev-tools: Use reStructuredText markups for kselftest selftests: Remove the TEST_API() wrapper from kselftest_harness.h Documentation/dev-tools: Add kselftest_harness documentation Documentation/00-INDEX | 2 - Documentation/dev-tools/index.rst | 1 + .../{kselftest.txt => dev-tools/kselftest.rst} | 101 ++- MAINTAINERS| 1 + .../test_harness.h => kselftest_harness.h} | 691 + tools/testing/selftests/seccomp/Makefile | 2 + tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 7 files changed, 520 insertions(+), 280 deletions(-) rename Documentation/{kselftest.txt => dev-tools/kselftest.rst} (52%) rename tools/testing/selftests/{seccomp/test_harness.h => kselftest_harness.h} (52%) -- 2.11.0
[PATCH v5 4/7] Documentation/dev-tools: Add kselftest
Move kselftest.txt to dev-tools/kselftest.rst . Signed-off-by: Mickaël Salaün Acked-by: Kees Cook Cc: Jonathan Corbet Cc: Shuah Khan --- Documentation/00-INDEX | 2 -- Documentation/{kselftest.txt => dev-tools/kselftest.rst} | 0 2 files changed, 2 deletions(-) rename Documentation/{kselftest.txt => dev-tools/kselftest.rst} (100%) diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index ed3e5e949fce..6daf51536153 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -246,8 +246,6 @@ kprobes.txt - documents the kernel probes debugging feature. kref.txt - docs on adding reference counters (krefs) to kernel objects. -kselftest.txt - - small unittests for (some) individual codepaths in the kernel. laptops/ - directory with laptop related info and laptop driver documentation. ldm.txt diff --git a/Documentation/kselftest.txt b/Documentation/dev-tools/kselftest.rst similarity index 100% rename from Documentation/kselftest.txt rename to Documentation/dev-tools/kselftest.rst -- 2.11.0
[PATCH v5 5/7] Documentation/dev-tools: Use reStructuredText markups for kselftest
Include and convert kselftest to the Sphinx format. Changes since v2: * lighten the modifications (suggested by Kees Cook) Signed-off-by: Mickaël Salaün Acked-by: Kees Cook Cc: Jonathan Corbet Cc: Shuah Khan --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/kselftest.rst | 67 +-- 2 files changed, 41 insertions(+), 27 deletions(-) diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index 07d881147ef3..e50054c6aeaa 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -23,6 +23,7 @@ whole; patches welcome! kmemleak kmemcheck gdb-kernel-debugging + kselftest .. only:: subproject and html diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 5bd590335839..9232ce94612c 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -1,4 +1,6 @@ +== Linux Kernel Selftests +== The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small tests to exercise individual code @@ -15,29 +17,34 @@ hotplug test is run on 2% of hotplug capable memory instead of 10%. Running the selftests (hotplug tests are run in limited mode) = -To build the tests: - $ make -C tools/testing/selftests +To build the tests:: +make -C tools/testing/selftests -To run the tests: - $ make -C tools/testing/selftests run_tests +To run the tests:: -To build and run the tests with a single command, use: - $ make kselftest +make -C tools/testing/selftests run_tests -- note that some tests will require root privileges. +To build and run the tests with a single command, use:: + +make kselftest + +Note that some tests will require root privileges. Running a subset of selftests - += + You can use the "TARGETS" variable on the make command line to specify single test to run, or a list of tests to run. -To run only tests targeted for a single subsystem: - $ make -C tools/testing/selftests TARGETS=ptrace run_tests +To run only tests targeted for a single subsystem:: -You can specify multiple tests to build and run: - $ make TARGETS="size timers" kselftest +make -C tools/testing/selftests TARGETS=ptrace run_tests + +You can specify multiple tests to build and run:: + +make TARGETS="size timers" kselftest See the top-level tools/testing/selftests/Makefile for the list of all possible targets. @@ -46,13 +53,15 @@ possible targets. Running the full range hotplug selftests -To build the hotplug tests: - $ make -C tools/testing/selftests hotplug +To build the hotplug tests:: -To run the hotplug tests: - $ make -C tools/testing/selftests run_hotplug +make -C tools/testing/selftests hotplug -- note that some tests will require root privileges. +To run the hotplug tests:: + +make -C tools/testing/selftests run_hotplug + +Note that some tests will require root privileges. Install selftests @@ -62,13 +71,15 @@ You can use kselftest_install.sh tool installs selftests in default location which is tools/testing/selftests/kselftest or a user specified location. -To install selftests in default location: - $ cd tools/testing/selftests - $ ./kselftest_install.sh +To install selftests in default location:: -To install selftests in a user specified location: - $ cd tools/testing/selftests - $ ./kselftest_install.sh install_dir +cd tools/testing/selftests +./kselftest_install.sh + +To install selftests in a user specified location:: + +cd tools/testing/selftests +./kselftest_install.sh install_dir Running installed selftests === @@ -79,8 +90,10 @@ named "run_kselftest.sh" to run the tests. You can simply do the following to run the installed Kselftests. Please note some tests will require root privileges. -cd kselftest -./run_kselftest.sh +:: + +cd kselftest +./run_kselftest.sh Contributing new tests == @@ -96,8 +109,8 @@ In general, the rules for selftests are * Don't cause the top-level "make run_tests" to fail if your feature is unconfigured. -Contributing new tests(details) -=== +Contributing new tests (details) + * Use TEST_GEN_XXX if such binaries or files are generated during compiling. -- 2.11.0
[PATCH net-next v1] bpf: Use the IS_FD_ARRAY() macro in map_update_elem()
Make the code more readable. Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann --- kernel/bpf/syscall.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5bdb0cc84ad2..e24aa3241387 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -709,10 +709,7 @@ static int map_update_elem(union bpf_attr *attr) err = bpf_percpu_hash_update(map, key, value, attr->flags); } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_update(map, key, value, attr->flags); - } else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || - map->map_type == BPF_MAP_TYPE_PROG_ARRAY || - map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || - map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { + } else if (IS_FD_ARRAY(map)) { rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, f.file, key, value, attr->flags); -- 2.15.1
[PATCH net-next v1] samples/bpf: Partially fixes the bpf.o build
Do not build lib/bpf/bpf.o with this Makefile but use the one from the library directory. This avoid making a buggy bpf.o file (e.g. missing symbols). This patch is useful if some code (e.g. Landlock tests) needs both the bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf). Signed-off-by: Mickaël Salaün Cc: Alexei Starovoitov Cc: Daniel Borkmann --- This is not a complet fix because the call to multi_depend with $(host-cmulti) from scripts/Makefile.host force the build of bpf.o anyway. I'm not sure how to completely avoid this automatic build though. --- samples/bpf/Makefile | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 7f61a3d57fa7..64335bb94f9f 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -201,13 +201,16 @@ CLANG_ARCH_ARGS = -target $(ARCH) endif # Trick to allow make to be run from this directory -all: +all: $(LIBBPF) $(MAKE) -C ../../ $(CURDIR)/ clean: $(MAKE) -C ../../ M=$(CURDIR) clean @rm -f *~ +$(LIBBPF): FORCE + $(MAKE) -C $(dir $@) $(notdir $@) + $(obj)/syscall_nrs.s: $(src)/syscall_nrs.c $(call if_changed_dep,cc_s_c) -- 2.15.1
Re: [PATCH v1] samples/bpf: Add a .gitignore for binaries
On 13/02/2017 02:43, David Ahern wrote: > On 2/12/17 2:23 PM, Mickaël Salaün wrote: >> diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore >> new file mode 100644 >> index ..a7562a5ef4c2 >> --- /dev/null >> +++ b/samples/bpf/.gitignore >> @@ -0,0 +1,32 @@ >> +fds_example >> +lathist > > ... > > Listing each target is going to be a PITA to maintain. It would be > better to put targets into a build directory (bin?) and ignore the > directory. > It would require a lot of modifications to the Makefile and more complexity. It seems much more simple for everyone to stick to a simple gitignore file easily maintainable: $ awk '$1 == "hostprogs-y" { print $3 }' < Makefile > .gitignore Alexei, Daniel, what do you think about this? Do you want me to send a v2 with the new tests? Mickaël signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next v1] samples/bpf: Partially fixes the bpf.o build
On 26/01/2018 03:16, Alexei Starovoitov wrote: > On Fri, Jan 26, 2018 at 01:39:30AM +0100, Mickaël Salaün wrote: >> Do not build lib/bpf/bpf.o with this Makefile but use the one from the >> library directory. This avoid making a buggy bpf.o file (e.g. missing >> symbols). > > could you provide an example? > What symbols will be missing? > I don't think there is an issue with existing Makefile. You can run this commands: make -C samples/bpf; nm tools/lib/bpf/bpf.o > a; make -C tools/lib/bpf; nm tools/lib/bpf/bpf.o > b; diff -u a b Symbols like bzero and sys_bpf are missing with the samples/bpf Makefile, which makes the bpf.o shrink from 25K to 7K. > >> This patch is useful if some code (e.g. Landlock tests) needs both the >> bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf). > > is that some future patches? Yes, I'll send them next week. > > we're trying to move everything form samples/bpf/ into selftests/bpf/ > and convert to use libbpf.a instead of obsolete bpf_load.c > Please use this approach for landlock as well. Ok, it should be better with this lib. > >> Signed-off-by: Mickaël Salaün >> Cc: Alexei Starovoitov >> Cc: Daniel Borkmann >> --- >> >> This is not a complet fix because the call to multi_depend with >> $(host-cmulti) from scripts/Makefile.host force the build of bpf.o >> anyway. I'm not sure how to completely avoid this automatic build >> though. >> --- >> samples/bpf/Makefile | 5 - >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile >> index 7f61a3d57fa7..64335bb94f9f 100644 >> --- a/samples/bpf/Makefile >> +++ b/samples/bpf/Makefile >> @@ -201,13 +201,16 @@ CLANG_ARCH_ARGS = -target $(ARCH) >> endif >> >> # Trick to allow make to be run from this directory >> -all: >> +all: $(LIBBPF) >> $(MAKE) -C ../../ $(CURDIR)/ >> >> clean: >> $(MAKE) -C ../../ M=$(CURDIR) clean >> @rm -f *~ >> >> +$(LIBBPF): FORCE >> +$(MAKE) -C $(dir $@) $(notdir $@) >> + >> $(obj)/syscall_nrs.s: $(src)/syscall_nrs.c >> $(call if_changed_dep,cc_s_c) >> >> -- >> 2.15.1 >> > signature.asc Description: OpenPGP digital signature
Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata
On 02/27/2018 02:23 AM, Al Viro wrote: > On Tue, Feb 27, 2018 at 12:57:21AM +, Al Viro wrote: >> On Tue, Feb 27, 2018 at 01:41:11AM +0100, Mickaël Salaün wrote: >>> The function current_nameidata_security(struct inode *) can be used to >>> retrieve a blob's pointer address tied to the inode being walk through. >>> This enable to follow a path lookup and know where an inode access come >>> from. This is needed for the Landlock LSM to be able to restrict access >>> to file path. >>> >>> The LSM hook nameidata_free_security(struct inode *) is called before >>> freeing the associated nameidata. >> >> NAK. Not without well-defined semantics and "some Linux S&M uses that for >> something, don't ask what" does not count. > > Incidentally, pathwalk mechanics is subject to change at zero notice, so > if you want something, you'd better > * have explicitly defined semantics > * explain what it is - on fsdevel > * not have it hidden behind the layers of opaque LSM dreck, pardon > the redundance. > > Again, pathwalk internals have changed in the past and may bloody well > change again in the future. There's a damn good reason why struct nameidata > is _not_ visible outside of fs/namei.c, and quietly relying upon any > implementation details is no-go. > I thought this whole patch series would go to linux-fsdevel but only this patch did. I'll CCed fsdevel for the next round. Meanwhile, the cover letter is here: https://lkml.org/lkml/2018/2/26/1214 The code using current_nameidata_lookup(inode) is in the patch 07/11: https://lkml.org/lkml/2018/2/26/1206 To sum up, I don't know any way to identify if a directory (execute) access was directly requested by a process or inferred by the kernel because of a path walk. This was not needed until now because the other access control systems (either the DAC or access controls enforced by inode-based LSM, i.e. SELinux and Smack) do not care about the file hierarchy. Path-based access controls (i.e. AppArmor and Tomoyo) directly use the notion of path to define a security policy (in the kernel, not only in the user space configuration). Landlock can't rely on xattrs (because of composed and unprivileged access control). Because we can't know for sure from which path an inode come from (if any), path-based LSM hooks do not help for some file system checks (e.g. inode_permission). With Landlock, I try to find a way to identify a set of inodes, from the user space point of view, which is most of the time related to file hierarchies. I needed a way to "follow" a path walk, with the minimum amount of code, and if possible without touching the fs/namei.c . I saw that the pathwalk mechanism has evolved over time. With this patch, I tried to make a kernel object (nameidata) usable in some way by LSM, but only through an inode (current_nameidata_lookup(inode)). The "only" guarantee of this function should be to identify if an inode is tied to a path walk. This enable to follow a path walk and know why an inode access is requested. I get your concern about the "instability" of the path walk mechanism. However, I though that a path resolution should not change from the user space point of view, like other Linux ABI. Anyway, all the current inode-based access controls, including DAC, rely on this path walks mechanism. This patch does not expose anything to user space, but only through the API of Landlock, which is currently relying on path walk resolutions, already visible to user space. Did I miss something? Do you have another suggestion to tie an inode to a path walk? Thanks, Mickaël signature.asc Description: OpenPGP digital signature
Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
On 28/02/2018 00:09, Andy Lutomirski wrote: > On Tue, Feb 27, 2018 at 10:03 PM, Mickaël Salaün wrote: >> >> On 27/02/2018 05:36, Andy Lutomirski wrote: >>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün wrote: >>>> Hi, >>>> > >>>> >>>> ## Why use the seccomp(2) syscall? >>>> >>>> Landlock use the same semantic as seccomp to apply access rule >>>> restrictions. It add a new layer of security for the current process >>>> which is inherited by its children. It makes sense to use an unique >>>> access-restricting syscall (that should be allowed by seccomp filters) >>>> which can only drop privileges. Moreover, a Landlock rule could come >>>> from outside a process (e.g. passed through a UNIX socket). It is then >>>> useful to differentiate the creation/load of Landlock eBPF programs via >>>> bpf(2), from rule enforcement via seccomp(2). >>> >>> This seems like a weak argument to me. Sure, this is a bit different >>> from seccomp(), and maybe shoving it into the seccomp() multiplexer is >>> awkward, but surely the bpf() multiplexer is even less applicable. >> >> I think using the seccomp syscall is fine, and everyone agreed on it. >> > > Ah, sorry, I completely misread what you wrote. My apologies. You > can disregard most of my email. > >> >>> >>> Also, looking forward, I think you're going to want a bunch of the >>> stuff that's under consideration as new seccomp features. Tycho is >>> working on a "user notifier" feature for seccomp where, in addition to >>> accepting, rejecting, or kicking to ptrace, you can send a message to >>> the creator of the filter and wait for a reply. I think that Landlock >>> will want exactly the same feature. >> >> I don't think why this may be useful at all her. Landlock does not >> filter at the syscall level but handles kernel object and actions as >> does an LSM. That is the whole purpose of Landlock. > > Suppose I'm writing a container manager. I want to run "mount" in the > container, but I don't want to allow moun() in general and I want to > emulate certain mount() actions. I can write a filter that catches > mount using seccomp and calls out to the container manager for help. > This isn't theoretical -- Tycho wants *exactly* this use case to be > supported. Well, I think this use case should be handled with something like LD_PRELOAD and a helper library. FYI, I did something like this: https://github.com/stemjail/stemshim Otherwise, we should think about enabling a process to (dynamically) extend/patch the vDSO (similar to LD_PRELOAD but at the syscall level and works with static binaries) for a subset of processes (the same way seccomp filters are inherited). It may be more powerful and flexible than extending the kernel/seccomp to patch (buggy?) userland. > > But using seccomp for this is indeed annoying. It would be nice to > use Landlock's ability to filter based on the filesystem type, for > example. So Tycho could write a Landlock rule like: > > bool filter_mount(...) > { > if (path needs emulation) > call_user_notifier(); > } > > And it should work. > > This means that, if both seccomp user notifiers and Landlock make it > upstream, then there should probably be a way to have a user notifier > bound to a seccomp filter and a set of landlock filters. > Using seccomp filters and Landlock programs may be powerful. However, for this use case, I think a *post-syscall* vDSO-like (which could get some data returned by a Landlock program) may be much more flexible (with less kernel code). What is needed here is a way to know the kernel semantic (Landlock) and a way to patch userland without patching its code (vDSO-like). signature.asc Description: OpenPGP digital signature
Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions
On 28/02/2018 01:09, Andy Lutomirski wrote: > On Wed, Feb 28, 2018 at 12:00 AM, Mickaël Salaün wrote: >> >> On 28/02/2018 00:23, Andy Lutomirski wrote: >>> On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski wrote: >>>> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün wrote: >>>>> >>>> >>>> I think you're wrong here. Any sane container trying to use Landlock >>>> like this would also create a PID namespace. Problem solved. I still >>>> think you should drop this patch. >> >> Containers is one use case, another is build-in sandboxing (e.g. for web >> browser…) and another one is for sandbox managers (e.g. Firejail, >> Bubblewrap, Flatpack…). In some of these use cases, especially from a >> developer point of view, you may want/need to debug your applications >> (without requiring to be root). For nested Landlock access-controls >> (e.g. container + user session + web browser), it may not be allowed to >> create a PID namespace, but you still want to have a meaningful >> access-control. >> > > The consideration should be exactly the same as for normal seccomp. > If I'm in a container (using PID namespaces + seccomp) and a run a web > browser, I can debug the browser. > > If there's a real use case for adding this type of automatic ptrace > protection, then by all means, let's add it as a general seccomp > feature. > Right, it makes sense to add this feature to seccomp filters as well. What do you think Kees? signature.asc Description: OpenPGP digital signature
Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
On 06/03/2018 23:46, Tycho Andersen wrote: > On Tue, Mar 06, 2018 at 10:33:17PM +, Andy Lutomirski wrote: Suppose I'm writing a container manager. I want to run "mount" in the container, but I don't want to allow moun() in general and I want to emulate certain mount() actions. I can write a filter that catches mount using seccomp and calls out to the container manager for help. This isn't theoretical -- Tycho wants *exactly* this use case to be supported. >>> >>> Well, I think this use case should be handled with something like >>> LD_PRELOAD and a helper library. FYI, I did something like this: >>> https://github.com/stemjail/stemshim >> >> I doubt that will work for containers. Containers that use user >> namespaces and, for example, setuid programs aren't going to honor >> LD_PRELOAD. > > Or anything that calls syscalls directly, like go programs. That's why the vDSO-like approach. Enforcing an access control is not the issue here, patching a buggy userland (without patching its code) is the issue isn't it? As far as I remember, the main problem is to handle file descriptors while "emulating" the kernel behavior. This can be done with a "shim" code mapped in every processes. Chrome used something like this (in a previous sandbox mechanism) as a kind of emulation (with the current seccomp-bpf ). I think it should be doable to replace the (userland) emulation code with an IPC wrapper receiving file descriptors through UNIX socket. signature.asc Description: OpenPGP digital signature
Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
On 02/27/2018 10:48 PM, Mickaël Salaün wrote: > > On 27/02/2018 17:39, Andy Lutomirski wrote: >> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov >> wrote: >>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote: >>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov >>>> wrote: >>>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote: >>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov >>>>>> wrote: >>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote: >>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock >>>>>>>> program >>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the >>>>>>>> current task and all its future children. A program is immutable and a >>>>>>>> task can only add new restricting programs to itself, forming a list of >>>>>>>> programss. >>>>>>>> >>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a >>>>>>>> kernel >>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC, >>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of >>>>>>>> object is triggered. The list of programs for this hook is then >>>>>>>> evaluated. Each program return a 32-bit value which can deny the action >>>>>>>> on a kernel object with a non-zero value. If every programs of the list >>>>>>>> return zero, then the action on the object is allowed. >>>>>>>> >>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for >>>>>>>> a >>>>>>>> call chain (e.g. evaluating multiple elements of a file path). This >>>>>>>> chaining is restricted when a process construct this chain by loading a >>>>>>>> program, but additional checks are performed when it requests to apply >>>>>>>> this chain of programs to itself. The restrictions ensure that it is >>>>>>>> not possible to call multiple programs in a way that would imply to >>>>>>>> handle multiple shared values (i.e. cookies) for one chain. For now, >>>>>>>> only a fs_pick program can be chained to the same type of program, >>>>>>>> because it may make sense if they have different triggers (cf. next >>>>>>>> commits). This restrictions still allows to reuse Landlock programs in >>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple >>>>>>>> chains of fs_pick programs). >>>>>>>> >>>>>>>> Signed-off-by: Mickaël Salaün >>>>>>> >>>>>>> ... >>>>>>> >>>>>>>> +struct landlock_prog_set *landlock_prepend_prog( >>>>>>>> + struct landlock_prog_set *current_prog_set, >>>>>>>> + struct bpf_prog *prog) >>>>>>>> +{ >>>>>>>> + struct landlock_prog_set *new_prog_set = current_prog_set; >>>>>>>> + unsigned long pages; >>>>>>>> + int err; >>>>>>>> + size_t i; >>>>>>>> + struct landlock_prog_set tmp_prog_set = {}; >>>>>>>> + >>>>>>>> + if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK) >>>>>>>> + return ERR_PTR(-EINVAL); >>>>>>>> + >>>>>>>> + /* validate memory size allocation */ >>>>>>>> + pages = prog->pages; >>>>>>>> + if (current_prog_set) { >>>>>>>> + size_t i; >>>>>>>> + >>>>>>>> + for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); >>>>>>>> i++) { >>>>>>>> + struct landlock_prog_list *walker_p; >>>>>>>> + >>>>>>>> + for (walker_p = current_prog_set->programs[i]; >>>>>>>> + walker_p; walker_p = &g
Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
On 04/08/2018 11:06 PM, Andy Lutomirski wrote: > On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün wrote: >> >> On 02/27/2018 10:48 PM, Mickaël Salaün wrote: >>> >>> On 27/02/2018 17:39, Andy Lutomirski wrote: >>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov >>>> wrote: >>>>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote: >>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov >>>>>> wrote: >>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote: >>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov >>>>>>>> wrote: >>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote: >>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock >>>>>>>>>> program >>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for >>>>>>>>>> the >>>>>>>>>> current task and all its future children. A program is immutable and >>>>>>>>>> a >>>>>>>>>> task can only add new restricting programs to itself, forming a list >>>>>>>>>> of >>>>>>>>>> programss. >>>>>>>>>> >>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a >>>>>>>>>> kernel >>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC, >>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind >>>>>>>>>> of >>>>>>>>>> object is triggered. The list of programs for this hook is then >>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the >>>>>>>>>> action >>>>>>>>>> on a kernel object with a non-zero value. If every programs of the >>>>>>>>>> list >>>>>>>>>> return zero, then the action on the object is allowed. >>>>>>>>>> >>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value >>>>>>>>>> for a >>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path). This >>>>>>>>>> chaining is restricted when a process construct this chain by >>>>>>>>>> loading a >>>>>>>>>> program, but additional checks are performed when it requests to >>>>>>>>>> apply >>>>>>>>>> this chain of programs to itself. The restrictions ensure that it is >>>>>>>>>> not possible to call multiple programs in a way that would imply to >>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain. For now, >>>>>>>>>> only a fs_pick program can be chained to the same type of program, >>>>>>>>>> because it may make sense if they have different triggers (cf. next >>>>>>>>>> commits). This restrictions still allows to reuse Landlock programs >>>>>>>>>> in >>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple >>>>>>>>>> chains of fs_pick programs). >>>>>>>>>> >>>>>>>>>> Signed-off-by: Mickaël Salaün >>>>>>>>> >>>>>>>>> ... >>>>>>>>> >>>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog( >>>>>>>>>> + struct landlock_prog_set *current_prog_set, >>>>>>>>>> + struct bpf_prog *prog) >>>>>>>>>> +{ >>>>>>>>>> + struct landlock_prog_set *new_prog_set = current_prog_set; >>>>>>>>>> + unsigned long pages; >>>>>>>>>> + int err; >>>>>>>>>> + size_t i; >>>>>>>>>> + struct landlock_prog_set tmp_prog_set = {}; >>>>>>>>>> + >>>>>>>>&g
Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy
On 04/10/2018 06:48 AM, Alexei Starovoitov wrote: > On Mon, Apr 09, 2018 at 12:01:59AM +0200, Mickaël Salaün wrote: >> >> On 04/08/2018 11:06 PM, Andy Lutomirski wrote: >>> On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün wrote: >>>> >>>> On 02/27/2018 10:48 PM, Mickaël Salaün wrote: >>>>> >>>>> On 27/02/2018 17:39, Andy Lutomirski wrote: >>>>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov >>>>>> wrote: >>>>>>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote: >>>>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov >>>>>>>> wrote: >>>>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote: >>>>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov >>>>>>>>>> wrote: >>>>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote: >>>>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock >>>>>>>>>>>> program >>>>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for >>>>>>>>>>>> the >>>>>>>>>>>> current task and all its future children. A program is immutable >>>>>>>>>>>> and a >>>>>>>>>>>> task can only add new restricting programs to itself, forming a >>>>>>>>>>>> list of >>>>>>>>>>>> programss. >>>>>>>>>>>> >>>>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a >>>>>>>>>>>> kernel >>>>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC, >>>>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this >>>>>>>>>>>> kind of >>>>>>>>>>>> object is triggered. The list of programs for this hook is then >>>>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the >>>>>>>>>>>> action >>>>>>>>>>>> on a kernel object with a non-zero value. If every programs of the >>>>>>>>>>>> list >>>>>>>>>>>> return zero, then the action on the object is allowed. >>>>>>>>>>>> >>>>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value >>>>>>>>>>>> for a >>>>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path). >>>>>>>>>>>> This >>>>>>>>>>>> chaining is restricted when a process construct this chain by >>>>>>>>>>>> loading a >>>>>>>>>>>> program, but additional checks are performed when it requests to >>>>>>>>>>>> apply >>>>>>>>>>>> this chain of programs to itself. The restrictions ensure that it >>>>>>>>>>>> is >>>>>>>>>>>> not possible to call multiple programs in a way that would imply to >>>>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain. For >>>>>>>>>>>> now, >>>>>>>>>>>> only a fs_pick program can be chained to the same type of program, >>>>>>>>>>>> because it may make sense if they have different triggers (cf. next >>>>>>>>>>>> commits). This restrictions still allows to reuse Landlock >>>>>>>>>>>> programs in >>>>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple >>>>>>>>>>>> chains of fs_pick programs). >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Mickaël Salaün >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>
Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing
On 07/03/2018 02:21, Andy Lutomirski wrote: > On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün wrote: >> >> On 06/03/2018 23:46, Tycho Andersen wrote: >>> On Tue, Mar 06, 2018 at 10:33:17PM +, Andy Lutomirski wrote: >>>>>> Suppose I'm writing a container manager. I want to run "mount" in the >>>>>> container, but I don't want to allow moun() in general and I want to >>>>>> emulate certain mount() actions. I can write a filter that catches >>>>>> mount using seccomp and calls out to the container manager for help. >>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be >>>>>> supported. >>>>> >>>>> Well, I think this use case should be handled with something like >>>>> LD_PRELOAD and a helper library. FYI, I did something like this: >>>>> https://github.com/stemjail/stemshim >>>> >>>> I doubt that will work for containers. Containers that use user >>>> namespaces and, for example, setuid programs aren't going to honor >>>> LD_PRELOAD. >>> >>> Or anything that calls syscalls directly, like go programs. >> >> That's why the vDSO-like approach. Enforcing an access control is not >> the issue here, patching a buggy userland (without patching its code) is >> the issue isn't it? >> >> As far as I remember, the main problem is to handle file descriptors >> while "emulating" the kernel behavior. This can be done with a "shim" >> code mapped in every processes. Chrome used something like this (in a >> previous sandbox mechanism) as a kind of emulation (with the current >> seccomp-bpf ). I think it should be doable to replace the (userland) >> emulation code with an IPC wrapper receiving file descriptors through >> UNIX socket. >> > > Can you explain exactly what you mean by "vDSO-like"? > > When a 64-bit program does a syscall, it just executes the SYSCALL > instruction. The vDSO isn't involved at all. 32-bit programs usually > go through the vDSO, but not always. > > It could be possible to force-load a DSO into an entire container and > rig up seccomp to intercept all SYSCALLs not originating from the DSO > such that they merely redirect control to the DSO, but that seems > quite messy. vDSO is a code mapped for all processes. As you said, these processes may use it or not. What I was thinking about is to use the same concept, i.e. map a "shim" code into each processes pertaining to a particular hierarchy (the same way seccomp filters are inherited across processes). With a seccomp filter matching some syscall (e.g. mount, open), it is possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This shim code should then be able to emulate/patch what is needed, even faking a file opening by receiving a file descriptor through a UNIX socket. As did the Chrome sandbox, the seccomp filter may look at the calling address to allow the shim code to call syscalls without being catched, if needed. However, relying on SIGSYS may not fit with arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump to a specific process address, to emulate the syscall in an easier way than only relying on a {c,e}BPF program. signature.asc Description: OpenPGP digital signature
[RFC PATCH v1 2/5] fs: Add a MAY_EXECMOUNT flag to infer the noexec mount propertie
An LSM doesn't get path information related to an access request to open an inode. This new (internal) MAY_EXECMOUNT flag enables an LSM to check if the underlying mount point of an inode is marked as executable. This is useful to implement a security policy taking advantage of the noexec mount option. This flag is set according to path_noexec(), which checks if a mount point is mounted with MNT_NOEXEC or if the underlying superblock is SB_I_NOEXEC. Signed-off-by: Mickaël Salaün Reviewed-by: Philippe Trébuchet Reviewed-by: Thibaut Sautereau Cc: Al Viro Cc: Kees Cook Cc: Mickaël Salaün --- fs/namei.c | 2 ++ include/linux/fs.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/namei.c b/fs/namei.c index 0cab6494978c..de4f33b3f464 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2970,6 +2970,8 @@ static int may_open(const struct path *path, int acc_mode, int flag) break; } + /* Pass the mount point executability. */ + acc_mode |= path_noexec(path) ? 0 : MAY_EXECMOUNT; error = inode_permission(inode, MAY_OPEN | acc_mode); if (error) return error; diff --git a/include/linux/fs.h b/include/linux/fs.h index 584c9329ad78..083a31b8068e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -96,6 +96,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, #define MAY_NOT_BLOCK 0x0080 /* the inode is opened with O_MAYEXEC */ #define MAY_OPENEXEC 0x0100 +/* the mount point is marked as executable */ +#define MAY_EXECMOUNT 0x0200 /* * flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond -- 2.20.0.rc2
[RFC PATCH v1 5/5] doc: Add documentation for Yama's open_mayexec_enforce
Signed-off-by: Mickaël Salaün Reviewed-by: Philippe Trébuchet Reviewed-by: Thibaut Sautereau Cc: Jonathan Corbet Cc: Kees Cook Cc: Mickaël Salaün --- Documentation/admin-guide/LSM/Yama.rst | 41 ++ 1 file changed, 41 insertions(+) diff --git a/Documentation/admin-guide/LSM/Yama.rst b/Documentation/admin-guide/LSM/Yama.rst index d0a060de3973..a72c86a24b35 100644 --- a/Documentation/admin-guide/LSM/Yama.rst +++ b/Documentation/admin-guide/LSM/Yama.rst @@ -72,3 +72,44 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are: ``PTRACE_TRACEME``. Once set, this sysctl value cannot be changed. The original children-only logic was based on the restrictions in grsecurity. + +open_mayexec_enforce + + +The ``O_MAYEXEC`` flag can be passed to :manpage:`open(2)` to only open files +(or directories) that are executable. If the file is not identified as +executable, then the syscall returns -EACCES. This may allow a script +interpreter to check executable permission before reading commands from a file. +One interesting use case is to enforce a "write xor execute" policy through +interpreters. + +Thanks to this flag, Yama enables to enforce the ``noexec`` mount option (i.e. +the underlying mount point of the file is mounted with MNT_NOEXEC or its +underlying superblock is SB_I_NOEXEC) not only on ELF binaries but also on +scripts. This may be possible thanks to script interpreters using the +``O_MAYEXEC`` flag. The executable permission is then checked before reading +commands from a file, and thus can enforce the ``noexec`` at the interpreter +level by propagating this security policy to the scripts. To be fully +effective, these interpreters also need to handle the other ways to execute +code (for which the kernel can't help): command line parameters (e.g., option +``-e`` for Perl), module loading (e.g., option ``-m`` for Python), stdin, file +sourcing, environment variables, configuration files... According to the +threat model, it may be acceptable to allow some script interpreters (e.g. +Bash) to interpret commands from stdin, may it be a TTY or a pipe, because it +may not be enough to (directly) perform syscalls. + +Yama implements two complementary security policies to propagate the ``noexec`` +mount option or the executable file permission. These policies are handled by +the ``kernel.yama.open_mayexec_enforce`` sysctl (writable only with +``CAP_MAC_ADMIN``) as a bitmask: + +1 - mount restriction: +check that the mount options for the underlying VFS mount do not prevent +execution. + +2 - file permission restriction: +check that the to-be-opened file is marked as executable for the current +process (e.g., POSIX permissions). + +Code samples can be found in tools/testing/selftests/yama/test_omayexec.c and +https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC . -- 2.20.0.rc2
[RFC PATCH v1 0/5] Add support for O_MAYEXEC
Hi, The goal of this patch series is to control script interpretation. A new O_MAYEXEC flag used by sys_open() is added to enable userland script interpreter to delegate to the kernel (and thus the system security policy) the permission to interpret scripts or other files containing what can be seen as commands. The security policy is the responsibility of an LSM. A basic system-wide policy is implemented with Yama and configurable through a sysctl. The initial idea come from CLIP OS and the original implementation has been used for more than 10 years: https://github.com/clipos-archive/clipos4_doc An introduction to O_MAYEXEC was given at the Linux Security Summit Europe 2018 - Linux Kernel Security Contributions by ANSSI: https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s The "write xor execute" principle was explained at Kernel Recipes 2018 - CLIP OS: a defense-in-depth OS: https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s This patch series can be applied on top of v4.20-rc6. This can be tested with CONFIG_SECURITY_YAMA. I would really appreciate constructive comments on this RFC. Regards, Mickaël Salaün (5): fs: Add support for an O_MAYEXEC flag on sys_open() fs: Add a MAY_EXECMOUNT flag to infer the noexec mount propertie Yama: Enforces noexec mounts or file executability through O_MAYEXEC selftest/yama: Add tests for O_MAYEXEC enforcing doc: Add documentation for Yama's open_mayexec_enforce Documentation/admin-guide/LSM/Yama.rst | 41 +++ MAINTAINERS | 1 + fs/fcntl.c | 2 +- fs/namei.c | 2 + fs/open.c| 4 + include/linux/fcntl.h| 2 +- include/linux/fs.h | 4 + include/uapi/asm-generic/fcntl.h | 3 + security/yama/Kconfig| 3 +- security/yama/yama_lsm.c | 82 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/yama/.gitignore | 1 + tools/testing/selftests/yama/Makefile| 19 ++ tools/testing/selftests/yama/config | 2 + tools/testing/selftests/yama/test_omayexec.c | 276 +++ 15 files changed, 439 insertions(+), 4 deletions(-) create mode 100644 tools/testing/selftests/yama/.gitignore create mode 100644 tools/testing/selftests/yama/Makefile create mode 100644 tools/testing/selftests/yama/config create mode 100644 tools/testing/selftests/yama/test_omayexec.c -- 2.20.0.rc2
[RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
When the O_MAYEXEC flag is passed, sys_open() may be subject to additional restrictions depending on a security policy implemented by an LSM through the inode_permission hook. The underlying idea is to be able to restrict scripts interpretation according to a policy defined by the system administrator. For this to be possible, script interpreters must use the O_MAYEXEC flag appropriately. To be fully effective, these interpreters also need to handle the other ways to execute code (for which the kernel can't help): command line parameters (e.g., option -e for Perl), module loading (e.g., option -m for Python), stdin, file sourcing, environment variables, configuration files... According to the threat model, it may be acceptable to allow some script interpreters (e.g. Bash) to interpret commands from stdin, may it be a TTY or a pipe, because it may not be enough to (directly) perform syscalls. A simple security policy implementation is available in a following patch for Yama. This is an updated subset of the patch initially written by Vincent Strubel for CLIP OS: https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch This patch has been used for more than 10 years with customized script interpreters. Some examples can be found here: https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC Signed-off-by: Mickaël Salaün Signed-off-by: Thibaut Sautereau Signed-off-by: Vincent Strubel Reviewed-by: Philippe Trébuchet Cc: Al Viro Cc: Kees Cook Cc: Mickaël Salaün --- fs/fcntl.c | 2 +- fs/open.c| 4 include/linux/fcntl.h| 2 +- include/linux/fs.h | 2 ++ include/uapi/asm-generic/fcntl.h | 3 +++ 5 files changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 083185174c6d..6c85c4d0c006 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -1031,7 +1031,7 @@ static int __init fcntl_init(void) * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY * is defined as O_NONBLOCK on some platforms and not on others. */ - BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ != + BUILD_BUG_ON(22 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32( (VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) | __FMODE_EXEC | __FMODE_NONOTIFY)); diff --git a/fs/open.c b/fs/open.c index 0285ce7dbd51..75479b79a58f 100644 --- a/fs/open.c +++ b/fs/open.c @@ -974,6 +974,10 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o if (flags & O_APPEND) acc_mode |= MAY_APPEND; + /* Check execution permissions on open. */ + if (flags & O_MAYEXEC) + acc_mode |= MAY_OPENEXEC; + op->acc_mode = acc_mode; op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN; diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index 27dc7a60693e..1fc00cabe9ab 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -9,7 +9,7 @@ (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \ O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ -O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) +O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_MAYEXEC) #ifndef force_o_largefile #define force_o_largefile() (BITS_PER_LONG != 32) diff --git a/include/linux/fs.h b/include/linux/fs.h index c95c0807471f..584c9329ad78 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -94,6 +94,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, #define MAY_CHDIR 0x0040 /* called from RCU mode, don't block */ #define MAY_NOT_BLOCK 0x0080 +/* the inode is opened with O_MAYEXEC */ +#define MAY_OPENEXEC 0x0100 /* * flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 9dc0bf0c5a6e..cbb9425d6e7c 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -97,6 +97,9 @@ #define O_NDELAY O_NONBLOCK #endif +/* command execution from file is intended, check exec permissions */ +#define O_MAYEXEC 04000 + #define F_DUPFD0 /* dup */ #define F_GETFD1 /* get close_on_exec */ #define F_SETFD2 /* set/clear close_on_exec */ -- 2.20.0.rc2
[RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC
Enable to either propagate the mount options from the underlying VFS mount to prevent execution, or to propagate the file execute permission. This may allow a script interpreter to check execution permissions before reading commands from a file. The main goal is to be able to protect the kernel by restricting arbitrary syscalls that an attacker could perform with a crafted binary or certain script languages. It also improves multilevel isolation by reducing the ability of an attacker to use side channels with specific code. These restrictions can natively be enforced for ELF binaries (with the noexec mount option) but require this kernel extension to properly handle scripts (e.g., Python, Perl). Add a new sysctl kernel.yama.open_mayexec_enforce to control this behavior. A following patch adds documentation. Signed-off-by: Mickaël Salaün Reviewed-by: Philippe Trébuchet Reviewed-by: Thibaut Sautereau Cc: Kees Cook Cc: Mickaël Salaün --- security/yama/Kconfig| 3 +- security/yama/yama_lsm.c | 82 +++- 2 files changed, 83 insertions(+), 2 deletions(-) diff --git a/security/yama/Kconfig b/security/yama/Kconfig index 96b27405558a..9457619fabd5 100644 --- a/security/yama/Kconfig +++ b/security/yama/Kconfig @@ -5,7 +5,8 @@ config SECURITY_YAMA help This selects Yama, which extends DAC support with additional system-wide security settings beyond regular Linux discretionary - access controls. Currently available is ptrace scope restriction. + access controls. Currently available are ptrace scope restriction and + enforcement of the O_MAYEXEC open flag. Like capabilities, this security module stacks with other LSMs. Further information can be found in Documentation/admin-guide/LSM/Yama.rst. diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c index ffda91a4a1aa..120664e94ee5 100644 --- a/security/yama/yama_lsm.c +++ b/security/yama/yama_lsm.c @@ -1,10 +1,12 @@ /* * Yama Linux Security Module * - * Author: Kees Cook + * Authors: Kees Cook + * Mickaël Salaün * * Copyright (C) 2010 Canonical, Ltd. * Copyright (C) 2011 The Chromium OS Authors. + * Copyright (C) 2018 ANSSI * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2, as @@ -28,7 +30,14 @@ #define YAMA_SCOPE_CAPABILITY 2 #define YAMA_SCOPE_NO_ATTACH 3 +#define YAMA_OMAYEXEC_ENFORCE_NONE 0 +#define YAMA_OMAYEXEC_ENFORCE_MOUNT(1 << 0) +#define YAMA_OMAYEXEC_ENFORCE_FILE (1 << 1) +#define _YAMA_OMAYEXEC_LASTYAMA_OMAYEXEC_ENFORCE_FILE +#define _YAMA_OMAYEXEC_MASK((_YAMA_OMAYEXEC_LAST << 1) - 1) + static int ptrace_scope = YAMA_SCOPE_RELATIONAL; +static int open_mayexec_enforce = YAMA_OMAYEXEC_ENFORCE_NONE; /* describe a ptrace relationship for potential exception */ struct ptrace_relation { @@ -423,7 +432,40 @@ int yama_ptrace_traceme(struct task_struct *parent) return rc; } +/** + * yama_inode_permission - check O_MAYEXEC permission before accessing an inode + * @inode: inode structure to check + * @mask: permission mask + * + * Return 0 if access is permitted, -EACCES otherwise. + */ +int yama_inode_permission(struct inode *inode, int mask) +{ + if (!(mask & MAY_OPENEXEC)) + return 0; + /* +* Match regular files and directories to make it easier to +* modify script interpreters. +*/ + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + return 0; + + if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) && + !(mask & MAY_EXECMOUNT)) + return -EACCES; + + /* +* May prefer acl_permission_check() instead of generic_permission(), +* to not be bypassable with CAP_DAC_READ_SEARCH. +*/ + if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE) + return generic_permission(inode, MAY_EXEC); + + return 0; +} + static struct security_hook_list yama_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(inode_permission, yama_inode_permission), LSM_HOOK_INIT(ptrace_access_check, yama_ptrace_access_check), LSM_HOOK_INIT(ptrace_traceme, yama_ptrace_traceme), LSM_HOOK_INIT(task_prctl, yama_task_prctl), @@ -447,6 +489,37 @@ static int yama_dointvec_minmax(struct ctl_table *table, int write, return proc_dointvec_minmax(&table_copy, write, buffer, lenp, ppos); } +static int yama_dointvec_bitmask_macadmin(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + int error; + + if (write) { + struct ctl_table table_copy; + int tmp_m
[RFC PATCH v1 4/5] selftest/yama: Add tests for O_MAYEXEC enforcing
Test propagation of noexec mount points or file executability through files open with or without O_MAYEXEC. Signed-off-by: Mickaël Salaün Cc: Kees Cook Cc: Mickaël Salaün Cc: Shuah Khan --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/yama/.gitignore | 1 + tools/testing/selftests/yama/Makefile| 19 ++ tools/testing/selftests/yama/config | 2 + tools/testing/selftests/yama/test_omayexec.c | 276 +++ 6 files changed, 300 insertions(+) create mode 100644 tools/testing/selftests/yama/.gitignore create mode 100644 tools/testing/selftests/yama/Makefile create mode 100644 tools/testing/selftests/yama/config create mode 100644 tools/testing/selftests/yama/test_omayexec.c diff --git a/MAINTAINERS b/MAINTAINERS index 8119141a926f..a1d01a81b283 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16591,6 +16591,7 @@ M: Kees Cook T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git yama/tip S: Supported F: security/yama/ +F: tools/testing/selftests/yama/ F: Documentation/admin-guide/LSM/Yama.rst YEALINK PHONE DRIVER diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index f0017c831e57..608f31167aa6 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -46,6 +46,7 @@ endif TARGETS += user TARGETS += vm TARGETS += x86 +TARGETS += yama TARGETS += zram #Please keep the TARGETS list alphabetically sorted # Run "make quicktest=1 run_tests" or diff --git a/tools/testing/selftests/yama/.gitignore b/tools/testing/selftests/yama/.gitignore new file mode 100644 index ..6e8d5cfb48d0 --- /dev/null +++ b/tools/testing/selftests/yama/.gitignore @@ -0,0 +1 @@ +/test_omayexec diff --git a/tools/testing/selftests/yama/Makefile b/tools/testing/selftests/yama/Makefile new file mode 100644 index ..d411f1615b60 --- /dev/null +++ b/tools/testing/selftests/yama/Makefile @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 + +all: + +include ../lib.mk + +.PHONY: all clean + +BINARIES := test_omayexec +CFLAGS += -Wl,-no-as-needed -Wall -Werror +LDFLAGS += -lcap + +test_omayexec: test_omayexec.c ../kselftest_harness.h + $(CC) $(CFLAGS) $(LDFLAGS) $< -o $@ + +TEST_PROGS += $(BINARIES) +EXTRA_CLEAN := $(BINARIES) + +all: $(BINARIES) diff --git a/tools/testing/selftests/yama/config b/tools/testing/selftests/yama/config new file mode 100644 index ..9d375bfc465b --- /dev/null +++ b/tools/testing/selftests/yama/config @@ -0,0 +1,2 @@ +CONFIG_SECURITY=y +CONFIG_SECURITY_YAMA=y diff --git a/tools/testing/selftests/yama/test_omayexec.c b/tools/testing/selftests/yama/test_omayexec.c new file mode 100644 index ..7d41097f0e89 --- /dev/null +++ b/tools/testing/selftests/yama/test_omayexec.c @@ -0,0 +1,276 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Yama tests - O_MAYEXEC + * + * Copyright © 2018 ANSSI + * + * Author: Mickaël Salaün + */ + +#include +#include /* O_CLOEXEC */ +#include +#include +#include /* strlen */ +#include +#include +#include /* mkdir */ +#include /* unlink, rmdir */ + +#include "../kselftest_harness.h" + +#ifndef O_MAYEXEC +#define O_MAYEXEC 04000 +#endif + +#define SYSCTL_MAYEXEC "/proc/sys/kernel/yama/open_mayexec_enforce" + +#define BIN_DIR"./test-mount" +#define BIN_PATH BIN_DIR "/file" +#define DIR_PATH BIN_DIR "/directory" + +#define ALLOWED1 +#define DENIED 0 + +static void test_omx(struct __test_metadata *_metadata, + const char *const path, const int exec_allowed) +{ + int fd; + + /* without O_MAYEXEC */ + fd = open(path, O_RDONLY | O_CLOEXEC); + ASSERT_NE(-1, fd); + EXPECT_FALSE(close(fd)); + + /* with O_MAYEXEC */ + fd = open(path, O_RDONLY | O_CLOEXEC | O_MAYEXEC); + if (exec_allowed) { + /* open should succeed */ + ASSERT_NE(-1, fd); + EXPECT_FALSE(close(fd)); + } else { + /* open should return EACCES */ + ASSERT_EQ(-1, fd); + ASSERT_EQ(EACCES, errno); + } +} + +static void ignore_dac(struct __test_metadata *_metadata, int override) +{ + cap_t caps; + const cap_value_t cap_val[2] = { + CAP_DAC_OVERRIDE, + CAP_DAC_READ_SEARCH, + }; + + caps = cap_get_proc(); + ASSERT_TRUE(!!caps); + ASSERT_FALSE(cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_val, + override ? CAP_SET : CAP_CLEAR)); + ASSERT_FALSE(cap_set_proc(caps)); + EXPECT_FALSE(cap_free(caps)); +} + +static void test_dir_file(struct __test_metadata *_metadata, + const char *const dir_path, const char *const file_path, + const int exec_a
Re: [RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC
Le 12/12/2018 à 09:17, Mickaël Salaün a écrit : > Enable to either propagate the mount options from the underlying VFS > mount to prevent execution, or to propagate the file execute permission. > This may allow a script interpreter to check execution permissions > before reading commands from a file. > > The main goal is to be able to protect the kernel by restricting > arbitrary syscalls that an attacker could perform with a crafted binary > or certain script languages. It also improves multilevel isolation > by reducing the ability of an attacker to use side channels with > specific code. These restrictions can natively be enforced for ELF > binaries (with the noexec mount option) but require this kernel > extension to properly handle scripts (e.g., Python, Perl). > > Add a new sysctl kernel.yama.open_mayexec_enforce to control this > behavior. A following patch adds documentation. > > Signed-off-by: Mickaël Salaün > Reviewed-by: Philippe Trébuchet > Reviewed-by: Thibaut Sautereau > Cc: Kees Cook > Cc: Mickaël Salaün > --- > security/yama/Kconfig| 3 +- > security/yama/yama_lsm.c | 82 +++- > 2 files changed, 83 insertions(+), 2 deletions(-) > > diff --git a/security/yama/Kconfig b/security/yama/Kconfig > index 96b27405558a..9457619fabd5 100644 > --- a/security/yama/Kconfig > +++ b/security/yama/Kconfig > @@ -5,7 +5,8 @@ config SECURITY_YAMA > help > This selects Yama, which extends DAC support with additional > system-wide security settings beyond regular Linux discretionary > - access controls. Currently available is ptrace scope restriction. > + access controls. Currently available are ptrace scope restriction and > + enforcement of the O_MAYEXEC open flag. > Like capabilities, this security module stacks with other LSMs. > Further information can be found in > Documentation/admin-guide/LSM/Yama.rst. > diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c > index ffda91a4a1aa..120664e94ee5 100644 > --- a/security/yama/yama_lsm.c > +++ b/security/yama/yama_lsm.c > @@ -1,10 +1,12 @@ > /* > * Yama Linux Security Module > * > - * Author: Kees Cook > + * Authors: Kees Cook > + * Mickaël Salaün > * > * Copyright (C) 2010 Canonical, Ltd. > * Copyright (C) 2011 The Chromium OS Authors. > + * Copyright (C) 2018 ANSSI > * > * This program is free software; you can redistribute it and/or modify > * it under the terms of the GNU General Public License version 2, as > @@ -28,7 +30,14 @@ > #define YAMA_SCOPE_CAPABILITY2 > #define YAMA_SCOPE_NO_ATTACH 3 > > +#define YAMA_OMAYEXEC_ENFORCE_NONE 0 > +#define YAMA_OMAYEXEC_ENFORCE_MOUNT (1 << 0) > +#define YAMA_OMAYEXEC_ENFORCE_FILE (1 << 1) > +#define _YAMA_OMAYEXEC_LAST YAMA_OMAYEXEC_ENFORCE_FILE > +#define _YAMA_OMAYEXEC_MASK ((_YAMA_OMAYEXEC_LAST << 1) - 1) > + > static int ptrace_scope = YAMA_SCOPE_RELATIONAL; > +static int open_mayexec_enforce = YAMA_OMAYEXEC_ENFORCE_NONE; > > /* describe a ptrace relationship for potential exception */ > struct ptrace_relation { > @@ -423,7 +432,40 @@ int yama_ptrace_traceme(struct task_struct *parent) > return rc; > } > > +/** > + * yama_inode_permission - check O_MAYEXEC permission before accessing an > inode > + * @inode: inode structure to check > + * @mask: permission mask > + * > + * Return 0 if access is permitted, -EACCES otherwise. > + */ > +int yama_inode_permission(struct inode *inode, int mask) > +{ > + if (!(mask & MAY_OPENEXEC)) > + return 0; > + /* > + * Match regular files and directories to make it easier to > + * modify script interpreters. > + */ > + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) > + return 0; I forgot to mention that these checks do not handle fifos. This is relevant in a threat model targeting persistent attacks (and with additional protections/restrictions). > + > + if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) && > + !(mask & MAY_EXECMOUNT)) > + return -EACCES; > + > + /* > + * May prefer acl_permission_check() instead of generic_permission(), > + * to not be bypassable with CAP_DAC_READ_SEARCH. > + */ > + if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE) > + return generic_permission(inode, MAY_EXEC); > + > + return 0; > +} > + > static struct security_hook_list yama_hooks[] __lsm_ro_after_init = { > + LSM_HOOK_INIT(inode_permission, yama_in
Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC
Le 12/12/2018 à 17:29, Jordan Glover a écrit : > On Wednesday, December 12, 2018 9:17 AM, Mickaël Salaün > wrote: > >> Hi, >> >> The goal of this patch series is to control script interpretation. A >> new O_MAYEXEC flag used by sys_open() is added to enable userland script >> interpreter to delegate to the kernel (and thus the system security >> policy) the permission to interpret scripts or other files containing >> what can be seen as commands. >> >> The security policy is the responsibility of an LSM. A basic >> system-wide policy is implemented with Yama and configurable through a >> sysctl. >> >> The initial idea come from CLIP OS and the original implementation has >> been used for more than 10 years: >> https://github.com/clipos-archive/clipos4_doc >> >> An introduction to O_MAYEXEC was given at the Linux Security Summit >> Europe 2018 - Linux Kernel Security Contributions by ANSSI: >> https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s >> The "write xor execute" principle was explained at Kernel Recipes 2018 - >> CLIP OS: a defense-in-depth OS: >> https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s >> >> This patch series can be applied on top of v4.20-rc6. This can be >> tested with CONFIG_SECURITY_YAMA. I would really appreciate >> constructive comments on this RFC. >> >> Regards, >> > > Are various interpreters upstreams interested in adding support > for O_MAYEXEC if it land in kernel? Did you contacted them about this? I think the first step is to be OK on the kernel side. We will then be able to help upstream interpreters implement this feature. It should be OK because the behavior doesn't change by default, i.e. if the sysadmin doesn't configure (and test) the whole system. Some examples of modified interpreters can be found at https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC . Mickaël
Re: [RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
Le 12/12/2018 à 15:43, Jan Kara a écrit : > On Wed 12-12-18 09:17:08, Mickaël Salaün wrote: >> When the O_MAYEXEC flag is passed, sys_open() may be subject to >> additional restrictions depending on a security policy implemented by an >> LSM through the inode_permission hook. >> >> The underlying idea is to be able to restrict scripts interpretation >> according to a policy defined by the system administrator. For this to >> be possible, script interpreters must use the O_MAYEXEC flag >> appropriately. To be fully effective, these interpreters also need to >> handle the other ways to execute code (for which the kernel can't help): >> command line parameters (e.g., option -e for Perl), module loading >> (e.g., option -m for Python), stdin, file sourcing, environment >> variables, configuration files... According to the threat model, it may >> be acceptable to allow some script interpreters (e.g. Bash) to interpret >> commands from stdin, may it be a TTY or a pipe, because it may not be >> enough to (directly) perform syscalls. >> >> A simple security policy implementation is available in a following >> patch for Yama. >> >> This is an updated subset of the patch initially written by Vincent >> Strubel for CLIP OS: >> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch >> This patch has been used for more than 10 years with customized script >> interpreters. Some examples can be found here: >> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC >> >> Signed-off-by: Mickaël Salaün >> Signed-off-by: Thibaut Sautereau >> Signed-off-by: Vincent Strubel >> Reviewed-by: Philippe Trébuchet >> Cc: Al Viro >> Cc: Kees Cook >> Cc: Mickaël Salaün > > ... > >> diff --git a/fs/open.c b/fs/open.c >> index 0285ce7dbd51..75479b79a58f 100644 >> --- a/fs/open.c >> +++ b/fs/open.c >> @@ -974,6 +974,10 @@ static inline int build_open_flags(int flags, umode_t >> mode, struct open_flags *o >> if (flags & O_APPEND) >> acc_mode |= MAY_APPEND; >> >> +/* Check execution permissions on open. */ >> +if (flags & O_MAYEXEC) >> +acc_mode |= MAY_OPENEXEC; >> + >> op->acc_mode = acc_mode; >> >> op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN; > > I don't feel experienced enough in security to tell whether we want this > functionality or not. But if we do this, shouldn't we also set FMODE_EXEC > on the resulting struct file? That way also security_file_open() can be > used to arbitrate such executable opens and in particular > fanotify permission event FAN_OPEN_EXEC will get properly generated which I > guess is desirable (support for it is sitting in my tree waiting for the > merge window) - adding some audit people involved in FAN_OPEN_EXEC to > CC. Just an idea... Indeed, it may be useful for other LSM. Mickaël
Re: [RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()
On 13/12/2018 10:47, Matthew Bobrowski wrote: > On Wed, Dec 12, 2018 at 03:43:06PM +0100, Jan Kara wrote: >>> When the O_MAYEXEC flag is passed, sys_open() may be subject to >>> additional restrictions depending on a security policy implemented by an >>> LSM through the inode_permission hook. >>> >>> The underlying idea is to be able to restrict scripts interpretation >>> according to a policy defined by the system administrator. For this to >>> be possible, script interpreters must use the O_MAYEXEC flag >>> appropriately. To be fully effective, these interpreters also need to >>> handle the other ways to execute code (for which the kernel can't help): >>> command line parameters (e.g., option -e for Perl), module loading >>> (e.g., option -m for Python), stdin, file sourcing, environment >>> variables, configuration files... According to the threat model, it may >>> be acceptable to allow some script interpreters (e.g. Bash) to interpret >>> commands from stdin, may it be a TTY or a pipe, because it may not be >>> enough to (directly) perform syscalls. >>> >>> A simple security policy implementation is available in a following >>> patch for Yama. >>> >>> This is an updated subset of the patch initially written by Vincent >>> Strubel for CLIP OS: >>> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch >>> This patch has been used for more than 10 years with customized script >>> interpreters. Some examples can be found here: >>> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC >>> >>> Signed-off-by: Mickaël Salaün >>> Signed-off-by: Thibaut Sautereau >>> Signed-off-by: Vincent Strubel >>> Reviewed-by: Philippe Trébuchet >>> Cc: Al Viro >>> Cc: Kees Cook >>> Cc: Mickaël Salaün >> >> ... >> >>> diff --git a/fs/open.c b/fs/open.c >>> index 0285ce7dbd51..75479b79a58f 100644 >>> --- a/fs/open.c >>> +++ b/fs/open.c >>> @@ -974,6 +974,10 @@ static inline int build_open_flags(int flags, umode_t >>> mode, struct open_flags *o >>> if (flags & O_APPEND) >>> acc_mode |= MAY_APPEND; >>> >>> + /* Check execution permissions on open. */ >>> + if (flags & O_MAYEXEC) >>> + acc_mode |= MAY_OPENEXEC; >>> + >>> op->acc_mode = acc_mode; >>> >>> op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN; >> >> I don't feel experienced enough in security to tell whether we want this >> functionality or not. But if we do this, shouldn't we also set FMODE_EXEC >> on the resulting struct file? That way also security_file_open() can be >> used to arbitrate such executable opens and in particular >> fanotify permission event FAN_OPEN_EXEC will get properly generated which I >> guess is desirable (support for it is sitting in my tree waiting for the >> merge window) - adding some audit people involved in FAN_OPEN_EXEC to >> CC. Just an idea... > > If I'm understanding this patch series correctly, without an enforced LSM > policy there's realistically no added benefit from a security perspective, > right? That's correct. The kernel knows the semantic but the enforcement is delegated to an LSM and its policy. > Also, I'm in agreement with what Jan has mentioned in regards to setting > the __FMODE_EXEC flag when O_MAYEXEC has been specified. This is something > that > would work quite nicely in conjunction with some of the new file access > notification events. OK, I will add it in the next patch series (for the new FAN_OPEN_EXEC support).
Re: [RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC
On 12/12/2018 18:09, Jann Horn wrote: > On Wed, Dec 12, 2018 at 9:18 AM Mickaël Salaün wrote: >> Enable to either propagate the mount options from the underlying VFS >> mount to prevent execution, or to propagate the file execute permission. >> This may allow a script interpreter to check execution permissions >> before reading commands from a file. >> >> The main goal is to be able to protect the kernel by restricting >> arbitrary syscalls that an attacker could perform with a crafted binary >> or certain script languages. It also improves multilevel isolation >> by reducing the ability of an attacker to use side channels with >> specific code. These restrictions can natively be enforced for ELF >> binaries (with the noexec mount option) but require this kernel >> extension to properly handle scripts (e.g., Python, Perl). >> >> Add a new sysctl kernel.yama.open_mayexec_enforce to control this >> behavior. A following patch adds documentation. >> >> Signed-off-by: Mickaël Salaün >> Reviewed-by: Philippe Trébuchet >> Reviewed-by: Thibaut Sautereau >> Cc: Kees Cook >> Cc: Mickaël Salaün >> --- > [...] >> +/** >> + * yama_inode_permission - check O_MAYEXEC permission before accessing an >> inode >> + * @inode: inode structure to check >> + * @mask: permission mask >> + * >> + * Return 0 if access is permitted, -EACCES otherwise. >> + */ >> +int yama_inode_permission(struct inode *inode, int mask) > > This should be static, no? Right, it will be in the next series. The previous function (yama_ptrace_traceme) is not static though. > >> +{ >> + if (!(mask & MAY_OPENEXEC)) >> + return 0; >> + /* >> +* Match regular files and directories to make it easier to >> +* modify script interpreters. >> +*/ >> + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) >> + return 0; > > So files are subject to checks, but loading code from things like > sockets is always fine? As I said in a previous email, these checks do not handle fifo either. This is relevant in a threat model targeting persistent attacks (and with additional protections/restrictions). We may want to only whitelist fifo, but I don't get how a socket is relevant here. Can you please clarify? > >> + if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) && >> + !(mask & MAY_EXECMOUNT)) >> + return -EACCES; >> + >> + /* >> +* May prefer acl_permission_check() instead of generic_permission(), >> +* to not be bypassable with CAP_DAC_READ_SEARCH. >> +*/ >> + if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE) >> + return generic_permission(inode, MAY_EXEC); >> + >> + return 0; >> +} >> + >> static struct security_hook_list yama_hooks[] __lsm_ro_after_init = { >> + LSM_HOOK_INIT(inode_permission, yama_inode_permission), >> LSM_HOOK_INIT(ptrace_access_check, yama_ptrace_access_check), >> LSM_HOOK_INIT(ptrace_traceme, yama_ptrace_traceme), >> LSM_HOOK_INIT(task_prctl, yama_task_prctl), >> @@ -447,6 +489,37 @@ static int yama_dointvec_minmax(struct ctl_table >> *table, int write, >> return proc_dointvec_minmax(&table_copy, write, buffer, lenp, ppos); >> } >> >> +static int yama_dointvec_bitmask_macadmin(struct ctl_table *table, int >> write, >> + void __user *buffer, size_t *lenp, >> + loff_t *ppos) >> +{ >> + int error; >> + >> + if (write) { >> + struct ctl_table table_copy; >> + int tmp_mayexec_enforce; >> + >> + if (!capable(CAP_MAC_ADMIN)) >> + return -EPERM; > > Don't put capable() checks in sysctls, it doesn't work. > I tested it and the root user can indeed open the file even if the process doesn't have CAP_MAC_ADMIN, however writing in the sysctl file is denied. Btw there is a similar check in the previous function (yama_dointvec_minmax). Thanks
Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC
On 13/12/2018 06:13, Florian Weimer wrote: > * James Morris: > >> On Wed, 12 Dec 2018, Florian Weimer wrote: >> >>> * James Morris: >>> If you're depending on the script interpreter to flag that the user may execute code, this seems to be equivalent in security terms to depending on the user. e.g. what if the user uses ptrace and clears O_MAYEXEC? This security mechanism makes sense in an hardened system where the user is not allowed to import and execute new file (write xor execute policy). This can be enforced with appropriate mount points a more advanced access control policy. >>> >>> The argument I've heard is this: Using ptrace (and adding the +x >>> attribute) are auditable events. >> >> I guess you could also preload a modified libc which strips the flag. > > My understanding is that this new libc would have to come somewhere, and > making it executable would be an auditable even as well. Auditing is a possible use case as well, but the W^X idea is to deny use of libraries which are not in an executable mount point, i.e. only execute trusted code.
Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC
On 13/12/2018 04:02, Matthew Wilcox wrote: > On Wed, Dec 12, 2018 at 09:17:07AM +0100, Mickaël Salaün wrote: >> The goal of this patch series is to control script interpretation. A >> new O_MAYEXEC flag used by sys_open() is added to enable userland script >> interpreter to delegate to the kernel (and thus the system security >> policy) the permission to interpret scripts or other files containing >> what can be seen as commands. > > I don't have a problem with the concept, but we're running low on O_ bits. > Does this have to be done before the process gets a file descriptor, > or could we have a new syscall? Since we're going to be changing the > interpreters anyway, it doesn't seem like too much of an imposition to > ask them to use: > > int verify_for_exec(int fd) > > instead of adding an O_MAYEXEC. > Adding a new syscall for this simple use case seems excessive. I think that the open/openat syscall familly are the right place to do an atomic open and permission check, the same way the kernel does for other file access. Moreover, it will be easier to patch upstream interpreters without the burden of handling a (new) syscall that may not exist on the running system, whereas unknown open flags are ignored.
Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC
On 13/12/2018 18:13, Matthew Wilcox wrote: > On Thu, Dec 13, 2018 at 04:17:29PM +0100, Mickaël Salaün wrote: >> On 13/12/2018 04:02, Matthew Wilcox wrote: >>> On Wed, Dec 12, 2018 at 09:17:07AM +0100, Mickaël Salaün wrote: >>>> The goal of this patch series is to control script interpretation. A >>>> new O_MAYEXEC flag used by sys_open() is added to enable userland script >>>> interpreter to delegate to the kernel (and thus the system security >>>> policy) the permission to interpret scripts or other files containing >>>> what can be seen as commands. >>> >>> I don't have a problem with the concept, but we're running low on O_ bits. >>> Does this have to be done before the process gets a file descriptor, >>> or could we have a new syscall? Since we're going to be changing the >>> interpreters anyway, it doesn't seem like too much of an imposition to >>> ask them to use: >>> >>> int verify_for_exec(int fd) >>> >>> instead of adding an O_MAYEXEC. >> >> Adding a new syscall for this simple use case seems excessive. I think > > We have somewhat less than 400 syscalls today. We have 20 O_ bits defined. > Obviously there's a lower practical limit on syscalls, but in principle > we could have up to 2^32 syscalls, and there are only 12 O_ bits remaining. > >> that the open/openat syscall familly are the right place to do an atomic >> open and permission check, the same way the kernel does for other file >> access. Moreover, it will be easier to patch upstream interpreters >> without the burden of handling a (new) syscall that may not exist on the >> running system, whereas unknown open flags are ignored. > > Ah, but that's the problem. The interpreter can see an -ENOSYS response > and handle it appropriately. If the flag is silently ignored, the > interpreter has no idea whether it can do a racy check or whether to > skip even trying to do the check. Right, but the interpreter should interpret the script if the open with O_MAYEXEC succeed (but not otherwise): it may be because the flag is known by the kernel and the system policy allow this call, or because the (old) kernel doesn't known about this flag (which is fine and needed for backward compatibility). The script interpretation must not failed if the kernel doesn't support O_MAYEXEC, it is then useless for the interpreter to do any additional check.
Re: [RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC
On 03/01/2019 12:17, Jann Horn wrote: > On Thu, Dec 13, 2018 at 3:49 PM Mickaël Salaün > wrote: >> On 12/12/2018 18:09, Jann Horn wrote: >>> On Wed, Dec 12, 2018 at 9:18 AM Mickaël Salaün wrote: >>>> Enable to either propagate the mount options from the underlying VFS >>>> mount to prevent execution, or to propagate the file execute permission. >>>> This may allow a script interpreter to check execution permissions >>>> before reading commands from a file. >>>> >>>> The main goal is to be able to protect the kernel by restricting >>>> arbitrary syscalls that an attacker could perform with a crafted binary >>>> or certain script languages. It also improves multilevel isolation >>>> by reducing the ability of an attacker to use side channels with >>>> specific code. These restrictions can natively be enforced for ELF >>>> binaries (with the noexec mount option) but require this kernel >>>> extension to properly handle scripts (e.g., Python, Perl). >>>> >>>> Add a new sysctl kernel.yama.open_mayexec_enforce to control this >>>> behavior. A following patch adds documentation. > [...] >>>> +{ >>>> + if (!(mask & MAY_OPENEXEC)) >>>> + return 0; >>>> + /* >>>> +* Match regular files and directories to make it easier to >>>> +* modify script interpreters. >>>> +*/ >>>> + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) >>>> + return 0; >>> >>> So files are subject to checks, but loading code from things like >>> sockets is always fine? >> >> As I said in a previous email, these checks do not handle fifo either. >> This is relevant in a threat model targeting persistent attacks (and >> with additional protections/restrictions). We may want to only whitelist >> fifo, but I don't get how a socket is relevant here. Can you please clarify? > > I don't think that there's a security problem here. I just think it's > weird to have the extra check when it seems to me like it isn't really > necessary - nobody is going to want to execute a socket or fifo > anyway, right? Right, the fifo whitelisting should answer your concern then. > >>> >>>> + if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) && >>>> + !(mask & MAY_EXECMOUNT)) >>>> + return -EACCES; >>>> + >>>> + /* >>>> +* May prefer acl_permission_check() instead of >>>> generic_permission(), >>>> +* to not be bypassable with CAP_DAC_READ_SEARCH. >>>> +*/ >>>> + if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE) >>>> + return generic_permission(inode, MAY_EXEC); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static struct security_hook_list yama_hooks[] __lsm_ro_after_init = { >>>> + LSM_HOOK_INIT(inode_permission, yama_inode_permission), >>>> LSM_HOOK_INIT(ptrace_access_check, yama_ptrace_access_check), >>>> LSM_HOOK_INIT(ptrace_traceme, yama_ptrace_traceme), >>>> LSM_HOOK_INIT(task_prctl, yama_task_prctl), >>>> @@ -447,6 +489,37 @@ static int yama_dointvec_minmax(struct ctl_table >>>> *table, int write, >>>> return proc_dointvec_minmax(&table_copy, write, buffer, lenp, >>>> ppos); >>>> } >>>> >>>> +static int yama_dointvec_bitmask_macadmin(struct ctl_table *table, int >>>> write, >>>> + void __user *buffer, size_t >>>> *lenp, >>>> + loff_t *ppos) >>>> +{ >>>> + int error; >>>> + >>>> + if (write) { >>>> + struct ctl_table table_copy; >>>> + int tmp_mayexec_enforce; >>>> + >>>> + if (!capable(CAP_MAC_ADMIN)) >>>> + return -EPERM; >>> >>> Don't put capable() checks in sysctls, it doesn't work. >>> >> >> I tested it and the root user can indeed open the file even if the >> process doesn't have CAP_MAC_ADMIN, however writing in the sysctl file >> is denied. Btw there is a similar check in the previous function >> (yama_dointvec_minmax). > > It's still wrong. If an attacker without CAP_MAC_ADMIN opens the > sysctl file, then passes the file descriptor to a setcap binary that > has CAP_MAC_ADMIN as stdout/stderr, and the setcap binary writes to > it, then the capable() check is bypassed. (But of course, to open the > sysctl file in the first place, you'd need to be root (uid 0), so the > check doesn't really matter.) I agree with you that a confused deputy attack may uses file descriptors, but I don't see how the current sysctl API may be used to check the process capability at open time. Anyway, on a properly configured system, especially one leveraging Linux capabilities (e.g. CLIP OS), root processes may not have CAP_SYS_ADMIN. Moreover, SUID or fcap binaries may not be available to an attacker (e.g. in a container).
Re: [PATCH 16/18] LSM: Allow arbitrary LSM ordering
On 9/18/18 00:36, John Johansen wrote: > On 09/17/2018 02:57 PM, Casey Schaufler wrote: >> On 9/17/2018 12:55 PM, John Johansen wrote: >>> On 09/17/2018 12:23 PM, Casey Schaufler wrote: On 9/17/2018 11:14 AM, Kees Cook wrote: >> Keep security=$lsm with the existing exclusive behavior. >> Add lsm=$lsm1,...,$lsmN which requires a full list of modules >> >> If you want to be fancy (I don't!) you could add >> >> lsm.add=$lsm1,...,$lsmN which adds the modules to the stack >> lsm.delete=$lsm1,...,$lsmN which deletes modules from the stack > We've got two issues: ordering and enablement. It's been strongly > suggested that we should move away from per-LSM enable/disable flags > (to which I agree). I also agree. There are way too many ways to turn off some LSMs. >>> I wont disagree, but its largely because we didn't have this discussion >>> when we should have. >> >> True that. >> >> > If ordering should be separate from enablement (to > avoid the "booted kernel with new LSM built in, but my lsm="..." line > didn't include it so it's disabled case), then I think we need to > split the logic (otherwise we just reinvented "security=" with similar > problems). We could reduce the problem by declaring that LSM ordering is not something you can specify on the boot line. I can see value in specifying it when you build the kernel, but your circumstances would have to be pretty strange to change it at boot time. >>> if there is LSM ordering the getting >>> >>> lsm=B,A,C >>> >>> is not the behavior I would expect from specifying >>> >>> lsm=A,B,C >> >> Right. You'd expect that they'd be used in the order specified. >> > > and yet you argue for something different ;) > > Should "lsm=" allow arbitrary ordering? (I think yes.) I say no. Assume you can specify it at build time. When would you want to change the order? Why would you? >>> because maybe you care about the denial message from one LSM more than >>> you do from another. Since stacking is bail on first fail the order >>> could be important from an auditing POV >> >> I understand that a distribution would want to specify the order >> for support purposes and that a developer would want to specify >> the order to ensure reproducible behavior. But they are going to >> be controlling their kernel builds. I'm not suggesting that the >> order shouldn't be capable of build time specification. What I >> don't see is a reason to rearrange it at boot time. >> > > Because not all users have the same priority as the distro. It can > also aid in debugging and testing of LSMs in a stacked situation. > >>> Auditing is why apparmor's internal stacking is not bail on first >>> fail. >> >> Within a security module I get that. But we've already got the >> priority wrong for audit in general, because you only get to the >> LSM if the traditional code approves. Every guidance I ever got > > true > >> said you should do the MAC checks first, because you're much more >> concerned about getting audit records about MAC failures than DAC. >> > > yep, wouldn't that be nice to have > > Should "lsm=" imply implicit enable/disable? (I think no: unlisted > LSMs are implicitly auto-appended to the explicit list) If you want to add something that isn't there instead of making it explicit you want "lsm.enable=" not "lsm=". > So then we could have "lsm.enable=..." and "lsm.disable=...". > > If builtin list was: > capability,yama,loadpin,integrity,{selinux,smack,tomoyo,apparmor} > then: > > lsm.disable=loadpin lsm=smack Methinks this should be lsm.disable=loadpin lsm.enable=smack >>> that would only work if order is not important >> >> It works unless you want to change the order at boot, and >> I still don't see a use case for that. > > see above > >> > becomes > > capability,smack,yama,integrity > > and > > CONFIG_SECURITY_LOADPIN_DEFAULT_ENABLED=n > selinux.enable=0 lsm.add=loadpin lsm.disable=smack,tomoyo > lsm=integrity Do you mean selinux.enable=0 lsm.enable=loadpin lsm.disable=smack,tomoyo lsm.enable=integrity selinux.enable=0 lsm.enable=loadpin,integrity lsm.disable=smack,tomoyo selinux.enable=0 lsm.enable=loadpin lsm.enable=integrity lsm.disable=smack lsm.disable=tomoyo > becomes > > capability,integrity,yama,loadpin,apparmor > > > If "lsm=" _does_ imply enablement, then how does it interact with > per-LSM disabling? i.e. what does "apparmor.enabled=0 > lsm=yama,apparmor" mean? If it means "turn on apparmor" how do I turn > on a CONFIG-default-off LSM without specifying all the other LSMs too? There should either be one option "lsm=", which is an explicit list or two, "lsm.enable=" and "lsm.disable", which modify the built in default. >>> maybe but th
Re: [PATCH 16/18] LSM: Allow arbitrary LSM ordering
On 9/18/18 01:30, Casey Schaufler wrote: > On 9/17/2018 4:20 PM, Kees Cook wrote: >> On Mon, Sep 17, 2018 at 4:10 PM, Mickaël Salaün wrote: >>> Landlock, because it target unprivileged users, should only be called >>> after all other major (access-control) LSMs. The admin or distro must >>> not be able to change that order in any way. This constraint doesn't >>> apply to current LSMs, though. > > What harm would it cause for Landlock to get called before SELinux? > I certainly see why it seems like it ought to get called after, but > would it really make a difference? If an unprivileged process is able to infer some properties of a file being requested (thanks to one of its eBPF program doing checks on this process accesses), whereas this file access would be denied by a privileged LSM, then there is a side channel attack allowing this process to indirectly get information otherwise inaccessible. In other words, an unprivileged process should not be allowed to sneak itself (via an eBPF program) before SELinux for instance. SELinux should be able to block such information gathering the same way it can block a fstat(2) requested by a process. signature.asc Description: OpenPGP digital signature
Re: WARNING in current_check_refer_path
Hello, Thanks for the report. Could you please provide a reproducer? Regards, Mickaël On Sun, Apr 28, 2024 at 10:47:02AM +0800, Ubisectech Sirius wrote: > Hello. > We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. > Recently, our team has discovered a issue in Linux kernel 6.7. Attached to > the email were a PoC file of the issue. > > Stack dump: > > loop3: detected capacity change from 0 to 1024 > [ cut here ] > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access > security/landlock/fs.c:598 [inline] > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access > security/landlock/fs.c:578 [inline] > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 > current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758 > Modules linked in: > CPU: 0 PID: 30368 Comm: syz-executor.3 Not tainted 6.7.0 #2 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > 04/01/2014 > RIP: 0010:get_mode_access security/landlock/fs.c:598 [inline] > RIP: 0010:get_mode_access security/landlock/fs.c:578 [inline] > RIP: 0010:current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758 > Code: e9 76 fb ff ff 41 bc fe ff ff ff e9 6b fb ff ff e8 00 99 77 fd 90 0f 0b > 90 41 bc f3 ff ff ff e9 57 fb ff ff e8 ec 98 77 fd 90 <0f> 0b 90 31 db e9 86 > f9 ff ff bb 00 08 00 00 e9 7c f9 ff ff 41 ba > RSP: 0018:c90001fb7ba0 EFLAGS: 00010212 > RAX: 0bc5 RBX: 88805feeb7b0 RCX: c90006e15000 > RDX: 0004 RSI: 84125d64 RDI: 0003 > RBP: 8880123c5608 R08: 0003 R09: c000 > R10: f000 R11: R12: 88805d32fc00 > R13: 8880123c5608 R14: R15: 0001 > FS: 7fd70c4d8640() GS:88802c60() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 001b2c136000 CR3: 5b2a CR4: 00750ef0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > PKRU: 5554 > Call Trace: > > security_path_rename+0x124/0x230 security/security.c:1828 > do_renameat2+0x9f6/0xd30 fs/namei.c:4983 > __do_sys_rename fs/namei.c:5042 [inline] > __se_sys_rename fs/namei.c:5040 [inline] > __x64_sys_rename+0x81/0xa0 fs/namei.c:5040 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x6f/0x77 > RIP: 0033:0x7fd70b6900ed > Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7fd70c4d8028 EFLAGS: 0246 ORIG_RAX: 0052 > RAX: ffda RBX: 7fd70b7cbf80 RCX: 7fd70b6900ed > RDX: RSI: 2140 RDI: 2100 > RBP: 7fd70b6f14a6 R08: R09: > R10: R11: 0246 R12: > R13: 000b R14: 7fd70b7cbf80 R15: 7fd70c4b8000 > > > Thank you for taking the time to read this email and we look forward to > working with you further. > > > > > > > > >
Re: 回复:WARNING in current_check_refer_path
On Mon, Apr 29, 2024 at 05:16:57PM +0800, Ubisectech Sirius wrote: > > Hello, > > > Thanks for the report. Could you please provide a reproducer? > > > Regards, > > Mickaël > > Hi. > The Poc file has seed to you as attachment. Indeed, but could you please trim down the file. There are 650 lines, most of them are irrelevant. > > > On Sun, Apr 28, 2024 at 10:47:02AM +0800, Ubisectech Sirius wrote: > >> Hello. > >> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. > >> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to > >> the email were a PoC file of the issue. > >> > >> Stack dump: > >> > > > loop3: detected capacity change from 0 to 1024 > > > [ cut here ] > > > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access > > > security/landlock/fs.c:598 [inline] > > > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access > > > security/landlock/fs.c:578 [inline] > > > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 > > > current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758 > > > Modules linked in: > > > CPU: 0 PID: 30368 Comm: syz-executor.3 Not tainted 6.7.0 #2 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > > > 04/01/2014 > > > RIP: 0010:get_mode_access security/landlock/fs.c:598 [inline] > > > RIP: 0010:get_mode_access security/landlock/fs.c:578 [inline] > > > RIP: 0010:current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758 > > > Code: e9 76 fb ff ff 41 bc fe ff ff ff e9 6b fb ff ff e8 00 99 77 fd 90 > > > 0f 0b 90 41 bc f3 ff ff ff e9 57 fb ff ff e8 ec 98 77 fd 90 <0f> 0b 90 31 > > > db e9 86 f9 ff ff bb 00 08 00 00 e9 7c f9 ff ff 41 ba > > > RSP: 0018:c90001fb7ba0 EFLAGS: 00010212 > > > RAX: 0bc5 RBX: 88805feeb7b0 RCX: c90006e15000 > > > RDX: 0004 RSI: 84125d64 RDI: 0003 > > > RBP: 8880123c5608 R08: 0003 R09: c000 > > > R10: f000 R11: R12: 88805d32fc00 > > > R13: 8880123c5608 R14: R15: 0001 > > > FS: 7fd70c4d8640() GS:88802c60() > > > knlGS: > > > CS: 0010 DS: ES: CR0: 80050033 > > > CR2: 001b2c136000 CR3: 5b2a CR4: 00750ef0 > > > DR0: DR1: DR2: > > > DR3: DR6: fffe0ff0 DR7: 0400 > > > PKRU: 5554 > > > Call Trace: > > > > > > security_path_rename+0x124/0x230 security/security.c:1828 > > > do_renameat2+0x9f6/0xd30 fs/namei.c:4983 > > > __do_sys_rename fs/namei.c:5042 [inline] > > > __se_sys_rename fs/namei.c:5040 [inline] > > > __x64_sys_rename+0x81/0xa0 fs/namei.c:5040 > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > > do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83 > > > entry_SYSCALL_64_after_hwframe+0x6f/0x77 > > > RIP: 0033:0x7fd70b6900ed > > > Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 > > > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 > > > ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 > > > RSP: 002b:7fd70c4d8028 EFLAGS: 0246 ORIG_RAX: 0052 > > > RAX: ffda RBX: 7fd70b7cbf80 RCX: 7fd70b6900ed > >> RDX: RSI: 2140 RDI: 2100 > > > RBP: 7fd70b6f14a6 R08: R09: > > > R10: R11: 0246 R12: > > > R13: 000b R14: 7fd70b7cbf80 R15: 7fd70c4b8000 > > > > > > > > > Thank you for taking the time to read this email and we look forward to > > > working with you further. > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [PATCH] certs: Restrict blacklist updates to the secondary trusted keyring
On Mon, Sep 11, 2023 at 09:29:07AM -0400, Mimi Zohar wrote: > Hi Eric, > > On Fri, 2023-09-08 at 17:34 -0400, Eric Snowberg wrote: > > Currently root can dynamically update the blacklist keyring if the hash > > being added is signed and vouched for by the builtin trusted keyring. > > Currently keys in the secondary trusted keyring can not be used. > > > > Keys within the secondary trusted keyring carry the same capabilities as > > the builtin trusted keyring. Relax the current restriction for updating > > the .blacklist keyring and allow the secondary to also be referenced as > > a trust source. Since the machine keyring is linked to the secondary > > trusted keyring, any key within it may also be used. > > > > An example use case for this is IMA appraisal. Now that IMA both > > references the blacklist keyring and allows the machine owner to add > > custom IMA CA certs via the machine keyring, this adds the additional > > capability for the machine owner to also do revocations on a running > > system. > > > > IMA appraisal usage example to add a revocation for /usr/foo: > > > > sha256sum /bin/foo | awk '{printf "bin:" $1}' > hash.txt > > > > openssl smime -sign -in hash.txt -inkey machine-private-key.pem \ > >-signer machine-certificate.pem -noattr -binary -outform DER \ > >-out hash.p7s > > > > keyctl padd blacklist "$(< hash.txt)" %:.blacklist < hash.p7s > > > > Signed-off-by: Eric Snowberg > > The secondary keyring may include both CA and code signing keys. With > this change any key loaded onto the secondary keyring may blacklist a > hash. Wouldn't it make more sense to limit blacklisting > certificates/hashes to at least CA keys? Some operational constraints may limit what a CA can sign. This change is critical and should be tied to a dedicated kernel config (disabled by default), otherwise existing systems using this feature will have their threat model automatically changed without notice. > > > --- > > certs/Kconfig | 2 +- > > certs/blacklist.c | 4 ++-- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/certs/Kconfig b/certs/Kconfig > > index 1f109b070877..23dc87c52aff 100644 > > --- a/certs/Kconfig > > +++ b/certs/Kconfig > > @@ -134,7 +134,7 @@ config SYSTEM_BLACKLIST_AUTH_UPDATE > > depends on SYSTEM_DATA_VERIFICATION > > help > > If set, provide the ability to load new blacklist keys at run time if > > - they are signed and vouched by a certificate from the builtin trusted > > + they are signed and vouched by a certificate from the secondary > > trusted > > If CONFIG_SECONDARY_TRUSTED_KEYRING is not enabled, it falls back to > the builtin keyring. Please update the comment accordingly. > > > keyring. The PKCS#7 signature of the description is set in the key > > payload. Blacklist keys cannot be removed. > > > > diff --git a/certs/blacklist.c b/certs/blacklist.c > > index 675dd7a8f07a..0b346048ae2d 100644 > > --- a/certs/blacklist.c > > +++ b/certs/blacklist.c > > @@ -102,12 +102,12 @@ static int blacklist_key_instantiate(struct key *key, > > > > #ifdef CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE > > /* > > -* Verifies the description's PKCS#7 signature against the builtin > > +* Verifies the description's PKCS#7 signature against the secondary > > * trusted keyring. > > */ > > And similarly here ... > > > err = verify_pkcs7_signature(key->description, > > strlen(key->description), prep->data, prep->datalen, > > - NULL, VERIFYING_UNSPECIFIED_SIGNATURE, NULL, NULL); > > + VERIFY_USE_SECONDARY_KEYRING, > > VERIFYING_UNSPECIFIED_SIGNATURE, NULL, NULL); > > if (err) > > return err; > > #else > > -- > thanks, > > Mimi >
[PATCH v1 1/3] kconfig: Remove duplicate call to sym_get_string_value()
From: Mickaël Salaün Use the saved returned value of sym_get_string_value() instead of calling it twice. Cc: Masahiro Yamada Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20210215122513.1773897-2-...@digikod.net --- scripts/kconfig/conf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kconfig/conf.c b/scripts/kconfig/conf.c index db03e2f45de4..18a233d27a8d 100644 --- a/scripts/kconfig/conf.c +++ b/scripts/kconfig/conf.c @@ -137,7 +137,7 @@ static int conf_string(struct menu *menu) printf("%*s%s ", indent - 1, "", menu->prompt->text); printf("(%s) ", sym->name); def = sym_get_string_value(sym); - if (sym_get_string_value(sym)) + if (def) printf("[%s] ", def); if (!conf_askvalue(sym, def)) return 0; -- 2.30.0