from:"Mickaël Salaün"

[PATCH v1] cgroup,bpf: Add access check for cgroup_get_from_fd()

2016-09-19 Thread Mickaël Salaün

Add security access check for cgroup backed FD. The "cgroup.procs" file
of the corresponding cgroup should be readable to identify the cgroup,
and writable to prove that the current process can manage this cgroup
(e.g. through delegation). This is similar to the check done by
cgroup_procs_write_permission().

Fixes: 4ed8ec521ed5 ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY")
Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Martin KaFai Lau 
Cc: Tejun Heo 
---
 include/linux/cgroup.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/syscall.c   |  1 +
 kernel/cgroup.c| 34 +++---
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c4688742ddc4..5767d471e292 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -87,7 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct 
dentry *dentry,
   struct cgroup_subsys 
*ss);
 
 struct cgroup *cgroup_get_from_path(const char *path);
-struct cgroup *cgroup_get_from_fd(int fd);
+struct cgroup *cgroup_get_from_fd(int fd, int access_mask);
 
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
 int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index a2ac051c342f..3d97c70134a0 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -543,7 +543,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
 struct file *map_file /* not used */,
 int fd)
 {
-   return cgroup_get_from_fd(fd);
+   return cgroup_get_from_fd(fd, MAY_READ);
 }
 
 static void cgroup_fd_array_put_ptr(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 228f962447a5..cc7270eadcf7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(int, bpf_prog_active);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b0d727d26fc7..e02e0a531be9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6236,34 +6236,54 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
 /**
  * cgroup_get_from_fd - get a cgroup pointer from a fd
  * @fd: fd obtained by open(cgroup2_dir)
+ * @access_mask: contains the permission mask
  *
  * Find the cgroup from a fd which should be obtained
  * by opening a cgroup directory.  Returns a pointer to the
  * cgroup on success. ERR_PTR is returned if the cgroup
- * cannot be found.
+ * cannot be found or its access is denied.
  */
-struct cgroup *cgroup_get_from_fd(int fd)
+struct cgroup *cgroup_get_from_fd(int fd, int access_mask)
 {
struct cgroup_subsys_state *css;
struct cgroup *cgrp;
struct file *f;
+   struct inode *inode;
+   int ret;
 
f = fget_raw(fd);
if (!f)
return ERR_PTR(-EBADF);
 
css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
-   fput(f);
-   if (IS_ERR(css))
-   return ERR_CAST(css);
+   if (IS_ERR(css)) {
+   ret = PTR_ERR(css);
+   goto put_f;
+   }
 
cgrp = css->cgroup;
if (!cgroup_on_dfl(cgrp)) {
-   cgroup_put(cgrp);
-   return ERR_PTR(-EBADF);
+   ret = -EBADF;
+   goto put_cgrp;
+   }
+
+   ret = -ENOMEM;
+   inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn);
+   if (inode) {
+   ret = inode_permission(inode, access_mask);
+   iput(inode);
}
+   if (ret)
+   goto put_cgrp;
 
+   fput(f);
return cgrp;
+
+put_cgrp:
+   cgroup_put(cgrp);
+put_f:
+   fput(f);
+   return ERR_PTR(ret);
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
 
-- 
2.9.3

Re: [PATCH v1] cgroup,bpf: Add access check for cgroup_get_from_fd()

2016-09-20 Thread Mickaël Salaün

On 20/09/2016 02:30, Alexei Starovoitov wrote:
> On Tue, Sep 20, 2016 at 12:49:13AM +0200, Mickaël Salaün wrote:
>> Add security access check for cgroup backed FD. The "cgroup.procs" file
>> of the corresponding cgroup should be readable to identify the cgroup,
>> and writable to prove that the current process can manage this cgroup
>> (e.g. through delegation). This is similar to the check done by
>> cgroup_procs_write_permission().
>>
>> Fixes: 4ed8ec521ed5 ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY")
> 
> I don't understand what 'fixes' is about.
> Looks like new feature or tightening?
> Since cgroup was opened by the process and it got an fd,
> it had an access, so extra check here looks unnecessary.

It may not be a "fix", but this patch tighten the access control. The
current cgroup_get_from_fd() only rely on the access check done on the
passed FD. However, this FD come from a cgroup directory, not a
"cgroup.procs" (in this directory). The "cgroup.procs" is used for
cgroup delegation by cgroup_procs_write_permission(). Checking
"cgroup.procs" is then more consistent with access checks done by other
part of the cgroup code. Being able to open a cgroup directory only
means that the current process is able to list the cgroup hierarchy, not
necessarily to list the tasks in this cgroups.

A BPF_MAP_TYPE_CGROUP_ARRAY should then only contains cgroups readable
by the process that filled the map. It is currently possible to call
bpf_skb_in_cgroup() and know if a packet come from a task in a cgroup,
whereas the loading process may not be able to list this tasks.

Write access to a cgroup directory means to be able to create
sub-cgroups, not to add or remove tasks from that cgroup. This will be
important for future use like the Daniel Mack's patch (attach an eBPF
program to a cgroup). Indeed, with the current code, a process with
CAP_NET_ADMIN (but without the right to manage a cgroup) would be able
to attach programs to a cgroup. Similar thing goes for Landlock.

> 
>> -struct cgroup *cgroup_get_from_fd(int fd)
>> +struct cgroup *cgroup_get_from_fd(int fd, int access_mask)
>>  {
>>  struct cgroup_subsys_state *css;
>>  struct cgroup *cgrp;
>>  struct file *f;
>> +struct inode *inode;
>> +int ret;
>>  
>>  f = fget_raw(fd);
>>  if (!f)
>>  return ERR_PTR(-EBADF);
>>  
>>  css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
>> -fput(f);
> 
> why move it down?

Because it is used by kernfs_get_inode().

> 
>> -if (IS_ERR(css))
>> -return ERR_CAST(css);
>> +if (IS_ERR(css)) {
>> +ret = PTR_ERR(css);
>> +goto put_f;
>> +}
>>  
>>  cgrp = css->cgroup;
>>  if (!cgroup_on_dfl(cgrp)) {
>> -cgroup_put(cgrp);
>> -return ERR_PTR(-EBADF);
>> +ret = -EBADF;
>> +goto put_cgrp;
>> +}
>> +
>> +ret = -ENOMEM;
>> +inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn);
>> +if (inode) {
>> +ret = inode_permission(inode, access_mask);
>> +iput(inode);
>>  }
>> +if (ret)
>> +goto put_cgrp;
>>  
>> +fput(f);
>>  return cgrp;
>> +
>> +put_cgrp:
>> +cgroup_put(cgrp);
>> +put_f:
>> +fput(f);
>> +return ERR_PTR(ret);
>>  }
>>  EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
>>  
>> -- 
>> 2.9.3
>>
> 

signature.asc
Description: OpenPGP digital signature

Re: lsm naming dilemma. Re: [RFC v3 07/22] landlock: Handle file comparisons

2016-09-20 Thread Mickaël Salaün


On 20/09/2016 03:10, Sargun Dhillon wrote:
> I'm fine giving up the Checmate name. Landlock seems easy enough to
> Google. I haven't gotten a chance to look through the entire patchset
> yet, but it does seem like they are somewhat similar.

Excellent! I'm looking forward for your review.


> 
> On Mon, Sep 19, 2016 at 5:12 PM, Alexei Starovoitov
>  wrote:
>> On Thu, Sep 15, 2016 at 11:25:10PM +0200, Mickaël Salaün wrote:
>>>>> Agreed. With this RFC, the Checmate features (i.e. network helpers)
>>>>> should be able to sit on top of Landlock.
>>>>
>>>> I think neither of them should be called fancy names for no technical 
>>>> reason.
>>>> We will have only one bpf based lsm. That's it and it doesn't
>>>> need an obscure name. Directory name can be security/bpf/..stuff.c
>>>
>>> I disagree on an LSM named "BPF". I first started with the "seccomp LSM"
>>> name (first RFC) but I later realized that it is confusing because
>>> seccomp is associated to its syscall and the underlying features. Same
>>> thing goes for BPF. It is also artificially hard to grep on a name too
>>> used in the kernel source tree.
>>> Making an association between the generic eBPF mechanism and a security
>>> centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover,
>>> the seccomp interface [1] can still be used.
>>
>> agree with above.
>>
>>> Landlock is a nice name to depict a sandbox as an enclave (i.e. a
>>> landlocked country/state). I want to keep this name, which is simple,
>>> express the goal of Landlock nicely and is comparable to other sandbox
>>> mechanisms as Seatbelt or Pledge.
>>> Landlock should not be confused with the underlying eBPF implementation.
>>> Landlock could use more than only eBPF in the future and eBPF could be
>>> used in other LSM as well.
>>
>> there will not be two bpf based LSMs.
>> Therefore unless you can convince Sargun to give up his 'checmate' name,
>> nothing goes in.
>> The features you both need are 90% the same, so they must be done
>> as part of single LSM whatever you both agree to call it.
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks

2016-09-20 Thread Mickaël Salaün


On 20/09/2016 06:37, Sargun Dhillon wrote:
> On Thu, Sep 15, 2016 at 09:41:33PM +0200, Mickaël Salaün wrote:
>>
>> On 15/09/2016 06:48, Alexei Starovoitov wrote:
>>> On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
>>>> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
>>>>  wrote:
>>>>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
>>>>>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
>>>>>>  wrote:
>>>>>>> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>>>>>>>>>>>
>>>>>>>>>>> This RFC handle both cgroup and seccomp approaches in a similar 
>>>>>>>>>>> way. I
>>>>>>>>>>> don't see why building on top of cgroup v2 is a problem. Is there
>>>>>>>>>>> security issues with delegation?
>>>>>>>>>>
>>>>>>>>>> What I mean is: cgroup v2 delegation has a functionality problem.
>>>>>>>>>> Tejun says [1]:
>>>>>>>>>>
>>>>>>>>>> We haven't had to face this decision because cgroup has never 
>>>>>>>>>> properly
>>>>>>>>>> supported delegating to applications and the in-use setups where this
>>>>>>>>>> happens are custom configurations where there is no boundary between
>>>>>>>>>> system and applications and adhoc trial-and-error is good enough a 
>>>>>>>>>> way
>>>>>>>>>> to find a working solution.  That wiggle room goes away once we
>>>>>>>>>> officially open this up to individual applications.
>>>>>>>>>>
>>>>>>>>>> Unless and until that changes, I think that landlock should stay away
>>>>>>>>>> from cgroups.  Others could reasonably disagree with me.
>>>>>>>>>
>>>>>>>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>>>>>>>>> and not for sandboxing. So the above doesn't matter in such contexts.
>>>>>>>>> lsm hooks + cgroups provide convenient scope and existing entry 
>>>>>>>>> points.
>>>>>>>>> Please see checmate examples how it's used.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To be clear: I'm not arguing at all that there shouldn't be
>>>>>>>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>>>>>>>> landlock interface shouldn't expose any cgroup integration, at least
>>>>>>>> until the cgroup situation settles down a lot.
>>>>>>>
>>>>>>> ahh. yes. we're perfectly in agreement here.
>>>>>>> I'm suggesting that the next RFC shouldn't include unpriv
>>>>>>> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
>>>>>>> argue about unpriv with cgroups and even unpriv as a whole,
>>>>>>> since it's not a given. Seccomp integration is also questionable.
>>>>>>> I'd rather not have seccomp as a gate keeper for this lsm.
>>>>>>> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
>>>>>>> don't have one to one relationship, so mixing them up is only
>>>>>>> asking for trouble further down the road.
>>>>>>> If we really need to carry some information from seccomp to lsm+bpf,
>>>>>>> it's easier to add eBPF support to seccomp and let bpf side deal
>>>>>>> with passing whatever information.
>>>>>>>
>>>>>>
>>>>>> As an argument for keeping seccomp (or an extended seccomp) as the
>>>>>> interface for an unprivileged bpf+lsm: seccomp already checks off most
>>>>>> of the boxes for safely letting unprivileged programs sandbox
>>>>>> themselves.
>>>>>
>>>>> you mean the attach part of seccomp syscall that deals with no_new_priv?
>>>>> sure, that's reusable.
>>>>>
>>>>>> Furthermore, to the extent that there are use cases for
>>>>>> unprivileged bpf+ls

Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-09-20 Thread Mickaël Salaün


On 15/09/2016 11:19, Pavel Machek wrote:
> Hi!
> 
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
>> The first RFC [1] was focused on extending seccomp while staying at the 
>> syscall
>> level. This brought a working PoC but with some (mitigated) ToCToU race
>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>> syscall argument evaluation (hence the LSM hooks).
> 
> Long and nice description follows. Should it go to Documentation/
> somewhere?
> 
> Because some documentation would be useful...
>   Pavel

Right, but I was looking for feedback before investing in documentation. :)


> 
>>  include/linux/bpf.h   |  41 +
>>  include/linux/lsm_hooks.h |   5 +
>>  include/linux/seccomp.h   |  54 ++-
>>  include/uapi/asm-generic/errno-base.h |   1 +
>>  include/uapi/linux/bpf.h  | 103 
>>  include/uapi/linux/seccomp.h  |   2 +
>>  kernel/bpf/arraymap.c | 222 +
>>  kernel/bpf/syscall.c  |  18 ++-
>>  kernel/bpf/verifier.c |  32 +++-
>>  kernel/fork.c |  41 -
>>  kernel/seccomp.c  | 211 +++-
>>  samples/Makefile  |   2 +-
>>  samples/landlock/.gitignore   |   1 +
>>  samples/landlock/Makefile |  16 ++
>>  samples/landlock/sandbox.c| 295 
>> ++
>>  security/Kconfig  |   1 +
>>  security/Makefile |   2 +
>>  security/landlock/Kconfig |  19 +++
>>  security/landlock/Makefile|   3 +
>>  security/landlock/checker_cgroup.c|  96 +++
>>  security/landlock/checker_cgroup.h|  18 +++
>>  security/landlock/checker_fs.c| 183 +
>>  security/landlock/checker_fs.h|  20 +++
>>  security/landlock/lsm.c   | 228 ++
>>  security/security.c   |   1 +
>>  25 files changed, 1592 insertions(+), 23 deletions(-)
>>  create mode 100644 samples/landlock/.gitignore
>>  create mode 100644 samples/landlock/Makefile
>>  create mode 100644 samples/landlock/sandbox.c
>>  create mode 100644 security/landlock/Kconfig
>>  create mode 100644 security/landlock/Makefile
>>  create mode 100644 security/landlock/checker_cgroup.c
>>  create mode 100644 security/landlock/checker_cgroup.h
>>  create mode 100644 security/landlock/checker_fs.c
>>  create mode 100644 security/landlock/checker_fs.h
>>  create mode 100644 security/landlock/lsm.c
>>
> 



signature.asc
Description: OpenPGP digital signature

[PATCH v1] seccomp: Fix documentation

2016-09-20 Thread Mickaël Salaün

Fix struct seccomp_filter and seccomp_run_filters() signatures.

Signed-off-by: Mickaël Salaün 
Cc: Andy Lutomirski 
Cc: James Morris 
Cc: Kees Cook 
Cc: Will Drewry 
---
 kernel/seccomp.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0db7c8a2afe2..494cba230ca0 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -41,8 +41,7 @@
  * outside of a lifetime-guarded section.  In general, this
  * is only needed for handling filters shared across tasks.
  * @prev: points to a previously installed, or inherited, filter
- * @len: the number of instructions in the program
- * @insnsi: the BPF program instructions to evaluate
+ * @prog: the BPF program to evaluate
  *
  * seccomp_filter objects are organized in a tree linked via the @prev
  * pointer.  For any task, it appears to be a singly-linked list starting
@@ -168,8 +167,8 @@ static int seccomp_check_filter(struct sock_filter *filter, 
unsigned int flen)
 }
 
 /**
- * seccomp_run_filters - evaluates all seccomp filters against @syscall
- * @syscall: number of the current system call
+ * seccomp_run_filters - evaluates all seccomp filters against @sd
+ * @sd: optional seccomp data to be passed to filters
  *
  * Returns valid seccomp BPF response codes.
  */
-- 
2.9.3

[PATCH v1] bpf: Set register type according to is_valid_access()

2016-09-22 Thread Mickaël Salaün

This fix a pointer leak when an unprivileged eBPF program read a pointer
value from the context. Even if is_valid_access() returns a pointer
type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
value containing an address is then allowed to leak. Moreover, this
prevented unprivileged eBPF programs to use functions with (legitimate)
pointer arguments.

This bug is not an issue for now because the only unprivileged eBPF
program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
from its context are UNKNOWN_VALUE. However, this fix is important for
future unprivileged eBPF program types which could use pointers in their
context.

Signed-off-by: Mickaël Salaün 
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Kees Cook 
Acked-by: Sargun Dhillon 
---
 kernel/bpf/verifier.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index daea765d72e6..0698ccd67715 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
}
err = check_ctx_access(env, off, size, t, ®_type);
if (!err && t == BPF_READ && value_regno >= 0) {
-   mark_reg_unknown_value(state->regs, value_regno);
-   if (env->allow_ptr_leaks)
-   /* note that reg.[id|off|range] == 0 */
-   state->regs[value_regno].type = reg_type;
+   /* note that reg.[id|off|range] == 0 */
+   state->regs[value_regno].type = reg_type;
}
 
} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3

Re: [PATCH v1] bpf: Set register type according to is_valid_access()

2016-09-22 Thread Mickaël Salaün


On 22/09/2016 21:41, Daniel Borkmann wrote:
> On 09/22/2016 08:35 PM, Mickaël Salaün wrote:
>> This fix a pointer leak when an unprivileged eBPF program read a pointer
>> value from the context. Even if is_valid_access() returns a pointer
>> type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
>> value containing an address is then allowed to leak. Moreover, this
>> prevented unprivileged eBPF programs to use functions with (legitimate)
>> pointer arguments.
>>
>> This bug is not an issue for now because the only unprivileged eBPF
>> program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
>> from its context are UNKNOWN_VALUE. However, this fix is important for
>> future unprivileged eBPF program types which could use pointers in their
>> context.
>>
>> Signed-off-by: Mickaël Salaün 
>> Fixes: 969bf05eb3ce ("bpf: direct packet access")
>> Cc: Alexei Starovoitov 
>> Cc: Andy Lutomirski 
>> Cc: Daniel Borkmann 
>> Cc: Kees Cook 
>> Acked-by: Sargun Dhillon 
>> ---
>>   kernel/bpf/verifier.c | 6 ++
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index daea765d72e6..0698ccd67715 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env
>> *env, u32 regno, int off,
>>   }
>>   err = check_ctx_access(env, off, size, t, ®_type);
>>   if (!err && t == BPF_READ && value_regno >= 0) {
>> -mark_reg_unknown_value(state->regs, value_regno);
>> -if (env->allow_ptr_leaks)
>> -/* note that reg.[id|off|range] == 0 */
>> -state->regs[value_regno].type = reg_type;
>> +/* note that reg.[id|off|range] == 0 */
>> +state->regs[value_regno].type = reg_type;
> 
> True that it's not an issue currently, since reg_type is only set for
> PTR_TO_PACKET/PTR_TO_PACKET_END in xdp and tc programs that can only be
> loaded as privileged. So not an issue for BPF_PROG_TYPE_SOCKET_FILTER.
> 
> One thing I don't quite follow is why you remove the
> mark_reg_unknown_value()
> as this also clears imm? I think this could result in an actual verifier
> bug when it would reuse previous tracked imm value of that dst register?

Good catch, I missed the imm initialization. I'm going to send a new patch.

> 
>>   }
>>
>>   } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
>>
> 



signature.asc
Description: OpenPGP digital signature

[PATCH v2] bpf: Set register type according to is_valid_access()

2016-09-22 Thread Mickaël Salaün

This fix a pointer leak when an unprivileged eBPF program read a pointer
value from the context. Even if is_valid_access() returns a pointer
type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
value containing an address is then allowed to leak. Moreover, this
prevented unprivileged eBPF programs to use functions with (legitimate)
pointer arguments.

This bug is not an issue for now because the only unprivileged eBPF
program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
from its context are UNKNOWN_VALUE. However, this fix is important for
future unprivileged eBPF program types which could use pointers in their
context.

Signed-off-by: Mickaël Salaün 
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Kees Cook 
Acked-by: Sargun Dhillon 
---
 kernel/bpf/verifier.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index daea765d72e6..adbc7c161ba5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -795,9 +795,8 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_ctx_access(env, off, size, t, ®_type);
if (!err && t == BPF_READ && value_regno >= 0) {
mark_reg_unknown_value(state->regs, value_regno);
-   if (env->allow_ptr_leaks)
-   /* note that reg.[id|off|range] == 0 */
-   state->regs[value_regno].type = reg_type;
+   /* note that reg.[id|off|range] == 0 */
+   state->regs[value_regno].type = reg_type;
}
 
} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3

[PATCH v3] bpf: Set register type according to is_valid_access()

2016-09-24 Thread Mickaël Salaün

This prevent future potential pointer leaks when an unprivileged eBPF
program will read a pointer value from its context. Even if
is_valid_access() returns a pointer type, the eBPF verifier replace it
with UNKNOWN_VALUE. The register value that contains a kernel address is
then allowed to leak. Moreover, this fix allows unprivileged eBPF
programs to use functions with (legitimate) pointer arguments.

Not an issue currently since reg_type is only set for PTR_TO_PACKET or
PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as
privileged. For now, the only unprivileged eBPF program allowed is for
socket filtering and all the types from its context are UNKNOWN_VALUE.
However, this fix is important for future unprivileged eBPF programs
which could use pointers in their context.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index daea765d72e6..adbc7c161ba5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -795,9 +795,8 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_ctx_access(env, off, size, t, ®_type);
if (!err && t == BPF_READ && value_regno >= 0) {
mark_reg_unknown_value(state->regs, value_regno);
-   if (env->allow_ptr_leaks)
-   /* note that reg.[id|off|range] == 0 */
-   state->regs[value_regno].type = reg_type;
+   /* note that reg.[id|off|range] == 0 */
+   state->regs[value_regno].type = reg_type;
}
 
} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3

Re: [PATCH v2 0/3] Fix seccomp for UM

2016-09-06 Thread Mickaël Salaün

Hi,

It seems that some of the fixes from linux-security have landed in the
Linus' tree but some seccomp fixes are still missing. They fix bugs
introduced in Linux v4.8 and are still present in v4.8-rc5. Could you
please push this series before the final 4.8 release?

Regards,
 Mickaël

On 09/08/2016 02:35, James Morris wrote:
> On Mon, 1 Aug 2016, Mickaël Salaün wrote:
> 
>> Hi,
>>
>> This series fix the recent seccomp update for the User-mode Linux 
>> architecture
>> (32-bit and 64-bit) since commit 26703c636c1f ("um/ptrace: run seccomp after
>> ptrace") which close the hole where ptrace can change a syscall out from 
>> under
>> seccomp.
>>
>> Changes since v1:
>> * fix commit message typo [2/3]
>> * add Kees Cook's Acked-by
>> * rebased on commit 7616ac70d1bb ("apparmor: fix 
>> SECURITY_APPARMOR_HASH_DEFAULT
>>   parameter handling")
> 
> All applied to
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next
> 
> 
> 



signature.asc
Description: OpenPGP digital signature

[RFC v3 09/22] seccomp: Move struct seccomp_filter in seccomp.h

2016-09-14 Thread Mickaël Salaün

Set struct seccomp_filter public because of the next use of
the new field thread_prev added for Landlock LSM.

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
---
 include/linux/seccomp.h | 27 ++-
 kernel/seccomp.c| 26 --
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index ecc296c137cd..a0459a7315ce 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,7 +10,32 @@
 #include 
 #include 
 
-struct seccomp_filter;
+/**
+ * struct seccomp_filter - container for seccomp BPF programs
+ *
+ * @usage: reference count to manage the object lifetime.
+ * get/put helpers should be used when accessing an instance
+ * outside of a lifetime-guarded section.  In general, this
+ * is only needed for handling filters shared across tasks.
+ * @prev: points to a previously installed, or inherited, filter
+ * @prog: the BPF program to evaluate
+ *
+ * seccomp_filter objects are organized in a tree linked via the @prev
+ * pointer.  For any task, it appears to be a singly-linked list starting
+ * with current->seccomp.filter, the most recently attached or inherited 
filter.
+ * However, multiple filters may share a @prev node, by way of fork(), which
+ * results in a unidirectional tree existing in memory.  This is similar to
+ * how namespaces work.
+ *
+ * seccomp_filter objects should never be modified after being attached
+ * to a task_struct (other than @usage).
+ */
+struct seccomp_filter {
+   atomic_t usage;
+   struct seccomp_filter *prev;
+   struct bpf_prog *prog;
+};
+
 /**
  * struct seccomp - the state of a seccomp'ed process
  *
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index dccfc05cb3ec..1867bbfa7c6c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -33,32 +33,6 @@
 #include 
 #include 
 
-/**
- * struct seccomp_filter - container for seccomp BPF programs
- *
- * @usage: reference count to manage the object lifetime.
- * get/put helpers should be used when accessing an instance
- * outside of a lifetime-guarded section.  In general, this
- * is only needed for handling filters shared across tasks.
- * @prev: points to a previously installed, or inherited, filter
- * @prog: the BPF program to evaluate
- *
- * seccomp_filter objects are organized in a tree linked via the @prev
- * pointer.  For any task, it appears to be a singly-linked list starting
- * with current->seccomp.filter, the most recently attached or inherited 
filter.
- * However, multiple filters may share a @prev node, by way of fork(), which
- * results in a unidirectional tree existing in memory.  This is similar to
- * how namespaces work.
- *
- * seccomp_filter objects should never be modified after being attached
- * to a task_struct (other than @usage).
- */
-struct seccomp_filter {
-   atomic_t usage;
-   struct seccomp_filter *prev;
-   struct bpf_prog *prog;
-};
-
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
-- 
2.9.3

[RFC v3 04/22] bpf: Set register type according to is_valid_access()

2016-09-14 Thread Mickaël Salaün

This fix a pointer leak when an unprivileged eBPF program read a pointer
value from the context. Even if is_valid_access() returns a pointer
type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
value containing an address is then allowed to leak. Moreover, this
prevented unprivileged eBPF programs to use functions with (legitimate)
pointer arguments.

This bug was not a problem until now because the only unprivileged eBPF
program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
from its context are UNKNOWN_VALUE.

Signed-off-by: Mickaël Salaün 
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c0c4a92dae8c..608cbffb0e86 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -794,10 +794,8 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
}
err = check_ctx_access(env, off, size, t, ®_type);
if (!err && t == BPF_READ && value_regno >= 0) {
-   mark_reg_unknown_value(state->regs, value_regno);
-   if (env->allow_ptr_leaks)
-   /* note that reg.[id|off|range] == 0 */
-   state->regs[value_regno].type = reg_type;
+   /* note that reg.[id|off|range] == 0 */
+   state->regs[value_regno].type = reg_type;
}
 
} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3

[RFC v3 08/22] seccomp: Fix documentation for struct seccomp_filter

2016-09-14 Thread Mickaël Salaün

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
---
 kernel/seccomp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0db7c8a2afe2..dccfc05cb3ec 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -41,8 +41,7 @@
  * outside of a lifetime-guarded section.  In general, this
  * is only needed for handling filters shared across tasks.
  * @prev: points to a previously installed, or inherited, filter
- * @len: the number of instructions in the program
- * @insnsi: the BPF program instructions to evaluate
+ * @prog: the BPF program to evaluate
  *
  * seccomp_filter objects are organized in a tree linked via the @prev
  * pointer.  For any task, it appears to be a singly-linked list starting
-- 
2.9.3

[RFC v3 20/22] landlock: Add update and debug access flags

2016-09-14 Thread Mickaël Salaün

For now, the update and debug accesses are only accessible to a process
with CAP_SYS_ADMIN. This could change in the future.

The capability check is statically done when loading an eBPF program,
according to the current process. If the process has enough rights and
set the appropriate access flags, then the dedicated functions or data
will be accessible.

With the update access, the following functions are available:
* bpf_map_lookup_elem
* bpf_map_update_elem
* bpf_map_delete_elem
* bpf_tail_call

With the debug access, the following functions are available:
* bpf_trace_printk
* bpf_get_prandom_u32
* bpf_get_current_pid_tgid
* bpf_get_current_uid_gid
* bpf_get_current_comm

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: Kees Cook 
Cc: Sargun Dhillon 
---
 include/uapi/linux/bpf.h |  4 +++-
 security/landlock/lsm.c  | 54 
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3cc52e51357f..8cfc2de2ab76 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -584,7 +584,9 @@ enum landlock_hook_id {
 #define _LANDLOCK_FLAG_ORIGIN_MASK ((1 << 3) - 1)
 
 /* context of function access flags */
-#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 0) - 1)
+#define LANDLOCK_FLAG_ACCESS_UPDATE(1 << 0)
+#define LANDLOCK_FLAG_ACCESS_DEBUG (1 << 1)
+#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 2) - 1)
 
 /* Handle check flags */
 #define LANDLOCK_FLAG_FS_DENTRY(1 << 0)
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 2a15839a08c8..56c45abe979c 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -202,11 +202,57 @@ static int landlock_run_prog(enum landlock_hook_id 
hook_id, __u64 args[6])
 static const struct bpf_func_proto *bpf_landlock_func_proto(
enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype)
 {
+   bool access_update = !!(prog_subtype->landlock_hook.access &
+   LANDLOCK_FLAG_ACCESS_UPDATE);
+   bool access_debug = !!(prog_subtype->landlock_hook.access &
+   LANDLOCK_FLAG_ACCESS_DEBUG);
+
switch (func_id) {
case BPF_FUNC_landlock_cmp_fs_prop_with_struct_file:
return &bpf_landlock_cmp_fs_prop_with_struct_file_proto;
case BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file:
return &bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+
+   /* access_update */
+   case BPF_FUNC_map_lookup_elem:
+   if (access_update)
+   return &bpf_map_lookup_elem_proto;
+   return NULL;
+   case BPF_FUNC_map_update_elem:
+   if (access_update)
+   return &bpf_map_update_elem_proto;
+   return NULL;
+   case BPF_FUNC_map_delete_elem:
+   if (access_update)
+   return &bpf_map_delete_elem_proto;
+   return NULL;
+   case BPF_FUNC_tail_call:
+   if (access_update)
+   return &bpf_tail_call_proto;
+   return NULL;
+
+   /* access_debug */
+   case BPF_FUNC_trace_printk:
+   if (access_debug)
+   return bpf_get_trace_printk_proto();
+   return NULL;
+   case BPF_FUNC_get_prandom_u32:
+   if (access_debug)
+   return &bpf_get_prandom_u32_proto;
+   return NULL;
+   case BPF_FUNC_get_current_pid_tgid:
+   if (access_debug)
+   return &bpf_get_current_pid_tgid_proto;
+   return NULL;
+   case BPF_FUNC_get_current_uid_gid:
+   if (access_debug)
+   return &bpf_get_current_uid_gid_proto;
+   return NULL;
+   case BPF_FUNC_get_current_comm:
+   if (access_debug)
+   return &bpf_get_current_comm_proto;
+   return NULL;
+
default:
return NULL;
}
@@ -348,6 +394,14 @@ static inline bool bpf_landlock_is_valid_subtype(
if (prog_subtype->landlock_hook.access & ~_LANDLOCK_FLAG_ACCESS_MASK)
return false;
 
+   /* check access flags */
+   if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_UPDATE &&
+   !capable(CAP_SYS_ADMIN))
+   return false;
+   if (prog_subtype->landlock_hook.access & LANDLOCK_FLAG_ACCESS_DEBUG &&
+   !capable(CAP_SYS_ADMIN))
+   return false;
+
return true;
 }
 
-- 
2.9.3

[RFC v3 16/22] bpf/cgroup,landlock: Handle Landlock hooks per cgroup

2016-09-14 Thread Mickaël Salaün

This allows to add new eBPF programs to Landlock hooks dedicated to a
cgroup thanks to the BPF_PROG_ATTACH command. Like for socket eBPF
programs, the Landlock hooks attached to a cgroup are propagated to the
nested cgroups. However, when a new Landlock program is attached to one
of this nested cgroup, this cgroup hierarchy fork the Landlock hooks.
This design is simple and match the current CONFIG_BPF_CGROUP
inheritance. The difference lie in the fact that Landlock programs can
only be stacked but not removed. This match the append-only seccomp
behavior. Userland is free to handle Landlock hooks attached to a cgroup
in more complicated ways (e.g. continuous inheritance), but care should
be taken to properly handle error cases (e.g. memory allocation errors).

Changes since v2:
* new design based on BPF_PROG_ATTACH (suggested by Alexei Starovoitov)

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: Kees Cook 
Cc: Tejun Heo 
Link: https://lkml.kernel.org/r/20160826021432.ga8...@ast-mbp.thefacebook.com
Link: https://lkml.kernel.org/r/20160827204307.ga43...@ast-mbp.thefacebook.com
---
 include/linux/bpf-cgroup.h  |  7 +++
 include/linux/cgroup-defs.h |  2 ++
 include/linux/landlock.h|  9 +
 include/uapi/linux/bpf.h|  1 +
 kernel/bpf/cgroup.c | 33 ++---
 kernel/bpf/syscall.c| 11 +++
 security/landlock/lsm.c | 40 +++-
 security/landlock/manager.c | 32 
 8 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 6cca7924ee17..439c681159e2 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -14,8 +14,15 @@ struct sk_buff;
 extern struct static_key_false cgroup_bpf_enabled_key;
 #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct landlock_hooks;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 union bpf_object {
struct bpf_prog *prog;
+#ifdef CONFIG_SECURITY_LANDLOCK
+   struct landlock_hooks *hooks;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 };
 
 struct cgroup_bpf {
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 861b4677fc5b..fe1023bf7b9d 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -301,8 +301,10 @@ struct cgroup {
/* used to schedule release agent */
struct work_struct release_agent_work;
 
+#ifdef CONFIG_CGROUP_BPF
/* used to store eBPF programs */
struct cgroup_bpf bpf;
+#endif /* CONFIG_CGROUP_BPF */
 
/* ids of the ancestors at each level including self */
int ancestor_ids[];
diff --git a/include/linux/landlock.h b/include/linux/landlock.h
index 932ae57fa70e..179a848110f3 100644
--- a/include/linux/landlock.h
+++ b/include/linux/landlock.h
@@ -19,6 +19,9 @@
 #include  /* struct seccomp_filter */
 #endif /* CONFIG_SECCOMP_FILTER */
 
+#ifdef CONFIG_CGROUP_BPF
+#include  /* struct cgroup */
+#endif /* CONFIG_CGROUP_BPF */
 
 #ifdef CONFIG_SECCOMP_FILTER
 struct landlock_seccomp_ret {
@@ -65,6 +68,7 @@ struct landlock_hooks {
 
 
 struct landlock_hooks *new_landlock_hooks(void);
+void get_landlock_hooks(struct landlock_hooks *hooks);
 void put_landlock_hooks(struct landlock_hooks *hooks);
 
 #ifdef CONFIG_SECCOMP_FILTER
@@ -73,5 +77,10 @@ int landlock_seccomp_set_hook(unsigned int flags,
const char __user *user_bpf_fd);
 #endif /* CONFIG_SECCOMP_FILTER */
 
+#ifdef CONFIG_CGROUP_BPF
+struct landlock_hooks *landlock_cgroup_set_hook(struct cgroup *cgrp,
+   struct bpf_prog *prog);
+#endif /* CONFIG_CGROUP_BPF */
+
 #endif /* CONFIG_SECURITY_LANDLOCK */
 #endif /* _LINUX_LANDLOCK_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 905dcace7255..12e61508f879 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -124,6 +124,7 @@ enum bpf_prog_type {
 enum bpf_attach_type {
BPF_CGROUP_INET_INGRESS,
BPF_CGROUP_INET_EGRESS,
+   BPF_CGROUP_LANDLOCK,
__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 7b75fa692617..1c18fe46958a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
@@ -31,7 +32,15 @@ void cgroup_bpf_put(struct cgroup *cgrp)
union bpf_object pinned = cgrp->bpf.pinned[type];
 
if (pinned.prog) {
-   bpf_prog_put(pinned.prog);
+   switch (type) {
+   case BPF_CGROUP_LANDLOCK:
+#ifdef CONFIG_SECURITY_LANDLOCK
+   put_landlock_hooks(pinned.hooks);
+   break;

[RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd()

2016-09-14 Thread Mickaël Salaün

Add security access check for cgroup backed FD. The "cgroup.procs" file
of the corresponding cgroup must be readable to identify the cgroup, and
writable to prove that the current process can manage this cgroup (e.g.
through delegation). This is similar to the check done by
cgroup_procs_write_permission().

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: Kees Cook 
Cc: Tejun Heo 
---
 include/linux/cgroup.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/syscall.c   |  6 +++---
 kernel/cgroup.c| 16 +++-
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c4688742ddc4..5767d471e292 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -87,7 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct 
dentry *dentry,
   struct cgroup_subsys 
*ss);
 
 struct cgroup *cgroup_get_from_path(const char *path);
-struct cgroup *cgroup_get_from_fd(int fd);
+struct cgroup *cgroup_get_from_fd(int fd, int access_mask);
 
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
 int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index edaab4c87292..1d4de8e0ab13 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -552,7 +552,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
 struct file *map_file /* not used */,
 int fd)
 {
-   return cgroup_get_from_fd(fd);
+   return cgroup_get_from_fd(fd, MAY_READ);
 }
 
 static void cgroup_fd_array_put_ptr(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e9c5add327e6..f90225dbbb59 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(int, bpf_prog_active);
 
@@ -863,7 +864,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
if (IS_ERR(prog))
return PTR_ERR(prog);
 
-   cgrp = cgroup_get_from_fd(attr->target_fd);
+   cgrp = cgroup_get_from_fd(attr->target_fd, MAY_WRITE);
if (IS_ERR(cgrp)) {
bpf_prog_put(prog);
return PTR_ERR(cgrp);
@@ -891,10 +892,9 @@ static int bpf_prog_detach(const union bpf_attr *attr)
if (!capable(CAP_NET_ADMIN))
return -EPERM;
 
-   cgrp = cgroup_get_from_fd(attr->target_fd);
+   cgrp = cgroup_get_from_fd(attr->target_fd, MAY_WRITE);
if (IS_ERR(cgrp))
return PTR_ERR(cgrp);
-
result = cgroup_bpf_update(cgrp, NULL, attr->attach_type);
cgroup_put(cgrp);
break;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 48b650a640a9..3bbaf3f02ed2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6241,17 +6241,20 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
 /**
  * cgroup_get_from_fd - get a cgroup pointer from a fd
  * @fd: fd obtained by open(cgroup2_dir)
+ * @access_mask: contains the permission mask
  *
  * Find the cgroup from a fd which should be obtained
  * by opening a cgroup directory.  Returns a pointer to the
  * cgroup on success. ERR_PTR is returned if the cgroup
  * cannot be found.
  */
-struct cgroup *cgroup_get_from_fd(int fd)
+struct cgroup *cgroup_get_from_fd(int fd, int access_mask)
 {
struct cgroup_subsys_state *css;
struct cgroup *cgrp;
struct file *f;
+   struct inode *inode;
+   int ret;
 
f = fget_raw(fd);
if (!f)
@@ -6268,6 +6271,17 @@ struct cgroup *cgroup_get_from_fd(int fd)
return ERR_PTR(-EBADF);
}
 
+   ret = -ENOMEM;
+   inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn);
+   if (inode) {
+   ret = inode_permission(inode, access_mask);
+   iput(inode);
+   }
+   if (ret) {
+   cgroup_put(cgrp);
+   return ERR_PTR(ret);
+   }
+
return cgrp;
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
-- 
2.9.3

[RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks

2016-09-14 Thread Mickaël Salaün

Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
set for all cgroup except the root. The flag is clear when a new process
without the no_new_privs flags is attached to the cgroup.

If a cgroup is landlocked, then any new attempt, from an unprivileged
process, to attach a process without no_new_privs to this cgroup will
be denied.

This allows to safely manage Landlock rules with cgroup delegation as
with seccomp.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: Kees Cook 
Cc: Tejun Heo 
---
 include/linux/cgroup-defs.h |  7 +++
 kernel/bpf/syscall.c|  7 ---
 kernel/cgroup.c | 44 ++--
 security/landlock/manager.c |  7 +++
 4 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index fe1023bf7b9d..ce0e4c90ae7d 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -59,6 +59,13 @@ enum {
 * specified at mount time and thus is implemented here.
 */
CGRP_CPUSET_CLONE_CHILDREN,
+   /*
+* Keep track of the no_new_privs property of processes in the cgroup.
+* This is useful to quickly check if all processes in the cgroup have
+* their no_new_privs bit on. This flag is initially set to true but
+* ANDed with every processes coming in the cgroup.
+*/
+   CGRP_NO_NEW_PRIVS,
 };
 
 /* cgroup_root->flags */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f90225dbbb59..ff8b53a8a2a0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -849,9 +849,10 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 
case BPF_CGROUP_LANDLOCK:
 #ifdef CONFIG_SECURITY_LANDLOCK
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
+   /*
+* security/capability check done in landlock_cgroup_set_hook()
+* called by cgroup_bpf_update()
+*/
prog = bpf_prog_get_type(attr->attach_bpf_fd,
BPF_PROG_TYPE_LANDLOCK);
break;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3bbaf3f02ed2..913e2d3b6d55 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CREATE_TRACE_POINTS
@@ -1985,6 +1986,7 @@ static void init_cgroup_root(struct cgroup_root *root,
strcpy(root->name, opts->name);
if (opts->cpuset_clone_children)
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
+   /* no CGRP_NO_NEW_PRIVS flag for the root */
 }
 
 static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
@@ -2812,14 +2814,35 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
LIST_HEAD(preloaded_csets);
struct task_struct *task;
int ret;
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+   bool no_new_privs;
+#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */
 
if (!cgroup_may_migrate_to(dst_cgrp))
return -EBUSY;
 
+   task = leader;
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+   no_new_privs = !!(dst_cgrp->flags & BIT_ULL(CGRP_NO_NEW_PRIVS));
+   do {
+   no_new_privs = no_new_privs && task_no_new_privs(task);
+   if (!no_new_privs) {
+   if (dst_cgrp->bpf.pinned[BPF_CGROUP_LANDLOCK].hooks &&
+   security_capable_noaudit(current_cred(),
+   current_user_ns(),
+   CAP_SYS_ADMIN) != 0)
+   return -EPERM;
+   clear_bit(CGRP_NO_NEW_PRIVS, &dst_cgrp->flags);
+   break;
+   }
+   if (!threadgroup)
+   break;
+   } while_each_thread(leader, task);
+#endif /* CONFIG_CGROUP_BPF && CONFIG_SECURITY_LANDLOCK */
+
/* look up all src csets */
spin_lock_irq(&css_set_lock);
rcu_read_lock();
-   task = leader;
do {
cgroup_migrate_add_src(task_css_set(task), dst_cgrp,
   &preloaded_csets);
@@ -4345,9 +4368,22 @@ int cgroup_transfer_tasks(struct cgroup *to, struct 
cgroup *from)
return -EBUSY;
 
mutex_lock(&cgroup_mutex);
-
percpu_down_write(&cgroup_threadgroup_rwsem);
 
+#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_SECURITY_LANDLOCK)
+   if (!(from->flags & BIT_ULL(CGRP_NO_NEW_PRIVS))) {
+   if (to->bpf.pinned[BPF_CGROUP_LANDLOCK].hooks &&
+

[RFC v3 22/22] samples/landlock: Add sandbox example

2016-09-14 Thread Mickaël Salaün

Add a basic sandbox tool to create a process isolated from some part of
the system. This can depend of the current cgroup.

Example with the current process hierarchy (seccomp):

  $ ls /home
  user1
  $ LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
  ./samples/landlock/sandbox /bin/sh -i
  Launching a new sandboxed process.
  $ ls /home
  ls: cannot open directory '/home': Permission denied

Example with a cgroup:

  $ mkdir /sys/fs/cgroup/sandboxed
  $ ls /home
  user1
  $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
  LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
  ./samples/landlock/sandbox
  Ready to sandbox with cgroups.
  $ ls /home
  user1
  $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
  $ ls /home
  ls: cannot open directory '/home': Permission denied

Changes since v2:
* use BPF_PROG_ATTACH for cgroup handling

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---
 samples/Makefile|   2 +-
 samples/landlock/.gitignore |   1 +
 samples/landlock/Makefile   |  16 +++
 samples/landlock/sandbox.c  | 307 
 4 files changed, 325 insertions(+), 1 deletion(-)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c

diff --git a/samples/Makefile b/samples/Makefile
index 1a20169d85ac..a2dcd57ca7ac 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -2,4 +2,4 @@
 
 obj-$(CONFIG_SAMPLES)  += kobject/ kprobes/ trace_events/ livepatch/ \
   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
-  configfs/ connector/ v4l/ trace_printk/
+  configfs/ connector/ v4l/ trace_printk/ landlock/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index ..f6c6da930a30
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandbox
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index ..d1044b2afd27
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,16 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-$(CONFIG_SECURITY_LANDLOCK) := sandbox
+sandbox-objs := sandbox.o
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS += -I$(objtree)/usr/include
+
+# Trick to allow make to be run from this directory
+all:
+   $(MAKE) -C ../../ $$PWD/
+
+clean:
+   $(MAKE) -C ../../ M=$$PWD clean
diff --git a/samples/landlock/sandbox.c b/samples/landlock/sandbox.c
new file mode 100644
index ..9d6ac00cdd23
--- /dev/null
+++ b/samples/landlock/sandbox.c
@@ -0,0 +1,307 @@
+/*
+ * Landlock LSM - Sandbox example
+ *
+ * Copyright (C) 2016  Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 3, as
+ * published by the Free Software Foundation.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include  /* open() */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../tools/include/linux/filter.h"
+
+#include "../bpf/libbpf.c"
+
+#ifndef seccomp
+static int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+   errno = 0;
+   return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
+static int landlock_prog_load(const struct bpf_insn *insns, int prog_len,
+   enum landlock_hook_id hook_id, __u64 access)
+{
+   union bpf_attr attr = {
+   .prog_type = BPF_PROG_TYPE_LANDLOCK,
+   .insns = ptr_to_u64((void *) insns),
+   .insn_cnt = prog_len / sizeof(struct bpf_insn),
+   .license = ptr_to_u64((void *) "GPL"),
+   .log_buf = ptr_to_u64(bpf_log_buf),
+   .log_size = LOG_BUF_SIZE,
+   .log_level = 1,
+   .prog_subtype.landlock_hook = {
+   .id = hook_id,
+   .origin = LANDLOCK_FLAG_ORIGIN_SECCOMP |
+   LANDLOCK_FLAG_ORIGIN_SYSCALL |
+   LANDLOCK_FLAG_ORIGIN_INTERRUPT,
+   .access = access,
+   },
+   };
+
+   /* assign one field outside of struct init to make sure any
+* padding is zero initialized
+*/
+   attr.kern_version = 0;
+
+   bpf_log_buf[0] = 0;
+
+   return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+}
+
+#define ARRAY_SIZE(a)  (sizeof(a) / sizeof(a[0]))
+
+static int apply_sandbox(const char **allowed_paths, int path_nb, const char
+   **cgroup_paths, int cgroup_nb)
+{
+   __u32 key;
+   int i,

[RFC v3 19/22] landlock: Add interrupted origin

2016-09-14 Thread Mickaël Salaün

This third origin of hook call should cover all possible trigger paths
(e.g. page fault). Landlock eBPF programs can then take decisions
accordingly.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: Kees Cook 
---
 include/uapi/linux/bpf.h |  3 ++-
 security/landlock/lsm.c  | 17 +++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 12e61508f879..3cc52e51357f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -580,7 +580,8 @@ enum landlock_hook_id {
 /* Trigger type */
 #define LANDLOCK_FLAG_ORIGIN_SYSCALL   (1 << 0)
 #define LANDLOCK_FLAG_ORIGIN_SECCOMP   (1 << 1)
-#define _LANDLOCK_FLAG_ORIGIN_MASK ((1 << 2) - 1)
+#define LANDLOCK_FLAG_ORIGIN_INTERRUPT (1 << 2)
+#define _LANDLOCK_FLAG_ORIGIN_MASK ((1 << 3) - 1)
 
 /* context of function access flags */
 #define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 0) - 1)
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 000dd0c7ec3d..2a15839a08c8 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -17,6 +17,7 @@
 #include  /* FIELD_SIZEOF() */
 #include 
 #include 
+#include  /* in_interrupt() */
 #include  /* struct seccomp_* */
 #include  /* uintptr_t */
 
@@ -109,6 +110,7 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, 
__u64 args[6])
 #endif /* CONFIG_CGROUP_BPF */
struct landlock_rule *rule;
u32 hook_idx = get_index(hook_id);
+   u16 current_call;
 
struct landlock_data ctx = {
.hook = hook_id,
@@ -128,6 +130,16 @@ static int landlock_run_prog(enum landlock_hook_id 
hook_id, __u64 args[6])
 * prioritize fine-grained policies (i.e. per thread), and return early.
 */
 
+   if (unlikely(in_interrupt())) {
+   current_call = LANDLOCK_FLAG_ORIGIN_INTERRUPT;
+#ifdef CONFIG_SECCOMP_FILTER
+   /* bypass landlock_ret evaluation */
+   goto seccomp_int;
+#endif /* CONFIG_SECCOMP_FILTER */
+   } else {
+   current_call = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+   }
+
 #ifdef CONFIG_SECCOMP_FILTER
/* seccomp triggers and landlock_ret cleanup */
ctx.origin = LANDLOCK_FLAG_ORIGIN_SECCOMP;
@@ -164,8 +176,9 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, 
__u64 args[6])
return -ret;
ctx.cookie = 0;
 
+seccomp_int:
/* syscall trigger */
-   ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+   ctx.origin = current_call;
ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
current->seccomp.landlock_hooks);
if (ret)
@@ -175,7 +188,7 @@ static int landlock_run_prog(enum landlock_hook_id hook_id, 
__u64 args[6])
 #ifdef CONFIG_CGROUP_BPF
/* syscall trigger */
if (cgroup_bpf_enabled) {
-   ctx.origin = LANDLOCK_FLAG_ORIGIN_SYSCALL;
+   ctx.origin = current_call;
/* get the default cgroup associated with the current thread */
cgrp = task_css_set(current)->dfl_cgrp;
ret = landlock_run_prog_for_syscall(hook_idx, &ctx,
-- 
2.9.3

[RFC v3 02/22] bpf: Move u64_to_ptr() to BPF headers and inline it

2016-09-14 Thread Mickaël Salaün

This helper will be useful for arraymap (next commit).

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h  | 6 ++
 kernel/bpf/syscall.c | 6 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a904f63f8c1..fa9a988400d9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -274,6 +274,12 @@ static inline void bpf_long_memcpy(void *dst, const void 
*src, u32 size)
 
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
+
+/* helper to convert user pointers passed inside __aligned_u64 fields */
+static inline void __user *u64_to_ptr(__u64 val)
+{
+   return (void __user *) (unsigned long) val;
+}
 #else
 static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1a8592a082ce..776c752604b0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -252,12 +252,6 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
return map;
 }
 
-/* helper to convert user pointers passed inside __aligned_u64 fields */
-static void __user *u64_to_ptr(__u64 val)
-{
-   return (void __user *) (unsigned long) val;
-}
-
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
return -ENOTSUPP;
-- 
2.9.3

[RFC v3 06/22] landlock: Add LSM hooks

2016-09-14 Thread Mickaël Salaün

Add LSM hooks which can be used by userland through Landlock (eBPF)
programs. This programs are limited to a whitelist of functions (cf.
next commit). The eBPF program context is depicted by the struct
landlock_data (cf. include/uapi/linux/bpf.h):
* hook: LSM hook ID
* origin: what triggered this Landlock program (syscall, dedicated
  seccomp return or interruption)
* cookie: the 16-bit value from the seccomp filter that triggered this
  Landlock program
* args[6]: array of some LSM hook arguments

The LSM hook arguments can contain raw values as integers or
(unleakable) pointers. The only way to use the pointers are to pass them
to an eBPF function according to their types (e.g. the
bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
file pointer).

For each Landlock program, the subtype allows to specify for which LSM
hook the program is dedicated thanks to the "id" field. The "origin"
field must contains each triggers for which the Landlock program will
be called (e.g. every syscall or/and seccomp filters returning
RET_LANDLOCK). The "access" bitfield can be used to allow a program to
access a specific feature from a Landlock hook (i.e. context value or
function). The flag guarding this feature may only be enabled according
to the capabilities of the process loading the program.

For now, there is three hooks for file system access control:
* file_open
* file_permission
* mmap_file

Changes since v2:
* use subtypes instead of dedicated eBPF program types for each hook
  (suggested by Alexei Starovoitov)
* replace convert_ctx_access() with subtype check
* use an array of Landlock program list instead of a single list
* handle running Landlock programs without needing a seccomp filter
* use, check and expose "origin" to Landlock programs
* mask the unused struct cred * (suggested by Andy Lutomirski)

Changes since v1:
* revamp access control from a syscall-based to a LSM hooks-based
* do not use audit cache
* no race conditions by design
* architecture agnostic
* switch from cBPF to eBPF (suggested by Daniel Borkmann)
* new BPF context

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Will Drewry 
Link: https://lkml.kernel.org/r/20160827205559.ga43...@ast-mbp.thefacebook.com
Link: https://lkml.kernel.org/r/20160827180642.ga38...@ast-mbp.thefacebook.com
Link: 
https://lkml.kernel.org/r/CALCETrUK1umtXMEXXKzMAccNQCVTPA8_XNDf01B5=gazujw...@mail.gmail.com
Link: https://lkml.kernel.org/r/20160827204307.ga43...@ast-mbp.thefacebook.com
---
 include/linux/bpf.h|   5 +
 include/linux/lsm_hooks.h  |   5 +
 include/uapi/linux/bpf.h   |  37 
 kernel/bpf/syscall.c   |  10 +-
 kernel/bpf/verifier.c  |   6 ++
 security/Makefile  |   2 +
 security/landlock/Makefile |   3 +
 security/landlock/lsm.c| 222 +
 security/security.c|   1 +
 9 files changed, 289 insertions(+), 2 deletions(-)
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/lsm.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9aa01d9d3d80..36c3e482239c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -85,6 +85,8 @@ enum bpf_arg_type {
 
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING,   /* any (initialized) argument is ok */
+
+   ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */
 };
 
 /* type of values returned from helper functions */
@@ -143,6 +145,9 @@ enum bpf_reg_type {
 */
PTR_TO_PACKET,
PTR_TO_PACKET_END,   /* skb->data + headlen */
+
+   /* Landlock */
+   PTR_TO_STRUCT_FILE,
 };
 
 struct bpf_prog;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 558adfa5c8a8..069af34301d4 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1933,5 +1933,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 667b6ef3ff1e..ad87003fe892 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -108,6 +108,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_XDP,
BPF_PROG_TYPE_PERF_EVENT,
BPF_PROG_TYPE_CGROUP_SOCKET,
+   BPF_PROG_TYPE_LANDLOCK,
 };
 
 enum bpf_attach_type {
@@ -528,6 +529,23 @@ struct xdp_md {
__u32 data_end;
 };
 
+/* LSM hooks */
+enum landlock_hook_id {
+   LANDLOCK_HOOK_UNSPEC,
+   LANDLOCK_HOOK_FILE_OPEN,
+   LANDLOCK_HOOK_FILE_PERMISSION,
+   LANDLOCK_HOOK_MMAP_FILE,
+};
+#define _LANDLOCK_HOOK

[RFC v3 10/22] seccomp: Split put_seccomp_filter() with put_seccomp()

2016-09-14 Thread Mickaël Salaün

The semantic is unchanged. This will be useful for the Landlock
integration with seccomp (next commit).

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
---
 include/linux/seccomp.h |  5 +++--
 kernel/fork.c   |  2 +-
 kernel/seccomp.c| 18 +-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index a0459a7315ce..ffdab7cdd162 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -102,13 +102,14 @@ static inline int seccomp_mode(struct seccomp *s)
 #endif /* CONFIG_SECCOMP */
 
 #ifdef CONFIG_SECCOMP_FILTER
-extern void put_seccomp_filter(struct task_struct *tsk);
+extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
 #else  /* CONFIG_SECCOMP_FILTER */
-static inline void put_seccomp_filter(struct task_struct *tsk)
+static inline void put_seccomp(struct task_struct *tsk)
 {
return;
 }
+
 static inline void get_seccomp_filter(struct task_struct *tsk)
 {
return;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3584f521e3a6..99df46f157cf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -276,7 +276,7 @@ void free_task(struct task_struct *tsk)
free_thread_stack(tsk);
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
-   put_seccomp_filter(tsk);
+   put_seccomp(tsk);
arch_release_task_struct(tsk);
free_task_struct(tsk);
 }
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 1867bbfa7c6c..92b15083b1b2 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -36,6 +36,8 @@
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
+static void put_seccomp_filter(struct seccomp_filter *filter);
+
 /*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
@@ -286,7 +288,7 @@ static inline void seccomp_sync_threads(void)
 * current's path will hold a reference.  (This also
 * allows a put before the assignment.)
 */
-   put_seccomp_filter(thread);
+   put_seccomp_filter(thread->seccomp.filter);
smp_store_release(&thread->seccomp.filter,
  caller->seccomp.filter);
 
@@ -448,10 +450,11 @@ static inline void seccomp_filter_free(struct 
seccomp_filter *filter)
}
 }
 
-/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+/* put_seccomp_filter - decrements the ref count of a filter */
+static void put_seccomp_filter(struct seccomp_filter *filter)
 {
-   struct seccomp_filter *orig = tsk->seccomp.filter;
+   struct seccomp_filter *orig = filter;
+
/* Clean up single-reference branches iteratively. */
while (orig && atomic_dec_and_test(&orig->usage)) {
struct seccomp_filter *freeme = orig;
@@ -460,6 +463,11 @@ void put_seccomp_filter(struct task_struct *tsk)
}
 }
 
+void put_seccomp(struct task_struct *tsk)
+{
+   put_seccomp_filter(tsk->seccomp.filter);
+}
+
 /**
  * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
  * @syscall: syscall number to send to userland
@@ -871,7 +879,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned 
long filter_off,
if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
ret = -EFAULT;
 
-   put_seccomp_filter(task);
+   put_seccomp_filter(task->seccomp.filter);
return ret;
 
 out:
-- 
2.9.3

[RFC v3 01/22] landlock: Add Kconfig

2016-09-14 Thread Mickaël Salaün

Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp
parts to ease the review.

Changes from v2:
* add seccomp filter or cgroups (with eBPF programs attached support)
  dependencies

Signed-off-by: Mickaël Salaün 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---
 security/Kconfig  |  1 +
 security/landlock/Kconfig | 23 +++
 2 files changed, 24 insertions(+)
 create mode 100644 security/landlock/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index 118f4549404e..c63194c561c5 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -164,6 +164,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/landlock/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index ..dec64270b06d
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,23 @@
+config SECURITY_LANDLOCK
+   bool "Landlock sandbox support"
+   depends on SECURITY
+   depends on BPF_SYSCALL
+   depends on SECCOMP_FILTER || CGROUP_BPF
+   default y
+   help
+ Landlock is a stacked LSM which allows any user to load a security
+ policy to restrict their processes (i.e. create a sandbox). The
+ policy is a list of stacked eBPF programs for some LSM hooks. Each
+ program can do some access comparison to check if an access request
+ is legitimate.
+
+ You need to enable seccomp filter and/or cgroups (with eBPF programs
+ attached support) to apply a security policy to either a process
+ hierarchy (e.g. application with built-in sandboxing) or a group of
+ processes (e.g. container sandboxing). It is recommended to enable
+ both seccomp filter and cgroups.
+
+ Further information about eBPF can be found in
+ Documentation/networking/filter.txt
+
+ If you are unsure how to answer this question, answer Y.
-- 
2.9.3

[RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles

2016-09-14 Thread Mickaël Salaün

This new arraymap looks like a set and brings new properties:
* strong typing of entries: the eBPF functions get the array type of
  elements instead of CONST_PTR_TO_MAP (e.g.
  CONST_PTR_TO_LANDLOCK_HANDLE_FS);
* force sequential filling (i.e. replace or append-only update), which
  allow quick browsing of all entries.

This strong typing is useful to statically check if the content of a map
can be passed to an eBPF function. For example, Landlock use it to store
and manage kernel objects (e.g. struct file) instead of dealing with
userland raw data. This improve efficiency and ensure that an eBPF
program can only call functions with the right high-level arguments.

The enum bpf_map_handle_type list low-level types (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
updating a map entry (handle). This handle types are used to infer a
high-level arraymap type which are listed in enum bpf_map_array_type
(e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).

For now, this new arraymap is only used by Landlock LSM (cf. next
commits) but it could be useful for other needs.

Changes since v2:
* add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
  handle entries (suggested by Andy Lutomirski)
* remove useless checks

Changes since v1:
* arraymap of handles replace custom checker groups
* simpler userland API

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: Kees Cook 
Link: 
https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com
---
 include/linux/bpf.h  |  14 
 include/uapi/linux/bpf.h |  18 +
 kernel/bpf/arraymap.c| 203 +++
 kernel/bpf/verifier.c|  12 ++-
 4 files changed, 246 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fa9a988400d9..eae4ce4542c1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -13,6 +13,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include  /* struct file */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 struct perf_event;
 struct bpf_map;
 
@@ -38,6 +42,7 @@ struct bpf_map_ops {
 struct bpf_map {
atomic_t refcnt;
enum bpf_map_type map_type;
+   enum bpf_map_array_type map_array_type;
u32 key_size;
u32 value_size;
u32 max_entries;
@@ -187,6 +192,9 @@ struct bpf_array {
 */
enum bpf_prog_type owner_prog_type;
bool owner_jited;
+#ifdef CONFIG_SECURITY_LANDLOCK
+   u32 n_entries;  /* number of entries in a handle array */
+#endif /* CONFIG_SECURITY_LANDLOCK */
union {
char value[0] __aligned(8);
void *ptrs[0] __aligned(8);
@@ -194,6 +202,12 @@ struct bpf_array {
};
 };
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct map_landlock_handle {
+   u32 type; /* enum bpf_map_handle_type */
+};
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #define MAX_TAIL_CALL_CNT 32
 
 struct bpf_event_entry {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7cd36166f9b7..b68de57f7ab8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -87,6 +87,15 @@ enum bpf_map_type {
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
BPF_MAP_TYPE_CGROUP_ARRAY,
+   BPF_MAP_TYPE_LANDLOCK_ARRAY,
+};
+
+enum bpf_map_array_type {
+   BPF_MAP_ARRAY_TYPE_UNSPEC,
+};
+
+enum bpf_map_handle_type {
+   BPF_MAP_HANDLE_TYPE_UNSPEC,
 };
 
 enum bpf_prog_type {
@@ -510,4 +519,13 @@ struct xdp_md {
__u32 data_end;
 };
 
+/* Map handle entry */
+struct landlock_handle {
+   __u32 type; /* enum bpf_map_handle_type */
+   union {
+   __u32 fd;
+   __aligned_u64 glob;
+   };
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index a2ac051c342f..94256597eacd 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -16,6 +16,13 @@
 #include 
 #include 
 #include 
+#include  /* fput() */
+#include  /* struct file */
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include  /* RLIMIT_NOFILE */
+#include  /* rlimit() */
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
@@ -580,3 +587,199 @@ static int __init register_cgroup_array_map(void)
 }
 late_initcall(register_cgroup_array_map);
 #endif
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr)
+{
+   if (attr->value_size != sizeof(struct landlock_handle))
+   return ERR_PTR(-EINVAL);
+   attr->value_size = sizeof(struct map_landlock_handle);
+
+   return array_map_alloc(attr);
+}
+
+static void landlock_put_handle(struct map_landlock_handle *handle)
+{
+   enum bpf_map_handle_type handle_type = handle->type;
+
+   switch (handle_type) {
+   case BPF_MAP_HANDLE_TYPE_UNSPEC:
+

[RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context

2016-09-14 Thread Mickaël Salaün

This is a proof of concept to expose optional values that could depend
of the process access rights.

There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and
LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access
eBPF functions manipulating a skb in a read or write way.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: Kees Cook 
Cc: Sargun Dhillon 
---
 include/linux/bpf.h  |  2 ++
 include/uapi/linux/bpf.h |  7 ++-
 kernel/bpf/verifier.c|  6 ++
 security/landlock/lsm.c  | 26 ++
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f7325c17f720..218973777612 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -88,6 +88,7 @@ enum bpf_arg_type {
 
ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */
ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS 
handle */
+   ARG_PTR_TO_STRUCT_SKB,  /* pointer to struct skb */
 };
 
 /* type of values returned from helper functions */
@@ -150,6 +151,7 @@ enum bpf_reg_type {
/* Landlock */
PTR_TO_STRUCT_FILE,
CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+   PTR_TO_STRUCT_SKB,
 };
 
 struct bpf_prog;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8cfc2de2ab76..7d9e56952ed9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -586,7 +586,9 @@ enum landlock_hook_id {
 /* context of function access flags */
 #define LANDLOCK_FLAG_ACCESS_UPDATE(1 << 0)
 #define LANDLOCK_FLAG_ACCESS_DEBUG (1 << 1)
-#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 2) - 1)
+#define LANDLOCK_FLAG_ACCESS_SKB_READ  (1 << 2)
+#define LANDLOCK_FLAG_ACCESS_SKB_WRITE (1 << 3)
+#define _LANDLOCK_FLAG_ACCESS_MASK ((1ULL << 4) - 1)
 
 /* Handle check flags */
 #define LANDLOCK_FLAG_FS_DENTRY(1 << 0)
@@ -619,12 +621,15 @@ struct landlock_handle {
  * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
  *description and the LANDLOCK_HOOK* definitions from
  *security/landlock/lsm.c for their types.
+ * @opt_skb: optional skb pointer, accessible with the
+ *   LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks.
  */
 struct landlock_data {
__u32 hook; /* enum landlock_hook_id */
__u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */
__u16 cookie; /* seccomp RET_LANDLOCK */
__u64 args[6];
+   __u64 opt_skb;
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8d7b18574f5a..a95154c1a60f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -247,6 +247,7 @@ static const char * const reg_type_str[] = {
[PTR_TO_PACKET_END] = "pkt_end",
[PTR_TO_STRUCT_FILE]= "struct_file",
[CONST_PTR_TO_LANDLOCK_HANDLE_FS] = "landlock_handle_fs",
+   [PTR_TO_STRUCT_SKB] = "struct_skb",
 };
 
 static void print_verifier_state(struct verifier_state *state)
@@ -559,6 +560,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
case CONST_PTR_TO_MAP:
case PTR_TO_STRUCT_FILE:
case CONST_PTR_TO_LANDLOCK_HANDLE_FS:
+   case PTR_TO_STRUCT_SKB:
return true;
default:
return false;
@@ -984,6 +986,10 @@ static int check_func_arg(struct verifier_env *env, u32 
regno,
expected_type = CONST_PTR_TO_LANDLOCK_HANDLE_FS;
if (type != expected_type)
goto err_type;
+   } else if (arg_type == ARG_PTR_TO_STRUCT_SKB) {
+   expected_type = PTR_TO_STRUCT_SKB;
+   if (type != expected_type)
+   goto err_type;
} else if (arg_type == ARG_PTR_TO_STACK ||
   arg_type == ARG_PTR_TO_RAW_STACK) {
expected_type = PTR_TO_STACK;
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index 56c45abe979c..8b0e6f0eb6b7 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -281,6 +281,7 @@ static bool __is_valid_access(int off, int size, enum 
bpf_access_type type,
break;
case offsetof(struct landlock_data, args[0]) ...
offsetof(struct landlock_data, args[5]):
+   case offsetof(struct landlock_data, opt_skb):
expected_size = sizeof(__u64);
break;
default:
@@ -299,6 +300,13 @@ static bool __is_valid_access(int off, int size, enum 
bpf_access_type type,
if (*reg_type == NOT_INIT)
return false;
break;
+   case offsetof(struct landlock_data, opt_skb):
+   if (!(prog_subtype->landlock_hook.access &
+   (LANDLOCK_FLAG_ACCESS_SKB_READ |
+

[RFC v3 13/22] bpf/cgroup: Replace struct bpf_prog with union bpf_object

2016-09-14 Thread Mickaël Salaün

This allows CONFIG_CGROUP_BPF to manage different type of pointers
instead of only eBPF programs. This will be useful for the next commits
to support Landlock with cgroups.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: Tejun Heo 
---
 include/linux/bpf-cgroup.h |  8 ++--
 kernel/bpf/cgroup.c| 44 +++-
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index fc076de74ab9..2234042d7f61 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -14,14 +14,18 @@ struct sk_buff;
 extern struct static_key_false cgroup_bpf_enabled_key;
 #define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
 
+union bpf_object {
+   struct bpf_prog *prog;
+};
+
 struct cgroup_bpf {
/*
 * Store two sets of bpf_prog pointers, one for programs that are
 * pinned directly to this cgroup, and one for those that are effective
 * when this cgroup is accessed.
 */
-   struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
-   struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
+   union bpf_object pinned[MAX_BPF_ATTACH_TYPE];
+   union bpf_object effective[MAX_BPF_ATTACH_TYPE];
 };
 
 void cgroup_bpf_put(struct cgroup *cgrp);
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 21d168c3ad35..782878ec4f2d 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -20,18 +20,18 @@ DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
 
 /**
- * cgroup_bpf_put() - put references of all bpf programs
+ * cgroup_bpf_put() - put references of all bpf objects
  * @cgrp: the cgroup to modify
  */
 void cgroup_bpf_put(struct cgroup *cgrp)
 {
unsigned int type;
 
-   for (type = 0; type < ARRAY_SIZE(cgrp->bpf.prog); type++) {
-   struct bpf_prog *prog = cgrp->bpf.prog[type];
+   for (type = 0; type < ARRAY_SIZE(cgrp->bpf.pinned); type++) {
+   union bpf_object pinned = cgrp->bpf.pinned[type];
 
-   if (prog) {
-   bpf_prog_put(prog);
+   if (pinned.prog) {
+   bpf_prog_put(pinned.prog);
static_branch_dec(&cgroup_bpf_enabled_key);
}
}
@@ -47,11 +47,12 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup 
*parent)
unsigned int type;
 
for (type = 0; type < ARRAY_SIZE(cgrp->bpf.effective); type++) {
-   struct bpf_prog *e;
+   union bpf_object e;
 
-   e = rcu_dereference_protected(parent->bpf.effective[type],
- lockdep_is_held(&cgroup_mutex));
-   rcu_assign_pointer(cgrp->bpf.effective[type], e);
+   e.prog = rcu_dereference_protected(
+   parent->bpf.effective[type].prog,
+   lockdep_is_held(&cgroup_mutex));
+   rcu_assign_pointer(cgrp->bpf.effective[type].prog, e.prog);
}
 }
 
@@ -87,32 +88,33 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 struct bpf_prog *prog,
 enum bpf_attach_type type)
 {
-   struct bpf_prog *old_prog, *effective;
+   union bpf_object obj, old_pinned, effective;
struct cgroup_subsys_state *pos;
 
-   old_prog = xchg(cgrp->bpf.prog + type, prog);
+   obj.prog = prog;
+   old_pinned = xchg(cgrp->bpf.pinned + type, obj);
 
-   effective = (!prog && parent) ?
-   rcu_dereference_protected(parent->bpf.effective[type],
+   effective.prog = (!obj.prog && parent) ?
+   rcu_dereference_protected(parent->bpf.effective[type].prog,
  lockdep_is_held(&cgroup_mutex)) :
-   prog;
+   obj.prog;
 
css_for_each_descendant_pre(pos, &cgrp->self) {
struct cgroup *desc = container_of(pos, struct cgroup, self);
 
/* skip the subtree if the descendant has its own program */
-   if (desc->bpf.prog[type] && desc != cgrp)
+   if (desc->bpf.pinned[type].prog && desc != cgrp)
pos = css_rightmost_descendant(pos);
else
-   rcu_assign_pointer(desc->bpf.effective[type],
-  effective);
+   rcu_assign_pointer(desc->bpf.effective[type].prog,
+  effective.prog);
}
 
-   if (prog)
+   if (obj.prog)
static_branch_inc(&cgroup_bpf_enabled_key);
 
-   if (old_prog) {
-   bpf_prog_put(old_prog);
+   if (old_pinned.prog) {
+

[RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy

2016-09-14 Thread Mickaël Salaün

A Landlock program will be triggered according to its subtype/origin
bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
Landlock program when a seccomp filter will return RET_LANDLOCK.
Moreover, it is possible to return a 16-bit cookie which will be
readable by the Landlock programs in its context.

Only seccomp filters loaded from the same thread and before a Landlock
program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple
Landlock programs can be triggered by one or more seccomp filters. This
way, each RET_LANDLOCK (with specific cookie) will trigger all the
allowed Landlock programs once.

Changes since v2:
* Landlock programs can now be run without seccomp filter but for any
  syscall (from the process) or interruption
* move Landlock related functions and structs into security/landlock/*
  (to manage cgroups as well)
* fix seccomp filter handling: run Landlock programs for each of their
  legitimate seccomp filter
* properly clean up all seccomp results
* cosmetic changes to ease the understanding
* fix some ifdef

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Andrew Morton 
---
 include/linux/landlock.h |  77 ++
 include/linux/seccomp.h  |  26 +
 include/uapi/linux/seccomp.h |   2 +
 kernel/fork.c|  23 +++-
 kernel/seccomp.c |  68 +++-
 security/landlock/Makefile   |   2 +-
 security/landlock/common.h   |  27 +
 security/landlock/lsm.c  |  96 -
 security/landlock/manager.c  | 242 +++
 9 files changed, 552 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/manager.c

diff --git a/include/linux/landlock.h b/include/linux/landlock.h
new file mode 100644
index ..932ae57fa70e
--- /dev/null
+++ b/include/linux/landlock.h
@@ -0,0 +1,77 @@
+/*
+ * Landlock LSM - Public headers
+ *
+ * Copyright (C) 2016  Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_LANDLOCK_H
+#define _LINUX_LANDLOCK_H
+#ifdef CONFIG_SECURITY_LANDLOCK
+
+#include  /* _LANDLOCK_HOOK_LAST */
+#include  /* atomic_t */
+
+#ifdef CONFIG_SECCOMP_FILTER
+#include  /* struct seccomp_filter */
+#endif /* CONFIG_SECCOMP_FILTER */
+
+
+#ifdef CONFIG_SECCOMP_FILTER
+struct landlock_seccomp_ret {
+   struct landlock_seccomp_ret *prev;
+   struct seccomp_filter *filter;
+   u16 cookie;
+   bool triggered;
+};
+#endif /* CONFIG_SECCOMP_FILTER */
+
+struct landlock_rule {
+   atomic_t usage;
+   struct landlock_rule *prev;
+   /*
+* List of filters (through filter->thread_prev) allowed to trigger
+* this Landlock program.
+*/
+   struct bpf_prog *prog;
+#ifdef CONFIG_SECCOMP_FILTER
+   struct seccomp_filter *thread_filter;
+#endif /* CONFIG_SECCOMP_FILTER */
+};
+
+/**
+ * struct landlock_hooks - Landlock hook programs enforced on a thread
+ *
+ * This is used for low performance impact when forking a process. Instead of
+ * copying the full array and incrementing the usage field of each entries,
+ * only create a pointer to struct landlock_hooks and increment the usage
+ * field.
+ *
+ * A new struct landlock_hooks must be created thanks to a call to
+ * new_landlock_hooks().
+ *
+ * @usage: reference count to manage the object lifetime. When a thread need to
+ * add Landlock programs and if @usage is greater than 1, then the
+ * thread must duplicate struct landlock_hooks to not change the
+ * children' rules as well.
+ */
+struct landlock_hooks {
+   atomic_t usage;
+   struct landlock_rule *rules[_LANDLOCK_HOOK_LAST];
+};
+
+
+struct landlock_hooks *new_landlock_hooks(void);
+void put_landlock_hooks(struct landlock_hooks *hooks);
+
+#ifdef CONFIG_SECCOMP_FILTER
+void put_landlock_ret(struct landlock_seccomp_ret *landlock_ret);
+int landlock_seccomp_set_hook(unsigned int flags,
+   const char __user *user_bpf_fd);
+#endif /* CONFIG_SECCOMP_FILTER */
+
+#endif /* CONFIG_SECURITY_LANDLOCK */
+#endif /* _LINUX_LANDLOCK_H */
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index ffdab7cdd162..3cb90bf43a24 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,6 +10,10 @@
 #include 
 #include 
 
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+#include 
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
+
 /**
  * struct seccomp_filter - container for seccomp BPF programs
  *
@@ -19,6 +23,7 @@
  * is only needed for handling filters shared across tasks.
  * @prev: points to a previously installed, or inherited, filter
  * @prog: the BPF program to evaluate

[RFC v3 00/22] Landlock LSM: Unprivileged sandboxing

2016-09-14 Thread Mickaël Salaün

ck LSM.

[1] https://lkml.kernel.org/r/1472121165-29071-1-git-send-email-...@digikod.net
[2] https://crypto.stanford.edu/cs155/papers/traps.pdf
[3] 
https://lkml.kernel.org/r/1473696735-11269-1-git-send-email-dan...@zonque.org

Regards,

Mickaël Salaün (22):
  landlock: Add Kconfig
  bpf: Move u64_to_ptr() to BPF headers and inline it
  bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  bpf: Set register type according to is_valid_access()
  bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier
  landlock: Add LSM hooks
  landlock: Handle file comparisons
  seccomp: Fix documentation for struct seccomp_filter
  seccomp: Move struct seccomp_filter in seccomp.h
  seccomp: Split put_seccomp_filter() with put_seccomp()
  seccomp,landlock: Handle Landlock hooks per process hierarchy
  bpf: Cosmetic change for bpf_prog_attach()
  bpf/cgroup: Replace struct bpf_prog with union bpf_object
  bpf/cgroup: Make cgroup_bpf_update() return an error code
  bpf/cgroup: Move capability check
  bpf/cgroup,landlock: Handle Landlock hooks per cgroup
  cgroup: Add access check for cgroup_get_from_fd()
  cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
  landlock: Add interrupted origin
  landlock: Add update and debug access flags
  bpf,landlock: Add optional skb pointer in the Landlock context
  samples/landlock: Add sandbox example

 include/linux/bpf-cgroup.h |  19 +-
 include/linux/bpf.h|  45 +++-
 include/linux/cgroup-defs.h|   9 +
 include/linux/cgroup.h |   2 +-
 include/linux/filter.h |   1 +
 include/linux/landlock.h   |  86 
 include/linux/lsm_hooks.h  |   5 +
 include/linux/seccomp.h|  58 +-
 include/uapi/linux/bpf.h   | 122 +++
 include/uapi/linux/seccomp.h   |   2 +
 kernel/bpf/arraymap.c  | 226 -
 kernel/bpf/cgroup.c|  78 ---
 kernel/bpf/syscall.c   |  78 ---
 kernel/bpf/verifier.c  |  47 -
 kernel/cgroup.c|  66 +-
 kernel/fork.c  |  25 ++-
 kernel/seccomp.c   | 107 +++---
 kernel/trace/bpf_trace.c   |  12 +-
 net/core/filter.c  |  21 +-
 samples/Makefile   |   2 +-
 samples/landlock/.gitignore|   1 +
 samples/landlock/Makefile  |  16 ++
 samples/landlock/sandbox.c | 307 
 security/Kconfig   |   1 +
 security/Makefile  |   2 +
 security/landlock/Kconfig  |  23 +++
 security/landlock/Makefile |   3 +
 security/landlock/checker_fs.c | 179 
 security/landlock/checker_fs.h |  20 ++
 security/landlock/common.h |  27 +++
 security/landlock/lsm.c| 451 +
 security/landlock/manager.c| 281 +
 security/security.c|   1 +
 33 files changed, 2194 insertions(+), 129 deletions(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/lsm.c
 create mode 100644 security/landlock/manager.c

-- 
2.9.3

[RFC v3 07/22] landlock: Handle file comparisons

2016-09-14 Thread Mickaël Salaün

Add eBPF functions to compare file system access with a Landlock file
system handle:
* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
  This function allows to compare the dentry, inode, device or mount
  point of the currently accessed file, with a reference handle.
* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
  This function allows an eBPF program to check if the current accessed
  file is the same or in the hierarchy of a reference handle.

The goal of file system handle is to abstract kernel objects such as a
struct file or a struct inode. Userland can create this kind of handle
thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
landlock_handle containing the handle type (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
also be any descriptions able to match a struct file or a struct inode
(e.g. path or glob string).

Changes since v2:
* add MNT_INTERNAL check to only add file handle from user-visible FS
  (e.g. no anonymous inode)
* replace struct file* with struct path* in map_landlock_handle
* add BPF protos
* fix bpf_landlock_cmp_fs_prop_with_struct_file()

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Link: 
https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com
---
 include/linux/bpf.h|  10 +++
 include/uapi/linux/bpf.h   |  49 +++
 kernel/bpf/arraymap.c  |  21 +
 kernel/bpf/verifier.c  |   8 ++
 security/landlock/Makefile |   2 +-
 security/landlock/checker_fs.c | 179 +
 security/landlock/checker_fs.h |  20 +
 security/landlock/lsm.c|   6 ++
 8 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 36c3e482239c..f7325c17f720 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -87,6 +87,7 @@ enum bpf_arg_type {
ARG_ANYTHING,   /* any (initialized) argument is ok */
 
ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */
+   ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS 
handle */
 };
 
 /* type of values returned from helper functions */
@@ -148,6 +149,7 @@ enum bpf_reg_type {
 
/* Landlock */
PTR_TO_STRUCT_FILE,
+   CONST_PTR_TO_LANDLOCK_HANDLE_FS,
 };
 
 struct bpf_prog;
@@ -214,6 +216,9 @@ struct bpf_array {
 #ifdef CONFIG_SECURITY_LANDLOCK
 struct map_landlock_handle {
u32 type; /* enum bpf_map_handle_type */
+   union {
+   struct path path;
+   };
 };
 #endif /* CONFIG_SECURITY_LANDLOCK */
 
@@ -348,6 +353,11 @@ extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern const struct bpf_func_proto 
bpf_landlock_cmp_fs_prop_with_struct_file_proto;
+extern const struct bpf_func_proto 
bpf_landlock_cmp_fs_beneath_with_struct_file_proto;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ad87003fe892..905dcace7255 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -92,10 +92,20 @@ enum bpf_map_type {
 
 enum bpf_map_array_type {
BPF_MAP_ARRAY_TYPE_UNSPEC,
+   BPF_MAP_ARRAY_TYPE_LANDLOCK_FS,
 };
 
 enum bpf_map_handle_type {
BPF_MAP_HANDLE_TYPE_UNSPEC,
+   BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+   /* BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB, */
+};
+
+enum bpf_map_array_op {
+   BPF_MAP_ARRAY_OP_UNSPEC,
+   BPF_MAP_ARRAY_OP_OR,
+   BPF_MAP_ARRAY_OP_AND,
+   BPF_MAP_ARRAY_OP_XOR,
 };
 
 enum bpf_prog_type {
@@ -434,6 +444,34 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_change_tail,
 
+   /**
+* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
+* Compare file system handles with a struct file
+*
+* @prop: properties to check against (e.g. LANDLOCK_FLAG_FS_DENTRY)
+* @map: handles to compare against
+* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+* @file: struct file address to compare with (taken from the context)
+*
+* Return: 0 if the file match the handles, 1 otherwise, or a negative
+* value if an error occurred.
+*/
+   BPF_FUNC_landlock_cmp_fs_prop_with_struct_file,
+
+   /**
+* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
+* Check if a struct file is a leaf of file system

[RFC v3 12/22] bpf: Cosmetic change for bpf_prog_attach()

2016-09-14 Thread Mickaël Salaün

Move code outside a switch/case to ease code factoring (cf. next
commit).

This apply on Daniel Mack's "Add eBPF hooks for cgroups":
https://lkml.kernel.org/r/1473696735-11269-1-git-send-email-dan...@zonque.org

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
---
 kernel/bpf/syscall.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f22e3b63d253..45a91d59 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -843,23 +843,24 @@ static int bpf_prog_attach(const union bpf_attr *attr)
case BPF_CGROUP_INET_EGRESS:
prog = bpf_prog_get_type(attr->attach_bpf_fd,
 BPF_PROG_TYPE_CGROUP_SOCKET);
-   if (IS_ERR(prog))
-   return PTR_ERR(prog);
-
-   cgrp = cgroup_get_from_fd(attr->target_fd);
-   if (IS_ERR(cgrp)) {
-   bpf_prog_put(prog);
-   return PTR_ERR(cgrp);
-   }
-
-   cgroup_bpf_update(cgrp, prog, attr->attach_type);
-   cgroup_put(cgrp);
break;
 
default:
return -EINVAL;
}
 
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+
+   cgrp = cgroup_get_from_fd(attr->target_fd);
+   if (IS_ERR(cgrp)) {
+   bpf_prog_put(prog);
+   return PTR_ERR(cgrp);
+   }
+
+   cgroup_bpf_update(cgrp, prog, attr->attach_type);
+   cgroup_put(cgrp);
+
return 0;
 }
 
-- 
2.9.3

[RFC v3 15/22] bpf/cgroup: Move capability check

2016-09-14 Thread Mickaël Salaün

This will be useful to be able to add more BPF attach type with
different capability checks.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
---
 kernel/bpf/syscall.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c978f2d9a1b3..8599596fd6cf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -833,15 +833,15 @@ static int bpf_prog_attach(const union bpf_attr *attr)
struct cgroup *cgrp;
int result;
 
-   if (!capable(CAP_NET_ADMIN))
-   return -EPERM;
-
if (CHECK_ATTR(BPF_PROG_ATTACH))
return -EINVAL;
 
switch (attr->attach_type) {
case BPF_CGROUP_INET_INGRESS:
case BPF_CGROUP_INET_EGRESS:
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
prog = bpf_prog_get_type(attr->attach_bpf_fd,
 BPF_PROG_TYPE_CGROUP_SOCKET);
break;
@@ -872,15 +872,15 @@ static int bpf_prog_detach(const union bpf_attr *attr)
struct cgroup *cgrp;
int result = 0;
 
-   if (!capable(CAP_NET_ADMIN))
-   return -EPERM;
-
if (CHECK_ATTR(BPF_PROG_DETACH))
return -EINVAL;
 
switch (attr->attach_type) {
case BPF_CGROUP_INET_INGRESS:
case BPF_CGROUP_INET_EGRESS:
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
cgrp = cgroup_get_from_fd(attr->target_fd);
if (IS_ERR(cgrp))
return PTR_ERR(cgrp);
-- 
2.9.3

[RFC v3 05/22] bpf,landlock: Add eBPF program subtype and is_valid_subtype() verifier

2016-09-14 Thread Mickaël Salaün

The program subtype goal is to be able to have different static
fine-grained verifications for a unique program type.

The struct bpf_verifier_ops gets a new optional function:
is_valid_subtype(). This new verifier is called at the begening of the
eBPF program verification to check if the (optional) program subtype is
valid.

For now, only Landlock eBPF programs are using a program subtype but
this could be used by other program types in the future.

Cf. the next commit to see how the subtype is used by Landlock LSM.

Signed-off-by: Mickaël Salaün 
Link: https://lkml.kernel.org/r/20160827205559.ga43...@ast-mbp.thefacebook.com
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: David S. Miller 
---
 include/linux/bpf.h  |  8 ++--
 include/linux/filter.h   |  1 +
 include/uapi/linux/bpf.h |  9 +
 kernel/bpf/syscall.c |  5 +++--
 kernel/bpf/verifier.c|  9 +++--
 kernel/trace/bpf_trace.c | 12 
 net/core/filter.c| 21 +
 7 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eae4ce4542c1..9aa01d9d3d80 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -149,17 +149,21 @@ struct bpf_prog;
 
 struct bpf_verifier_ops {
/* return eBPF function prototype for verification */
-   const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id 
func_id);
+   const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id,
+   union bpf_prog_subtype *prog_subtype);
 
/* return true if 'size' wide access at offset 'off' within bpf_context
 * with 'type' (read or write) is allowed
 */
bool (*is_valid_access)(int off, int size, enum bpf_access_type type,
-   enum bpf_reg_type *reg_type);
+   enum bpf_reg_type *reg_type,
+   union bpf_prog_subtype *prog_subtype);
 
u32 (*convert_ctx_access)(enum bpf_access_type type, int dst_reg,
  int src_reg, int ctx_off,
  struct bpf_insn *insn, struct bpf_prog *prog);
+
+   bool (*is_valid_subtype)(union bpf_prog_subtype *prog_subtype);
 };
 
 struct bpf_prog_type_list {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c521adfe..88470cdd3ee1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -406,6 +406,7 @@ struct bpf_prog {
kmemcheck_bitfield_end(meta);
u32 len;/* Number of filter blocks */
enum bpf_prog_type  type;   /* Type of BPF program */
+   union bpf_prog_subtype  subtype;/* For fine-grained 
verifications */
struct bpf_prog_aux *aux;   /* Auxiliary fields */
struct sock_fprog_kern  *orig_prog; /* Original BPF program */
unsigned int(*bpf_func)(const struct sk_buff *skb,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b68de57f7ab8..667b6ef3ff1e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -127,6 +127,14 @@ enum bpf_attach_type {
 
 #define BPF_F_NO_PREALLOC  (1U << 0)
 
+union bpf_prog_subtype {
+   struct {
+   __u32   id; /* enum landlock_hook_id */
+   __u16   origin; /* LANDLOCK_FLAG_ORIGIN_* */
+   __aligned_u64   access; /* LANDLOCK_FLAG_ACCESS_* */
+   } landlock_hook;
+} __attribute__((aligned(8)));
+
 union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32   map_type;   /* one of enum bpf_map_type */
@@ -155,6 +163,7 @@ union bpf_attr {
__u32   log_size;   /* size of user buffer */
__aligned_u64   log_buf;/* user supplied buffer */
__u32   kern_version;   /* checked when 
prog_type=kprobe */
+   union bpf_prog_subtype prog_subtype;/* checked when 
prog_type=landlock */
};
 
struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 776c752604b0..8b3f4d2b4802 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -572,7 +572,7 @@ static void fixup_bpf_calls(struct bpf_prog *prog)
continue;
}
 
-   fn = prog->aux->ops->get_func_proto(insn->imm);
+   fn = prog->aux->ops->get_func_proto(insn->imm, 
&prog->subtype);
/* all functions that have prototype and verifier 
allowed
 * programs to call them, must be real in-kernel 
functions
 */
@@ -710,7 +710,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum 
bpf_prog_type type)
 EXPORT_SYMBOL_GPL(bpf_prog_get_type);

[RFC v3 14/22] bpf/cgroup: Make cgroup_bpf_update() return an error code

2016-09-14 Thread Mickaël Salaün

This will be useful to support Landlock for the next commits.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Daniel Mack 
Cc: David S. Miller 
Cc: Tejun Heo 
---
 include/linux/bpf-cgroup.h |  4 ++--
 kernel/bpf/cgroup.c|  3 ++-
 kernel/bpf/syscall.c   | 10 ++
 kernel/cgroup.c|  6 --
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 2234042d7f61..6cca7924ee17 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -31,13 +31,13 @@ struct cgroup_bpf {
 void cgroup_bpf_put(struct cgroup *cgrp);
 void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
 
-void __cgroup_bpf_update(struct cgroup *cgrp,
+int __cgroup_bpf_update(struct cgroup *cgrp,
 struct cgroup *parent,
 struct bpf_prog *prog,
 enum bpf_attach_type type);
 
 /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
-void cgroup_bpf_update(struct cgroup *cgrp,
+int cgroup_bpf_update(struct cgroup *cgrp,
   struct bpf_prog *prog,
   enum bpf_attach_type type);
 
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 782878ec4f2d..7b75fa692617 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -83,7 +83,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup 
*parent)
  *
  * Must be called with cgroup_mutex held.
  */
-void __cgroup_bpf_update(struct cgroup *cgrp,
+int __cgroup_bpf_update(struct cgroup *cgrp,
 struct cgroup *parent,
 struct bpf_prog *prog,
 enum bpf_attach_type type)
@@ -117,6 +117,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
bpf_prog_put(old_pinned.prog);
static_branch_dec(&cgroup_bpf_enabled_key);
}
+   return 0;
 }
 
 /**
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 45a91d59..c978f2d9a1b3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -831,6 +831,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 {
struct bpf_prog *prog;
struct cgroup *cgrp;
+   int result;
 
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -858,10 +859,10 @@ static int bpf_prog_attach(const union bpf_attr *attr)
return PTR_ERR(cgrp);
}
 
-   cgroup_bpf_update(cgrp, prog, attr->attach_type);
+   result = cgroup_bpf_update(cgrp, prog, attr->attach_type);
cgroup_put(cgrp);
 
-   return 0;
+   return result;
 }
 
 #define BPF_PROG_DETACH_LAST_FIELD attach_type
@@ -869,6 +870,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 static int bpf_prog_detach(const union bpf_attr *attr)
 {
struct cgroup *cgrp;
+   int result = 0;
 
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -883,7 +885,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
if (IS_ERR(cgrp))
return PTR_ERR(cgrp);
 
-   cgroup_bpf_update(cgrp, NULL, attr->attach_type);
+   result = cgroup_bpf_update(cgrp, NULL, attr->attach_type);
cgroup_put(cgrp);
break;
 
@@ -891,7 +893,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
return -EINVAL;
}
 
-   return 0;
+   return result;
 }
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 87324ce481b1..48b650a640a9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6450,15 +6450,17 @@ static __init int cgroup_namespaces_init(void)
 subsys_initcall(cgroup_namespaces_init);
 
 #ifdef CONFIG_CGROUP_BPF
-void cgroup_bpf_update(struct cgroup *cgrp,
+int cgroup_bpf_update(struct cgroup *cgrp,
   struct bpf_prog *prog,
   enum bpf_attach_type type)
 {
struct cgroup *parent = cgroup_parent(cgrp);
+   int result;
 
mutex_lock(&cgroup_mutex);
-   __cgroup_bpf_update(cgrp, parent, prog, type);
+   result = __cgroup_bpf_update(cgrp, parent, prog, type);
mutex_unlock(&cgroup_mutex);
+   return result;
 }
 #endif /* CONFIG_CGROUP_BPF */
 
-- 
2.9.3

Re: [RFC v3 17/22] cgroup: Add access check for cgroup_get_from_fd()

2016-09-14 Thread Mickaël Salaün


On 14/09/2016 09:24, Mickaël Salaün wrote:
> Add security access check for cgroup backed FD. The "cgroup.procs" file
> of the corresponding cgroup must be readable to identify the cgroup, and
> writable to prove that the current process can manage this cgroup (e.g.
> through delegation). This is similar to the check done by
> cgroup_procs_write_permission().
> 
> Signed-off-by: Mickaël Salaün 
> Cc: Alexei Starovoitov 
> Cc: Andy Lutomirski 
> Cc: Daniel Borkmann 
> Cc: Daniel Mack 
> Cc: David S. Miller 
> Cc: Kees Cook 
> Cc: Tejun Heo 
> ---
>  include/linux/cgroup.h |  2 +-
>  kernel/bpf/arraymap.c  |  2 +-
>  kernel/bpf/syscall.c   |  6 +++---
>  kernel/cgroup.c| 16 +++-
>  4 files changed, 20 insertions(+), 6 deletions(-)
...
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 48b650a640a9..3bbaf3f02ed2 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -6241,17 +6241,20 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
>  /**
>   * cgroup_get_from_fd - get a cgroup pointer from a fd
>   * @fd: fd obtained by open(cgroup2_dir)
> + * @access_mask: contains the permission mask
>   *
>   * Find the cgroup from a fd which should be obtained
>   * by opening a cgroup directory.  Returns a pointer to the
>   * cgroup on success. ERR_PTR is returned if the cgroup
>   * cannot be found.
>   */
> -struct cgroup *cgroup_get_from_fd(int fd)
> +struct cgroup *cgroup_get_from_fd(int fd, int access_mask)
>  {
>   struct cgroup_subsys_state *css;
>   struct cgroup *cgrp;
>   struct file *f;
> + struct inode *inode;
> + int ret;
>  
>   f = fget_raw(fd);
>   if (!f)
> @@ -6268,6 +6271,17 @@ struct cgroup *cgroup_get_from_fd(int fd)
>   return ERR_PTR(-EBADF);
>   }
>  
> + ret = -ENOMEM;
> + inode = kernfs_get_inode(f->f_path.dentry->d_sb, cgrp->procs_file.kn);

I forgot to properly move fput(f) after this line… This will be fixed.



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks

2016-09-14 Thread Mickaël Salaün

On 14/09/2016 20:27, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün  wrote:
>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>> set for all cgroup except the root. The flag is clear when a new process
>> without the no_new_privs flags is attached to the cgroup.
>>
>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>> process, to attach a process without no_new_privs to this cgroup will
>> be denied.
> 
> Until and unless everyone can agree on a way to properly namespace,
> delegate, etc cgroups, I think that trying to add unprivileged
> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
> no-internal-tasks, etc, I just don't see how this approach can be
> viable.

As far as I can tell, the no_new_privs flag of at task is not related to
namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.

Using cgroup is optional, any task could use the seccomp-based
landlocking instead. However, for those that want/need to manage a
security policy in a more dynamic way, using cgroups may make sense.

I though cgroup delegation was OK in the v2, isn't it the case? Do you
have some links?

> 
> Can we try to make landlock work completely independently of cgroups
> so that it doesn't get stuck and so that programs can use it without
> worrying about cgroup v1 vs v2, interactions with cgroup managers,
> cgroup managers that (supposedly?) will start migrating processes
> around piecemeal and almost certainly blowing up landlock in the
> process, etc?

This RFC handle both cgroup and seccomp approaches in a similar way. I
don't see why building on top of cgroup v2 is a problem. Is there
security issues with delegation?

> 
> I have no problem with looking at prototypes for how landlock +
> cgroups would work, but I can't imagine the result being mergeable.
> 

signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 19/22] landlock: Add interrupted origin

2016-09-14 Thread Mickaël Salaün


On 14/09/2016 20:29, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün  wrote:
>> This third origin of hook call should cover all possible trigger paths
>> (e.g. page fault). Landlock eBPF programs can then take decisions
>> accordingly.
>>
>> Signed-off-by: Mickaël Salaün 
>> Cc: Alexei Starovoitov 
>> Cc: Andy Lutomirski 
>> Cc: Daniel Borkmann 
>> Cc: Kees Cook 
>> ---
> 
> 
>>
>> +   if (unlikely(in_interrupt())) {
> 
> IMO security hooks have no business being called from interrupts.
> Aren't they all synchronous things done by tasks?  Interrupts are
> driver things.
> 
> Are you trying to check for page faults and such?

Yes, that was the idea you did put in my mind. Not sure how to deal with
this.



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 11/22] seccomp,landlock: Handle Landlock hooks per process hierarchy

2016-09-14 Thread Mickaël Salaün

On 14/09/2016 20:43, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün  wrote:
>> A Landlock program will be triggered according to its subtype/origin
>> bitfield. The LANDLOCK_FLAG_ORIGIN_SECCOMP value will trigger the
>> Landlock program when a seccomp filter will return RET_LANDLOCK.
>> Moreover, it is possible to return a 16-bit cookie which will be
>> readable by the Landlock programs in its context.
> 
> Are you envisioning that the filters will return RET_LANDLOCK most of
> the time or rarely?  If it's most of the time, then maybe this could
> be simplified a bit by unconditionally calling the landlock filter and
> letting the landlock filter access a struct seccomp_data if needed.

Exposing seccomp_data in a Landlock context may be a good idea. The main
implication is that Landlock programs may then be architecture specific
(if dealing with data) as seccomp filters are. Another point is that it
remove any direct binding between seccomp filters and Landlock programs.
I will try this (more simple) approach.

> 
>>
>> Only seccomp filters loaded from the same thread and before a Landlock
>> program can trigger it through LANDLOCK_FLAG_ORIGIN_SECCOMP. Multiple
>> Landlock programs can be triggered by one or more seccomp filters. This
>> way, each RET_LANDLOCK (with specific cookie) will trigger all the
>> allowed Landlock programs once.
> 
> This interface seems somewhat awkward.  Should we not have a way to
> atomicaly install a whole pile of landlock filters and associated
> seccomp filter all at once?

I can change the seccomp(2) use in this way: instead of loading a
Landlock program, (atomically) load an array of Landlock programs.

However, exposing seccomp_data to Landlock programs looks like a better
way to deal with it. This does not needs to manage an array of Landlock
programs.

 Mickaël

signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 07/22] landlock: Handle file comparisons

2016-09-14 Thread Mickaël Salaün


On 14/09/2016 21:07, Jann Horn wrote:
> On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
>> Add eBPF functions to compare file system access with a Landlock file
>> system handle:
>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>   This function allows to compare the dentry, inode, device or mount
>>   point of the currently accessed file, with a reference handle.
>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>   This function allows an eBPF program to check if the current accessed
>>   file is the same or in the hierarchy of a reference handle.
> [...]
>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>> index 94256597eacd..edaab4c87292 100644
>> --- a/kernel/bpf/arraymap.c
>> +++ b/kernel/bpf/arraymap.c
>> @@ -603,6 +605,9 @@ static void landlock_put_handle(struct 
>> map_landlock_handle *handle)
>>  enum bpf_map_handle_type handle_type = handle->type;
>>  
>>  switch (handle_type) {
>> +case BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD:
>> +path_put(&handle->path);
>> +break;
>>  case BPF_MAP_HANDLE_TYPE_UNSPEC:
>>  default:
>>  WARN_ON(1);
> [...]
>> diff --git a/security/landlock/checker_fs.c b/security/landlock/checker_fs.c
>> new file mode 100644
>> index ..39eb85dc7d18
>> --- /dev/null
>> +++ b/security/landlock/checker_fs.c
> [...]
>> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
>> +u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
>> +{
>> +u8 property = (u8) r1_property;
>> +struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
>> +enum bpf_map_array_op map_op = r3_map_op;
>> +struct file *file = (struct file *) (unsigned long) r4_file;
>> +struct bpf_array *array = container_of(map, struct bpf_array, map);
>> +struct path *p1, *p2;
>> +struct map_landlock_handle *handle;
>> +int i;
> 
> Please don't use int when iterating over an array, use size_t.

OK, I will use size_t.

> 
> 
>> +/* for now, only handle OP_OR */
> 
> Is "OP_OR" an appropriate name for something that ANDs the success of
> checks?
> 
> 
> [...]
>> +synchronize_rcu();
> 
> Can you put a comment here that explains what's going on?

Hum, this should not be here.

> 
> 
>> +for (i = 0; i < array->n_entries; i++) {
>> +bool result_dentry = !(property & LANDLOCK_FLAG_FS_DENTRY);
>> +bool result_inode = !(property & LANDLOCK_FLAG_FS_INODE);
>> +bool result_device = !(property & LANDLOCK_FLAG_FS_DEVICE);
>> +bool result_mount = !(property & LANDLOCK_FLAG_FS_MOUNT);
>> +
>> +handle = (struct map_landlock_handle *)
>> +(array->value + array->elem_size * i);
>> +
>> +if (handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) {
>> +WARN_ON(1);
>> +return -EFAULT;
>> +}
>> +p1 = &handle->path;
>> +
>> +if (!result_dentry && p1->dentry == p2->dentry)
>> +result_dentry = true;
> 
> Why is this safe? As far as I can tell, this is not in an RCU read-side
> critical section (synchronize_rcu() was just called), and no lock has been
> taken. What prevents someone from removing the arraymap entry while we're
> looking at it? Am I missing something?

I will try to properly deal with RCU.



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 21/22] bpf,landlock: Add optional skb pointer in the Landlock context

2016-09-14 Thread Mickaël Salaün


On 14/09/2016 23:20, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:24:14AM +0200, Mickaël Salaün wrote:
>> This is a proof of concept to expose optional values that could depend
>> of the process access rights.
>>
>> There is two dedicated flags: LANDLOCK_FLAG_ACCESS_SKB_READ and
>> LANDLOCK_FLAG_ACCESS_SKB_WRITE. Each of them can be activated to access
>> eBPF functions manipulating a skb in a read or write way.
>>
>> Signed-off-by: Mickaël Salaün 
> ...
>>  /* Handle check flags */
>>  #define LANDLOCK_FLAG_FS_DENTRY (1 << 0)
>> @@ -619,12 +621,15 @@ struct landlock_handle {
>>   * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
>>   *description and the LANDLOCK_HOOK* definitions from
>>   *security/landlock/lsm.c for their types.
>> + * @opt_skb: optional skb pointer, accessible with the
>> + *   LANDLOCK_FLAG_ACCESS_SKB_* flags for network-related hooks.
>>   */
>>  struct landlock_data {
>>  __u32 hook; /* enum landlock_hook_id */
>>  __u16 origin; /* LANDLOCK_FLAG_ORIGIN_* */
>>  __u16 cookie; /* seccomp RET_LANDLOCK */
>>  __u64 args[6];
>> +__u64 opt_skb;
>>  };
> 
> missing something here.
> This patch doesn't make use of it.
> That's something for the future?
> How that field will be populated?
> Why make it different vs the rest or args[6] ?
> 
> 

I don't use this code, it's only purpose is to show how to deal with
fine-grained privileges of Landlock programs (to allow Sargun to add his
custom helpers from Checmate). However, this optional field may be part
of args[6].



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 07/22] landlock: Handle file comparisons

2016-09-14 Thread Mickaël Salaün



On 14/09/2016 23:06, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:24:00AM +0200, Mickaël Salaün wrote:
>> Add eBPF functions to compare file system access with a Landlock file
>> system handle:
>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>   This function allows to compare the dentry, inode, device or mount
>>   point of the currently accessed file, with a reference handle.
>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>   This function allows an eBPF program to check if the current accessed
>>   file is the same or in the hierarchy of a reference handle.
>>
>> The goal of file system handle is to abstract kernel objects such as a
>> struct file or a struct inode. Userland can create this kind of handle
>> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
>> landlock_handle containing the handle type (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
>> also be any descriptions able to match a struct file or a struct inode
>> (e.g. path or glob string).
>>
>> Changes since v2:
>> * add MNT_INTERNAL check to only add file handle from user-visible FS
>>   (e.g. no anonymous inode)
>> * replace struct file* with struct path* in map_landlock_handle
>> * add BPF protos
>> * fix bpf_landlock_cmp_fs_prop_with_struct_file()
>>
>> Signed-off-by: Mickaël Salaün 
>> Cc: Alexei Starovoitov 
>> Cc: Andy Lutomirski 
>> Cc: Daniel Borkmann 
>> Cc: David S. Miller 
>> Cc: James Morris 
>> Cc: Kees Cook 
>> Cc: Serge E. Hallyn 
>> Link: 
>> https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com
> 
> thanks for keeping the links to the previous discussion.
> Long term it should help, though I worry we already at the point
> where there are too many outstanding issues to resolve before we
> can proceed with reasonable code review.
> 
>> +/*
>> + * bpf_landlock_cmp_fs_prop_with_struct_file
>> + *
>> + * Cf. include/uapi/linux/bpf.h
>> + */
>> +static inline u64 bpf_landlock_cmp_fs_prop_with_struct_file(u64 r1_property,
>> +u64 r2_map, u64 r3_map_op, u64 r4_file, u64 r5)
>> +{
>> +u8 property = (u8) r1_property;
>> +struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
>> +enum bpf_map_array_op map_op = r3_map_op;
>> +struct file *file = (struct file *) (unsigned long) r4_file;
> 
> please use just added BPF_CALL_ macros. They will help readability of the 
> above.
> 
>> +struct bpf_array *array = container_of(map, struct bpf_array, map);
>> +struct path *p1, *p2;
>> +struct map_landlock_handle *handle;
>> +int i;
>> +
>> +/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS is an arraymap */
>> +if (unlikely(!map)) {
>> +WARN_ON(1);
>> +return -EFAULT;
>> +}
>> +if (unlikely(!file))
>> +return -ENOENT;
>> +if (unlikely((property | _LANDLOCK_FLAG_FS_MASK) != 
>> _LANDLOCK_FLAG_FS_MASK))
>> +return -EINVAL;
>> +
>> +/* for now, only handle OP_OR */
>> +switch (map_op) {
>> +case BPF_MAP_ARRAY_OP_OR:
>> +break;
>> +case BPF_MAP_ARRAY_OP_UNSPEC:
>> +case BPF_MAP_ARRAY_OP_AND:
>> +case BPF_MAP_ARRAY_OP_XOR:
>> +default:
>> +return -EINVAL;
>> +}
>> +p2 = &file->f_path;
>> +
>> +synchronize_rcu();
> 
> that is completely broken.
> bpf programs are executing under rcu_lock.
> please enable CONFIG_PROVE_RCU and retest everything.

Thanks for the tip. I will fix this.

> 
> I would suggest for the next RFC to do minimal 7 patches up to this point
> with simple example that demonstrates the use case.
> I would avoid all unpriv stuff and all of seccomp for the next RFC as well,
> otherwise I don't think we can realistically make forward progress, since
> there are too many issues raised in the subsequent patches.

I hope we will find a common agreement about seccomp vs cgroup… I think
both approaches have their advantages, can be complementary and nicely
combined.

Unprivileged sandboxing is the main goal of Landlock. This should not be
a problem, even for privileged features, thanks to the new subtype/access.

> 
> The common part that is mergeable is prog's subtype extension to
> the verifier that can be used for better tracing and is the common
> piece of infra needed for both landlock and checmate LSMs
> (which must be one LSM anyway)

Agreed. With this RFC, the Checmate features (i.e. network helpers)
should be able to sit on top of Landlock.



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 03/22] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles

2016-09-14 Thread Mickaël Salaün


On 14/09/2016 20:51, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:23:56AM +0200, Mickaël Salaün wrote:
>> This new arraymap looks like a set and brings new properties:
>> * strong typing of entries: the eBPF functions get the array type of
>>   elements instead of CONST_PTR_TO_MAP (e.g.
>>   CONST_PTR_TO_LANDLOCK_HANDLE_FS);
>> * force sequential filling (i.e. replace or append-only update), which
>>   allow quick browsing of all entries.
>>
>> This strong typing is useful to statically check if the content of a map
>> can be passed to an eBPF function. For example, Landlock use it to store
>> and manage kernel objects (e.g. struct file) instead of dealing with
>> userland raw data. This improve efficiency and ensure that an eBPF
>> program can only call functions with the right high-level arguments.
>>
>> The enum bpf_map_handle_type list low-level types (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
>> updating a map entry (handle). This handle types are used to infer a
>> high-level arraymap type which are listed in enum bpf_map_array_type
>> (e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).
>>
>> For now, this new arraymap is only used by Landlock LSM (cf. next
>> commits) but it could be useful for other needs.
>>
>> Changes since v2:
>> * add a RLIMIT_NOFILE-based limit to the maximum number of arraymap
>>   handle entries (suggested by Andy Lutomirski)
>> * remove useless checks
>>
>> Changes since v1:
>> * arraymap of handles replace custom checker groups
>> * simpler userland API
>>
>> Signed-off-by: Mickaël Salaün 
>> Cc: Alexei Starovoitov 
>> Cc: Andy Lutomirski 
>> Cc: Daniel Borkmann 
>> Cc: David S. Miller 
>> Cc: Kees Cook 
>> Link: 
>> https://lkml.kernel.org/r/calcetrwwtiz3kztkegow24-dvhqq6lftwexh77fd2g5o71y...@mail.gmail.com
>> ---
>>  include/linux/bpf.h  |  14 
>>  include/uapi/linux/bpf.h |  18 +
>>  kernel/bpf/arraymap.c| 203 
>> +++
>>  kernel/bpf/verifier.c|  12 ++-
>>  4 files changed, 246 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index fa9a988400d9..eae4ce4542c1 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -13,6 +13,10 @@
>>  #include 
>>  #include 
>>  
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +#include  /* struct file */
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>> +
>>  struct perf_event;
>>  struct bpf_map;
>>  
>> @@ -38,6 +42,7 @@ struct bpf_map_ops {
>>  struct bpf_map {
>>  atomic_t refcnt;
>>  enum bpf_map_type map_type;
>> +enum bpf_map_array_type map_array_type;
>>  u32 key_size;
>>  u32 value_size;
>>  u32 max_entries;
>> @@ -187,6 +192,9 @@ struct bpf_array {
>>   */
>>  enum bpf_prog_type owner_prog_type;
>>  bool owner_jited;
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +u32 n_entries;  /* number of entries in a handle array */
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>>  union {
>>  char value[0] __aligned(8);
>>  void *ptrs[0] __aligned(8);
>> @@ -194,6 +202,12 @@ struct bpf_array {
>>  };
>>  };
>>  
>> +#ifdef CONFIG_SECURITY_LANDLOCK
>> +struct map_landlock_handle {
>> +u32 type; /* enum bpf_map_handle_type */
>> +};
>> +#endif /* CONFIG_SECURITY_LANDLOCK */
>> +
>>  #define MAX_TAIL_CALL_CNT 32
>>  
>>  struct bpf_event_entry {
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 7cd36166f9b7..b68de57f7ab8 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -87,6 +87,15 @@ enum bpf_map_type {
>>  BPF_MAP_TYPE_PERCPU_ARRAY,
>>  BPF_MAP_TYPE_STACK_TRACE,P_TYPE_CGROUP_ARRAY
>>  BPF_MAP_TYPE_CGROUP_ARRAY,
>> +BPF_MAP_TYPE_LANDLOCK_ARRAY,
>> +};
>> +
>> +enum bpf_map_array_type {
>> +BPF_MAP_ARRAY_TYPE_UNSPEC,
>> +};
>> +
>> +enum bpf_map_handle_type {
>> +BPF_MAP_HANDLE_TYPE_UNSPEC,
>>  };
> 
> missing something. why it has to be special to have it's own
> fd array implementation?
> Please take a look how BPF_MAP_TYPE_PERF_EVENT_ARRAY, 
> BPF_MAP_TYPE_CGROUP_ARRAY and BPF_MAP_TYPE_PROG_ARRAY are done.
> The all store objects into array map that user space passes via FD.
> I think the same model should apply here.

The idea is to have multiple way for userland to describe a resource
(e.g. an open file descriptor, a path or a glob pattern). The kernel
representation could then be a "struct path *" or dedicated types (e.g.
custom glob).

Another interesting point (that could replace
check_map_func_compatibility()) is that BPF_MAP_TYPE_LANDLOCK_ARRAY
translate to dedicated (abstract) types (instead of CONST_PTR_TO_MAP)
thanks to bpf_reg_type_from_map(). This is useful to abstract userland
(map) interface with kernel object(s) dealing with that type.

A third point is that BPF_MAP_TYPE_LANDLOCK_ARRAY is a kind of set. It
is optimized to quickly walk through all the elements in a sequential way.



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks

2016-09-15 Thread Mickaël Salaün


On 15/09/2016 03:25, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün  wrote:
>>
>> On 14/09/2016 20:27, Andy Lutomirski wrote:
>>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün  wrote:
>>>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>>>> set for all cgroup except the root. The flag is clear when a new process
>>>> without the no_new_privs flags is attached to the cgroup.
>>>>
>>>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>>>> process, to attach a process without no_new_privs to this cgroup will
>>>> be denied.
>>>
>>> Until and unless everyone can agree on a way to properly namespace,
>>> delegate, etc cgroups, I think that trying to add unprivileged
>>> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
>>> no-internal-tasks, etc, I just don't see how this approach can be
>>> viable.
>>
>> As far as I can tell, the no_new_privs flag of at task is not related to
>> namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
>> the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
>>
>> Using cgroup is optional, any task could use the seccomp-based
>> landlocking instead. However, for those that want/need to manage a
>> security policy in a more dynamic way, using cgroups may make sense.
>>
>> I though cgroup delegation was OK in the v2, isn't it the case? Do you
>> have some links?
>>
>>>
>>> Can we try to make landlock work completely independently of cgroups
>>> so that it doesn't get stuck and so that programs can use it without
>>> worrying about cgroup v1 vs v2, interactions with cgroup managers,
>>> cgroup managers that (supposedly?) will start migrating processes
>>> around piecemeal and almost certainly blowing up landlock in the
>>> process, etc?
>>
>> This RFC handle both cgroup and seccomp approaches in a similar way. I
>> don't see why building on top of cgroup v2 is a problem. Is there
>> security issues with delegation?
> 
> What I mean is: cgroup v2 delegation has a functionality problem.
> Tejun says [1]:
> 
> We haven't had to face this decision because cgroup has never properly
> supported delegating to applications and the in-use setups where this
> happens are custom configurations where there is no boundary between
> system and applications and adhoc trial-and-error is good enough a way
> to find a working solution.  That wiggle room goes away once we
> officially open this up to individual applications.
> 
> Unless and until that changes, I think that landlock should stay away
> from cgroups.  Others could reasonably disagree with me.
> 
> [1] https://lkml.kernel.org/r/20160909225747.ga30...@mtj.duckdns.org
> 

I don't get the same echo here:
https://lkml.kernel.org/r/20160826155026.gd16...@mtj.duckdns.org

On 26/08/2016 17:50, Tejun Heo wrote:
> Please refer to "2-5. Delegation" of Documentation/cgroup-v2.txt.
> Delegation on v1 is broken on both core and specific controller
> behaviors and thus discouraged.  On v2, delegation should work just
> fine.

Tejun, could you please clarify if there is still a problem with cgroup
v2 delegation?

This patch only implement a cache mechanism with the CGRP_NO_NEW_PRIVS
flag. If cgroups can group processes correctly, I don't see any
(security) issue here. It's the administrator choice to delegate a part
of the cgroup management. It's then the delegatee responsibility to
correctly put processes in cgroups. This is comparable to a process
which is responsible to correctly call seccomp(2).

 Mickaël



signature.asc
Description: OpenPGP digital signature

Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks

2016-09-15 Thread Mickaël Salaün


On 15/09/2016 06:48, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
>>  wrote:
>>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
 On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
  wrote:
> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>
> This RFC handle both cgroup and seccomp approaches in a similar way. I
> don't see why building on top of cgroup v2 is a problem. Is there
> security issues with delegation?

 What I mean is: cgroup v2 delegation has a functionality problem.
 Tejun says [1]:

 We haven't had to face this decision because cgroup has never properly
 supported delegating to applications and the in-use setups where this
 happens are custom configurations where there is no boundary between
 system and applications and adhoc trial-and-error is good enough a way
 to find a working solution.  That wiggle room goes away once we
 officially open this up to individual applications.

 Unless and until that changes, I think that landlock should stay away
 from cgroups.  Others could reasonably disagree with me.
>>>
>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>>> and not for sandboxing. So the above doesn't matter in such contexts.
>>> lsm hooks + cgroups provide convenient scope and existing entry points.
>>> Please see checmate examples how it's used.
>>>
>>
>> To be clear: I'm not arguing at all that there shouldn't be
>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>> landlock interface shouldn't expose any cgroup integration, at least
>> until the cgroup situation settles down a lot.
>
> ahh. yes. we're perfectly in agreement here.
> I'm suggesting that the next RFC shouldn't include unpriv
> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> argue about unpriv with cgroups and even unpriv as a whole,
> since it's not a given. Seccomp integration is also questionable.
> I'd rather not have seccomp as a gate keeper for this lsm.
> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> don't have one to one relationship, so mixing them up is only
> asking for trouble further down the road.
> If we really need to carry some information from seccomp to lsm+bpf,
> it's easier to add eBPF support to seccomp and let bpf side deal
> with passing whatever information.
>

 As an argument for keeping seccomp (or an extended seccomp) as the
 interface for an unprivileged bpf+lsm: seccomp already checks off most
 of the boxes for safely letting unprivileged programs sandbox
 themselves.
>>>
>>> you mean the attach part of seccomp syscall that deals with no_new_priv?
>>> sure, that's reusable.
>>>
 Furthermore, to the extent that there are use cases for
 unprivileged bpf+lsm that *aren't* expressible within the seccomp
 hierarchy, I suspect that syscall filters have exactly the same
 problem and that we should fix seccomp to cover it.
>>>
>>> not sure what you mean by 'seccomp hierarchy'. The normal process
>>> hierarchy ?
>>
>> Kind of.  I mean the filter layers that are inherited across fork(),
>> the TSYNC mechanism, etc.
>>
>>> imo the main deficiency of secccomp is inability to look into arguments.
>>> One can argue that it's a blessing, since composite args
>>> are not yet copied into the kernel memory.
>>> But in a lot of cases the seccomp arguments are FDs pointing
>>> to kernel objects and if programs could examine those objects
>>> the sandboxing scope would be more precise.
>>> lsm+bpf solves that part and I'd still argue that it's
>>> orthogonal to seccomp's pass/reject flow.
>>> I mean if seccomp says 'ok' the syscall should continue executing
>>> as normal and whatever LSM hooks were triggered by it may have
>>> their own lsm+bpf verdicts.
>>
>> I agree with all of this...
>>
>>> Furthermore in the process hierarchy different children
>>> should be able to set their own lsm+bpf filters that are not
>>> related to parallel seccomp+bpf hierarchy of programs.
>>> seccomp syscall can be an interface to attach programs
>>> to lsm hooks, but nothing more than that.
>>
>> I'm not sure what you mean.  I mean that, logically, I think we should
>> be able to do:
>>
>> seccomp(attach a syscall filter);
>> fork();
>> child does seccomp(attach some lsm filters);
>>
>> I think that they *should* be related to the seccomp+bpf hierarchy of
>> programs in that they are entries in the same logical list of filter
>> layers installed.  Some of those layers can be syscall filters and
>> some of the layers can be lsm filters.  If we subsequently add a way
>> to attach a removable seccomp filter or a way to attach a seccomp
>> filter

[RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-08-25 Thread Mickaël Salaün

ser can already use seccomp-filter to whitelist a set of syscalls to
reduce the kernel attack surface for a set of processes. However an
unprivileged user can't create a security policy as the root user can thanks to
SELinux and other access control LSMs. Landlock allows any unprivileged user to
protect their data from being accessed by any process they run but only an
identified subset. User tools can be created to help create such a high-level
access control policy. This policy may not be powerful enough to express the
same policies as the current access control LSMs, because of the threat an
unprivileged user can be to the system, but it should be enough for most
use-cases (e.g. blacklist or whitelist a set of file hierarchies).


## Does Landlock can limit network access or other resources?

Limiting network access is obviously in the scope of Landlock but it is not yet
implemented. The main goal now is to get feedback about the whole concept, the
API and the file access control part. More access control types could be
implemented in the future.


## Why using the seccomp(2) syscall?

Landlock use the same semantic as seccomp to apply access rule restrictions. It
add a new layer of security for the current process which is inherited by its
childs. It make sense to use an unique access-restricting syscall (that should
be allowed by seccomp-filter rules) which can only drop privileges. Moreover, a
Landlock eBPF program could come from outside a process (e.g. passed through a
UNIX socket). It is then useful to differentiate the creation/load of Landlock
eBPF programs via bpf(2), from rule enforcing via seccomp(2).


# Differences from the RFC v1

* focus on the LSM hooks, not the syscalls:
  * much more simple implementation
  * does not need audit cache tricks to avoid race conditions
  * more simple to use and more generic because using the LSM hook abstraction
directly
  * more efficient because only checking in LSM hooks
  * architecture agnostic
* switch from cBPF to eBPF:
  * new eBPF program types dedicated to Landlock
  * custom functions used by the eBPF program
  * gain some new features (e.g. 10 registers, can load values of different
size, LLVM translator) but only a few functions allowed and a dedicated 
map
type
  * new context: LSM hook ID, cookie and LSM hook arguments
  * need to set the sysctl kernel.unprivileged_bpf_disable to 0 (default value)
to be able to load hook filters as unprivileged users
* smaller and simpler:
  * no more checker groups but dedicated arraymap of handles
  * simpler userland structs thanks to eBPF functions
* distinctive name: Landlock


[1] https://lkml.kernel.org/r/1458784008-16277-1-git-send-email-...@digikod.net
[2] https://crypto.stanford.edu/cs155/papers/traps.pdf


This series can be applied on Linux 4.7 and be tested with
CONFIG_SECURITY_LANDLOCK and CONFIG_CGROUPS. I would really appreciate
constructive comments on the usability, architecture, code and userland API of
Landlock LSM.

Regards,

Mickaël Salaün (10):
  landlock: Add Kconfig
  bpf: Move u64_to_ptr() to BPF headers and inline it
  bpf,landlock: Add a new arraymap type to deal with (Landlock) handles
  seccomp: Split put_seccomp_filter() with put_seccomp()
  seccomp: Handle Landlock
  landlock: Add LSM hooks
  landlock: Add errno check
  landlock: Handle file system comparisons
  landlock: Handle cgroups
  samples/landlock: Add sandbox example

 include/linux/bpf.h   |  41 +
 include/linux/lsm_hooks.h |   5 +
 include/linux/seccomp.h   |  54 ++-
 include/uapi/asm-generic/errno-base.h |   1 +
 include/uapi/linux/bpf.h  | 103 
 include/uapi/linux/seccomp.h  |   2 +
 kernel/bpf/arraymap.c | 222 +
 kernel/bpf/syscall.c  |  18 ++-
 kernel/bpf/verifier.c |  32 +++-
 kernel/fork.c |  41 -
 kernel/seccomp.c  | 211 +++-
 samples/Makefile  |   2 +-
 samples/landlock/.gitignore   |   1 +
 samples/landlock/Makefile |  16 ++
 samples/landlock/sandbox.c| 295 ++
 security/Kconfig  |   1 +
 security/Makefile |   2 +
 security/landlock/Kconfig |  19 +++
 security/landlock/Makefile|   3 +
 security/landlock/checker_cgroup.c|  96 +++
 security/landlock/checker_cgroup.h|  18 +++
 security/landlock/checker_fs.c| 183 +
 security/landlock/checker_fs.h|  20 +++
 security/landlock/lsm.c   | 228 ++
 security/security.c   |   1 +
 25 files changed, 1592 insertions(+), 23 deletions(-)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c
 create

[RFC v2 09/10] landlock: Handle cgroups

2016-08-25 Thread Mickaël Salaün

Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
to compare the current process cgroup with a cgroup handle, The handle
can match the current cgroup if it is the same or a child. This allows
to make conditional rules according to the current cgroup.

A cgroup handle is a map entry created from a file descriptor referring
a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.

An unprivileged process can create and manipulate cgroups thanks to
cgroup delegation.

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Alexei Starovoitov 
Cc: James Morris 
Cc: Serge E. Hallyn 
Cc: David S. Miller 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h|  8 
 include/uapi/linux/bpf.h   | 15 ++
 kernel/bpf/arraymap.c  | 30 
 kernel/bpf/verifier.c  |  6 +++
 security/landlock/Kconfig  |  3 ++
 security/landlock/Makefile |  2 +-
 security/landlock/checker_cgroup.c | 96 ++
 security/landlock/checker_cgroup.h | 18 +++
 security/landlock/lsm.c|  8 
 9 files changed, 185 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/checker_cgroup.c
 create mode 100644 security/landlock/checker_cgroup.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 79014aedbea4..9e6786e7a40a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -14,6 +14,9 @@
 
 #ifdef CONFIG_SECURITY_LANDLOCK
 #include  /* struct file */
+#ifdef CONFIG_CGROUPS
+#include  /* struct cgroup_subsys_state */
+#endif /* CONFIG_CGROUPS */
 #endif /* CONFIG_SECURITY_LANDLOCK */
 
 struct bpf_map;
@@ -85,6 +88,7 @@ enum bpf_arg_type {
ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */
ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */
ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS 
handle */
+   ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP,/* pointer to Landlock 
cgroup handle */
 };
 
 /* type of values returned from helper functions */
@@ -148,6 +152,7 @@ enum bpf_reg_type {
PTR_TO_STRUCT_FILE,
PTR_TO_STRUCT_CRED,
CONST_PTR_TO_LANDLOCK_HANDLE_FS,
+   CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP,
 };
 
 struct bpf_prog;
@@ -212,6 +217,9 @@ struct map_landlock_handle {
u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */
union {
struct file *file;
+#ifdef CONFIG_CGROUPS
+   struct cgroup_subsys_state *css;
+#endif /* CONFIG_CGROUPS */
};
 };
 #endif /* CONFIG_SECURITY_LANDLOCK */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 88af79dd668c..7f60b9fdb35c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -90,12 +90,14 @@ enum bpf_map_type {
 enum bpf_map_array_type {
BPF_MAP_ARRAY_TYPE_UNSPEC,
BPF_MAP_ARRAY_TYPE_LANDLOCK_FS,
+   BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP,
 };
 
 enum bpf_map_handle_type {
BPF_MAP_HANDLE_TYPE_UNSPEC,
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB,
+   BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD,
 };
 
 enum bpf_map_array_op {
@@ -364,6 +366,19 @@ enum bpf_func_id {
 */
BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file,
 
+   /**
+* bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
+* Check if the current process is a leaf of cgroup handles
+*
+* @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE)
+* @map: handles to compare against
+* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+*
+* Return: 0 if the current cgroup is the sam or beneath the handle,
+* 1 otherwise, or a negative value if an error occurred.
+*/
+   BPF_FUNC_landlock_cmp_cgroup_beneath,
+
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 6804dafd8355..050b3d8d88c8 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -19,6 +19,12 @@
 #include  /* fput() */
 #include  /* struct file */
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#ifdef CONFIG_CGROUPS
+#include  /* struct cgroup_subsys_state */
+#endif /* CONFIG_CGROUPS */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
int i;
@@ -514,6 +520,12 @@ static void landlock_put_handle(struct map_landlock_handle 
*handle)
else
WARN_ON(1);
break;
+   case BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD:
+   if (likely(handle->css))
+   css_put(handle->css);
+   else
+   WARN_ON(1);
+   break;
default:
WARN_ON(1);
}
@@ -541,6 +553,10 @@ stati

[RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp()

2016-08-25 Thread Mickaël Salaün

The semantic is unchanged. This will be useful for the Landlock
integration with seccomp (next commit).

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
---
 include/linux/seccomp.h |  5 +++--
 kernel/fork.c   |  2 +-
 kernel/seccomp.c| 18 +-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 2296e6b2f690..29b20fe8fd4d 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -83,13 +83,14 @@ static inline int seccomp_mode(struct seccomp *s)
 #endif /* CONFIG_SECCOMP */
 
 #ifdef CONFIG_SECCOMP_FILTER
-extern void put_seccomp_filter(struct task_struct *tsk);
+extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
 #else  /* CONFIG_SECCOMP_FILTER */
-static inline void put_seccomp_filter(struct task_struct *tsk)
+static inline void put_seccomp(struct task_struct *tsk)
 {
return;
 }
+
 static inline void get_seccomp_filter(struct task_struct *tsk)
 {
return;
diff --git a/kernel/fork.c b/kernel/fork.c
index 4a7ec0c6c88c..b23a71ec8003 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -235,7 +235,7 @@ void free_task(struct task_struct *tsk)
free_thread_stack(tsk->stack);
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
-   put_seccomp_filter(tsk);
+   put_seccomp(tsk);
arch_release_task_struct(tsk);
free_task_struct(tsk);
 }
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7002796f14a4..f1f475691c27 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -60,6 +60,8 @@ struct seccomp_filter {
struct bpf_prog *prog;
 };
 
+static void put_seccomp_filter(struct seccomp_filter *filter);
+
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
@@ -313,7 +315,7 @@ static inline void seccomp_sync_threads(void)
 * current's path will hold a reference.  (This also
 * allows a put before the assignment.)
 */
-   put_seccomp_filter(thread);
+   put_seccomp_filter(thread->seccomp.filter);
smp_store_release(&thread->seccomp.filter,
  caller->seccomp.filter);
 
@@ -475,10 +477,11 @@ static inline void seccomp_filter_free(struct 
seccomp_filter *filter)
}
 }
 
-/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
-void put_seccomp_filter(struct task_struct *tsk)
+/* put_seccomp_filter - decrements the ref count of a filter */
+static void put_seccomp_filter(struct seccomp_filter *filter)
 {
-   struct seccomp_filter *orig = tsk->seccomp.filter;
+   struct seccomp_filter *orig = filter;
+
/* Clean up single-reference branches iteratively. */
while (orig && atomic_dec_and_test(&orig->usage)) {
struct seccomp_filter *freeme = orig;
@@ -487,6 +490,11 @@ void put_seccomp_filter(struct task_struct *tsk)
}
 }
 
+void put_seccomp(struct task_struct *tsk)
+{
+   put_seccomp_filter(tsk->seccomp.filter);
+}
+
 /**
  * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
  * @syscall: syscall number to send to userland
@@ -926,7 +934,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned 
long filter_off,
if (copy_to_user(data, fprog->filter, bpf_classic_proglen(fprog)))
ret = -EFAULT;
 
-   put_seccomp_filter(task);
+   put_seccomp_filter(task->seccomp.filter);
return ret;
 
 out:
-- 
2.8.1

[RFC v2 01/10] landlock: Add Kconfig

2016-08-25 Thread Mickaël Salaün

Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp
parts to ease the review.

Signed-off-by: Mickaël Salaün 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---
 security/Kconfig  |  1 +
 security/landlock/Kconfig | 16 
 2 files changed, 17 insertions(+)
 create mode 100644 security/landlock/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index 176758cdfa57..be6c549dd0ca 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -124,6 +124,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/landlock/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index ..dc8328d216d7
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,16 @@
+config SECURITY_LANDLOCK
+   bool "Landlock sandbox support"
+   depends on SECURITY
+   select BPF_SYSCALL
+   select SECCOMP
+   default y
+   help
+ Landlock is a stacked LSM which allows any user to load a security 
policy
+ to restrict their processes (i.e. create a sandbox). The policy is a 
list
+ of stacked eBPF programs for some LSM hooks. Each program can do some
+ access comparison to check if an access request is legitimate.
+
+ Further information about eBPF can be found in
+ Documentation/networking/filter.txt
+
+ If you are unsure how to answer this question, answer Y.
-- 
2.8.1

[RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles

2016-08-25 Thread Mickaël Salaün

This new arraymap looks like a set and brings new properties:
* strong typing of entries: the eBPF functions get the array type of
  elements instead of CONST_PTR_TO_MAP (e.g.
  CONST_PTR_TO_LANDLOCK_HANDLE_FS);
* force sequential filling (i.e. replace or append-only update), which
  allow quick browsing of all entries.

This strong typing is useful to statically check if the content of a map
can be passed to an eBPF function. For example, Landlock use it to store
and manage kernel objects (e.g. struct file) instead of dealing with
userland raw data. This improve efficiency and ensure that an eBPF
program can only call functions with the right high-level arguments.

The enum bpf_map_handle_type list low-level types (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) which are identified when
updating a map entry (handle). This handle types are used to infer a
high-level arraymap type which are listed in enum bpf_map_array_type
(e.g. BPF_MAP_ARRAY_TYPE_LANDLOCK_FS).

For now, this new arraymap is only used by Landlock LSM (cf. next
commits) but it could be useful for other needs.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
Cc: Daniel Borkmann 
Cc: James Morris 
Cc: Kees Cook 
---
 include/linux/bpf.h  |  18 +
 include/uapi/linux/bpf.h |  18 +
 kernel/bpf/arraymap.c| 181 +++
 kernel/bpf/syscall.c |   9 ++-
 kernel/bpf/verifier.c|  12 +++-
 5 files changed, 235 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ca3742729ae7..9a5b388be099 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -12,6 +12,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include  /* struct file */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 struct bpf_map;
 
 /* map is generic key/value storage optionally accesible by eBPF programs */
@@ -34,6 +38,7 @@ struct bpf_map_ops {
 struct bpf_map {
atomic_t refcnt;
enum bpf_map_type map_type;
+   enum bpf_map_array_type map_array_type;
u32 key_size;
u32 value_size;
u32 max_entries;
@@ -183,12 +188,25 @@ struct bpf_array {
 */
enum bpf_prog_type owner_prog_type;
bool owner_jited;
+#ifdef CONFIG_SECURITY_LANDLOCK
+   u32 n_entries;  /* number of entries in a handle array */
+#endif /* CONFIG_SECURITY_LANDLOCK */
union {
char value[0] __aligned(8);
void *ptrs[0] __aligned(8);
void __percpu *pptrs[0] __aligned(8);
};
 };
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct map_landlock_handle {
+   u32 type;
+   union {
+   struct file *file;
+   };
+};
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #define MAX_TAIL_CALL_CNT 32
 
 u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406459b935a2..a60eedc17d40 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,15 @@ enum bpf_map_type {
BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
+   BPF_MAP_TYPE_LANDLOCK_ARRAY,
+};
+
+enum bpf_map_array_type {
+   BPF_MAP_ARRAY_TYPE_UNSPEC,
+};
+
+enum bpf_map_handle_type {
+   BPF_MAP_HANDLE_TYPE_UNSPEC,
 };
 
 enum bpf_prog_type {
@@ -386,4 +395,13 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+/* Map handle entry */
+struct landlock_handle {
+   __u32 type; /* enum bpf_map_handle_type */
+   union {
+   __u32 fd;
+   __aligned_u64 glob;
+   };
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 76d5a794e426..5938b8ee475b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+#include  /* fput() */
+#include  /* struct file */
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
@@ -491,3 +493,182 @@ static int __init register_perf_event_array_map(void)
return 0;
 }
 late_initcall(register_perf_event_array_map);
+
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+static struct bpf_map *landlock_array_map_alloc(union bpf_attr *attr)
+{
+   if (attr->value_size != sizeof(struct landlock_handle))
+   return ERR_PTR(-EINVAL);
+   attr->value_size = sizeof(struct map_landlock_handle);
+
+   return array_map_alloc(attr);
+}
+
+static void landlock_put_handle(struct map_landlock_handle *handle)
+{
+   switch (handle->type) {
+   /* TODO: add handle types */
+   default:
+   WARN_ON(1);
+   }
+   /* safeguard */
+   handle->type = BPF_MAP_HANDLE_TYPE_UNSPEC;
+}
+
+static void landlock_array_map_free(struct bpf_map *map)
+{
+   struct bpf_array *array = container_of(map, struct bpf_array, map);
+   int i;
+
+   synchronize_rcu();
+
+   for (i =

[RFC v2 06/10] landlock: Add LSM hooks

2016-08-25 Thread Mickaël Salaün

Add LSM hooks which can be used by userland through Landlock (eBPF)
programs. This programs are limited to a whitelist of functions (cf.
next commit). The eBPF program context is depicted by the struct
landlock_data (cf. include/uapi/linux/bpf.h):
* hook: LSM hook ID (useful when using the same program for multiple LSM
  hooks);
* cookie: the 16-bit value from the seccomp filter that triggered this
  Landlock program;
* args[6]: array of LSM hook arguments.

The LSM hook arguments can contain raw values as integers or
(unleakable) pointers. The only way to use the pointers are to pass them
to an eBPF function according to their types (e.g. the
bpf_landlock_cmp_fs_beneath_with_struct_file function can use a struct
file pointer).

For now, there is three hooks for file system access control:
* file_open;
* file_permission;
* mmap_file.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: James Morris 
Cc: Serge E. Hallyn 
Cc: David S. Miller 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h|   7 ++
 include/linux/lsm_hooks.h  |   5 ++
 include/uapi/linux/bpf.h   |  20 +
 kernel/bpf/syscall.c   |   3 +
 kernel/bpf/verifier.c  |   8 ++
 kernel/seccomp.c   |   7 +-
 security/Makefile  |   2 +
 security/landlock/Makefile |   3 +
 security/landlock/lsm.c| 211 +
 security/security.c|   1 +
 10 files changed, 265 insertions(+), 2 deletions(-)
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/lsm.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9a5b388be099..557e7efdf0cd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -81,6 +81,9 @@ enum bpf_arg_type {
 
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING,   /* any (initialized) argument is ok */
+
+   ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */
+   ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */
 };
 
 /* type of values returned from helper functions */
@@ -139,6 +142,10 @@ enum bpf_reg_type {
 */
PTR_TO_PACKET,
PTR_TO_PACKET_END,   /* skb->data + headlen */
+
+   /* Landlock */
+   PTR_TO_STRUCT_FILE,
+   PTR_TO_STRUCT_CRED,
 };
 
 struct bpf_prog;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7ae397669d8b..6792ae8fb53d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1898,5 +1898,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a60eedc17d40..983d14e910ff 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -102,6 +102,9 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_CLS,
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
+   BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
+   BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
+   BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
 };
 
 #define BPF_PSEUDO_MAP_FD  1
@@ -404,4 +407,21 @@ struct landlock_handle {
};
 } __attribute__((aligned(8)));
 
+/**
+ * struct landlock_data
+ *
+ * @hook: LSM hook ID
+ * @cookie: value set by a seccomp-filter return value RET_LANDLOCK. This come
+ *  from a trusted seccomp-bpf program: the same process that loaded
+ *  this Landlock hook program.
+ * @args: LSM hook arguments, see include/linux/lsm_hooks.h for there
+ *description and the LANDLOCK_HOOK* definitions from
+ *security/landlock/lsm.c for their types.
+ */
+struct landlock_data {
+   __u32 hook;
+   __u16 cookie;
+   __u64 args[6];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 32a10ef4b878..6b8bfc34c751 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -719,6 +719,9 @@ static int bpf_prog_load(union bpf_attr *attr)
 
switch (type) {
case BPF_PROG_TYPE_SOCKET_FILTER:
+   case BPF_PROG_TYPE_LANDLOCK_FILE_OPEN:
+   case BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION:
+   case BPF_PROG_TYPE_LANDLOCK_MMAP_FILE:
break;
default:
if (!capable(CAP_SYS_ADMIN))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c15f6cc28e00..2931e2efcc10 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -244,6 +244,8 @@ static const char * const reg_type_str[] = {
[CONST_IMM] = "imm",
[PTR_TO_PACKET] = "pkt",
[PTR_TO_PACKET_END] = "pkt_end",
+   [PTR_TO_STRUCT_FILE]= "struct_file",
+   [PTR_TO_STRUCT_C

[RFC v2 08/10] landlock: Handle file system comparisons

2016-08-25 Thread Mickaël Salaün

Add eBPF functions to compare file system access with a Landlock file
system handle:
* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
  This function allows to compare the dentry, inode, device or mount
  point of the currently accessed file, with a reference handle.
* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
  This function allows an eBPF program to check if the current accessed
  file is the same or in the hierarchy of a reference handle.

The goal of file system handle is to abstract kernel objects such as a
struct file or a struct inode. Userland can create this kind of handle
thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
landlock_handle containing the handle type (e.g.
BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
also be any descriptions able to match a struct file or a struct inode
(e.g. path or glob string).

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Alexei Starovoitov 
Cc: James Morris 
Cc: Serge E. Hallyn 
Cc: David S. Miller 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h|   4 +-
 include/uapi/linux/bpf.h   |  52 +++-
 kernel/bpf/arraymap.c  |  17 +++-
 kernel/bpf/verifier.c  |   6 ++
 security/landlock/Makefile |   2 +-
 security/landlock/checker_fs.c | 183 +
 security/landlock/checker_fs.h |  20 +
 security/landlock/lsm.c|  11 ++-
 8 files changed, 288 insertions(+), 7 deletions(-)
 create mode 100644 security/landlock/checker_fs.c
 create mode 100644 security/landlock/checker_fs.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 557e7efdf0cd..79014aedbea4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_arg_type {
 
ARG_PTR_TO_STRUCT_FILE, /* pointer to struct file */
ARG_PTR_TO_STRUCT_CRED, /* pointer to struct cred */
+   ARG_CONST_PTR_TO_LANDLOCK_HANDLE_FS,/* pointer to Landlock FS 
handle */
 };
 
 /* type of values returned from helper functions */
@@ -146,6 +147,7 @@ enum bpf_reg_type {
/* Landlock */
PTR_TO_STRUCT_FILE,
PTR_TO_STRUCT_CRED,
+   CONST_PTR_TO_LANDLOCK_HANDLE_FS,
 };
 
 struct bpf_prog;
@@ -207,7 +209,7 @@ struct bpf_array {
 
 #ifdef CONFIG_SECURITY_LANDLOCK
 struct map_landlock_handle {
-   u32 type;
+   u32 type; /* e.g. BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD */
union {
struct file *file;
};
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 983d14e910ff..88af79dd668c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -89,10 +89,20 @@ enum bpf_map_type {
 
 enum bpf_map_array_type {
BPF_MAP_ARRAY_TYPE_UNSPEC,
+   BPF_MAP_ARRAY_TYPE_LANDLOCK_FS,
 };
 
 enum bpf_map_handle_type {
BPF_MAP_HANDLE_TYPE_UNSPEC,
+   BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+   BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_GLOB,
+};
+
+enum bpf_map_array_op {
+   BPF_MAP_ARRAY_OP_UNSPEC,
+   BPF_MAP_ARRAY_OP_OR,
+   BPF_MAP_ARRAY_OP_AND,
+   BPF_MAP_ARRAY_OP_XOR,
 };
 
 enum bpf_prog_type {
@@ -325,6 +335,35 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_get_tunnel_opt,
BPF_FUNC_skb_set_tunnel_opt,
+
+   /**
+* bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
+* Compare file system handles with a struct file
+*
+* @prop: properties to check against (e.g. LANDLOCK_FLAG_FS_DENTRY)
+* @map: handles to compare against
+* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+* @file: struct file address to compare with (taken from the context)
+*
+* Return: 0 if the file match the handles, 1 otherwise, or a negative
+* value if an error occurred.
+*/
+   BPF_FUNC_landlock_cmp_fs_prop_with_struct_file,
+
+   /**
+* bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
+* Check if a struct file is a leaf of file system handles
+*
+* @opt: check options (e.g. LANDLOCK_FLAG_OPT_REVERSE)
+* @map: handles to compare against
+* @map_op: which elements of the map to use (e.g. BPF_MAP_ARRAY_OP_OR)
+* @file: struct file address to compare with (taken from the context)
+*
+* Return: 0 if the file is the same or beneath the handles,
+* 1 otherwise, or a negative value if an error occurred.
+*/
+   BPF_FUNC_landlock_cmp_fs_beneath_with_struct_file,
+
__BPF_FUNC_MAX_ID,
 };
 
@@ -398,6 +437,17 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+/* Handle check flags */
+#define LANDLOCK_FLAG_FS_DENTRY(1 << 0)
+#define LANDLOCK_FLAG_FS_INODE (1 << 1)
+#define LANDLOCK_FLAG_FS_DEVICE(1 << 2)
+#define LANDLOCK_FLAG_FS_MOUNT (1 << 3)
+#define _LANDLOCK_FLAG_FS_MASK ((1 << 4) - 1)
+
+/* H

[RFC v2 10/10] samples/landlock: Add sandbox example

2016-08-25 Thread Mickaël Salaün

Add a basic sandbox tool to create a process isolated from some part of
the system. This can depend of the current cgroup.

Example:

  $ mkdir /sys/fs/cgroup/sandboxed
  $ ls /home
  user1
  $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
  LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
  ./sandbox /bin/sh -i
  $ ls /home
  user1
  $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
  $ ls /home
  ls: cannot open directory '/home': Permission denied

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Alexei Starovoitov 
Cc: James Morris 
Cc: Serge E. Hallyn 
Cc: David S. Miller 
Cc: Daniel Borkmann 
---
 samples/Makefile|   2 +-
 samples/landlock/.gitignore |   1 +
 samples/landlock/Makefile   |  16 +++
 samples/landlock/sandbox.c  | 295 
 4 files changed, 313 insertions(+), 1 deletion(-)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandbox.c

diff --git a/samples/Makefile b/samples/Makefile
index 2e3b523d7097..42e6a613f728 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -2,4 +2,4 @@
 
 obj-$(CONFIG_SAMPLES)  += kobject/ kprobes/ trace_events/ livepatch/ \
   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
-  configfs/ connector/ v4l/
+  configfs/ connector/ v4l/ landlock/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index ..f6c6da930a30
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandbox
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index ..d1044b2afd27
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,16 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+hostprogs-$(CONFIG_SECURITY_LANDLOCK) := sandbox
+sandbox-objs := sandbox.o
+
+always := $(hostprogs-y)
+
+HOSTCFLAGS += -I$(objtree)/usr/include
+
+# Trick to allow make to be run from this directory
+all:
+   $(MAKE) -C ../../ $$PWD/
+
+clean:
+   $(MAKE) -C ../../ M=$$PWD clean
diff --git a/samples/landlock/sandbox.c b/samples/landlock/sandbox.c
new file mode 100644
index ..86604963c30c
--- /dev/null
+++ b/samples/landlock/sandbox.c
@@ -0,0 +1,295 @@
+/*
+ * Landlock LSM - Sandbox Example
+ *
+ * Copyright (C) 2016  Mickaël Salaün 
+ *
+ * The code may be used by anyone for any purpose, and can serve as a starting
+ * point for developing a sandbox.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include  /* open() */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../tools/include/linux/filter.h"
+
+#include "../bpf/libbpf.c"
+
+#ifndef seccomp
+static int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+   errno = 0;
+   return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
+#define ARRAY_SIZE(a)  (sizeof(a) / sizeof(a[0]))
+
+static int apply_sandbox(const char **allowed_paths, int path_nb, const char 
**cgroup_paths, int cgroup_nb)
+{
+   __u32 key;
+   int i, ret = 0, map_fs = -1, map_cg = -1, offset;
+
+   /* set up the test sandbox */
+   if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+   perror("prctl(no_new_priv)");
+   return 1;
+   }
+
+   /* register a new syscall filter */
+   struct sock_filter filter0[] = {
+   /* pass a cookie containing 5 to the LSM hook filter */
+   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_LANDLOCK | 5),
+   };
+   struct sock_fprog prog0 = {
+   .len = (unsigned short)ARRAY_SIZE(filter0),
+   .filter = filter0,
+   };
+   if (seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog0)) {
+   perror("seccomp(set_filter)");
+   return 1;
+   }
+
+   if (path_nb) {
+   map_fs = bpf_create_map(BPF_MAP_TYPE_LANDLOCK_ARRAY, 
sizeof(key), sizeof(struct landlock_handle), 10, 0);
+   if (map_fs < 0) {
+   fprintf(stderr, "bpf_create_map(fs");
+   perror(")");
+   return 1;
+   }
+   for (key = 0; key < path_nb; key++) {
+   int fd = open(allowed_paths[key], O_RDONLY | O_CLOEXEC);
+   if (fd < 0) {
+   fprintf(stderr, "open(fs: \"%s\"", 
allowed_paths[key]);
+   perror(")");
+   return 1;
+   }
+   struct landlock_handle handle = {
+   .type = BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD,
+   .fd = (__

[RFC v2 05/10] seccomp: Handle Landlock

2016-08-25 Thread Mickaël Salaün

A Landlock program can be triggered when a seccomp filter return
RET_LANDLOCK. Moreover, it is possible to return a 16-bit cookie which
will be readable by the Landlock programs.

Only seccomp filters loaded from the same thread and before a Landlock
program can trigger it. Multiple Landlock programs can be triggered by
one or more seccomp filters. This way, each RET_LANDLOCK (with specific
cookie) will trigger all the allowed Landlock programs once.

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Andrew Morton 
---
 include/linux/seccomp.h  |  49 +++
 include/uapi/linux/seccomp.h |   2 +
 kernel/fork.c|  39 -
 kernel/seccomp.c | 190 ++-
 4 files changed, 275 insertions(+), 5 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 29b20fe8fd4d..785ccbebf687 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,7 +10,33 @@
 #include 
 #include 
 
+#ifdef CONFIG_SECURITY_LANDLOCK
+#include  /* struct bpf_prog */
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 struct seccomp_filter;
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct seccomp_landlock_ret {
+   struct seccomp_landlock_ret *prev;
+   /* @filter points to a @landlock_filter list */
+   struct seccomp_filter *filter;
+   u16 cookie;
+   bool triggered;
+};
+
+struct seccomp_landlock_prog {
+   atomic_t usage;
+   struct seccomp_landlock_prog *prev;
+   /*
+* List of filters (through filter->landlock_prev) allowed to trigger
+* this Landlock program.
+*/
+   struct seccomp_filter *filter;
+   struct bpf_prog *prog;
+};
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 /**
  * struct seccomp - the state of a seccomp'ed process
  *
@@ -18,6 +44,10 @@ struct seccomp_filter;
  * system calls available to a process.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *  accessed without locking during system call entry.
+ * @landlock_filter: list of filters allowed to trigger an associated
+ *Landlock hook via a RET_LANDLOCK.
+ * @landlock_ret: stored values from a RET_LANDLOCK.
+ * @landlock_prog: list of Landlock programs.
  *
  *  @filter must only be accessed from the context of current as there
  *  is no read locking.
@@ -25,6 +55,12 @@ struct seccomp_filter;
 struct seccomp {
int mode;
struct seccomp_filter *filter;
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+   struct seccomp_filter *landlock_filter;
+   struct seccomp_landlock_ret *landlock_ret;
+   struct seccomp_landlock_prog *landlock_prog;
+#endif /* CONFIG_SECURITY_LANDLOCK */
 };
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
@@ -85,6 +121,12 @@ static inline int seccomp_mode(struct seccomp *s)
 #ifdef CONFIG_SECCOMP_FILTER
 extern void put_seccomp(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret);
+extern struct seccomp_landlock_ret *dup_landlock_ret(
+   struct seccomp_landlock_ret *ret_orig);
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #else  /* CONFIG_SECCOMP_FILTER */
 static inline void put_seccomp(struct task_struct *tsk)
 {
@@ -95,6 +137,13 @@ static inline void get_seccomp_filter(struct task_struct 
*tsk)
 {
return;
 }
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+static inline void put_landlock_ret(struct seccomp_landlock_ret *landlock_ret) 
{}
+static inline struct seccomp_landlock_ret *dup_landlock_ret(
+   struct seccomp_landlock_ret *ret_orig) {}
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
 #endif /* CONFIG_SECCOMP_FILTER */
 
 #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE)
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..b4aab1c19b8a 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,6 +13,7 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT0
 #define SECCOMP_SET_MODE_FILTER1
+#define SECCOMP_SET_LANDLOCK_HOOK  2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC  1
@@ -28,6 +29,7 @@
 #define SECCOMP_RET_KILL   0xU /* kill the task immediately */
 #define SECCOMP_RET_TRAP   0x0003U /* disallow and force a SIGSYS */
 #define SECCOMP_RET_ERRNO  0x0005U /* returns an errno */
+#define SECCOMP_RET_LANDLOCK   0x0007U /* trigger LSM evaluation */
 #define SECCOMP_RET_TRACE  0x7ff0U /* pass to a tracer or disallow */
 #define SECCOMP_RET_ALLOW  0x7fffU /* allow */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index b23a71ec8003..3658c1e95e03 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -369,7 +369,12 @@ static struct task_struct *dup_task_struct(struct 
task_struct

[RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it

2016-08-25 Thread Mickaël Salaün

This helper will be useful for arraymap (next commit).

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: David S. Miller 
Cc: Daniel Borkmann 
---
 include/linux/bpf.h  | 6 ++
 kernel/bpf/syscall.c | 6 --
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0de4de6dd43e..ca3742729ae7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -251,6 +251,12 @@ static inline void bpf_long_memcpy(void *dst, const void 
*src, u32 size)
 
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
+
+/* helper to convert user pointers passed inside __aligned_u64 fields */
+static inline void __user *u64_to_ptr(__u64 val)
+{
+   return (void __user *) (unsigned long) val;
+}
 #else
 static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 46ecce4b79ed..d305a3ce0fa7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -247,12 +247,6 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
return map;
 }
 
-/* helper to convert user pointers passed inside __aligned_u64 fields */
-static void __user *u64_to_ptr(__u64 val)
-{
-   return (void __user *) (unsigned long) val;
-}
-
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
return -ENOTSUPP;
-- 
2.8.1

[RFC v2 07/10] landlock: Add errno check

2016-08-25 Thread Mickaël Salaün

Add a max errno value.

This is not strictly needed but should improve reliability.

Signed-off-by: Mickaël Salaün 
Cc: Arnd Bergmann 
Cc: Serge E. Hallyn 
Cc: James Morris 
Cc: Kees Cook 
---
 include/uapi/asm-generic/errno-base.h | 1 +
 security/landlock/lsm.c   | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/uapi/asm-generic/errno-base.h 
b/include/uapi/asm-generic/errno-base.h
index 65115978510f..43407a403e72 100644
--- a/include/uapi/asm-generic/errno-base.h
+++ b/include/uapi/asm-generic/errno-base.h
@@ -35,5 +35,6 @@
 #defineEPIPE   32  /* Broken pipe */
 #defineEDOM33  /* Math argument out of domain of func 
*/
 #defineERANGE  34  /* Math result not representable */
+#define_ERRNO_LAST ERANGE
 
 #endif
diff --git a/security/landlock/lsm.c b/security/landlock/lsm.c
index aa9d4a64826e..322309068066 100644
--- a/security/landlock/lsm.c
+++ b/security/landlock/lsm.c
@@ -11,7 +11,6 @@
 #include 
 #include  /* enum bpf_reg_type, struct landlock_data */
 #include 
-#include  /* MAX_ERRNO */
 #include  /* struct bpf_prog, BPF_PROG_RUN() */
 #include  /* FIELD_SIZEOF() */
 #include 
@@ -104,8 +103,9 @@ static int landlock_run_prog(__u64 args[6])
}
}
if (!ret) {
-   if (cur_ret > MAX_ERRNO)
-   ret = MAX_ERRNO;
+   /* check errno to not mess with kernel code */
+   if (cur_ret > _ERRNO_LAST)
+   ret = EPERM;
else
ret = cur_ret;
}
-- 
2.8.1

Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-08-25 Thread Mickaël Salaün

On 25/08/2016 13:05, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
>> Hi,
>>
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
> 
> Maybe I'm missing an obvious description, but: do you have a
> description of the eBPF API to landlock?  What function do you
> provide, when is it called, what functions can it call, what does the
> fancy new arraymap do, etc?
> 
> --Andy
> 

The eBPF context is described in "[RFC v2 06/10] landlock: Add LSM hooks".

The provided eBPF functions are described in "[RFC v2 08/10] landlock:
Handle file system comparisons"
(bpf_landlock_cmp_fs_prop_with_struct_file and
bpf_landlock_cmp_fs_beneath_with_struct_file) and "[RFC v2 09/10]
landlock: Handle cgroups" (bpf_landlock_cmp_cgroup_beneath). The
function descriptions are summarized in include/uapi/linux/bpf.h .

This functions can be called by an eBPF program of type
BPF_PROG_TYPE_LANDLOCK_FILE_OPEN, BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION
and BPF_PROG_TYPE_LANDLOCK_MMAP_FILE as described in "[RFC v2 06/10]
landlock: Add LSM hooks".

I tried to split the commits as much as possible to ease the review. The
"[RFC v2 10/10] samples/landlock: Add sandbox example" may help to see
the whole picture.

Hope this helps,
 Mickaël

signature.asc
Description: OpenPGP digital signature

Re: [RFC v2 08/10] landlock: Handle file system comparisons

2016-08-25 Thread Mickaël Salaün


On 25/08/2016 13:12, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
>> Add eBPF functions to compare file system access with a Landlock file
>> system handle:
>> * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
>>   This function allows to compare the dentry, inode, device or mount
>>   point of the currently accessed file, with a reference handle.
>> * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
>>   This function allows an eBPF program to check if the current accessed
>>   file is the same or in the hierarchy of a reference handle.
>>
>> The goal of file system handle is to abstract kernel objects such as a
>> struct file or a struct inode. Userland can create this kind of handle
>> thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
>> landlock_handle containing the handle type (e.g.
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
>> also be any descriptions able to match a struct file or a struct inode
>> (e.g. path or glob string).
> 
> This needs Eric's opinion.
> 
> Also, where do all the struct file *'s get stashed?  Are they
> preserved in the arraymap?  What prevents reference cycles or absurdly
> large numbers of struct files getting pinned?

Yes, the struct file are kept in the arraymap and dropped when there is
no more reference on them. Currently, the limitations are the maximum
number of open file descriptors referring to an arraymap and the maximum
number of eBPF Landlock programs loaded in a process
(LANDLOCK_PROG_LIST_MAX_PAGES in kernel/seccomp.c).

What kind of reference cycles have you in mind?

It probably needs another limit for kernel object references as well.
What is the best option here? Add another static limitation or use an
existing one?

 Mickaël



signature.asc
Description: OpenPGP digital signature

Re: [RFC v2 09/10] landlock: Handle cgroups

2016-08-25 Thread Mickaël Salaün

On 25/08/2016 13:09, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
>> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
>> to compare the current process cgroup with a cgroup handle, The handle
>> can match the current cgroup if it is the same or a child. This allows
>> to make conditional rules according to the current cgroup.
>>
>> A cgroup handle is a map entry created from a file descriptor referring
>> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
>> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
>> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> 
> Can you elaborate on why this is useful?  I.e. why not just supply
> different policies to different subtrees.

The main use case I see is to load the security policies at the start of
a user session for all processes but not enforce them right away. The
user can then keep a shell for Landlock administration tasks and lock
the other processes with a dedicated cgroup on the fly. This allows the
user to make unremovable Landlock security policies but only activate
them when needed for specific processes.

> 
> Also, how does this interact with the current cgroup v1 vs v2 mess?
> As far as I can tell, no one can even really agree on what "what
> cgroup am I in" means right now.

I tested with cgroup-v2 but indeed, it seems a bit different with
cgroup-v1 :)
Does anyone know how to handle both cases?

> 
>>
>> An unprivileged process can create and manipulate cgroups thanks to
>> cgroup delegation.
> 
> What is cgroup delegation?

This is simply the action of changing the owner of cgroup sysfs files to
allow an unprivileged user to handle them (cf. Documentation/cgroup-v2.txt)

 Mickaël

signature.asc
Description: OpenPGP digital signature

Re: [RFC v2 09/10] landlock: Handle cgroups

2016-08-26 Thread Mickaël Salaün


On 26/08/2016 04:14, Alexei Starovoitov wrote:
> On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
>> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
>> to compare the current process cgroup with a cgroup handle, The handle
>> can match the current cgroup if it is the same or a child. This allows
>> to make conditional rules according to the current cgroup.
>>
>> A cgroup handle is a map entry created from a file descriptor referring
>> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
>> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
>> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
>>
>> An unprivileged process can create and manipulate cgroups thanks to
>> cgroup delegation.
>>
>> Signed-off-by: Mickaël Salaün 
> ...
>> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
>> +u64 r3_map_op, u64 r4, u64 r5)
>> +{
>> +u8 option = (u8) r1_option;
>> +struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
>> +enum bpf_map_array_op map_op = r3_map_op;
>> +struct bpf_array *array = container_of(map, struct bpf_array, map);
>> +struct cgroup *cg1, *cg2;
>> +struct map_landlock_handle *handle;
>> +int i;
>> +
>> +/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
>> +if (unlikely(!map)) {
>> +WARN_ON(1);
>> +return -EFAULT;
>> +}
>> +if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != 
>> _LANDLOCK_FLAG_OPT_MASK))
>> +return -EINVAL;
>> +
>> +/* for now, only handle OP_OR */
>> +switch (map_op) {
>> +case BPF_MAP_ARRAY_OP_OR:
>> +break;
>> +case BPF_MAP_ARRAY_OP_UNSPEC:
>> +case BPF_MAP_ARRAY_OP_AND:
>> +case BPF_MAP_ARRAY_OP_XOR:
>> +default:
>> +return -EINVAL;
>> +}
>> +
>> +synchronize_rcu();
>> +
>> +for (i = 0; i < array->n_entries; i++) {
>> +handle = (struct map_landlock_handle *)
>> +(array->value + array->elem_size * i);
>> +
>> +/* protected by the proto types, should not happen */
>> +if (unlikely(handle->type != 
>> BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
>> +WARN_ON(1);
>> +return -EFAULT;
>> +}
>> +if (unlikely(!handle->css)) {
>> +WARN_ON(1);
>> +return -EFAULT;
>> +}
>> +
>> +if (option & LANDLOCK_FLAG_OPT_REVERSE) {
>> +cg1 = handle->css->cgroup;
>> +cg2 = task_css_set(current)->dfl_cgrp;
>> +} else {
>> +cg1 = task_css_set(current)->dfl_cgrp;
>> +cg2 = handle->css->cgroup;
>> +}
>> +
>> +if (cgroup_is_descendant(cg1, cg2))
>> +return 0;
>> +}
>> +return 1;
>> +}
> 
> - please take a loook at exisiting bpf_current_task_under_cgroup and
> reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
> is nothing but duplication of the code.

Oh, I didn't know about this patchset and the new helper. Indeed, it
looks a lot like mine except there is no static verification of the map
type as I did with the arraymap of handles, and no batch mode either. I
think the return value of bpf_current_task_under_cgroup is error-prone
if an eBPF program do an "if(ret)" test on the value (because of the
negative ERRNO return value). Inverting the 0 and 1 return values should
fix this (0 == succeed, 1 == failed, <0 == error).


To sum up, there is four related patchsets:
* "Landlock LSM: Unprivileged sandboxing" (this series)
* "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
* "Networking cgroup controller" (Anoop Naravaram)
* "Add eBPF hooks for cgroups" (Daniel Mack)

The three other series (Sargun's, Anoop's and Daniel's) are mainly
focused on network access-control via cgroup for *containers*. As far as
I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's
goal is to empower all processes (privileged or not) to create their own
sandbox. This also means, like explained in "[RFC v2 00/10] Landlock
LSM: Unprivileged sandboxing", there is more constraints. For example,
it is not acceptable to let a process probe the kernel memory as it
wish. More details are in the

[PATCH v5 1/7] selftests: Make test_harness.h more generally available

2017-05-26 Thread Mickaël Salaün

The seccomp/test_harness.h file contains useful helpers to build tests.
Moving it to the selftest directory should benefit to other test
components.

Keep seccomp maintainers for this file.

Changes since v1:
* rename to kselftest_harness.h (suggested by Shuah Khan)
* keep maintainers

Signed-off-by: Mickaël Salaün 
Acked-by: Kees Cook 
Acked-by: Will Drewry 
Cc: Andy Lutomirski 
Cc: Shuah Khan 
Link: 
https://lkml.kernel.org/r/CAGXu5j+8CVz8vL51DRYXqOY=xc3zuKFf=ptene88xyhzfyi...@mail.gmail.com
---
 MAINTAINERS | 1 +
 tools/testing/selftests/{seccomp/test_harness.h => kselftest_harness.h} | 0
 tools/testing/selftests/seccomp/seccomp_bpf.c   | 2 +-
 3 files changed, 2 insertions(+), 1 deletion(-)
 rename tools/testing/selftests/{seccomp/test_harness.h => kselftest_harness.h} 
(100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index f7d568b8f133..ef292b8c771d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11492,6 +11492,7 @@ F:  kernel/seccomp.c
 F: include/uapi/linux/seccomp.h
 F: include/linux/seccomp.h
 F: tools/testing/selftests/seccomp/*
+F: tools/testing/selftests/kselftest_harness.h
 K: \bsecure_computing
 K: \bTIF_SECCOMP\b
 
diff --git a/tools/testing/selftests/seccomp/test_harness.h 
b/tools/testing/selftests/kselftest_harness.h
similarity index 100%
rename from tools/testing/selftests/seccomp/test_harness.h
rename to tools/testing/selftests/kselftest_harness.h
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 03f1fa495d74..7ba94efb24fd 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -37,7 +37,7 @@
 #include 
 #include 
 
-#include "test_harness.h"
+#include "../kselftest_harness.h"
 
 #ifndef PR_SET_PTRACER
 # define PR_SET_PTRACER 0x59616d61
-- 
2.11.0

[PATCH v5 3/7] selftests/seccomp: Force rebuild according to dependencies

2017-05-26 Thread Mickaël Salaün

Rebuild the seccomp tests when kselftest_harness.h is updated.

Signed-off-by: Mickaël Salaün 
Acked-by: Kees Cook 
Cc: Andy Lutomirski 
Cc: Shuah Khan 
Cc: Will Drewry 
---
 tools/testing/selftests/seccomp/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/seccomp/Makefile 
b/tools/testing/selftests/seccomp/Makefile
index 5fa6fd2246b1..aeb0c805f3ca 100644
--- a/tools/testing/selftests/seccomp/Makefile
+++ b/tools/testing/selftests/seccomp/Makefile
@@ -4,3 +4,5 @@ LDFLAGS += -lpthread
 
 include ../lib.mk
 
+$(TEST_GEN_PROGS): seccomp_bpf.c ../kselftest_harness.h
+   $(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
-- 
2.11.0

[PATCH v5 7/7] Documentation/dev-tools: Add kselftest_harness documentation

2017-05-26 Thread Mickaël Salaün

Add ReST metadata to kselftest_harness.h to be able to include the
comments in the Sphinx documentation.

Changes since v4:
* exclude the TEST_API() changes (requested by Kees Cook)

Changes since v3:
* document macros as actual functions (suggested by Jonathan Corbet)
* remove the TEST_API() wrapper to expose the underlying macro arguments
  to the documentation tools
* move and cleanup comments

Changes since v2:
* add reference to the full documentation in the header file (suggested
  by Kees Cook)

Signed-off-by: Mickaël Salaün 
Cc: Andy Lutomirski 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Shuah Khan 
Cc: Will Drewry 
---
 Documentation/dev-tools/kselftest.rst   |  34 +++
 tools/testing/selftests/kselftest_harness.h | 415 ++--
 2 files changed, 364 insertions(+), 85 deletions(-)

diff --git a/Documentation/dev-tools/kselftest.rst 
b/Documentation/dev-tools/kselftest.rst
index 9232ce94612c..a92fa181b6cf 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -120,3 +120,37 @@ Contributing new tests (details)
executable which is not tested by default.
TEST_FILES, TEST_GEN_FILES mean it is the file which is used by
test.
+
+Test Harness
+
+
+The kselftest_harness.h file contains useful helpers to build tests.  The tests
+from tools/testing/selftests/seccomp/seccomp_bpf.c can be used as example.
+
+Example
+---
+
+.. kernel-doc:: tools/testing/selftests/kselftest_harness.h
+:doc: example
+
+
+Helpers
+---
+
+.. kernel-doc:: tools/testing/selftests/kselftest_harness.h
+:functions: TH_LOG TEST TEST_SIGNAL FIXTURE FIXTURE_DATA FIXTURE_SETUP
+FIXTURE_TEARDOWN TEST_F TEST_HARNESS_MAIN
+
+Operators
+-
+
+.. kernel-doc:: tools/testing/selftests/kselftest_harness.h
+:doc: operators
+
+.. kernel-doc:: tools/testing/selftests/kselftest_harness.h
+:functions: ASSERT_EQ ASSERT_NE ASSERT_LT ASSERT_LE ASSERT_GT ASSERT_GE
+ASSERT_NULL ASSERT_TRUE ASSERT_NULL ASSERT_TRUE ASSERT_FALSE
+ASSERT_STREQ ASSERT_STRNE EXPECT_EQ EXPECT_NE EXPECT_LT
+EXPECT_LE EXPECT_GT EXPECT_GE EXPECT_NULL EXPECT_TRUE
+EXPECT_FALSE EXPECT_STREQ EXPECT_STRNE
+
diff --git a/tools/testing/selftests/kselftest_harness.h 
b/tools/testing/selftests/kselftest_harness.h
index 45f807ce37e1..c56f72e07cd7 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -4,41 +4,49 @@
  *
  * kselftest_harness.h: simple C unit test helper.
  *
- * Usage:
- *   #include "../kselftest_harness.h"
- *   TEST(standalone_test) {
- * do_some_stuff;
- * EXPECT_GT(10, stuff) {
- *stuff_state_t state;
- *enumerate_stuff_state(&state);
- *TH_LOG("expectation failed with state: %s", state.msg);
- * }
- * more_stuff;
- * ASSERT_NE(some_stuff, NULL) TH_LOG("how did it happen?!");
- * last_stuff;
- * EXPECT_EQ(0, last_stuff);
- *   }
- *
- *   FIXTURE(my_fixture) {
- * mytype_t *data;
- * int awesomeness_level;
- *   };
- *   FIXTURE_SETUP(my_fixture) {
- * self->data = mytype_new();
- * ASSERT_NE(NULL, self->data);
- *   }
- *   FIXTURE_TEARDOWN(my_fixture) {
- * mytype_free(self->data);
- *   }
- *   TEST_F(my_fixture, data_is_good) {
- * EXPECT_EQ(1, is_my_data_good(self->data));
- *   }
- *
- *   TEST_HARNESS_MAIN
+ * See documentation in Documentation/dev-tools/kselftest.rst
  *
  * API inspired by code.google.com/p/googletest
  */
 
+/**
+ * DOC: example
+ *
+ * .. code-block:: c
+ *
+ *#include "../kselftest_harness.h"
+ *
+ *TEST(standalone_test) {
+ *  do_some_stuff;
+ *  EXPECT_GT(10, stuff) {
+ * stuff_state_t state;
+ * enumerate_stuff_state(&state);
+ * TH_LOG("expectation failed with state: %s", state.msg);
+ *  }
+ *  more_stuff;
+ *  ASSERT_NE(some_stuff, NULL) TH_LOG("how did it happen?!");
+ *  last_stuff;
+ *  EXPECT_EQ(0, last_stuff);
+ *}
+ *
+ *FIXTURE(my_fixture) {
+ *  mytype_t *data;
+ *  int awesomeness_level;
+ *};
+ *FIXTURE_SETUP(my_fixture) {
+ *  self->data = mytype_new();
+ *  ASSERT_NE(NULL, self->data);
+ *}
+ *FIXTURE_TEARDOWN(my_fixture) {
+ *  mytype_free(self->data);
+ *}
+ *TEST_F(my_fixture, data_is_good) {
+ *  EXPECT_EQ(1, is_my_data_good(self->data));
+ *}
+ *
+ *TEST_HARNESS_MAIN
+ */
+
 #ifndef __KSELFTEST_HARNESS_H
 #define __KSELFTEST_HARNESS_H
 
@@ -61,10 +69,20 @@
 #  define TH_LOG_ENABLED 1
 #endif
 
-/* TH_LOG(format, ...)
+/**
+ * TH_LOG(fmt, ...)
+ *
+ * @fmt: format string
+ * @...: optional arguments
+ *
+ * .. code-block:: c
+ *
+ * TH_LOG(format, ...)
+ *
  * Optional debug logging function available for use in tests.
  * Logging may be enabled or disabled by defining TH_LOG_

[PATCH v5 6/7] selftests: Remove the TEST_API() wrapper from kselftest_harness.h

2017-05-26 Thread Mickaël Salaün

Remove the TEST_API() wrapper to expose the underlying macro arguments
to the documentation tools.

Use "git diff --patience" to get a more readable patch.

Changes since v4:
* standalone patch to ease the review (requested by Kees Cook)

Signed-off-by: Mickaël Salaün 
Cc: Andy Lutomirski 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Shuah Khan 
Cc: Will Drewry 
---
 tools/testing/selftests/kselftest_harness.h | 349 
 1 file changed, 147 insertions(+), 202 deletions(-)

diff --git a/tools/testing/selftests/kselftest_harness.h 
b/tools/testing/selftests/kselftest_harness.h
index 171e70aead9c..45f807ce37e1 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -51,147 +51,6 @@
 #include 
 #include 
 
-/* All exported functionality should be declared through this macro. */
-#define TEST_API(x) _##x
-
-/*
- * Exported APIs
- */
-
-/* TEST(name) { implementation }
- * Defines a test by name.
- * Names must be unique and tests must not be run in parallel.  The
- * implementation containing block is a function and scoping should be treated
- * as such.  Returning early may be performed with a bare "return;" statement.
- *
- * EXPECT_* and ASSERT_* are valid in a TEST() { } context.
- */
-#define TEST TEST_API(TEST)
-
-/* TEST_SIGNAL(name, signal) { implementation }
- * Defines a test by name and the expected term signal.
- * Names must be unique and tests must not be run in parallel.  The
- * implementation containing block is a function and scoping should be treated
- * as such.  Returning early may be performed with a bare "return;" statement.
- *
- * EXPECT_* and ASSERT_* are valid in a TEST() { } context.
- */
-#define TEST_SIGNAL TEST_API(TEST_SIGNAL)
-
-/* FIXTURE(datatype name) {
- *   type property1;
- *   ...
- * };
- * Defines the data provided to TEST_F()-defined tests as |self|.  It should be
- * populated and cleaned up using FIXTURE_SETUP and FIXTURE_TEARDOWN.
- */
-#define FIXTURE TEST_API(FIXTURE)
-
-/* FIXTURE_DATA(datatype name)
- * This call may be used when the type of the fixture data
- * is needed.  In general, this should not be needed unless
- * the |self| is being passed to a helper directly.
- */
-#define FIXTURE_DATA TEST_API(FIXTURE_DATA)
-
-/* FIXTURE_SETUP(fixture name) { implementation }
- * Populates the required "setup" function for a fixture.  An instance of the
- * datatype defined with _FIXTURE_DATA will be exposed as |self| for the
- * implementation.
- *
- * ASSERT_* are valid for use in this context and will prempt the execution
- * of any dependent fixture tests.
- *
- * A bare "return;" statement may be used to return early.
- */
-#define FIXTURE_SETUP TEST_API(FIXTURE_SETUP)
-
-/* FIXTURE_TEARDOWN(fixture name) { implementation }
- * Populates the required "teardown" function for a fixture.  An instance of 
the
- * datatype defined with _FIXTURE_DATA will be exposed as |self| for the
- * implementation to clean up.
- *
- * A bare "return;" statement may be used to return early.
- */
-#define FIXTURE_TEARDOWN TEST_API(FIXTURE_TEARDOWN)
-
-/* TEST_F(fixture, name) { implementation }
- * Defines a test that depends on a fixture (e.g., is part of a test case).
- * Very similar to TEST() except that |self| is the setup instance of fixture's
- * datatype exposed for use by the implementation.
- */
-#define TEST_F TEST_API(TEST_F)
-
-#define TEST_F_SIGNAL TEST_API(TEST_F_SIGNAL)
-
-/* Use once to append a main() to the test file. E.g.,
- *   TEST_HARNESS_MAIN
- */
-#define TEST_HARNESS_MAIN TEST_API(TEST_HARNESS_MAIN)
-
-/*
- * Operators for use in TEST and TEST_F.
- * ASSERT_* calls will stop test execution immediately.
- * EXPECT_* calls will emit a failure warning, note it, and continue.
- */
-
-/* ASSERT_EQ(expected, measured): expected == measured */
-#define ASSERT_EQ TEST_API(ASSERT_EQ)
-/* ASSERT_NE(expected, measured): expected != measured */
-#define ASSERT_NE TEST_API(ASSERT_NE)
-/* ASSERT_LT(expected, measured): expected < measured */
-#define ASSERT_LT TEST_API(ASSERT_LT)
-/* ASSERT_LE(expected, measured): expected <= measured */
-#define ASSERT_LE TEST_API(ASSERT_LE)
-/* ASSERT_GT(expected, measured): expected > measured */
-#define ASSERT_GT TEST_API(ASSERT_GT)
-/* ASSERT_GE(expected, measured): expected >= measured */
-#define ASSERT_GE TEST_API(ASSERT_GE)
-/* ASSERT_NULL(measured): NULL == measured */
-#define ASSERT_NULL TEST_API(ASSERT_NULL)
-/* ASSERT_TRUE(measured): measured != 0 */
-#define ASSERT_TRUE TEST_API(ASSERT_TRUE)
-/* ASSERT_FALSE(measured): measured == 0 */
-#define ASSERT_FALSE TEST_API(ASSERT_FALSE)
-/* ASSERT_STREQ(expected, measured): !strcmp(expected, measured) */
-#define ASSERT_STREQ TEST_API(ASSERT_STREQ)
-/* ASSERT_STRNE(expected, measured): strcmp(expected, measured) */
-#define ASSERT_STRNE TEST_API(ASSERT_STRNE)
-/* EXPECT_EQ(expected, measured): expected == measure

[PATCH v5 2/7] selftests: Cosmetic renames in kselftest_harness.h

2017-05-26 Thread Mickaël Salaün

Keep the content consistent with the new name.

Signed-off-by: Mickaël Salaün 
Acked-by: Kees Cook 
Cc: Andy Lutomirski 
Cc: Shuah Khan 
Cc: Will Drewry 
---
 tools/testing/selftests/kselftest_harness.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kselftest_harness.h 
b/tools/testing/selftests/kselftest_harness.h
index a786c69c7584..171e70aead9c 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -2,10 +2,10 @@
  * Copyright (c) 2012 The Chromium OS Authors. All rights reserved.
  * Use of this source code is governed by the GPLv2 license.
  *
- * test_harness.h: simple C unit test helper.
+ * kselftest_harness.h: simple C unit test helper.
  *
  * Usage:
- *   #include "test_harness.h"
+ *   #include "../kselftest_harness.h"
  *   TEST(standalone_test) {
  * do_some_stuff;
  * EXPECT_GT(10, stuff) {
@@ -38,8 +38,9 @@
  *
  * API inspired by code.google.com/p/googletest
  */
-#ifndef TEST_HARNESS_H_
-#define TEST_HARNESS_H_
+
+#ifndef __KSELFTEST_HARNESS_H
+#define __KSELFTEST_HARNESS_H
 
 #define _GNU_SOURCE
 #include 
@@ -532,4 +533,4 @@ static void __attribute__((constructor)) 
__constructor_order_first(void)
__constructor_order = _CONSTRUCTOR_ORDER_FORWARD;
 }
 
-#endif  /* TEST_HARNESS_H_ */
+#endif  /* __KSELFTEST_HARNESS_H */
-- 
2.11.0

[PATCH v5 0/7] Add kselftest_harness.h

2017-05-26 Thread Mickaël Salaün

Hi,

This patch series make the seccomp/test_harness.h more generally available [1]
and update the kselftest documentation in the Sphinx format. It also improve
the Makefile of seccomp tests to take into account any kselftest_harness.h
update.

[1] 
https://lkml.kernel.org/r/CAGXu5j+8CVz8vL51DRYXqOY=xc3zuKFf=ptene88xyhzfyi...@mail.gmail.com

Regards,

Mickaël Salaün (7):
  selftests: Make test_harness.h more generally available
  selftests: Cosmetic renames in kselftest_harness.h
  selftests/seccomp: Force rebuild according to dependencies
  Documentation/dev-tools: Add kselftest
  Documentation/dev-tools: Use reStructuredText markups for kselftest
  selftests: Remove the TEST_API() wrapper from kselftest_harness.h
  Documentation/dev-tools: Add kselftest_harness documentation

 Documentation/00-INDEX |   2 -
 Documentation/dev-tools/index.rst  |   1 +
 .../{kselftest.txt => dev-tools/kselftest.rst} | 101 ++-
 MAINTAINERS|   1 +
 .../test_harness.h => kselftest_harness.h} | 691 +
 tools/testing/selftests/seccomp/Makefile   |   2 +
 tools/testing/selftests/seccomp/seccomp_bpf.c  |   2 +-
 7 files changed, 520 insertions(+), 280 deletions(-)
 rename Documentation/{kselftest.txt => dev-tools/kselftest.rst} (52%)
 rename tools/testing/selftests/{seccomp/test_harness.h => kselftest_harness.h} 
(52%)

-- 
2.11.0

[PATCH v5 4/7] Documentation/dev-tools: Add kselftest

2017-05-26 Thread Mickaël Salaün

Move kselftest.txt to dev-tools/kselftest.rst .

Signed-off-by: Mickaël Salaün 
Acked-by: Kees Cook 
Cc: Jonathan Corbet 
Cc: Shuah Khan 
---
 Documentation/00-INDEX   | 2 --
 Documentation/{kselftest.txt => dev-tools/kselftest.rst} | 0
 2 files changed, 2 deletions(-)
 rename Documentation/{kselftest.txt => dev-tools/kselftest.rst} (100%)

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index ed3e5e949fce..6daf51536153 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -246,8 +246,6 @@ kprobes.txt
- documents the kernel probes debugging feature.
 kref.txt
- docs on adding reference counters (krefs) to kernel objects.
-kselftest.txt
-   - small unittests for (some) individual codepaths in the kernel.
 laptops/
- directory with laptop related info and laptop driver documentation.
 ldm.txt
diff --git a/Documentation/kselftest.txt b/Documentation/dev-tools/kselftest.rst
similarity index 100%
rename from Documentation/kselftest.txt
rename to Documentation/dev-tools/kselftest.rst
-- 
2.11.0

[PATCH v5 5/7] Documentation/dev-tools: Use reStructuredText markups for kselftest

2017-05-26 Thread Mickaël Salaün

Include and convert kselftest to the Sphinx format.

Changes since v2:
* lighten the modifications (suggested by Kees Cook)

Signed-off-by: Mickaël Salaün 
Acked-by: Kees Cook 
Cc: Jonathan Corbet 
Cc: Shuah Khan 
---
 Documentation/dev-tools/index.rst |  1 +
 Documentation/dev-tools/kselftest.rst | 67 +--
 2 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/Documentation/dev-tools/index.rst 
b/Documentation/dev-tools/index.rst
index 07d881147ef3..e50054c6aeaa 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -23,6 +23,7 @@ whole; patches welcome!
kmemleak
kmemcheck
gdb-kernel-debugging
+   kselftest
 
 
 .. only::  subproject and html
diff --git a/Documentation/dev-tools/kselftest.rst 
b/Documentation/dev-tools/kselftest.rst
index 5bd590335839..9232ce94612c 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -1,4 +1,6 @@
+==
 Linux Kernel Selftests
+==
 
 The kernel contains a set of "self tests" under the tools/testing/selftests/
 directory. These are intended to be small tests to exercise individual code
@@ -15,29 +17,34 @@ hotplug test is run on 2% of hotplug capable memory instead 
of 10%.
 Running the selftests (hotplug tests are run in limited mode)
 =
 
-To build the tests:
-  $ make -C tools/testing/selftests
+To build the tests::
 
+make -C tools/testing/selftests
 
-To run the tests:
-  $ make -C tools/testing/selftests run_tests
+To run the tests::
 
-To build and run the tests with a single command, use:
-  $ make kselftest
+make -C tools/testing/selftests run_tests
 
-- note that some tests will require root privileges.
+To build and run the tests with a single command, use::
+
+make kselftest
+
+Note that some tests will require root privileges.
 
 
 Running a subset of selftests
-
+=
+
 You can use the "TARGETS" variable on the make command line to specify
 single test to run, or a list of tests to run.
 
-To run only tests targeted for a single subsystem:
-  $  make -C tools/testing/selftests TARGETS=ptrace run_tests
+To run only tests targeted for a single subsystem::
 
-You can specify multiple tests to build and run:
-  $  make TARGETS="size timers" kselftest
+make -C tools/testing/selftests TARGETS=ptrace run_tests
+
+You can specify multiple tests to build and run::
+
+make TARGETS="size timers" kselftest
 
 See the top-level tools/testing/selftests/Makefile for the list of all
 possible targets.
@@ -46,13 +53,15 @@ possible targets.
 Running the full range hotplug selftests
 
 
-To build the hotplug tests:
-  $ make -C tools/testing/selftests hotplug
+To build the hotplug tests::
 
-To run the hotplug tests:
-  $ make -C tools/testing/selftests run_hotplug
+make -C tools/testing/selftests hotplug
 
-- note that some tests will require root privileges.
+To run the hotplug tests::
+
+make -C tools/testing/selftests run_hotplug
+
+Note that some tests will require root privileges.
 
 
 Install selftests
@@ -62,13 +71,15 @@ You can use kselftest_install.sh tool installs selftests in 
default
 location which is tools/testing/selftests/kselftest or a user specified
 location.
 
-To install selftests in default location:
-   $ cd tools/testing/selftests
-   $ ./kselftest_install.sh
+To install selftests in default location::
 
-To install selftests in a user specified location:
-   $ cd tools/testing/selftests
-   $ ./kselftest_install.sh install_dir
+cd tools/testing/selftests
+./kselftest_install.sh
+
+To install selftests in a user specified location::
+
+cd tools/testing/selftests
+./kselftest_install.sh install_dir
 
 Running installed selftests
 ===
@@ -79,8 +90,10 @@ named "run_kselftest.sh" to run the tests.
 You can simply do the following to run the installed Kselftests. Please
 note some tests will require root privileges.
 
-cd kselftest
-./run_kselftest.sh
+::
+
+cd kselftest
+./run_kselftest.sh
 
 Contributing new tests
 ==
@@ -96,8 +109,8 @@ In general, the rules for selftests are
  * Don't cause the top-level "make run_tests" to fail if your feature is
unconfigured.
 
-Contributing new tests(details)
-===
+Contributing new tests (details)
+
 
  * Use TEST_GEN_XXX if such binaries or files are generated during
compiling.
-- 
2.11.0

[PATCH net-next v1] bpf: Use the IS_FD_ARRAY() macro in map_update_elem()

2018-01-25 Thread Mickaël Salaün

Make the code more readable.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 kernel/bpf/syscall.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5bdb0cc84ad2..e24aa3241387 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -709,10 +709,7 @@ static int map_update_elem(union bpf_attr *attr)
err = bpf_percpu_hash_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
-   } else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
+   } else if (IS_FD_ARRAY(map)) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
   attr->flags);
-- 
2.15.1

[PATCH net-next v1] samples/bpf: Partially fixes the bpf.o build

2018-01-25 Thread Mickaël Salaün

Do not build lib/bpf/bpf.o with this Makefile but use the one from the
library directory.  This avoid making a buggy bpf.o file (e.g. missing
symbols).

This patch is useful if some code (e.g. Landlock tests) needs both the
bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---

This is not a complet fix because the call to multi_depend with
$(host-cmulti) from scripts/Makefile.host force the build of bpf.o
anyway. I'm not sure how to completely avoid this automatic build
though.
---
 samples/bpf/Makefile | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 7f61a3d57fa7..64335bb94f9f 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -201,13 +201,16 @@ CLANG_ARCH_ARGS = -target $(ARCH)
 endif
 
 # Trick to allow make to be run from this directory
-all:
+all: $(LIBBPF)
$(MAKE) -C ../../ $(CURDIR)/
 
 clean:
$(MAKE) -C ../../ M=$(CURDIR) clean
@rm -f *~
 
+$(LIBBPF): FORCE
+   $(MAKE) -C $(dir $@) $(notdir $@)
+
 $(obj)/syscall_nrs.s:  $(src)/syscall_nrs.c
$(call if_changed_dep,cc_s_c)
 
-- 
2.15.1

Re: [PATCH v1] samples/bpf: Add a .gitignore for binaries

2017-05-13 Thread Mickaël Salaün

On 13/02/2017 02:43, David Ahern wrote:
> On 2/12/17 2:23 PM, Mickaël Salaün wrote:
>> diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
>> new file mode 100644
>> index ..a7562a5ef4c2
>> --- /dev/null
>> +++ b/samples/bpf/.gitignore
>> @@ -0,0 +1,32 @@
>> +fds_example
>> +lathist
> 
> ...
> 
> Listing each target is going to be a PITA to maintain. It would be
> better to put targets into a build directory (bin?) and ignore the
> directory.
> 

It would require a lot of modifications to the Makefile and more
complexity. It seems much more simple for everyone to stick to a simple
gitignore file easily maintainable:
$ awk '$1 == "hostprogs-y" { print $3 }' < Makefile > .gitignore

Alexei, Daniel, what do you think about this? Do you want me to send a
v2 with the new tests?

 Mickaël

signature.asc
Description: OpenPGP digital signature

Re: [PATCH net-next v1] samples/bpf: Partially fixes the bpf.o build

2018-01-26 Thread Mickaël Salaün


On 26/01/2018 03:16, Alexei Starovoitov wrote:
> On Fri, Jan 26, 2018 at 01:39:30AM +0100, Mickaël Salaün wrote:
>> Do not build lib/bpf/bpf.o with this Makefile but use the one from the
>> library directory.  This avoid making a buggy bpf.o file (e.g. missing
>> symbols).
> 
> could you provide an example?
> What symbols will be missing?
> I don't think there is an issue with existing Makefile.

You can run this commands:
make -C samples/bpf; nm tools/lib/bpf/bpf.o > a; make -C tools/lib/bpf;
nm tools/lib/bpf/bpf.o > b; diff -u a b

Symbols like bzero and sys_bpf are missing with the samples/bpf
Makefile, which makes the bpf.o shrink from 25K to 7K.

> 
>> This patch is useful if some code (e.g. Landlock tests) needs both the
>> bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).
> 
> is that some future patches?

Yes, I'll send them next week.

> 
> we're trying to move everything form samples/bpf/ into selftests/bpf/
> and convert to use libbpf.a instead of obsolete bpf_load.c
> Please use this approach for landlock as well.

Ok, it should be better with this lib.

> 
>> Signed-off-by: Mickaël Salaün 
>> Cc: Alexei Starovoitov 
>> Cc: Daniel Borkmann 
>> ---
>>
>> This is not a complet fix because the call to multi_depend with
>> $(host-cmulti) from scripts/Makefile.host force the build of bpf.o
>> anyway. I'm not sure how to completely avoid this automatic build
>> though.
>> ---
>>  samples/bpf/Makefile | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index 7f61a3d57fa7..64335bb94f9f 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -201,13 +201,16 @@ CLANG_ARCH_ARGS = -target $(ARCH)
>>  endif
>>  
>>  # Trick to allow make to be run from this directory
>> -all:
>> +all: $(LIBBPF)
>>  $(MAKE) -C ../../ $(CURDIR)/
>>  
>>  clean:
>>  $(MAKE) -C ../../ M=$(CURDIR) clean
>>  @rm -f *~
>>  
>> +$(LIBBPF): FORCE
>> +$(MAKE) -C $(dir $@) $(notdir $@)
>> +
>>  $(obj)/syscall_nrs.s:   $(src)/syscall_nrs.c
>>  $(call if_changed_dep,cc_s_c)
>>  
>> -- 
>> 2.15.1
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH bpf-next v8 01/11] fs,security: Add a security blob to nameidata

2018-03-11 Thread Mickaël Salaün

On 02/27/2018 02:23 AM, Al Viro wrote:
> On Tue, Feb 27, 2018 at 12:57:21AM +, Al Viro wrote:
>> On Tue, Feb 27, 2018 at 01:41:11AM +0100, Mickaël Salaün wrote:
>>> The function current_nameidata_security(struct inode *) can be used to
>>> retrieve a blob's pointer address tied to the inode being walk through.
>>> This enable to follow a path lookup and know where an inode access come
>>> from. This is needed for the Landlock LSM to be able to restrict access
>>> to file path.
>>>
>>> The LSM hook nameidata_free_security(struct inode *) is called before
>>> freeing the associated nameidata.
>>
>> NAK.  Not without well-defined semantics and "some Linux S&M uses that for
>> something, don't ask what" does not count.
> 
> Incidentally, pathwalk mechanics is subject to change at zero notice, so
> if you want something, you'd better
>   * have explicitly defined semantics
>   * explain what it is - on fsdevel
>   * not have it hidden behind the layers of opaque LSM dreck, pardon
> the redundance.
> 
> Again, pathwalk internals have changed in the past and may bloody well
> change again in the future.  There's a damn good reason why struct nameidata
> is _not_ visible outside of fs/namei.c, and quietly relying upon any
> implementation details is no-go.
> 

I thought this whole patch series would go to linux-fsdevel but only
this patch did. I'll CCed fsdevel for the next round. Meanwhile, the
cover letter is here: https://lkml.org/lkml/2018/2/26/1214
The code using current_nameidata_lookup(inode) is in the patch 07/11:
https://lkml.org/lkml/2018/2/26/1206

To sum up, I don't know any way to identify if a directory (execute)
access was directly requested by a process or inferred by the kernel
because of a path walk. This was not needed until now because the other
access control systems (either the DAC or access controls enforced by
inode-based LSM, i.e. SELinux and Smack) do not care about the file
hierarchy. Path-based access controls (i.e. AppArmor and Tomoyo)
directly use the notion of path to define a security policy (in the
kernel, not only in the user space configuration). Landlock can't rely
on xattrs (because of composed and unprivileged access control). Because
we can't know for sure from which path an inode come from (if any),
path-based LSM hooks do not help for some file system checks (e.g.
inode_permission). With Landlock, I try to find a way to identify a set
of inodes, from the user space point of view, which is most of the time
related to file hierarchies.

I needed a way to "follow" a path walk, with the minimum amount of code,
and if possible without touching the fs/namei.c . I saw that the
pathwalk mechanism has evolved over time. With this patch, I tried to
make a kernel object (nameidata) usable in some way by LSM, but only
through an inode (current_nameidata_lookup(inode)). The "only" guarantee
of this function should be to identify if an inode is tied to a path
walk. This enable to follow a path walk and know why an inode access is
requested.

I get your concern about the "instability" of the path walk mechanism.
However, I though that a path resolution should not change from the user
space point of view, like other Linux ABI. Anyway, all the current
inode-based access controls, including DAC, rely on this path walks
mechanism. This patch does not expose anything to user space, but only
through the API of Landlock, which is currently relying on path walk
resolutions, already visible to user space. Did I miss something? Do you
have another suggestion to tie an inode to a path walk?

Thanks,
 Mickaël

signature.asc
Description: OpenPGP digital signature

Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing

2018-03-06 Thread Mickaël Salaün



On 28/02/2018 00:09, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 10:03 PM, Mickaël Salaün  wrote:
>>
>> On 27/02/2018 05:36, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 12:41 AM, Mickaël Salaün  wrote:
>>>> Hi,
>>>>
> 
>>>>
>>>> ## Why use the seccomp(2) syscall?
>>>>
>>>> Landlock use the same semantic as seccomp to apply access rule
>>>> restrictions. It add a new layer of security for the current process
>>>> which is inherited by its children. It makes sense to use an unique
>>>> access-restricting syscall (that should be allowed by seccomp filters)
>>>> which can only drop privileges. Moreover, a Landlock rule could come
>>>> from outside a process (e.g.  passed through a UNIX socket). It is then
>>>> useful to differentiate the creation/load of Landlock eBPF programs via
>>>> bpf(2), from rule enforcement via seccomp(2).
>>>
>>> This seems like a weak argument to me.  Sure, this is a bit different
>>> from seccomp(), and maybe shoving it into the seccomp() multiplexer is
>>> awkward, but surely the bpf() multiplexer is even less applicable.
>>
>> I think using the seccomp syscall is fine, and everyone agreed on it.
>>
> 
> Ah, sorry, I completely misread what you wrote.  My apologies.  You
> can disregard most of my email.
> 
>>
>>>
>>> Also, looking forward, I think you're going to want a bunch of the
>>> stuff that's under consideration as new seccomp features.  Tycho is
>>> working on a "user notifier" feature for seccomp where, in addition to
>>> accepting, rejecting, or kicking to ptrace, you can send a message to
>>> the creator of the filter and wait for a reply.  I think that Landlock
>>> will want exactly the same feature.
>>
>> I don't think why this may be useful at all her. Landlock does not
>> filter at the syscall level but handles kernel object and actions as
>> does an LSM. That is the whole purpose of Landlock.
> 
> Suppose I'm writing a container manager.  I want to run "mount" in the
> container, but I don't want to allow moun() in general and I want to
> emulate certain mount() actions.  I can write a filter that catches
> mount using seccomp and calls out to the container manager for help.
> This isn't theoretical -- Tycho wants *exactly* this use case to be
> supported.

Well, I think this use case should be handled with something like
LD_PRELOAD and a helper library. FYI, I did something like this:
https://github.com/stemjail/stemshim

Otherwise, we should think about enabling a process to (dynamically)
extend/patch the vDSO (similar to LD_PRELOAD but at the syscall level
and works with static binaries) for a subset of processes (the same way
seccomp filters are inherited). It may be more powerful and flexible
than extending the kernel/seccomp to patch (buggy?) userland.

> 
> But using seccomp for this is indeed annoying.  It would be nice to
> use Landlock's ability to filter based on the filesystem type, for
> example.  So Tycho could write a Landlock rule like:
> 
> bool filter_mount(...)
> {
>   if (path needs emulation)
> call_user_notifier();
> }
> 
> And it should work.
> 
> This means that, if both seccomp user notifiers and Landlock make it
> upstream, then there should probably be a way to have a user notifier
> bound to a seccomp filter and a set of landlock filters.
> 

Using seccomp filters and Landlock programs may be powerful. However,
for this use case, I think a *post-syscall* vDSO-like (which could get
some data returned by a Landlock program) may be much more flexible
(with less kernel code). What is needed here is a way to know the kernel
semantic (Landlock) and a way to patch userland without patching its
code (vDSO-like).



signature.asc
Description: OpenPGP digital signature

Re: [PATCH bpf-next v8 08/11] landlock: Add ptrace restrictions

2018-03-06 Thread Mickaël Salaün


On 28/02/2018 01:09, Andy Lutomirski wrote:
> On Wed, Feb 28, 2018 at 12:00 AM, Mickaël Salaün  wrote:
>>
>> On 28/02/2018 00:23, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 11:02 PM, Andy Lutomirski  wrote:
>>>> On Tue, Feb 27, 2018 at 10:14 PM, Mickaël Salaün  wrote:
>>>>>
>>>>
>>>> I think you're wrong here.  Any sane container trying to use Landlock
>>>> like this would also create a PID namespace.  Problem solved.  I still
>>>> think you should drop this patch.
>>
>> Containers is one use case, another is build-in sandboxing (e.g. for web
>> browser…) and another one is for sandbox managers (e.g. Firejail,
>> Bubblewrap, Flatpack…). In some of these use cases, especially from a
>> developer point of view, you may want/need to debug your applications
>> (without requiring to be root). For nested Landlock access-controls
>> (e.g. container + user session + web browser), it may not be allowed to
>> create a PID namespace, but you still want to have a meaningful
>> access-control.
>>
> 
> The consideration should be exactly the same as for normal seccomp.
> If I'm in a container (using PID namespaces + seccomp) and a run a web
> browser, I can debug the browser.
> 
> If there's a real use case for adding this type of automatic ptrace
> protection, then by all means, let's add it as a general seccomp
> feature.
> 

Right, it makes sense to add this feature to seccomp filters as well.
What do you think Kees?



signature.asc
Description: OpenPGP digital signature

Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing

2018-03-06 Thread Mickaël Salaün

On 06/03/2018 23:46, Tycho Andersen wrote:
> On Tue, Mar 06, 2018 at 10:33:17PM +, Andy Lutomirski wrote:
 Suppose I'm writing a container manager.  I want to run "mount" in the
 container, but I don't want to allow moun() in general and I want to
 emulate certain mount() actions.  I can write a filter that catches
 mount using seccomp and calls out to the container manager for help.
 This isn't theoretical -- Tycho wants *exactly* this use case to be
 supported.
>>>
>>> Well, I think this use case should be handled with something like
>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>> https://github.com/stemjail/stemshim
>>
>> I doubt that will work for containers.  Containers that use user
>> namespaces and, for example, setuid programs aren't going to honor
>> LD_PRELOAD.
> 
> Or anything that calls syscalls directly, like go programs.

That's why the vDSO-like approach. Enforcing an access control is not
the issue here, patching a buggy userland (without patching its code) is
the issue isn't it?

As far as I remember, the main problem is to handle file descriptors
while "emulating" the kernel behavior. This can be done with a "shim"
code mapped in every processes. Chrome used something like this (in a
previous sandbox mechanism) as a kind of emulation (with the current
seccomp-bpf ). I think it should be doable to replace the (userland)
emulation code with an IPC wrapper receiving file descriptors through
UNIX socket.

signature.asc
Description: OpenPGP digital signature

Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy

2018-04-08 Thread Mickaël Salaün


On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
> 
> On 27/02/2018 17:39, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>  wrote:
>>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote:
>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>>  wrote:
>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote:
>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>>  wrote:
>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock 
>>>>>>>> program
>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for the
>>>>>>>> current task and all its future children. A program is immutable and a
>>>>>>>> task can only add new restricting programs to itself, forming a list of
>>>>>>>> programss.
>>>>>>>>
>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a 
>>>>>>>> kernel
>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind of
>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>> evaluated. Each program return a 32-bit value which can deny the action
>>>>>>>> on a kernel object with a non-zero value. If every programs of the list
>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>
>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value for 
>>>>>>>> a
>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>>> chaining is restricted when a process construct this chain by loading a
>>>>>>>> program, but additional checks are performed when it requests to apply
>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs in
>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>> chains of fs_pick programs).
>>>>>>>>
>>>>>>>> Signed-off-by: Mickaël Salaün 
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>>> + struct landlock_prog_set *current_prog_set,
>>>>>>>> + struct bpf_prog *prog)
>>>>>>>> +{
>>>>>>>> + struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>>> + unsigned long pages;
>>>>>>>> + int err;
>>>>>>>> + size_t i;
>>>>>>>> + struct landlock_prog_set tmp_prog_set = {};
>>>>>>>> +
>>>>>>>> + if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>>>>>>>> + return ERR_PTR(-EINVAL);
>>>>>>>> +
>>>>>>>> + /* validate memory size allocation */
>>>>>>>> + pages = prog->pages;
>>>>>>>> + if (current_prog_set) {
>>>>>>>> + size_t i;
>>>>>>>> +
>>>>>>>> + for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); 
>>>>>>>> i++) {
>>>>>>>> + struct landlock_prog_list *walker_p;
>>>>>>>> +
>>>>>>>> + for (walker_p = current_prog_set->programs[i];
>>>>>>>> + walker_p; walker_p = 
&g

Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy

2018-04-08 Thread Mickaël Salaün


On 04/08/2018 11:06 PM, Andy Lutomirski wrote:
> On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün  wrote:
>>
>> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>>
>>> On 27/02/2018 17:39, Andy Lutomirski wrote:
>>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>>>  wrote:
>>>>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote:
>>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>>>>  wrote:
>>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote:
>>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>>>>  wrote:
>>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock 
>>>>>>>>>> program
>>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for 
>>>>>>>>>> the
>>>>>>>>>> current task and all its future children. A program is immutable and 
>>>>>>>>>> a
>>>>>>>>>> task can only add new restricting programs to itself, forming a list 
>>>>>>>>>> of
>>>>>>>>>> programss.
>>>>>>>>>>
>>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a 
>>>>>>>>>> kernel
>>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this kind 
>>>>>>>>>> of
>>>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the 
>>>>>>>>>> action
>>>>>>>>>> on a kernel object with a non-zero value. If every programs of the 
>>>>>>>>>> list
>>>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>>>
>>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value 
>>>>>>>>>> for a
>>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  This
>>>>>>>>>> chaining is restricted when a process construct this chain by 
>>>>>>>>>> loading a
>>>>>>>>>> program, but additional checks are performed when it requests to 
>>>>>>>>>> apply
>>>>>>>>>> this chain of programs to itself.  The restrictions ensure that it is
>>>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>>>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>>>> commits).  This restrictions still allows to reuse Landlock programs 
>>>>>>>>>> in
>>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>>>> chains of fs_pick programs).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Mickaël Salaün 
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>> +struct landlock_prog_set *landlock_prepend_prog(
>>>>>>>>>> + struct landlock_prog_set *current_prog_set,
>>>>>>>>>> + struct bpf_prog *prog)
>>>>>>>>>> +{
>>>>>>>>>> + struct landlock_prog_set *new_prog_set = current_prog_set;
>>>>>>>>>> + unsigned long pages;
>>>>>>>>>> + int err;
>>>>>>>>>> + size_t i;
>>>>>>>>>> + struct landlock_prog_set tmp_prog_set = {};
>>>>>>>>>> +
>>>>>>>>&g

Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy

2018-04-11 Thread Mickaël Salaün


On 04/10/2018 06:48 AM, Alexei Starovoitov wrote:
> On Mon, Apr 09, 2018 at 12:01:59AM +0200, Mickaël Salaün wrote:
>>
>> On 04/08/2018 11:06 PM, Andy Lutomirski wrote:
>>> On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün  wrote:
>>>>
>>>> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>>>>
>>>>> On 27/02/2018 17:39, Andy Lutomirski wrote:
>>>>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>>>>>  wrote:
>>>>>>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote:
>>>>>>>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>>>>>>>  wrote:
>>>>>>>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote:
>>>>>>>>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>>>>>>>>  wrote:
>>>>>>>>>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>>>>>>>>>>>> The seccomp(2) syscall can be used by a task to apply a Landlock 
>>>>>>>>>>>> program
>>>>>>>>>>>> to itself. As a seccomp filter, a Landlock program is enforced for 
>>>>>>>>>>>> the
>>>>>>>>>>>> current task and all its future children. A program is immutable 
>>>>>>>>>>>> and a
>>>>>>>>>>>> task can only add new restricting programs to itself, forming a 
>>>>>>>>>>>> list of
>>>>>>>>>>>> programss.
>>>>>>>>>>>>
>>>>>>>>>>>> A Landlock program is tied to a Landlock hook. If the action on a 
>>>>>>>>>>>> kernel
>>>>>>>>>>>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>>>>>>>>>>>> capabilities, other LSM), then a Landlock hook related to this 
>>>>>>>>>>>> kind of
>>>>>>>>>>>> object is triggered. The list of programs for this hook is then
>>>>>>>>>>>> evaluated. Each program return a 32-bit value which can deny the 
>>>>>>>>>>>> action
>>>>>>>>>>>> on a kernel object with a non-zero value. If every programs of the 
>>>>>>>>>>>> list
>>>>>>>>>>>> return zero, then the action on the object is allowed.
>>>>>>>>>>>>
>>>>>>>>>>>> Multiple Landlock programs can be chained to share a 64-bits value 
>>>>>>>>>>>> for a
>>>>>>>>>>>> call chain (e.g. evaluating multiple elements of a file path).  
>>>>>>>>>>>> This
>>>>>>>>>>>> chaining is restricted when a process construct this chain by 
>>>>>>>>>>>> loading a
>>>>>>>>>>>> program, but additional checks are performed when it requests to 
>>>>>>>>>>>> apply
>>>>>>>>>>>> this chain of programs to itself.  The restrictions ensure that it 
>>>>>>>>>>>> is
>>>>>>>>>>>> not possible to call multiple programs in a way that would imply to
>>>>>>>>>>>> handle multiple shared values (i.e. cookies) for one chain.  For 
>>>>>>>>>>>> now,
>>>>>>>>>>>> only a fs_pick program can be chained to the same type of program,
>>>>>>>>>>>> because it may make sense if they have different triggers (cf. next
>>>>>>>>>>>> commits).  This restrictions still allows to reuse Landlock 
>>>>>>>>>>>> programs in
>>>>>>>>>>>> a safe way (e.g. use the same loaded fs_walk program with multiple
>>>>>>>>>>>> chains of fs_pick programs).
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Mickaël Salaün 
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>

Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing

2018-03-08 Thread Mickaël Salaün

On 07/03/2018 02:21, Andy Lutomirski wrote:
> On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün  wrote:
>>
>> On 06/03/2018 23:46, Tycho Andersen wrote:
>>> On Tue, Mar 06, 2018 at 10:33:17PM +, Andy Lutomirski wrote:
>>>>>> Suppose I'm writing a container manager.  I want to run "mount" in the
>>>>>> container, but I don't want to allow moun() in general and I want to
>>>>>> emulate certain mount() actions.  I can write a filter that catches
>>>>>> mount using seccomp and calls out to the container manager for help.
>>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>>>> supported.
>>>>>
>>>>> Well, I think this use case should be handled with something like
>>>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>>>> https://github.com/stemjail/stemshim
>>>>
>>>> I doubt that will work for containers.  Containers that use user
>>>> namespaces and, for example, setuid programs aren't going to honor
>>>> LD_PRELOAD.
>>>
>>> Or anything that calls syscalls directly, like go programs.
>>
>> That's why the vDSO-like approach. Enforcing an access control is not
>> the issue here, patching a buggy userland (without patching its code) is
>> the issue isn't it?
>>
>> As far as I remember, the main problem is to handle file descriptors
>> while "emulating" the kernel behavior. This can be done with a "shim"
>> code mapped in every processes. Chrome used something like this (in a
>> previous sandbox mechanism) as a kind of emulation (with the current
>> seccomp-bpf ). I think it should be doable to replace the (userland)
>> emulation code with an IPC wrapper receiving file descriptors through
>> UNIX socket.
>>
> 
> Can you explain exactly what you mean by "vDSO-like"?
> 
> When a 64-bit program does a syscall, it just executes the SYSCALL
> instruction.  The vDSO isn't involved at all.  32-bit programs usually
> go through the vDSO, but not always.
> 
> It could be possible to force-load a DSO into an entire container and
> rig up seccomp to intercept all SYSCALLs not originating from the DSO
> such that they merely redirect control to the DSO, but that seems
> quite messy.

vDSO is a code mapped for all processes. As you said, these processes
may use it or not. What I was thinking about is to use the same concept,
i.e. map a "shim" code into each processes pertaining to a particular
hierarchy (the same way seccomp filters are inherited across processes).
With a seccomp filter matching some syscall (e.g. mount, open), it is
possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This
shim code should then be able to emulate/patch what is needed, even
faking a file opening by receiving a file descriptor through a UNIX
socket. As did the Chrome sandbox, the seccomp filter may look at the
calling address to allow the shim code to call syscalls without being
catched, if needed. However, relying on SIGSYS may not fit with
arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump
to a specific process address, to emulate the syscall in an easier way
than only relying on a {c,e}BPF program.

signature.asc
Description: OpenPGP digital signature

[RFC PATCH v1 2/5] fs: Add a MAY_EXECMOUNT flag to infer the noexec mount propertie

2018-12-12 Thread Mickaël Salaün

An LSM doesn't get path information related to an access request to open
an inode.  This new (internal) MAY_EXECMOUNT flag enables an LSM to
check if the underlying mount point of an inode is marked as executable.
This is useful to implement a security policy taking advantage of the
noexec mount option.

This flag is set according to path_noexec(), which checks if a mount
point is mounted with MNT_NOEXEC or if the underlying superblock is
SB_I_NOEXEC.

Signed-off-by: Mickaël Salaün 
Reviewed-by: Philippe Trébuchet 
Reviewed-by: Thibaut Sautereau 
Cc: Al Viro 
Cc: Kees Cook 
Cc: Mickaël Salaün 
---
 fs/namei.c | 2 ++
 include/linux/fs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 0cab6494978c..de4f33b3f464 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2970,6 +2970,8 @@ static int may_open(const struct path *path, int 
acc_mode, int flag)
break;
}
 
+   /* Pass the mount point executability. */
+   acc_mode |= path_noexec(path) ? 0 : MAY_EXECMOUNT;
error = inode_permission(inode, MAY_OPEN | acc_mode);
if (error)
return error;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 584c9329ad78..083a31b8068e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -96,6 +96,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define MAY_NOT_BLOCK  0x0080
 /* the inode is opened with O_MAYEXEC */
 #define MAY_OPENEXEC   0x0100
+/* the mount point is marked as executable */
+#define MAY_EXECMOUNT  0x0200
 
 /*
  * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
-- 
2.20.0.rc2

[RFC PATCH v1 5/5] doc: Add documentation for Yama's open_mayexec_enforce

2018-12-12 Thread Mickaël Salaün

Signed-off-by: Mickaël Salaün 
Reviewed-by: Philippe Trébuchet 
Reviewed-by: Thibaut Sautereau 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Mickaël Salaün 
---
 Documentation/admin-guide/LSM/Yama.rst | 41 ++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/LSM/Yama.rst 
b/Documentation/admin-guide/LSM/Yama.rst
index d0a060de3973..a72c86a24b35 100644
--- a/Documentation/admin-guide/LSM/Yama.rst
+++ b/Documentation/admin-guide/LSM/Yama.rst
@@ -72,3 +72,44 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) 
are:
 ``PTRACE_TRACEME``. Once set, this sysctl value cannot be changed.
 
 The original children-only logic was based on the restrictions in grsecurity.
+
+open_mayexec_enforce
+
+
+The ``O_MAYEXEC`` flag can be passed to :manpage:`open(2)` to only open files
+(or directories) that are executable.  If the file is not identified as
+executable, then the syscall returns -EACCES.  This may allow a script
+interpreter to check executable permission before reading commands from a file.
+One interesting use case is to enforce a "write xor execute" policy through
+interpreters.
+
+Thanks to this flag, Yama enables to enforce the ``noexec`` mount option (i.e.
+the underlying mount point of the file is mounted with MNT_NOEXEC or its
+underlying superblock is SB_I_NOEXEC) not only on ELF binaries but also on
+scripts.  This may be possible thanks to script interpreters using the
+``O_MAYEXEC`` flag.  The executable permission is then checked before reading
+commands from a file, and thus can enforce the ``noexec`` at the interpreter
+level by propagating this security policy to the scripts.  To be fully
+effective, these interpreters also need to handle the other ways to execute
+code (for which the kernel can't help): command line parameters (e.g., option
+``-e`` for Perl), module loading (e.g., option ``-m`` for Python), stdin, file
+sourcing, environment variables, configuration files...  According to the
+threat model, it may be acceptable to allow some script interpreters (e.g.
+Bash) to interpret commands from stdin, may it be a TTY or a pipe, because it
+may not be enough to (directly) perform syscalls.
+
+Yama implements two complementary security policies to propagate the ``noexec``
+mount option or the executable file permission.  These policies are handled by
+the ``kernel.yama.open_mayexec_enforce`` sysctl (writable only with
+``CAP_MAC_ADMIN``) as a bitmask:
+
+1 - mount restriction:
+check that the mount options for the underlying VFS mount do not prevent
+execution.
+
+2 - file permission restriction:
+check that the to-be-opened file is marked as executable for the current
+process (e.g., POSIX permissions).
+
+Code samples can be found in tools/testing/selftests/yama/test_omayexec.c and
+https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC .
-- 
2.20.0.rc2

[RFC PATCH v1 0/5] Add support for O_MAYEXEC

2018-12-12 Thread Mickaël Salaün

Hi,

The goal of this patch series is to control script interpretation.  A
new O_MAYEXEC flag used by sys_open() is added to enable userland script
interpreter to delegate to the kernel (and thus the system security
policy) the permission to interpret scripts or other files containing
what can be seen as commands.

The security policy is the responsibility of an LSM.  A basic
system-wide policy is implemented with Yama and configurable through a
sysctl.

The initial idea come from CLIP OS and the original implementation has
been used for more than 10 years:
https://github.com/clipos-archive/clipos4_doc

An introduction to O_MAYEXEC was given at the Linux Security Summit
Europe 2018 - Linux Kernel Security Contributions by ANSSI:
https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s
The "write xor execute" principle was explained at Kernel Recipes 2018 -
CLIP OS: a defense-in-depth OS:
https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s

This patch series can be applied on top of v4.20-rc6.  This can be
tested with CONFIG_SECURITY_YAMA.  I would really appreciate
constructive comments on this RFC.

Regards,

Mickaël Salaün (5):
  fs: Add support for an O_MAYEXEC flag on sys_open()
  fs: Add a MAY_EXECMOUNT flag to infer the noexec mount propertie
  Yama: Enforces noexec mounts or file executability through O_MAYEXEC
  selftest/yama: Add tests for O_MAYEXEC enforcing
  doc: Add documentation for Yama's open_mayexec_enforce

 Documentation/admin-guide/LSM/Yama.rst   |  41 +++
 MAINTAINERS  |   1 +
 fs/fcntl.c   |   2 +-
 fs/namei.c   |   2 +
 fs/open.c|   4 +
 include/linux/fcntl.h|   2 +-
 include/linux/fs.h   |   4 +
 include/uapi/asm-generic/fcntl.h |   3 +
 security/yama/Kconfig|   3 +-
 security/yama/yama_lsm.c |  82 +-
 tools/testing/selftests/Makefile |   1 +
 tools/testing/selftests/yama/.gitignore  |   1 +
 tools/testing/selftests/yama/Makefile|  19 ++
 tools/testing/selftests/yama/config  |   2 +
 tools/testing/selftests/yama/test_omayexec.c | 276 +++
 15 files changed, 439 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/yama/.gitignore
 create mode 100644 tools/testing/selftests/yama/Makefile
 create mode 100644 tools/testing/selftests/yama/config
 create mode 100644 tools/testing/selftests/yama/test_omayexec.c

-- 
2.20.0.rc2

[RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()

2018-12-12 Thread Mickaël Salaün

When the O_MAYEXEC flag is passed, sys_open() may be subject to
additional restrictions depending on a security policy implemented by an
LSM through the inode_permission hook.

The underlying idea is to be able to restrict scripts interpretation
according to a policy defined by the system administrator.  For this to
be possible, script interpreters must use the O_MAYEXEC flag
appropriately.  To be fully effective, these interpreters also need to
handle the other ways to execute code (for which the kernel can't help):
command line parameters (e.g., option -e for Perl), module loading
(e.g., option -m for Python), stdin, file sourcing, environment
variables, configuration files...  According to the threat model, it may
be acceptable to allow some script interpreters (e.g. Bash) to interpret
commands from stdin, may it be a TTY or a pipe, because it may not be
enough to (directly) perform syscalls.

A simple security policy implementation is available in a following
patch for Yama.

This is an updated subset of the patch initially written by Vincent
Strubel for CLIP OS:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than 10 years with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Signed-off-by: Mickaël Salaün 
Signed-off-by: Thibaut Sautereau 
Signed-off-by: Vincent Strubel 
Reviewed-by: Philippe Trébuchet 
Cc: Al Viro 
Cc: Kees Cook 
Cc: Mickaël Salaün 
---
 fs/fcntl.c   | 2 +-
 fs/open.c| 4 
 include/linux/fcntl.h| 2 +-
 include/linux/fs.h   | 2 ++
 include/uapi/asm-generic/fcntl.h | 3 +++
 5 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 083185174c6d..6c85c4d0c006 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -1031,7 +1031,7 @@ static int __init fcntl_init(void)
 * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
 * is defined as O_NONBLOCK on some platforms and not on others.
 */
-   BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ !=
+   BUILD_BUG_ON(22 - 1 /* for O_RDONLY being 0 */ !=
HWEIGHT32(
(VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) |
__FMODE_EXEC | __FMODE_NONOTIFY));
diff --git a/fs/open.c b/fs/open.c
index 0285ce7dbd51..75479b79a58f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -974,6 +974,10 @@ static inline int build_open_flags(int flags, umode_t 
mode, struct open_flags *o
if (flags & O_APPEND)
acc_mode |= MAY_APPEND;
 
+   /* Check execution permissions on open. */
+   if (flags & O_MAYEXEC)
+   acc_mode |= MAY_OPENEXEC;
+
op->acc_mode = acc_mode;
 
op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN;
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index 27dc7a60693e..1fc00cabe9ab 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -9,7 +9,7 @@
(O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | 
\
 O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
 FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
-O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
+O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_MAYEXEC)
 
 #ifndef force_o_largefile
 #define force_o_largefile() (BITS_PER_LONG != 32)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c95c0807471f..584c9329ad78 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -94,6 +94,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define MAY_CHDIR  0x0040
 /* called from RCU mode, don't block */
 #define MAY_NOT_BLOCK  0x0080
+/* the inode is opened with O_MAYEXEC */
+#define MAY_OPENEXEC   0x0100
 
 /*
  * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 9dc0bf0c5a6e..cbb9425d6e7c 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -97,6 +97,9 @@
 #define O_NDELAY   O_NONBLOCK
 #endif
 
+/* command execution from file is intended, check exec permissions */
+#define O_MAYEXEC  04000
+
 #define F_DUPFD0   /* dup */
 #define F_GETFD1   /* get close_on_exec */
 #define F_SETFD2   /* set/clear close_on_exec */
-- 
2.20.0.rc2

[RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC

2018-12-12 Thread Mickaël Salaün

Enable to either propagate the mount options from the underlying VFS
mount to prevent execution, or to propagate the file execute permission.
This may allow a script interpreter to check execution permissions
before reading commands from a file.

The main goal is to be able to protect the kernel by restricting
arbitrary syscalls that an attacker could perform with a crafted binary
or certain script languages.  It also improves multilevel isolation
by reducing the ability of an attacker to use side channels with
specific code.  These restrictions can natively be enforced for ELF
binaries (with the noexec mount option) but require this kernel
extension to properly handle scripts (e.g., Python, Perl).

Add a new sysctl kernel.yama.open_mayexec_enforce to control this
behavior.  A following patch adds documentation.

Signed-off-by: Mickaël Salaün 
Reviewed-by: Philippe Trébuchet 
Reviewed-by: Thibaut Sautereau 
Cc: Kees Cook 
Cc: Mickaël Salaün 
---
 security/yama/Kconfig|  3 +-
 security/yama/yama_lsm.c | 82 +++-
 2 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/security/yama/Kconfig b/security/yama/Kconfig
index 96b27405558a..9457619fabd5 100644
--- a/security/yama/Kconfig
+++ b/security/yama/Kconfig
@@ -5,7 +5,8 @@ config SECURITY_YAMA
help
  This selects Yama, which extends DAC support with additional
  system-wide security settings beyond regular Linux discretionary
- access controls. Currently available is ptrace scope restriction.
+ access controls. Currently available are ptrace scope restriction and
+ enforcement of the O_MAYEXEC open flag.
  Like capabilities, this security module stacks with other LSMs.
  Further information can be found in
  Documentation/admin-guide/LSM/Yama.rst.
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index ffda91a4a1aa..120664e94ee5 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -1,10 +1,12 @@
 /*
  * Yama Linux Security Module
  *
- * Author: Kees Cook 
+ * Authors: Kees Cook 
+ *  Mickaël Salaün 
  *
  * Copyright (C) 2010 Canonical, Ltd.
  * Copyright (C) 2011 The Chromium OS Authors.
+ * Copyright (C) 2018 ANSSI
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2, as
@@ -28,7 +30,14 @@
 #define YAMA_SCOPE_CAPABILITY  2
 #define YAMA_SCOPE_NO_ATTACH   3
 
+#define YAMA_OMAYEXEC_ENFORCE_NONE 0
+#define YAMA_OMAYEXEC_ENFORCE_MOUNT(1 << 0)
+#define YAMA_OMAYEXEC_ENFORCE_FILE (1 << 1)
+#define _YAMA_OMAYEXEC_LASTYAMA_OMAYEXEC_ENFORCE_FILE
+#define _YAMA_OMAYEXEC_MASK((_YAMA_OMAYEXEC_LAST << 1) - 1)
+
 static int ptrace_scope = YAMA_SCOPE_RELATIONAL;
+static int open_mayexec_enforce = YAMA_OMAYEXEC_ENFORCE_NONE;
 
 /* describe a ptrace relationship for potential exception */
 struct ptrace_relation {
@@ -423,7 +432,40 @@ int yama_ptrace_traceme(struct task_struct *parent)
return rc;
 }
 
+/**
+ * yama_inode_permission - check O_MAYEXEC permission before accessing an inode
+ * @inode: inode structure to check
+ * @mask: permission mask
+ *
+ * Return 0 if access is permitted, -EACCES otherwise.
+ */
+int yama_inode_permission(struct inode *inode, int mask)
+{
+   if (!(mask & MAY_OPENEXEC))
+   return 0;
+   /*
+* Match regular files and directories to make it easier to
+* modify script interpreters.
+*/
+   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+   return 0;
+
+   if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) &&
+   !(mask & MAY_EXECMOUNT))
+   return -EACCES;
+
+   /*
+* May prefer acl_permission_check() instead of generic_permission(),
+* to not be bypassable with CAP_DAC_READ_SEARCH.
+*/
+   if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE)
+   return generic_permission(inode, MAY_EXEC);
+
+   return 0;
+}
+
 static struct security_hook_list yama_hooks[] __lsm_ro_after_init = {
+   LSM_HOOK_INIT(inode_permission, yama_inode_permission),
LSM_HOOK_INIT(ptrace_access_check, yama_ptrace_access_check),
LSM_HOOK_INIT(ptrace_traceme, yama_ptrace_traceme),
LSM_HOOK_INIT(task_prctl, yama_task_prctl),
@@ -447,6 +489,37 @@ static int yama_dointvec_minmax(struct ctl_table *table, 
int write,
return proc_dointvec_minmax(&table_copy, write, buffer, lenp, ppos);
 }
 
+static int yama_dointvec_bitmask_macadmin(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+   int error;
+
+   if (write) {
+   struct ctl_table table_copy;
+   int tmp_m

[RFC PATCH v1 4/5] selftest/yama: Add tests for O_MAYEXEC enforcing

2018-12-12 Thread Mickaël Salaün

Test propagation of noexec mount points or file executability through
files open with or without O_MAYEXEC.

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Mickaël Salaün 
Cc: Shuah Khan 
---
 MAINTAINERS  |   1 +
 tools/testing/selftests/Makefile |   1 +
 tools/testing/selftests/yama/.gitignore  |   1 +
 tools/testing/selftests/yama/Makefile|  19 ++
 tools/testing/selftests/yama/config  |   2 +
 tools/testing/selftests/yama/test_omayexec.c | 276 +++
 6 files changed, 300 insertions(+)
 create mode 100644 tools/testing/selftests/yama/.gitignore
 create mode 100644 tools/testing/selftests/yama/Makefile
 create mode 100644 tools/testing/selftests/yama/config
 create mode 100644 tools/testing/selftests/yama/test_omayexec.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8119141a926f..a1d01a81b283 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16591,6 +16591,7 @@ M:  Kees Cook 
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
yama/tip
 S: Supported
 F: security/yama/
+F: tools/testing/selftests/yama/
 F: Documentation/admin-guide/LSM/Yama.rst
 
 YEALINK PHONE DRIVER
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index f0017c831e57..608f31167aa6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -46,6 +46,7 @@ endif
 TARGETS += user
 TARGETS += vm
 TARGETS += x86
+TARGETS += yama
 TARGETS += zram
 #Please keep the TARGETS list alphabetically sorted
 # Run "make quicktest=1 run_tests" or
diff --git a/tools/testing/selftests/yama/.gitignore 
b/tools/testing/selftests/yama/.gitignore
new file mode 100644
index ..6e8d5cfb48d0
--- /dev/null
+++ b/tools/testing/selftests/yama/.gitignore
@@ -0,0 +1 @@
+/test_omayexec
diff --git a/tools/testing/selftests/yama/Makefile 
b/tools/testing/selftests/yama/Makefile
new file mode 100644
index ..d411f1615b60
--- /dev/null
+++ b/tools/testing/selftests/yama/Makefile
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: GPL-2.0
+
+all:
+
+include ../lib.mk
+
+.PHONY: all clean
+
+BINARIES := test_omayexec
+CFLAGS += -Wl,-no-as-needed -Wall -Werror
+LDFLAGS += -lcap
+
+test_omayexec: test_omayexec.c ../kselftest_harness.h
+   $(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
+
+TEST_PROGS += $(BINARIES)
+EXTRA_CLEAN := $(BINARIES)
+
+all: $(BINARIES)
diff --git a/tools/testing/selftests/yama/config 
b/tools/testing/selftests/yama/config
new file mode 100644
index ..9d375bfc465b
--- /dev/null
+++ b/tools/testing/selftests/yama/config
@@ -0,0 +1,2 @@
+CONFIG_SECURITY=y
+CONFIG_SECURITY_YAMA=y
diff --git a/tools/testing/selftests/yama/test_omayexec.c 
b/tools/testing/selftests/yama/test_omayexec.c
new file mode 100644
index ..7d41097f0e89
--- /dev/null
+++ b/tools/testing/selftests/yama/test_omayexec.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Yama tests - O_MAYEXEC
+ *
+ * Copyright © 2018 ANSSI
+ *
+ * Author: Mickaël Salaün 
+ */
+
+#include 
+#include  /* O_CLOEXEC */
+#include 
+#include 
+#include  /* strlen */
+#include 
+#include 
+#include  /* mkdir */
+#include  /* unlink, rmdir */
+
+#include "../kselftest_harness.h"
+
+#ifndef O_MAYEXEC
+#define O_MAYEXEC  04000
+#endif
+
+#define SYSCTL_MAYEXEC "/proc/sys/kernel/yama/open_mayexec_enforce"
+
+#define BIN_DIR"./test-mount"
+#define BIN_PATH   BIN_DIR "/file"
+#define DIR_PATH   BIN_DIR "/directory"
+
+#define ALLOWED1
+#define DENIED 0
+
+static void test_omx(struct __test_metadata *_metadata,
+   const char *const path, const int exec_allowed)
+{
+   int fd;
+
+   /* without O_MAYEXEC */
+   fd = open(path, O_RDONLY | O_CLOEXEC);
+   ASSERT_NE(-1, fd);
+   EXPECT_FALSE(close(fd));
+
+   /* with O_MAYEXEC */
+   fd = open(path, O_RDONLY | O_CLOEXEC | O_MAYEXEC);
+   if (exec_allowed) {
+   /* open should succeed */
+   ASSERT_NE(-1, fd);
+   EXPECT_FALSE(close(fd));
+   } else {
+   /* open should return EACCES */
+   ASSERT_EQ(-1, fd);
+   ASSERT_EQ(EACCES, errno);
+   }
+}
+
+static void ignore_dac(struct __test_metadata *_metadata, int override)
+{
+   cap_t caps;
+   const cap_value_t cap_val[2] = {
+   CAP_DAC_OVERRIDE,
+   CAP_DAC_READ_SEARCH,
+   };
+
+   caps = cap_get_proc();
+   ASSERT_TRUE(!!caps);
+   ASSERT_FALSE(cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_val,
+   override ? CAP_SET : CAP_CLEAR));
+   ASSERT_FALSE(cap_set_proc(caps));
+   EXPECT_FALSE(cap_free(caps));
+}
+
+static void test_dir_file(struct __test_metadata *_metadata,
+   const char *const dir_path, const char *const file_path,
+   const int exec_a

Re: [RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC

2018-12-12 Thread Mickaël Salaün




Le 12/12/2018 à 09:17, Mickaël Salaün a écrit :
> Enable to either propagate the mount options from the underlying VFS
> mount to prevent execution, or to propagate the file execute permission.
> This may allow a script interpreter to check execution permissions
> before reading commands from a file.
> 
> The main goal is to be able to protect the kernel by restricting
> arbitrary syscalls that an attacker could perform with a crafted binary
> or certain script languages.  It also improves multilevel isolation
> by reducing the ability of an attacker to use side channels with
> specific code.  These restrictions can natively be enforced for ELF
> binaries (with the noexec mount option) but require this kernel
> extension to properly handle scripts (e.g., Python, Perl).
> 
> Add a new sysctl kernel.yama.open_mayexec_enforce to control this
> behavior.  A following patch adds documentation.
> 
> Signed-off-by: Mickaël Salaün 
> Reviewed-by: Philippe Trébuchet 
> Reviewed-by: Thibaut Sautereau 
> Cc: Kees Cook 
> Cc: Mickaël Salaün 
> ---
>  security/yama/Kconfig|  3 +-
>  security/yama/yama_lsm.c | 82 +++-
>  2 files changed, 83 insertions(+), 2 deletions(-)
> 
> diff --git a/security/yama/Kconfig b/security/yama/Kconfig
> index 96b27405558a..9457619fabd5 100644
> --- a/security/yama/Kconfig
> +++ b/security/yama/Kconfig
> @@ -5,7 +5,8 @@ config SECURITY_YAMA
>   help
> This selects Yama, which extends DAC support with additional
> system-wide security settings beyond regular Linux discretionary
> -   access controls. Currently available is ptrace scope restriction.
> +   access controls. Currently available are ptrace scope restriction and
> +   enforcement of the O_MAYEXEC open flag.
> Like capabilities, this security module stacks with other LSMs.
> Further information can be found in
> Documentation/admin-guide/LSM/Yama.rst.
> diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
> index ffda91a4a1aa..120664e94ee5 100644
> --- a/security/yama/yama_lsm.c
> +++ b/security/yama/yama_lsm.c
> @@ -1,10 +1,12 @@
>  /*
>   * Yama Linux Security Module
>   *
> - * Author: Kees Cook 
> + * Authors: Kees Cook 
> + *  Mickaël Salaün 
>   *
>   * Copyright (C) 2010 Canonical, Ltd.
>   * Copyright (C) 2011 The Chromium OS Authors.
> + * Copyright (C) 2018 ANSSI
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2, as
> @@ -28,7 +30,14 @@
>  #define YAMA_SCOPE_CAPABILITY2
>  #define YAMA_SCOPE_NO_ATTACH 3
>  
> +#define YAMA_OMAYEXEC_ENFORCE_NONE   0
> +#define YAMA_OMAYEXEC_ENFORCE_MOUNT  (1 << 0)
> +#define YAMA_OMAYEXEC_ENFORCE_FILE   (1 << 1)
> +#define _YAMA_OMAYEXEC_LAST  YAMA_OMAYEXEC_ENFORCE_FILE
> +#define _YAMA_OMAYEXEC_MASK  ((_YAMA_OMAYEXEC_LAST << 1) - 1)
> +
>  static int ptrace_scope = YAMA_SCOPE_RELATIONAL;
> +static int open_mayexec_enforce = YAMA_OMAYEXEC_ENFORCE_NONE;
>  
>  /* describe a ptrace relationship for potential exception */
>  struct ptrace_relation {
> @@ -423,7 +432,40 @@ int yama_ptrace_traceme(struct task_struct *parent)
>   return rc;
>  }
>  
> +/**
> + * yama_inode_permission - check O_MAYEXEC permission before accessing an 
> inode
> + * @inode: inode structure to check
> + * @mask: permission mask
> + *
> + * Return 0 if access is permitted, -EACCES otherwise.
> + */
> +int yama_inode_permission(struct inode *inode, int mask)
> +{
> + if (!(mask & MAY_OPENEXEC))
> + return 0;
> + /*
> +  * Match regular files and directories to make it easier to
> +  * modify script interpreters.
> +  */
> + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
> + return 0;

I forgot to mention that these checks do not handle fifos. This is
relevant in a threat model targeting persistent attacks (and with
additional protections/restrictions).

> +
> + if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) &&
> + !(mask & MAY_EXECMOUNT))
> + return -EACCES;
> +
> + /*
> +  * May prefer acl_permission_check() instead of generic_permission(),
> +  * to not be bypassable with CAP_DAC_READ_SEARCH.
> +  */
> + if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE)
> + return generic_permission(inode, MAY_EXEC);
> +
> + return 0;
> +}
> +
>  static struct security_hook_list yama_hooks[] __lsm_ro_after_init = {
> + LSM_HOOK_INIT(inode_permission, yama_in

Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC

2018-12-12 Thread Mickaël Salaün



Le 12/12/2018 à 17:29, Jordan Glover a écrit :
> On Wednesday, December 12, 2018 9:17 AM, Mickaël Salaün  
> wrote:
> 
>> Hi,
>>
>> The goal of this patch series is to control script interpretation. A
>> new O_MAYEXEC flag used by sys_open() is added to enable userland script
>> interpreter to delegate to the kernel (and thus the system security
>> policy) the permission to interpret scripts or other files containing
>> what can be seen as commands.
>>
>> The security policy is the responsibility of an LSM. A basic
>> system-wide policy is implemented with Yama and configurable through a
>> sysctl.
>>
>> The initial idea come from CLIP OS and the original implementation has
>> been used for more than 10 years:
>> https://github.com/clipos-archive/clipos4_doc
>>
>> An introduction to O_MAYEXEC was given at the Linux Security Summit
>> Europe 2018 - Linux Kernel Security Contributions by ANSSI:
>> https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s
>> The "write xor execute" principle was explained at Kernel Recipes 2018 -
>> CLIP OS: a defense-in-depth OS:
>> https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s
>>
>> This patch series can be applied on top of v4.20-rc6. This can be
>> tested with CONFIG_SECURITY_YAMA. I would really appreciate
>> constructive comments on this RFC.
>>
>> Regards,
>>
> 
> Are various interpreters upstreams interested in adding support
> for O_MAYEXEC if it land in kernel? Did you contacted them about this?

I think the first step is to be OK on the kernel side. We will then be
able to help upstream interpreters implement this feature. It should be
OK because the behavior doesn't change by default, i.e. if the sysadmin
doesn't configure (and test) the whole system. Some examples of modified
interpreters can be found at
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
.

 Mickaël

Re: [RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()

2018-12-12 Thread Mickaël Salaün



Le 12/12/2018 à 15:43, Jan Kara a écrit :
> On Wed 12-12-18 09:17:08, Mickaël Salaün wrote:
>> When the O_MAYEXEC flag is passed, sys_open() may be subject to
>> additional restrictions depending on a security policy implemented by an
>> LSM through the inode_permission hook.
>>
>> The underlying idea is to be able to restrict scripts interpretation
>> according to a policy defined by the system administrator.  For this to
>> be possible, script interpreters must use the O_MAYEXEC flag
>> appropriately.  To be fully effective, these interpreters also need to
>> handle the other ways to execute code (for which the kernel can't help):
>> command line parameters (e.g., option -e for Perl), module loading
>> (e.g., option -m for Python), stdin, file sourcing, environment
>> variables, configuration files...  According to the threat model, it may
>> be acceptable to allow some script interpreters (e.g. Bash) to interpret
>> commands from stdin, may it be a TTY or a pipe, because it may not be
>> enough to (directly) perform syscalls.
>>
>> A simple security policy implementation is available in a following
>> patch for Yama.
>>
>> This is an updated subset of the patch initially written by Vincent
>> Strubel for CLIP OS:
>> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
>> This patch has been used for more than 10 years with customized script
>> interpreters.  Some examples can be found here:
>> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
>>
>> Signed-off-by: Mickaël Salaün 
>> Signed-off-by: Thibaut Sautereau 
>> Signed-off-by: Vincent Strubel 
>> Reviewed-by: Philippe Trébuchet 
>> Cc: Al Viro 
>> Cc: Kees Cook 
>> Cc: Mickaël Salaün 
> 
> ...
> 
>> diff --git a/fs/open.c b/fs/open.c
>> index 0285ce7dbd51..75479b79a58f 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -974,6 +974,10 @@ static inline int build_open_flags(int flags, umode_t 
>> mode, struct open_flags *o
>>  if (flags & O_APPEND)
>>  acc_mode |= MAY_APPEND;
>>  
>> +/* Check execution permissions on open. */
>> +if (flags & O_MAYEXEC)
>> +acc_mode |= MAY_OPENEXEC;
>> +
>>  op->acc_mode = acc_mode;
>>  
>>  op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN;
> 
> I don't feel experienced enough in security to tell whether we want this
> functionality or not. But if we do this, shouldn't we also set FMODE_EXEC
> on the resulting struct file? That way also security_file_open() can be
> used to arbitrate such executable opens and in particular
> fanotify permission event FAN_OPEN_EXEC will get properly generated which I
> guess is desirable (support for it is sitting in my tree waiting for the
> merge window) - adding some audit people involved in FAN_OPEN_EXEC to
> CC. Just an idea...

Indeed, it may be useful for other LSM.

 Mickaël

Re: [RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()

2018-12-13 Thread Mickaël Salaün



On 13/12/2018 10:47, Matthew Bobrowski wrote:
> On Wed, Dec 12, 2018 at 03:43:06PM +0100, Jan Kara wrote:
>>> When the O_MAYEXEC flag is passed, sys_open() may be subject to
>>> additional restrictions depending on a security policy implemented by an
>>> LSM through the inode_permission hook.
>>>
>>> The underlying idea is to be able to restrict scripts interpretation
>>> according to a policy defined by the system administrator.  For this to
>>> be possible, script interpreters must use the O_MAYEXEC flag
>>> appropriately.  To be fully effective, these interpreters also need to
>>> handle the other ways to execute code (for which the kernel can't help):
>>> command line parameters (e.g., option -e for Perl), module loading
>>> (e.g., option -m for Python), stdin, file sourcing, environment
>>> variables, configuration files...  According to the threat model, it may
>>> be acceptable to allow some script interpreters (e.g. Bash) to interpret
>>> commands from stdin, may it be a TTY or a pipe, because it may not be
>>> enough to (directly) perform syscalls.
>>>
>>> A simple security policy implementation is available in a following
>>> patch for Yama.
>>>
>>> This is an updated subset of the patch initially written by Vincent
>>> Strubel for CLIP OS:
>>> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
>>> This patch has been used for more than 10 years with customized script
>>> interpreters.  Some examples can be found here:
>>> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
>>>
>>> Signed-off-by: Mickaël Salaün 
>>> Signed-off-by: Thibaut Sautereau 
>>> Signed-off-by: Vincent Strubel 
>>> Reviewed-by: Philippe Trébuchet 
>>> Cc: Al Viro 
>>> Cc: Kees Cook 
>>> Cc: Mickaël Salaün 
>>
>> ...
>>
>>> diff --git a/fs/open.c b/fs/open.c
>>> index 0285ce7dbd51..75479b79a58f 100644
>>> --- a/fs/open.c
>>> +++ b/fs/open.c
>>> @@ -974,6 +974,10 @@ static inline int build_open_flags(int flags, umode_t 
>>> mode, struct open_flags *o
>>> if (flags & O_APPEND)
>>> acc_mode |= MAY_APPEND;
>>>  
>>> +   /* Check execution permissions on open. */
>>> +   if (flags & O_MAYEXEC)
>>> +   acc_mode |= MAY_OPENEXEC;
>>> +
>>> op->acc_mode = acc_mode;
>>>  
>>> op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN;
>>
>> I don't feel experienced enough in security to tell whether we want this
>> functionality or not. But if we do this, shouldn't we also set FMODE_EXEC
>> on the resulting struct file? That way also security_file_open() can be
>> used to arbitrate such executable opens and in particular
>> fanotify permission event FAN_OPEN_EXEC will get properly generated which I
>> guess is desirable (support for it is sitting in my tree waiting for the
>> merge window) - adding some audit people involved in FAN_OPEN_EXEC to
>> CC. Just an idea...
> 
> If I'm understanding this patch series correctly, without an enforced LSM
> policy there's realistically no added benefit from a security perspective,
> right?

That's correct. The kernel knows the semantic but the enforcement is
delegated to an LSM and its policy.

> Also, I'm in agreement with what Jan has mentioned in regards to setting
> the __FMODE_EXEC flag when O_MAYEXEC has been specified. This is something 
> that
> would work quite nicely in conjunction with some of the new file access
> notification events.

OK, I will add it in the next patch series (for the new FAN_OPEN_EXEC
support).

Re: [RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC

2018-12-13 Thread Mickaël Salaün



On 12/12/2018 18:09, Jann Horn wrote:
> On Wed, Dec 12, 2018 at 9:18 AM Mickaël Salaün  wrote:
>> Enable to either propagate the mount options from the underlying VFS
>> mount to prevent execution, or to propagate the file execute permission.
>> This may allow a script interpreter to check execution permissions
>> before reading commands from a file.
>>
>> The main goal is to be able to protect the kernel by restricting
>> arbitrary syscalls that an attacker could perform with a crafted binary
>> or certain script languages.  It also improves multilevel isolation
>> by reducing the ability of an attacker to use side channels with
>> specific code.  These restrictions can natively be enforced for ELF
>> binaries (with the noexec mount option) but require this kernel
>> extension to properly handle scripts (e.g., Python, Perl).
>>
>> Add a new sysctl kernel.yama.open_mayexec_enforce to control this
>> behavior.  A following patch adds documentation.
>>
>> Signed-off-by: Mickaël Salaün 
>> Reviewed-by: Philippe Trébuchet 
>> Reviewed-by: Thibaut Sautereau 
>> Cc: Kees Cook 
>> Cc: Mickaël Salaün 
>> ---
> [...]
>> +/**
>> + * yama_inode_permission - check O_MAYEXEC permission before accessing an 
>> inode
>> + * @inode: inode structure to check
>> + * @mask: permission mask
>> + *
>> + * Return 0 if access is permitted, -EACCES otherwise.
>> + */
>> +int yama_inode_permission(struct inode *inode, int mask)
> 
> This should be static, no?

Right, it will be in the next series. The previous function
(yama_ptrace_traceme) is not static though.

> 
>> +{
>> +   if (!(mask & MAY_OPENEXEC))
>> +   return 0;
>> +   /*
>> +* Match regular files and directories to make it easier to
>> +* modify script interpreters.
>> +*/
>> +   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
>> +   return 0;
> 
> So files are subject to checks, but loading code from things like
> sockets is always fine?

As I said in a previous email, these checks do not handle fifo either.
This is relevant in a threat model targeting persistent attacks (and
with additional protections/restrictions). We may want to only whitelist
fifo, but I don't get how a socket is relevant here. Can you please clarify?

> 
>> +   if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) &&
>> +   !(mask & MAY_EXECMOUNT))
>> +   return -EACCES;
>> +
>> +   /*
>> +* May prefer acl_permission_check() instead of generic_permission(),
>> +* to not be bypassable with CAP_DAC_READ_SEARCH.
>> +*/
>> +   if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE)
>> +   return generic_permission(inode, MAY_EXEC);
>> +
>> +   return 0;
>> +}
>> +
>>  static struct security_hook_list yama_hooks[] __lsm_ro_after_init = {
>> +   LSM_HOOK_INIT(inode_permission, yama_inode_permission),
>> LSM_HOOK_INIT(ptrace_access_check, yama_ptrace_access_check),
>> LSM_HOOK_INIT(ptrace_traceme, yama_ptrace_traceme),
>> LSM_HOOK_INIT(task_prctl, yama_task_prctl),
>> @@ -447,6 +489,37 @@ static int yama_dointvec_minmax(struct ctl_table 
>> *table, int write,
>> return proc_dointvec_minmax(&table_copy, write, buffer, lenp, ppos);
>>  }
>>
>> +static int yama_dointvec_bitmask_macadmin(struct ctl_table *table, int 
>> write,
>> + void __user *buffer, size_t *lenp,
>> + loff_t *ppos)
>> +{
>> +   int error;
>> +
>> +   if (write) {
>> +   struct ctl_table table_copy;
>> +   int tmp_mayexec_enforce;
>> +
>> +   if (!capable(CAP_MAC_ADMIN))
>> +   return -EPERM;
> 
> Don't put capable() checks in sysctls, it doesn't work.
> 

I tested it and the root user can indeed open the file even if the
process doesn't have CAP_MAC_ADMIN, however writing in the sysctl file
is denied. Btw there is a similar check in the previous function
(yama_dointvec_minmax).

Thanks

Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC

2018-12-13 Thread Mickaël Salaün

On 13/12/2018 06:13, Florian Weimer wrote:
> * James Morris:
> 
>> On Wed, 12 Dec 2018, Florian Weimer wrote:
>>
>>> * James Morris:
>>>
 If you're depending on the script interpreter to flag that the user may 
 execute code, this seems to be equivalent in security terms to depending 
 on the user.  e.g. what if the user uses ptrace and clears O_MAYEXEC?

This security mechanism makes sense in an hardened system where the user
is not allowed to import and execute new file (write xor execute
policy). This can be enforced with appropriate mount points a more
advanced access control policy.

>>>
>>> The argument I've heard is this: Using ptrace (and adding the +x
>>> attribute) are auditable events.
>>
>> I guess you could also preload a modified libc which strips the flag.
> 
> My understanding is that this new libc would have to come somewhere, and
> making it executable would be an auditable even as well.

Auditing is a possible use case as well, but the W^X idea is to deny use
of libraries which are not in an executable mount point, i.e. only
execute trusted code.

Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC

2018-12-13 Thread Mickaël Salaün

On 13/12/2018 04:02, Matthew Wilcox wrote:
> On Wed, Dec 12, 2018 at 09:17:07AM +0100, Mickaël Salaün wrote:
>> The goal of this patch series is to control script interpretation.  A
>> new O_MAYEXEC flag used by sys_open() is added to enable userland script
>> interpreter to delegate to the kernel (and thus the system security
>> policy) the permission to interpret scripts or other files containing
>> what can be seen as commands.
> 
> I don't have a problem with the concept, but we're running low on O_ bits.
> Does this have to be done before the process gets a file descriptor,
> or could we have a new syscall?  Since we're going to be changing the
> interpreters anyway, it doesn't seem like too much of an imposition to
> ask them to use:
> 
>   int verify_for_exec(int fd)
> 
> instead of adding an O_MAYEXEC.
> 

Adding a new syscall for this simple use case seems excessive. I think
that the open/openat syscall familly are the right place to do an atomic
open and permission check, the same way the kernel does for other file
access. Moreover, it will be easier to patch upstream interpreters
without the burden of handling a (new) syscall that may not exist on the
running system, whereas unknown open flags are ignored.

Re: [RFC PATCH v1 0/5] Add support for O_MAYEXEC

2018-12-13 Thread Mickaël Salaün



On 13/12/2018 18:13, Matthew Wilcox wrote:
> On Thu, Dec 13, 2018 at 04:17:29PM +0100, Mickaël Salaün wrote:
>> On 13/12/2018 04:02, Matthew Wilcox wrote:
>>> On Wed, Dec 12, 2018 at 09:17:07AM +0100, Mickaël Salaün wrote:
>>>> The goal of this patch series is to control script interpretation.  A
>>>> new O_MAYEXEC flag used by sys_open() is added to enable userland script
>>>> interpreter to delegate to the kernel (and thus the system security
>>>> policy) the permission to interpret scripts or other files containing
>>>> what can be seen as commands.
>>>
>>> I don't have a problem with the concept, but we're running low on O_ bits.
>>> Does this have to be done before the process gets a file descriptor,
>>> or could we have a new syscall?  Since we're going to be changing the
>>> interpreters anyway, it doesn't seem like too much of an imposition to
>>> ask them to use:
>>>
>>> int verify_for_exec(int fd)
>>>
>>> instead of adding an O_MAYEXEC.
>>
>> Adding a new syscall for this simple use case seems excessive. I think
> 
> We have somewhat less than 400 syscalls today.  We have 20 O_ bits defined.
> Obviously there's a lower practical limit on syscalls, but in principle
> we could have up to 2^32 syscalls, and there are only 12 O_ bits remaining.
> 
>> that the open/openat syscall familly are the right place to do an atomic
>> open and permission check, the same way the kernel does for other file
>> access. Moreover, it will be easier to patch upstream interpreters
>> without the burden of handling a (new) syscall that may not exist on the
>> running system, whereas unknown open flags are ignored.
> 
> Ah, but that's the problem.  The interpreter can see an -ENOSYS response
> and handle it appropriately.  If the flag is silently ignored, the
> interpreter has no idea whether it can do a racy check or whether to
> skip even trying to do the check.

Right, but the interpreter should interpret the script if the open with
O_MAYEXEC succeed (but not otherwise): it may be because the flag is
known by the kernel and the system policy allow this call, or because
the (old) kernel doesn't known about this flag (which is fine and needed
for backward compatibility). The script interpretation must not failed
if the kernel doesn't support O_MAYEXEC, it is then useless for the
interpreter to do any additional check.

Re: [RFC PATCH v1 3/5] Yama: Enforces noexec mounts or file executability through O_MAYEXEC

2019-01-08 Thread Mickaël Salaün



On 03/01/2019 12:17, Jann Horn wrote:
> On Thu, Dec 13, 2018 at 3:49 PM Mickaël Salaün
>  wrote:
>> On 12/12/2018 18:09, Jann Horn wrote:
>>> On Wed, Dec 12, 2018 at 9:18 AM Mickaël Salaün  wrote:
>>>> Enable to either propagate the mount options from the underlying VFS
>>>> mount to prevent execution, or to propagate the file execute permission.
>>>> This may allow a script interpreter to check execution permissions
>>>> before reading commands from a file.
>>>>
>>>> The main goal is to be able to protect the kernel by restricting
>>>> arbitrary syscalls that an attacker could perform with a crafted binary
>>>> or certain script languages.  It also improves multilevel isolation
>>>> by reducing the ability of an attacker to use side channels with
>>>> specific code.  These restrictions can natively be enforced for ELF
>>>> binaries (with the noexec mount option) but require this kernel
>>>> extension to properly handle scripts (e.g., Python, Perl).
>>>>
>>>> Add a new sysctl kernel.yama.open_mayexec_enforce to control this
>>>> behavior.  A following patch adds documentation.
> [...]
>>>> +{
>>>> +   if (!(mask & MAY_OPENEXEC))
>>>> +   return 0;
>>>> +   /*
>>>> +* Match regular files and directories to make it easier to
>>>> +* modify script interpreters.
>>>> +*/
>>>> +   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
>>>> +   return 0;
>>>
>>> So files are subject to checks, but loading code from things like
>>> sockets is always fine?
>>
>> As I said in a previous email, these checks do not handle fifo either.
>> This is relevant in a threat model targeting persistent attacks (and
>> with additional protections/restrictions). We may want to only whitelist
>> fifo, but I don't get how a socket is relevant here. Can you please clarify?
> 
> I don't think that there's a security problem here. I just think it's
> weird to have the extra check when it seems to me like it isn't really
> necessary - nobody is going to want to execute a socket or fifo
> anyway, right?

Right, the fifo whitelisting should answer your concern then.

> 
>>>
>>>> +   if ((open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_MOUNT) &&
>>>> +   !(mask & MAY_EXECMOUNT))
>>>> +   return -EACCES;
>>>> +
>>>> +   /*
>>>> +* May prefer acl_permission_check() instead of 
>>>> generic_permission(),
>>>> +* to not be bypassable with CAP_DAC_READ_SEARCH.
>>>> +*/
>>>> +   if (open_mayexec_enforce & YAMA_OMAYEXEC_ENFORCE_FILE)
>>>> +   return generic_permission(inode, MAY_EXEC);
>>>> +
>>>> +   return 0;
>>>> +}
>>>> +
>>>>  static struct security_hook_list yama_hooks[] __lsm_ro_after_init = {
>>>> +   LSM_HOOK_INIT(inode_permission, yama_inode_permission),
>>>> LSM_HOOK_INIT(ptrace_access_check, yama_ptrace_access_check),
>>>> LSM_HOOK_INIT(ptrace_traceme, yama_ptrace_traceme),
>>>> LSM_HOOK_INIT(task_prctl, yama_task_prctl),
>>>> @@ -447,6 +489,37 @@ static int yama_dointvec_minmax(struct ctl_table 
>>>> *table, int write,
>>>> return proc_dointvec_minmax(&table_copy, write, buffer, lenp, 
>>>> ppos);
>>>>  }
>>>>
>>>> +static int yama_dointvec_bitmask_macadmin(struct ctl_table *table, int 
>>>> write,
>>>> + void __user *buffer, size_t 
>>>> *lenp,
>>>> + loff_t *ppos)
>>>> +{
>>>> +   int error;
>>>> +
>>>> +   if (write) {
>>>> +   struct ctl_table table_copy;
>>>> +   int tmp_mayexec_enforce;
>>>> +
>>>> +   if (!capable(CAP_MAC_ADMIN))
>>>> +   return -EPERM;
>>>
>>> Don't put capable() checks in sysctls, it doesn't work.
>>>
>>
>> I tested it and the root user can indeed open the file even if the
>> process doesn't have CAP_MAC_ADMIN, however writing in the sysctl file
>> is denied. Btw there is a similar check in the previous function
>> (yama_dointvec_minmax).
> 
> It's still wrong. If an attacker without CAP_MAC_ADMIN opens the
> sysctl file, then passes the file descriptor to a setcap binary that
> has CAP_MAC_ADMIN as stdout/stderr, and the setcap binary writes to
> it, then the capable() check is bypassed. (But of course, to open the
> sysctl file in the first place, you'd need to be root (uid 0), so the
> check doesn't really matter.)

I agree with you that a confused deputy attack may uses file
descriptors, but I don't see how the current sysctl API may be used to
check the process capability at open time. Anyway, on a properly
configured system, especially one leveraging Linux capabilities (e.g.
CLIP OS), root processes may not have CAP_SYS_ADMIN. Moreover, SUID or
fcap binaries may not be available to an attacker (e.g. in a container).

Re: [PATCH 16/18] LSM: Allow arbitrary LSM ordering

2018-09-17 Thread Mickaël Salaün


On 9/18/18 00:36, John Johansen wrote:
> On 09/17/2018 02:57 PM, Casey Schaufler wrote:
>> On 9/17/2018 12:55 PM, John Johansen wrote:
>>> On 09/17/2018 12:23 PM, Casey Schaufler wrote:
 On 9/17/2018 11:14 AM, Kees Cook wrote:
>> Keep security=$lsm with the existing exclusive behavior.
>> Add lsm=$lsm1,...,$lsmN which requires a full list of modules
>>
>> If you want to be fancy (I don't!) you could add
>>
>> lsm.add=$lsm1,...,$lsmN which adds the modules to the stack
>> lsm.delete=$lsm1,...,$lsmN which deletes modules from the stack
> We've got two issues: ordering and enablement. It's been strongly
> suggested that we should move away from per-LSM enable/disable flags
> (to which I agree).
 I also agree. There are way too many ways to turn off some LSMs.

>>> I wont disagree, but its largely because we didn't have this discussion
>>> when we should have.
>>
>> True that.
>>
>>
> If ordering should be separate from enablement (to
> avoid the "booted kernel with new LSM built in, but my lsm="..." line
> didn't include it so it's disabled case), then I think we need to
> split the logic (otherwise we just reinvented "security=" with similar
> problems).
 We could reduce the problem by declaring that LSM ordering is
 not something you can specify on the boot line. I can see value
 in specifying it when you build the kernel, but your circumstances
 would have to be pretty strange to change it at boot time.

>>> if there is LSM ordering the getting
>>>
>>>   lsm=B,A,C
>>>
>>> is not the behavior I would expect from specifying
>>>
>>>   lsm=A,B,C
>>
>> Right. You'd expect that they'd be used in the order specified.
>>
> 
> and yet you argue for something different ;)
> 
> Should "lsm=" allow arbitrary ordering? (I think yes.)
 I say no. Assume you can specify it at build time. When would
 you want to change the order? Why would you?

>>> because maybe you care about the denial message from one LSM more than
>>> you do from another. Since stacking is bail on first fail the order
>>> could be important from an auditing POV
>>
>> I understand that a distribution would want to specify the order
>> for support purposes and that a developer would want to specify
>> the order to ensure reproducible behavior. But they are going to
>> be controlling their kernel builds. I'm not suggesting that the
>> order shouldn't be capable of build time specification. What I
>> don't see is a reason to rearrange it at boot time.
>>
> 
> Because not all users have the same priority as the distro. It can
> also aid in debugging and testing of LSMs in a stacked situation.
> 
>>> Auditing is why apparmor's internal stacking is not bail on first
>>> fail.
>>
>> Within a security module I get that. But we've already got the
>> priority wrong for audit in general, because you only get to the
>> LSM if the traditional code approves. Every guidance I ever got
> 
> true
> 
>> said you should do the MAC checks first, because you're much more
>> concerned about getting audit records about MAC failures than DAC.
>>
> 
> yep, wouldn't that be nice to have
> 
> Should "lsm=" imply implicit enable/disable? (I think no: unlisted
> LSMs are implicitly auto-appended to the explicit list)
 If you want to add something that isn't there instead of making
 it explicit you want "lsm.enable=" not "lsm=".

> So then we could have "lsm.enable=..." and "lsm.disable=...".
>
> If builtin list was:
> capability,yama,loadpin,integrity,{selinux,smack,tomoyo,apparmor}
> then:
>
> lsm.disable=loadpin lsm=smack
 Methinks this should be lsm.disable=loadpin lsm.enable=smack

>>> that would only work if order is not important
>>
>> It works unless you want to change the order at boot, and
>> I still don't see a use case for that.
> 
> see above
> 
>>
> becomes
>
> capability,smack,yama,integrity
>
> and
>
> CONFIG_SECURITY_LOADPIN_DEFAULT_ENABLED=n
> selinux.enable=0 lsm.add=loadpin lsm.disable=smack,tomoyo 
> lsm=integrity
 Do you mean
selinux.enable=0 lsm.enable=loadpin lsm.disable=smack,tomoyo 
 lsm.enable=integrity
selinux.enable=0 lsm.enable=loadpin,integrity lsm.disable=smack,tomoyo
selinux.enable=0 lsm.enable=loadpin lsm.enable=integrity 
 lsm.disable=smack lsm.disable=tomoyo

> becomes
>
> capability,integrity,yama,loadpin,apparmor
>
>
> If "lsm=" _does_ imply enablement, then how does it interact with
> per-LSM disabling? i.e. what does "apparmor.enabled=0
> lsm=yama,apparmor" mean? If it means "turn on apparmor" how do I turn
> on a CONFIG-default-off LSM without specifying all the other LSMs too?
 There should either be one option "lsm=", which is an explicit list or
 two, "lsm.enable=" and "lsm.disable", which modify the built in default.

>>> maybe but th

Re: [PATCH 16/18] LSM: Allow arbitrary LSM ordering

2018-09-17 Thread Mickaël Salaün

On 9/18/18 01:30, Casey Schaufler wrote:
> On 9/17/2018 4:20 PM, Kees Cook wrote:
>> On Mon, Sep 17, 2018 at 4:10 PM, Mickaël Salaün  wrote:
>>> Landlock, because it target unprivileged users, should only be called
>>> after all other major (access-control) LSMs. The admin or distro must
>>> not be able to change that order in any way. This constraint doesn't
>>> apply to current LSMs, though.
> 
> What harm would it cause for Landlock to get called before SELinux?
> I certainly see why it seems like it ought to get called after, but
> would it really make a difference?

If an unprivileged process is able to infer some properties of a file
being requested (thanks to one of its eBPF program doing checks on this
process accesses), whereas this file access would be denied by a
privileged LSM, then there is a side channel attack allowing this
process to indirectly get information otherwise inaccessible.

In other words, an unprivileged process should not be allowed to sneak
itself (via an eBPF program) before SELinux for instance. SELinux should
be able to block such information gathering the same way it can block a
fstat(2) requested by a process.

signature.asc
Description: OpenPGP digital signature

Re: WARNING in current_check_refer_path

2024-04-29 Thread Mickaël Salaün

Hello,

Thanks for the report.  Could you please provide a reproducer?

Regards,
 Mickaël


On Sun, Apr 28, 2024 at 10:47:02AM +0800, Ubisectech Sirius wrote:
> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> the email were a PoC file of the issue.
> 
> Stack dump:
> 
> loop3: detected capacity change from 0 to 1024
> [ cut here ]
> WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access 
> security/landlock/fs.c:598 [inline]
> WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access 
> security/landlock/fs.c:578 [inline]
> WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 
> current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758
> Modules linked in:
> CPU: 0 PID: 30368 Comm: syz-executor.3 Not tainted 6.7.0 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:get_mode_access security/landlock/fs.c:598 [inline]
> RIP: 0010:get_mode_access security/landlock/fs.c:578 [inline]
> RIP: 0010:current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758
> Code: e9 76 fb ff ff 41 bc fe ff ff ff e9 6b fb ff ff e8 00 99 77 fd 90 0f 0b 
> 90 41 bc f3 ff ff ff e9 57 fb ff ff e8 ec 98 77 fd 90 <0f> 0b 90 31 db e9 86 
> f9 ff ff bb 00 08 00 00 e9 7c f9 ff ff 41 ba
> RSP: 0018:c90001fb7ba0 EFLAGS: 00010212
> RAX: 0bc5 RBX: 88805feeb7b0 RCX: c90006e15000
> RDX: 0004 RSI: 84125d64 RDI: 0003
> RBP: 8880123c5608 R08: 0003 R09: c000
> R10: f000 R11:  R12: 88805d32fc00
> R13: 8880123c5608 R14:  R15: 0001
> FS:  7fd70c4d8640() GS:88802c60() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2c136000 CR3: 5b2a CR4: 00750ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  
>  security_path_rename+0x124/0x230 security/security.c:1828
>  do_renameat2+0x9f6/0xd30 fs/namei.c:4983
>  __do_sys_rename fs/namei.c:5042 [inline]
>  __se_sys_rename fs/namei.c:5040 [inline]
>  __x64_sys_rename+0x81/0xa0 fs/namei.c:5040
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7fd70b6900ed
> Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7fd70c4d8028 EFLAGS: 0246 ORIG_RAX: 0052
> RAX: ffda RBX: 7fd70b7cbf80 RCX: 7fd70b6900ed
> RDX:  RSI: 2140 RDI: 2100
> RBP: 7fd70b6f14a6 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 000b R14: 7fd70b7cbf80 R15: 7fd70c4b8000
>  
> 
> Thank you for taking the time to read this email and we look forward to 
> working with you further.
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: 回复：WARNING in current_check_refer_path

2024-04-29 Thread Mickaël Salaün

On Mon, Apr 29, 2024 at 05:16:57PM +0800, Ubisectech Sirius wrote:
> > Hello,
> 
> > Thanks for the report.  Could you please provide a reproducer?
> 
> > Regards,
> > Mickaël
> 
> Hi.
>   The Poc file has seed to you as attachment.

Indeed, but could you please trim down the file. There are 650 lines,
most of them are irrelevant.

> 
> > On Sun, Apr 28, 2024 at 10:47:02AM +0800, Ubisectech Sirius wrote:
> >> Hello.
> >> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> >> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> >> the email were a PoC file of the issue.
> >> 
> >> Stack dump:
> >> 
> > > loop3: detected capacity change from 0 to 1024
> > > [ cut here ]
> > > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access 
> > > security/landlock/fs.c:598 [inline]
> > > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 get_mode_access 
> > > security/landlock/fs.c:578 [inline]
> > > WARNING: CPU: 0 PID: 30368 at security/landlock/fs.c:598 
> > > current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758
> > > Modules linked in:
> > > CPU: 0 PID: 30368 Comm: syz-executor.3 Not tainted 6.7.0 #2
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> > > 04/01/2014
> > > RIP: 0010:get_mode_access security/landlock/fs.c:598 [inline]
> > > RIP: 0010:get_mode_access security/landlock/fs.c:578 [inline]
> > > RIP: 0010:current_check_refer_path+0x955/0xa60 security/landlock/fs.c:758
> > > Code: e9 76 fb ff ff 41 bc fe ff ff ff e9 6b fb ff ff e8 00 99 77 fd 90 
> > > 0f 0b 90 41 bc f3 ff ff ff e9 57 fb ff ff e8 ec 98 77 fd 90 <0f> 0b 90 31 
> > > db e9 86 f9 ff ff bb 00 08 00 00 e9 7c f9 ff ff 41 ba
> > > RSP: 0018:c90001fb7ba0 EFLAGS: 00010212
> > > RAX: 0bc5 RBX: 88805feeb7b0 RCX: c90006e15000
> > > RDX: 0004 RSI: 84125d64 RDI: 0003
> > > RBP: 8880123c5608 R08: 0003 R09: c000
> > > R10: f000 R11:  R12: 88805d32fc00
> > > R13: 8880123c5608 R14:  R15: 0001
> > > FS:  7fd70c4d8640() GS:88802c60() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 001b2c136000 CR3: 5b2a CR4: 00750ef0
> > > DR0:  DR1:  DR2: 
> > > DR3:  DR6: fffe0ff0 DR7: 0400
> > > PKRU: 5554
> > > Call Trace:
> > >  
> > >  security_path_rename+0x124/0x230 security/security.c:1828
> > >  do_renameat2+0x9f6/0xd30 fs/namei.c:4983
> > >  __do_sys_rename fs/namei.c:5042 [inline]
> > >  __se_sys_rename fs/namei.c:5040 [inline]
> > >  __x64_sys_rename+0x81/0xa0 fs/namei.c:5040
> > >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > >  do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
> > >  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> > > RIP: 0033:0x7fd70b6900ed
> > > Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 
> > > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 
> > > ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:7fd70c4d8028 EFLAGS: 0246 ORIG_RAX: 0052
> > > RAX: ffda RBX: 7fd70b7cbf80 RCX: 7fd70b6900ed
> >>  RDX:  RSI: 2140 RDI: 2100
> > > RBP: 7fd70b6f14a6 R08:  R09: 
> > > R10:  R11: 0246 R12: 
> > > R13: 000b R14: 7fd70b7cbf80 R15: 7fd70c4b8000
> > >  
> > > 
> > > Thank you for taking the time to read this email and we look forward to 
> > > working with you further.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 
>

Re: [PATCH] certs: Restrict blacklist updates to the secondary trusted keyring

2023-09-11 Thread Mickaël Salaün

On Mon, Sep 11, 2023 at 09:29:07AM -0400, Mimi Zohar wrote:
> Hi Eric,
> 
> On Fri, 2023-09-08 at 17:34 -0400, Eric Snowberg wrote:
> > Currently root can dynamically update the blacklist keyring if the hash
> > being added is signed and vouched for by the builtin trusted keyring.
> > Currently keys in the secondary trusted keyring can not be used.
> > 
> > Keys within the secondary trusted keyring carry the same capabilities as
> > the builtin trusted keyring.  Relax the current restriction for updating
> > the .blacklist keyring and allow the secondary to also be referenced as
> > a trust source.  Since the machine keyring is linked to the secondary
> > trusted keyring, any key within it may also be used.
> > 
> > An example use case for this is IMA appraisal.  Now that IMA both
> > references the blacklist keyring and allows the machine owner to add
> > custom IMA CA certs via the machine keyring, this adds the additional
> > capability for the machine owner to also do revocations on a running
> > system.
> > 
> > IMA appraisal usage example to add a revocation for /usr/foo:
> > 
> > sha256sum /bin/foo | awk '{printf "bin:" $1}' > hash.txt
> > 
> > openssl smime -sign -in hash.txt -inkey machine-private-key.pem \
> >-signer machine-certificate.pem -noattr -binary -outform DER \
> >-out hash.p7s
> > 
> > keyctl padd blacklist "$(< hash.txt)" %:.blacklist < hash.p7s
> > 
> > Signed-off-by: Eric Snowberg 
> 
> The secondary keyring may include both CA and code signing keys.  With
> this change any key loaded onto the secondary keyring may blacklist a
> hash.  Wouldn't it make more sense to limit blacklisting
> certificates/hashes to at least CA keys? 

Some operational constraints may limit what a CA can sign.

This change is critical and should be tied to a dedicated kernel config
(disabled by default), otherwise existing systems using this feature
will have their threat model automatically changed without notice.

> 
> > ---
> >  certs/Kconfig | 2 +-
> >  certs/blacklist.c | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/certs/Kconfig b/certs/Kconfig
> > index 1f109b070877..23dc87c52aff 100644
> > --- a/certs/Kconfig
> > +++ b/certs/Kconfig
> > @@ -134,7 +134,7 @@ config SYSTEM_BLACKLIST_AUTH_UPDATE
> > depends on SYSTEM_DATA_VERIFICATION
> > help
> >   If set, provide the ability to load new blacklist keys at run time if
> > - they are signed and vouched by a certificate from the builtin trusted
> > + they are signed and vouched by a certificate from the secondary 
> > trusted
> 
> If CONFIG_SECONDARY_TRUSTED_KEYRING is not enabled, it falls back to
> the builtin keyring.  Please update the comment accordingly.
> 
> >   keyring.  The PKCS#7 signature of the description is set in the key
> >   payload.  Blacklist keys cannot be removed.
> >  
> > diff --git a/certs/blacklist.c b/certs/blacklist.c
> > index 675dd7a8f07a..0b346048ae2d 100644
> > --- a/certs/blacklist.c
> > +++ b/certs/blacklist.c
> > @@ -102,12 +102,12 @@ static int blacklist_key_instantiate(struct key *key,
> >  
> >  #ifdef CONFIG_SYSTEM_BLACKLIST_AUTH_UPDATE
> > /*
> > -* Verifies the description's PKCS#7 signature against the builtin
> > +* Verifies the description's PKCS#7 signature against the secondary
> >  * trusted keyring.
> >  */
> 
> And similarly here ...
> 
> > err = verify_pkcs7_signature(key->description,
> > strlen(key->description), prep->data, prep->datalen,
> > -   NULL, VERIFYING_UNSPECIFIED_SIGNATURE, NULL, NULL);
> > +   VERIFY_USE_SECONDARY_KEYRING, 
> > VERIFYING_UNSPECIFIED_SIGNATURE, NULL, NULL);
> > if (err)
> > return err;
> >  #else
> 
> -- 
> thanks,
> 
> Mimi
>

[PATCH v1 1/3] kconfig: Remove duplicate call to sym_get_string_value()

2021-02-15 Thread Mickaël Salaün

From: Mickaël Salaün 

Use the saved returned value of sym_get_string_value() instead of
calling it twice.

Cc: Masahiro Yamada 
Signed-off-by: Mickaël Salaün 
Link: https://lore.kernel.org/r/20210215122513.1773897-2-...@digikod.net
---
 scripts/kconfig/conf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/conf.c b/scripts/kconfig/conf.c
index db03e2f45de4..18a233d27a8d 100644
--- a/scripts/kconfig/conf.c
+++ b/scripts/kconfig/conf.c
@@ -137,7 +137,7 @@ static int conf_string(struct menu *menu)
printf("%*s%s ", indent - 1, "", menu->prompt->text);
printf("(%s) ", sym->name);
def = sym_get_string_value(sym);
-   if (sym_get_string_value(sym))
+   if (def)
printf("[%s] ", def);
if (!conf_askvalue(sym, def))
return 0;
-- 
2.30.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1051 matches

Mail list logo