Re: [PATCH] mm: Allow shmem mappings with MREMAP_DONTUNMAP

2021-03-03 Thread Lokesh Gidra
On Wed, Mar 3, 2021 at 10:13 AM Brian Geffon  wrote:
>
> I apologize, this patch didn't include my signed off by, here it is:
>
> Signed-off-by: Brian Geffon 
>
>
> On Wed, Mar 3, 2021 at 9:53 AM Brian Geffon  wrote:
> >
> > Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This 
> > change
> > will widen the support to include shmem mappings. The primary use case
> > is to support MREMAP_DONTUNMAP on mappings which may have been created from
> > a memfd.
> >
> > Lokesh Gidra who works on the Android JVM, provided an explanation of how 
> > such
> > a feature will improve Android JVM garbage collection:
> > "Android is developing a new garbage collector (GC), based on userfaultfd. 
> > The
> > garbage collector will use userfaultfd (uffd) on the java heap during 
> > compaction.
> > On accessing any uncompacted page, the application threads will find it 
> > missing,
> > at which point the thread will create the compacted page and then use 
> > UFFDIO_COPY
> > ioctl to get it mapped and then resume execution. Before starting this 
> > compaction,
> > in a stop-the-world pause the heap will be mremap(MREMAP_DONTUNMAP) so that 
> > the
> > java heap is ready to receive UFFD_EVENT_PAGEFAULT events after resuming 
> > execution.
> >
> > To speedup mremap operations, pagetable movement was optimized by moving 
> > PUD entries
> > instead of PTE entries [1]. It was necessary as mremap of even modest sized 
> > memory
> > ranges also took several milliseconds, and stopping the application for 
> > that long
> > isn't acceptable in response-time sensitive cases. With UFFDIO_CONTINUE 
> > feature [2],
> > it will be even more efficient to implement this GC, particularly the 
> > 'non-moveable'
> > portions of the heap. It will also help in reducing the need to copy 
> > (UFFDIO_COPY)
> > the pages. However, for this to work, the java heap has to be on a 'shared' 
> > vma.
> > Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this 
> > patch will
> > enable using UFFDIO_CONTINUE for the new userfaultfd-based heap compaction."
> >
> > [1] 
> > https://lore.kernel.org/linux-mm/20201215030730.nc3cu98e4%25a...@linux-foundation.org/
> > [2] 
> > https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmus...@google.com/

Thanks for the patch, Brian. I've tested mremap(MREMAP_DONTUNMAP) on a
memfd memory range.

Tested-by: Lokesh Gidra 

> > ---
> >  mm/mremap.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index ec8f840399ed..6934d199da54 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -653,8 +653,7 @@ static struct vm_area_struct *vma_to_resize(unsigned 
> > long addr,
> > return ERR_PTR(-EINVAL);
> > }
> >
> > -   if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> > -   vma->vm_flags & VM_SHARED))
> > +   if (flags & MREMAP_DONTUNMAP && !(vma_is_anonymous(vma) || 
> > vma_is_shmem(vma)))
> > return ERR_PTR(-EINVAL);
> >
> > if (is_vm_hugetlb_page(vma))
> > --
> > 2.31.0.rc0.254.gbdcc3b1a9d-goog
> >


Re: [PATCH v3 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl

2021-02-01 Thread Lokesh Gidra
On Thu, Jan 28, 2021 at 2:48 PM Axel Rasmussen  wrote:
>
> This ioctl is how userspace ought to resolve "minor" userfaults. The
> idea is, userspace is notified that a minor fault has occurred. It might
> change the contents of the page using its second non-UFFD mapping, or
> not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
> the page contents are correct, carry on setting up the mapping".
>
> Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
> MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
> the minor fault case, we already have some pre-existing underlying page.
> Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
> We'd just use memcpy() or similar instead.
>
> It turns out hugetlb_mcopy_atomic_pte() already does very close to what
> we want, if an existing page is provided via `struct page **pagep`. We
> already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
> just extend that design: add an enum for the three modes of operation,
> and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
> case. (Basically, look up the existing page, and avoid adding the
> existing page to the page cache or calling set_page_huge_active() on
> it.)
>
> Signed-off-by: Axel Rasmussen 
> ---
>  fs/userfaultfd.c | 67 +++
>  include/linux/hugetlb.h  |  3 ++
>  include/linux/userfaultfd_k.h| 18 +
>  include/uapi/linux/userfaultfd.h | 21 +-
>  mm/hugetlb.c | 26 +++-
>  mm/userfaultfd.c | 69 +---
>  6 files changed, 170 insertions(+), 34 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 968aca3e3ee9..80a3fca389b8 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1530,6 +1530,10 @@ static int userfaultfd_register(struct userfaultfd_ctx 
> *ctx,
> if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP))
> ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT);
>
> +   /* CONTINUE ioctl is only supported for MINOR ranges. */
> +   if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR))
> +   ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE);
> +
> /*
>  * Now that we scanned all vmas we can already tell
>  * userland which ioctls methods are guaranteed to
> @@ -1883,6 +1887,66 @@ static int userfaultfd_writeprotect(struct 
> userfaultfd_ctx *ctx,
> return ret;
>  }
>
> +static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long 
> arg)
> +{
> +   __s64 ret;
> +   struct uffdio_continue uffdio_continue;
> +   struct uffdio_continue __user *user_uffdio_continue;
> +   struct userfaultfd_wake_range range;
> +
> +   user_uffdio_continue = (struct uffdio_continue __user *)arg;
> +
> +   ret = -EAGAIN;
> +   if (READ_ONCE(ctx->mmap_changing))
> +   goto out;
> +
> +   ret = -EFAULT;
> +   if (copy_from_user(_continue, user_uffdio_continue,
> +  /* don't copy the output fields */
> +  sizeof(uffdio_continue) - (sizeof(__s64
> +   goto out;
> +
> +   ret = validate_range(ctx->mm, _continue.range.start,
> +uffdio_continue.range.len);
> +   if (ret)
> +   goto out;
> +
> +   ret = -EINVAL;
> +   /* double check for wraparound just in case. */
> +   if (uffdio_continue.range.start + uffdio_continue.range.len <=
> +   uffdio_continue.range.start) {
> +   goto out;
> +   }
> +   if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE)
> +   goto out;
> +
> +   if (mmget_not_zero(ctx->mm)) {
> +   ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
> +uffdio_continue.range.len,
> +>mmap_changing);
> +   mmput(ctx->mm);
> +   } else {
> +   return -ESRCH;
> +   }
> +
> +   if (unlikely(put_user(ret, _uffdio_continue->mapped)))
> +   return -EFAULT;
> +   if (ret < 0)
> +   goto out;
> +
> +   /* len == 0 would wake all */
> +   BUG_ON(!ret);
> +   range.len = ret;
> +   if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) {
> +   range.start = uffdio_continue.range.start;
> +   wake_userfault(ctx, );
> +   }
> +   ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN;
> +
> +out:
> +   return ret;
> +}
> +
>  static inline unsigned int uffd_ctx_features(__u64 user_features)
>  {
> /*
> @@ -1967,6 +2031,9 @@ static long userfaultfd_ioctl(struct file *file, 
> unsigned cmd,
> case UFFDIO_WRITEPROTECT:
> ret = userfaultfd_writeprotect(ctx, arg);
>

Re: [PATCH v1] userfaultfd.2: Add UFFD_USER_MODE_ONLY flag

2021-01-26 Thread Lokesh Gidra
On Mon, Jan 25, 2021 at 5:44 PM Lokesh Gidra  wrote:
>
> Add description of UFFD_USER_MODE_ONLY flag to userfaultfd(2) manual
> page, which is required after [1]. Also updated the description of
> unprivileged_userfaultfd file in proc(5) as per [2].
>
> [1] 
> https://lore.kernel.org/linux-mm/20201215031349.nximl388w%25a...@linux-foundation.org/
> [2] 
> https://lore.kernel.org/linux-mm/20201215031354.gushjupko%25a...@linux-foundation.org/
>
> Signed-off-by: Lokesh Gidra 
> ---
>  man2/userfaultfd.2 |  5 +
>  man5/proc.5| 12 
>  2 files changed, 17 insertions(+)
>
> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
> index e7dc9f813..792a49d52 100644
> --- a/man2/userfaultfd.2
> +++ b/man2/userfaultfd.2
> @@ -72,6 +72,11 @@ See the description of the
>  .BR O_NONBLOCK
>  flag in
>  .BR open (2).
> +.TP
> +.BR UFFD_USER_MODE_ONLY " (Since Linux 5.11)"
> +Allow handling of user-mode page-faults only. See the description of the
> +unprivileged_userfaultfd file in
> +.BR proc (5).
>  .PP
>  When the last file descriptor referring to a userfaultfd object is closed,
>  all memory ranges that were registered with the object are unregistered
> diff --git a/man5/proc.5 b/man5/proc.5
> index f16a29d6e..cb2350c0c 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -5905,6 +5905,18 @@ If this file has the value 0, then only processes that 
> have the
>  capability may employ
>  .BR userfaultfd (2).
>  The default value in this file is 1.
> +.IP
> +Starting with Linux 5.11,
> +.BR userfaultfd (2)
> +can be used by all processes, however, if this file has the value 0, then
> +.BR UFFD_USER_MODE_ONLY
> +flag must be passed to it, which restricts page-fault handling to only
> +user-mode faults. This restriction is not applicable for processes with
> +.B CAP_SYS_PTRACE
> +capability, or if this file has the value 1. Furthermore, the default
> +value in this file is changed to 0. For further details see the
> +Linux kernel source file
> +.I Documentation/admin\-guide/sysctl/vm.rst.
>  .TP
>  .IR /proc/sysrq\-trigger " (since Linux 2.4.21)"
>  Writing a character to this file triggers the same SysRq function as
> --

Adding the right linux-mm mailing list. Mistakenly used
linux...@kvack.kernel.org earlier.

> 2.30.0.280.ga3ce27912f-goog
>


[PATCH v1] userfaultfd.2: Add UFFD_USER_MODE_ONLY flag

2021-01-26 Thread Lokesh Gidra
Add description of UFFD_USER_MODE_ONLY flag to userfaultfd(2) manual
page, which is required after [1]. Also updated the description of
unprivileged_userfaultfd file in proc(5) as per [2].

[1] 
https://lore.kernel.org/linux-mm/20201215031349.nximl388w%25a...@linux-foundation.org/
[2] 
https://lore.kernel.org/linux-mm/20201215031354.gushjupko%25a...@linux-foundation.org/

Signed-off-by: Lokesh Gidra 
---
 man2/userfaultfd.2 |  5 +
 man5/proc.5| 12 
 2 files changed, 17 insertions(+)

diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index e7dc9f813..792a49d52 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -72,6 +72,11 @@ See the description of the
 .BR O_NONBLOCK
 flag in
 .BR open (2).
+.TP
+.BR UFFD_USER_MODE_ONLY " (Since Linux 5.11)"
+Allow handling of user-mode page-faults only. See the description of the
+unprivileged_userfaultfd file in
+.BR proc (5).
 .PP
 When the last file descriptor referring to a userfaultfd object is closed,
 all memory ranges that were registered with the object are unregistered
diff --git a/man5/proc.5 b/man5/proc.5
index f16a29d6e..cb2350c0c 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -5905,6 +5905,18 @@ If this file has the value 0, then only processes that 
have the
 capability may employ
 .BR userfaultfd (2).
 The default value in this file is 1.
+.IP
+Starting with Linux 5.11,
+.BR userfaultfd (2)
+can be used by all processes, however, if this file has the value 0, then
+.BR UFFD_USER_MODE_ONLY
+flag must be passed to it, which restricts page-fault handling to only
+user-mode faults. This restriction is not applicable for processes with
+.B CAP_SYS_PTRACE
+capability, or if this file has the value 1. Furthermore, the default
+value in this file is changed to 0. For further details see the
+Linux kernel source file
+.I Documentation/admin\-guide/sysctl/vm.rst.
 .TP
 .IR /proc/sysrq\-trigger " (since Linux 2.4.21)"
 Writing a character to this file triggers the same SysRq function as
-- 
2.30.0.280.ga3ce27912f-goog



Re: [PATCH v15 0/4] SELinux support for anonymous inodes and UFFD

2021-01-14 Thread Lokesh Gidra
On Thu, Jan 14, 2021 at 2:47 PM Paul Moore  wrote:
>
> On Tue, Jan 12, 2021 at 12:15 PM Paul Moore  wrote:
> >
> > On Fri, Jan 8, 2021 at 5:22 PM Lokesh Gidra  wrote:
> > >
> > > Userfaultfd in unprivileged contexts could be potentially very
> > > useful. We'd like to harden userfaultfd to make such unprivileged use
> > > less risky. This patch series allows SELinux to manage userfaultfd
> > > file descriptors and in the future, other kinds of
> > > anonymous-inode-based file descriptor.
> >
> > ...
> >
> > > Daniel Colascione (3):
> > >   fs: add LSM-supporting anon-inode interface
> > >   selinux: teach SELinux about anonymous inodes
> > >   userfaultfd: use secure anon inodes for userfaultfd
> > >
> > > Lokesh Gidra (1):
> > >   security: add inode_init_security_anon() LSM hook
> > >
> > >  fs/anon_inodes.c| 150 
> > >  fs/libfs.c  |   5 -
> > >  fs/userfaultfd.c|  19 ++--
> > >  include/linux/anon_inodes.h |   5 +
> > >  include/linux/lsm_hook_defs.h   |   2 +
> > >  include/linux/lsm_hooks.h   |   9 ++
> > >  include/linux/security.h|  10 ++
> > >  security/security.c |   8 ++
> > >  security/selinux/hooks.c|  57 +++
> > >  security/selinux/include/classmap.h |   2 +
> > >  10 files changed, 213 insertions(+), 54 deletions(-)
> >
> > With several rounds of reviews done and the corresponding SELinux test
> > suite looking close to being ready I think it makes sense to merge
> > this via the SELinux tree.  VFS folks, if you have any comments or
> > objections please let me know soon.  If I don't hear anything within
> > the next day or two I'll go ahead and merge this for linux-next.
>
> With no comments over the last two days I merged the patchset into
> selinux/next.  Thanks for all your work and patience on this Lokesh.
>
Thanks so much.

> Also, it looks like you are very close to getting the associated
> SELinux test suite additions merged, please continue to work with
> Ondrej to get those merged soon.
>
Certainly! I'm waiting for his reviews for the latest patch.

> --
> paul moore
> www.paul-moore.com


[PATCH v15 2/4] fs: add LSM-supporting anon-inode interface

2021-01-08 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[LG: Delete obsolete comments to alloc_anon_inode()]
[LG: Add context_inode description in comments to anon_inode_getfd_secure()]
[LG: Remove definition of anon_inode_getfile_secure() as there are no callers]
[LG: Make __anon_inode_getfile() static]
[LG: Use correct error cast in __anon_inode_getfile()]
[LG: Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/anon_inodes.c| 150 ++--
 fs/libfs.c  |   5 --
 include/linux/anon_inodes.h |   5 ++
 3 files changed, 115 insertions(+), 45 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..023337d65a03 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+   }
+
+   file = alloc_file_pseudo(inode, anon_inode_mnt, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
  

[PATCH v15 4/4] userfaultfd: use secure anon inodes for userfaultfd

2021-01-08 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[LG: Remove owner inode from userfaultfd_ctx]
[LG: Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 894cc28142e7..0be8cdd4425a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -979,14 +979,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -996,7 +996,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1107,7 +1107,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1167,6 +1167,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1174,7 +1175,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1999,8 +2000,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.30.0.284.gd98b1dd5eaa7-goog



[PATCH v15 0/4] SELinux support for anonymous inodes and UFFD

2021-01-08 Thread Lokesh Gidra
Linux label and return -EACCES if it's
invalid.

Changes from the thirteenth version of the patch:
  - Initialize anon-inode's sclass with SECCLASS_ANON_INODE.
  - Check if context_inode has sclass set to SECCLASS_ANON_INODE.

Changes from the forteenth version of the patch:
  - Revert changes of v14.
  - Use FILE__CREATE (instead of ANON_INODE__CREATE) while initializing
anon-inode's SELinux security struct.
  - Added a pr_err() message if context_inode is not initialized.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  fs: add LSM-supporting anon-inode interface
  selinux: teach SELinux about anonymous inodes
  userfaultfd: use secure anon inodes for userfaultfd

Lokesh Gidra (1):
  security: add inode_init_security_anon() LSM hook

 fs/anon_inodes.c| 150 
 fs/libfs.c  |   5 -
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   5 +
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  57 +++
 security/selinux/include/classmap.h |   2 +
 10 files changed, 213 insertions(+), 54 deletions(-)

-- 
2.30.0.284.gd98b1dd5eaa7-goog



[PATCH v15 3/4] selinux: teach SELinux about anonymous inodes

2021-01-08 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 security/selinux/hooks.c| 57 +
 security/selinux/include/classmap.h |  2 +
 2 files changed, 59 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 644b17ec9e63..a5e12b2fabde 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2934,6 +2934,62 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   if (context_isec->initialized != LABEL_INITIALIZED) {
+   pr_err("SELinux:  context_inode is not initialized");
+   return -EACCES;
+   }
+
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -7000,6 +7056,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.30.0.284.gd98b1dd5eaa7-goog



[PATCH v15 1/4] security: add inode_init_security_anon() LSM hook

2021-01-08 Thread Lokesh Gidra
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.

The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 include/linux/lsm_hook_defs.h |  2 ++
 include/linux/lsm_hooks.h |  9 +
 include/linux/security.h  | 10 ++
 security/security.c   |  8 
 4 files changed, 29 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 7aaa753b8608..dfd261dcbcb0 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct 
inode *inode)
 LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
 struct inode *dir, const struct qstr *qstr, const char **name,
 void **value, size_t *len)
+LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
+const struct qstr *name, const struct inode *context_inode)
 LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
 umode_t mode)
 LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a19adef1f088..bdfc8a76a4f7 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -233,6 +233,15 @@
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ *  Set up the incore security field for the new anonymous inode
+ *  and return whether the inode creation is permitted by the security
+ *  module or not.
+ *  @inode contains the inode structure
+ *  @name name of the anonymous inode class
+ *  @context_inode optional related inode
+ * Returns 0 on success, -EACCES if the security module denies the
+ * creation of this inode, or another -errno upon other errors.
  * @inode_create:
  * Check permission to create a regular file.
  * @dir contains inode structure of the parent of the new file.
diff --git a/include/linux/security.h b/include/linux/security.h
index c35ea0ffccd9..b0d14f04b16d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -324,6 +324,9 @@ void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr,
 initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode);
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len);
@@ -738,6 +741,13 @@ static inline int security_inode_init_security(struct 
inode *inode,
return 0;
 }
 
+static inline int security_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode 
*context_inode)
+{
+   return 0;
+}
+
 static inline int security_old_inode_init_security(struct inode *inode,
   struct inode *dir,
   const struct qstr *qstr,
diff --git a/security/security.c b/security/security.c
index 7b09cfbae94f..401663b5b70e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1059,6 +1059,14 @@ int security_inode_init_security(struct inode *inode, 
struct inode *dir,
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+   return call_int_hook(inode_init_security_anon, 0, inode, name,
+context_inode);
+}
+
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len)
-- 
2.30.0.284.gd98b1dd5eaa7-goog



Re: [PATCH v13 3/4] selinux: teach SELinux about anonymous inodes

2021-01-08 Thread Lokesh Gidra
On Fri, Jan 8, 2021 at 1:24 PM Stephen Smalley
 wrote:
>
> On Fri, Jan 8, 2021 at 3:17 PM Lokesh Gidra  wrote:
> >
> > On Fri, Jan 8, 2021 at 11:35 AM Stephen Smalley
> >  wrote:
> > >
> > > On Wed, Jan 6, 2021 at 10:03 PM Paul Moore  wrote:
> > > >
> > > > On Wed, Nov 11, 2020 at 8:54 PM Lokesh Gidra  
> > > > wrote:
> > > > > From: Daniel Colascione 
> > > > >
> > > > > This change uses the anon_inodes and LSM infrastructure introduced in
> > > > > the previous patches to give SELinux the ability to control
> > > > > anonymous-inode files that are created using the new
> > > > > anon_inode_getfd_secure() function.
> > > > >
> > > > > A SELinux policy author detects and controls these anonymous inodes by
> > > > > adding a name-based type_transition rule that assigns a new security
> > > > > type to anonymous-inode files created in some domain. The name used
> > > > > for the name-based transition is the name associated with the
> > > > > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > > > > "[perf_event]".
> > > > >
> > > > > Example:
> > > > >
> > > > > type uffd_t;
> > > > > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > > > > allow sysadm_t uffd_t:anon_inode { create };
> > > > >
> > > > > (The next patch in this series is necessary for making userfaultfd
> > > > > support this new interface.  The example above is just
> > > > > for exposition.)
> > > > >
> > > > > Signed-off-by: Daniel Colascione 
> > > > > Signed-off-by: Lokesh Gidra 
> > > > > ---
> > > > >  security/selinux/hooks.c| 56 
> > > > > +
> > > > >  security/selinux/include/classmap.h |  2 ++
> > > > >  2 files changed, 58 insertions(+)
> > > > >
> > > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > > > index 6b1826fc3658..d092aa512868 100644
> > > > > --- a/security/selinux/hooks.c
> > > > > +++ b/security/selinux/hooks.c
> > > > > @@ -2927,6 +2927,61 @@ static int selinux_inode_init_security(struct 
> > > > > inode *inode, struct inode *dir,
> > > > > return 0;
> > > > >  }
> > > > >
> > > > > +static int selinux_inode_init_security_anon(struct inode *inode,
> > > > > +   const struct qstr *name,
> > > > > +   const struct inode 
> > > > > *context_inode)
> > > > > +{
> > > > > +   const struct task_security_struct *tsec = 
> > > > > selinux_cred(current_cred());
> > > > > +   struct common_audit_data ad;
> > > > > +   struct inode_security_struct *isec;
> > > > > +   int rc;
> > > > > +
> > > > > +   if (unlikely(!selinux_initialized(_state)))
> > > > > +   return 0;
> > > > > +
> > > > > +   isec = selinux_inode(inode);
> > > > > +
> > > > > +   /*
> > > > > +* We only get here once per ephemeral inode.  The inode has
> > > > > +* been initialized via inode_alloc_security but is otherwise
> > > > > +* untouched.
> > > > > +*/
> > > > > +
> > > > > +   if (context_inode) {
> > > > > +   struct inode_security_struct *context_isec =
> > > > > +   selinux_inode(context_inode);
> > > > > +   if (context_isec->initialized != LABEL_INITIALIZED)
> > > > > +   return -EACCES;
> > Stephen, as per your explanation below, is this check also
> > problematic? I mean is it possible that /dev/kvm context_inode may not
> > have its label initialized? If so, then v12 of the patch series can be
> > used as is. Otherwise, I will send the next version which rollbacks
> > v14 and v13, except for this check. Kindly confirm.
>
> The context_inode should always be initialized already.  I'm not fond
> though of silently returning -EACCES here.  At the least we should
> have a pr_err() or pr_warn() here.  In reality, this could only occur
> in the case of a kernel bug or memory corruption so it used to be a
> candidate for WARN_ON() or BUG_ON() or similar but I know that
> BUG_ON() at least is frowned upon these days.

Got it. I'll add a pr_err(). Thanks a lot.


Re: [PATCH v13 3/4] selinux: teach SELinux about anonymous inodes

2021-01-08 Thread Lokesh Gidra
On Fri, Jan 8, 2021 at 11:35 AM Stephen Smalley
 wrote:
>
> On Wed, Jan 6, 2021 at 10:03 PM Paul Moore  wrote:
> >
> > On Wed, Nov 11, 2020 at 8:54 PM Lokesh Gidra  wrote:
> > > From: Daniel Colascione 
> > >
> > > This change uses the anon_inodes and LSM infrastructure introduced in
> > > the previous patches to give SELinux the ability to control
> > > anonymous-inode files that are created using the new
> > > anon_inode_getfd_secure() function.
> > >
> > > A SELinux policy author detects and controls these anonymous inodes by
> > > adding a name-based type_transition rule that assigns a new security
> > > type to anonymous-inode files created in some domain. The name used
> > > for the name-based transition is the name associated with the
> > > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > > "[perf_event]".
> > >
> > > Example:
> > >
> > > type uffd_t;
> > > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > > allow sysadm_t uffd_t:anon_inode { create };
> > >
> > > (The next patch in this series is necessary for making userfaultfd
> > > support this new interface.  The example above is just
> > > for exposition.)
> > >
> > > Signed-off-by: Daniel Colascione 
> > > Signed-off-by: Lokesh Gidra 
> > > ---
> > >  security/selinux/hooks.c| 56 +
> > >  security/selinux/include/classmap.h |  2 ++
> > >  2 files changed, 58 insertions(+)
> > >
> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > index 6b1826fc3658..d092aa512868 100644
> > > --- a/security/selinux/hooks.c
> > > +++ b/security/selinux/hooks.c
> > > @@ -2927,6 +2927,61 @@ static int selinux_inode_init_security(struct 
> > > inode *inode, struct inode *dir,
> > > return 0;
> > >  }
> > >
> > > +static int selinux_inode_init_security_anon(struct inode *inode,
> > > +   const struct qstr *name,
> > > +   const struct inode 
> > > *context_inode)
> > > +{
> > > +   const struct task_security_struct *tsec = 
> > > selinux_cred(current_cred());
> > > +   struct common_audit_data ad;
> > > +   struct inode_security_struct *isec;
> > > +   int rc;
> > > +
> > > +   if (unlikely(!selinux_initialized(_state)))
> > > +   return 0;
> > > +
> > > +   isec = selinux_inode(inode);
> > > +
> > > +   /*
> > > +* We only get here once per ephemeral inode.  The inode has
> > > +* been initialized via inode_alloc_security but is otherwise
> > > +* untouched.
> > > +*/
> > > +
> > > +   if (context_inode) {
> > > +   struct inode_security_struct *context_isec =
> > > +   selinux_inode(context_inode);
> > > +   if (context_isec->initialized != LABEL_INITIALIZED)
> > > +   return -EACCES;
Stephen, as per your explanation below, is this check also
problematic? I mean is it possible that /dev/kvm context_inode may not
have its label initialized? If so, then v12 of the patch series can be
used as is. Otherwise, I will send the next version which rollbacks
v14 and v13, except for this check. Kindly confirm.

> > > +
> > > +   isec->sclass = context_isec->sclass;
> >
> > Taking the object class directly from the context_inode is
> > interesting, and I suspect problematic.  In the case below where no
> > context_inode is supplied the object class is set to
> > SECCLASS_ANON_INODE, which is correct, but when a context_inode is
> > supplied there is no guarantee that the object class will be set to
> > SECCLASS_ANON_INODE.  This could both pose a problem for policy
> > writers (how do you distinguish the anon inode from other normal file
> > inodes in this case?) as well as an outright fault later in this
> > function when we try to check the ANON_INODE__CREATE on an object
> > other than a SECCLASS_ANON_INODE object.
> >
> > It works in the userfaultfd case because the context_inode is
> > originally created with this function so the object class is correctly
> > set to SECCLASS_ANON_INODE, but can we always guarantee that to be the
> > case?  Do we ever need or want to

[PATCH v14 2/4] fs: add LSM-supporting anon-inode interface

2021-01-07 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[LG: Delete obsolete comments to alloc_anon_inode()]
[LG: Add context_inode description in comments to anon_inode_getfd_secure()]
[LG: Remove definition of anon_inode_getfile_secure() as there are no callers]
[LG: Make __anon_inode_getfile() static]
[LG: Use correct error cast in __anon_inode_getfile()]
[LG: Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/anon_inodes.c| 150 ++--
 fs/libfs.c  |   5 --
 include/linux/anon_inodes.h |   5 ++
 3 files changed, 115 insertions(+), 45 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..023337d65a03 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+   }
+
+   file = alloc_file_pseudo(inode, anon_inode_mnt, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
  

[PATCH v14 4/4] userfaultfd: use secure anon inodes for userfaultfd

2021-01-07 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[LG: Remove owner inode from userfaultfd_ctx]
[LG: Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 894cc28142e7..0be8cdd4425a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -979,14 +979,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -996,7 +996,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1107,7 +1107,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1167,6 +1167,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1174,7 +1175,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1999,8 +2000,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.30.0.284.gd98b1dd5eaa7-goog



[PATCH v14 3/4] selinux: teach SELinux about anonymous inodes

2021-01-07 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 security/selinux/hooks.c| 59 +
 security/selinux/include/classmap.h |  2 +
 2 files changed, 61 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 644b17ec9e63..8b4e155b2930 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2934,6 +2933,63 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+   isec->initialized = LABEL_INITIALIZED;
+   isec->sclass = SECCLASS_ANON_INODE;
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   if (context_isec->initialized != LABEL_INITIALIZED)
+   return -EACCES;
+   if (context_isec->sclass != SECCLASS_ANON_INODE) {
+   pr_err("SELinux:  initializing anonymous inode with 
non-anonymous inode");
+   return -EACCES;
+   }
+
+   isec->sid = context_isec->sid;
+   } else {
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   ANON_INODE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -7000,6 +7057,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.30.0.284.gd98b1dd5eaa7-goog



[PATCH v14 0/4] SELinux support for anonymous inodes and UFFD

2021-01-07 Thread Lokesh Gidra
Linux label and return -EACCES if it's
invalid.

Changes from the thirteenth version of the patch:
  - Initialize anon-inode's sclass with SECCLASS_ANON_INODE.
  - Check if context_inode has sclass set to SECCLASS_ANON_INODE.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  fs: add LSM-supporting anon-inode interface
  selinux: teach SELinux about anonymous inodes
  userfaultfd: use secure anon inodes for userfaultfd

Lokesh Gidra (1):
  security: add inode_init_security_anon() LSM hook

 fs/anon_inodes.c| 150 
 fs/libfs.c  |   5 -
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   5 +
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  59 +++
 security/selinux/include/classmap.h |   2 +
 10 files changed, 215 insertions(+), 54 deletions(-)

-- 
2.30.0.284.gd98b1dd5eaa7-goog



[PATCH v14 1/4] security: add inode_init_security_anon() LSM hook

2021-01-07 Thread Lokesh Gidra
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.

The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 include/linux/lsm_hook_defs.h |  2 ++
 include/linux/lsm_hooks.h |  9 +
 include/linux/security.h  | 10 ++
 security/security.c   |  8 
 4 files changed, 29 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 7aaa753b8608..dfd261dcbcb0 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct 
inode *inode)
 LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
 struct inode *dir, const struct qstr *qstr, const char **name,
 void **value, size_t *len)
+LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
+const struct qstr *name, const struct inode *context_inode)
 LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
 umode_t mode)
 LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a19adef1f088..bdfc8a76a4f7 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -233,6 +233,15 @@
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ *  Set up the incore security field for the new anonymous inode
+ *  and return whether the inode creation is permitted by the security
+ *  module or not.
+ *  @inode contains the inode structure
+ *  @name name of the anonymous inode class
+ *  @context_inode optional related inode
+ * Returns 0 on success, -EACCES if the security module denies the
+ * creation of this inode, or another -errno upon other errors.
  * @inode_create:
  * Check permission to create a regular file.
  * @dir contains inode structure of the parent of the new file.
diff --git a/include/linux/security.h b/include/linux/security.h
index c35ea0ffccd9..b0d14f04b16d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -324,6 +324,9 @@ void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr,
 initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode);
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len);
@@ -738,6 +741,13 @@ static inline int security_inode_init_security(struct 
inode *inode,
return 0;
 }
 
+static inline int security_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode 
*context_inode)
+{
+   return 0;
+}
+
 static inline int security_old_inode_init_security(struct inode *inode,
   struct inode *dir,
   const struct qstr *qstr,
diff --git a/security/security.c b/security/security.c
index 7b09cfbae94f..401663b5b70e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1059,6 +1059,14 @@ int security_inode_init_security(struct inode *inode, 
struct inode *dir,
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+   return call_int_hook(inode_init_security_anon, 0, inode, name,
+context_inode);
+}
+
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len)
-- 
2.30.0.284.gd98b1dd5eaa7-goog



Re: [PATCH v13 3/4] selinux: teach SELinux about anonymous inodes

2021-01-07 Thread Lokesh Gidra
On Thu, Jan 7, 2021 at 2:30 PM Paul Moore  wrote:
>
> On Wed, Jan 6, 2021 at 10:55 PM Lokesh Gidra  wrote:
> > On Wed, Jan 6, 2021 at 7:03 PM Paul Moore  wrote:
> > > On Wed, Nov 11, 2020 at 8:54 PM Lokesh Gidra  
> > > wrote:
> > > > From: Daniel Colascione 
> > > >
> > > > This change uses the anon_inodes and LSM infrastructure introduced in
> > > > the previous patches to give SELinux the ability to control
> > > > anonymous-inode files that are created using the new
> > > > anon_inode_getfd_secure() function.
> > > >
> > > > A SELinux policy author detects and controls these anonymous inodes by
> > > > adding a name-based type_transition rule that assigns a new security
> > > > type to anonymous-inode files created in some domain. The name used
> > > > for the name-based transition is the name associated with the
> > > > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > > > "[perf_event]".
> > > >
> > > > Example:
> > > >
> > > > type uffd_t;
> > > > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > > > allow sysadm_t uffd_t:anon_inode { create };
> > > >
> > > > (The next patch in this series is necessary for making userfaultfd
> > > > support this new interface.  The example above is just
> > > > for exposition.)
> > > >
> > > > Signed-off-by: Daniel Colascione 
> > > > Signed-off-by: Lokesh Gidra 
> > > > ---
> > > >  security/selinux/hooks.c| 56 +
> > > >  security/selinux/include/classmap.h |  2 ++
> > > >  2 files changed, 58 insertions(+)
> > > >
> > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > > index 6b1826fc3658..d092aa512868 100644
> > > > --- a/security/selinux/hooks.c
> > > > +++ b/security/selinux/hooks.c
> > > > @@ -2927,6 +2927,61 @@ static int selinux_inode_init_security(struct 
> > > > inode *inode, struct inode *dir,
> > > > return 0;
> > > >  }
> > > >
> > > > +static int selinux_inode_init_security_anon(struct inode *inode,
> > > > +   const struct qstr *name,
> > > > +   const struct inode 
> > > > *context_inode)
> > > > +{
> > > > +   const struct task_security_struct *tsec = 
> > > > selinux_cred(current_cred());
> > > > +   struct common_audit_data ad;
> > > > +   struct inode_security_struct *isec;
> > > > +   int rc;
> > > > +
> > > > +   if (unlikely(!selinux_initialized(_state)))
> > > > +   return 0;
> > > > +
> > > > +   isec = selinux_inode(inode);
> > > > +
> > > > +   /*
> > > > +* We only get here once per ephemeral inode.  The inode has
> > > > +* been initialized via inode_alloc_security but is otherwise
> > > > +* untouched.
> > > > +*/
> > > > +
> > > > +   if (context_inode) {
> > > > +   struct inode_security_struct *context_isec =
> > > > +   selinux_inode(context_inode);
> > > > +   if (context_isec->initialized != LABEL_INITIALIZED)
> > > > +   return -EACCES;
> > > > +
> > > > +   isec->sclass = context_isec->sclass;
> > >
> > > Taking the object class directly from the context_inode is
> > > interesting, and I suspect problematic.  In the case below where no
> > > context_inode is supplied the object class is set to
> > > SECCLASS_ANON_INODE, which is correct, but when a context_inode is
> > > supplied there is no guarantee that the object class will be set to
> > > SECCLASS_ANON_INODE.  This could both pose a problem for policy
> > > writers (how do you distinguish the anon inode from other normal file
> > > inodes in this case?) as well as an outright fault later in this
> > > function when we try to check the ANON_INODE__CREATE on an object
> > > other than a SECCLASS_ANON_INODE object.
> > >
> > Thanks for catching this. I'll initialize 'sclass' unconditionally to
> > SECCLASS

Re: [PATCH v13 3/4] selinux: teach SELinux about anonymous inodes

2021-01-06 Thread Lokesh Gidra
On Wed, Jan 6, 2021 at 7:03 PM Paul Moore  wrote:
>
> On Wed, Nov 11, 2020 at 8:54 PM Lokesh Gidra  wrote:
> > From: Daniel Colascione 
> >
> > This change uses the anon_inodes and LSM infrastructure introduced in
> > the previous patches to give SELinux the ability to control
> > anonymous-inode files that are created using the new
> > anon_inode_getfd_secure() function.
> >
> > A SELinux policy author detects and controls these anonymous inodes by
> > adding a name-based type_transition rule that assigns a new security
> > type to anonymous-inode files created in some domain. The name used
> > for the name-based transition is the name associated with the
> > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > "[perf_event]".
> >
> > Example:
> >
> > type uffd_t;
> > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > allow sysadm_t uffd_t:anon_inode { create };
> >
> > (The next patch in this series is necessary for making userfaultfd
> > support this new interface.  The example above is just
> > for exposition.)
> >
> > Signed-off-by: Daniel Colascione 
> > Signed-off-by: Lokesh Gidra 
> > ---
> >  security/selinux/hooks.c| 56 +
> >  security/selinux/include/classmap.h |  2 ++
> >  2 files changed, 58 insertions(+)
> >
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 6b1826fc3658..d092aa512868 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -2927,6 +2927,61 @@ static int selinux_inode_init_security(struct inode 
> > *inode, struct inode *dir,
> > return 0;
> >  }
> >
> > +static int selinux_inode_init_security_anon(struct inode *inode,
> > +   const struct qstr *name,
> > +   const struct inode 
> > *context_inode)
> > +{
> > +   const struct task_security_struct *tsec = 
> > selinux_cred(current_cred());
> > +   struct common_audit_data ad;
> > +   struct inode_security_struct *isec;
> > +   int rc;
> > +
> > +   if (unlikely(!selinux_initialized(_state)))
> > +   return 0;
> > +
> > +   isec = selinux_inode(inode);
> > +
> > +   /*
> > +* We only get here once per ephemeral inode.  The inode has
> > +* been initialized via inode_alloc_security but is otherwise
> > +* untouched.
> > +*/
> > +
> > +   if (context_inode) {
> > +   struct inode_security_struct *context_isec =
> > +   selinux_inode(context_inode);
> > +   if (context_isec->initialized != LABEL_INITIALIZED)
> > +   return -EACCES;
> > +
> > +   isec->sclass = context_isec->sclass;
>
> Taking the object class directly from the context_inode is
> interesting, and I suspect problematic.  In the case below where no
> context_inode is supplied the object class is set to
> SECCLASS_ANON_INODE, which is correct, but when a context_inode is
> supplied there is no guarantee that the object class will be set to
> SECCLASS_ANON_INODE.  This could both pose a problem for policy
> writers (how do you distinguish the anon inode from other normal file
> inodes in this case?) as well as an outright fault later in this
> function when we try to check the ANON_INODE__CREATE on an object
> other than a SECCLASS_ANON_INODE object.
>
Thanks for catching this. I'll initialize 'sclass' unconditionally to
SECCLASS_ANON_INODE in the next version. Also, do you think I should
add a check that context_inode's sclass must be SECCLASS_ANON_INODE to
confirm that we never receive a regular inode as context_inode?

> It works in the userfaultfd case because the context_inode is
> originally created with this function so the object class is correctly
> set to SECCLASS_ANON_INODE, but can we always guarantee that to be the
> case?  Do we ever need or want to support using a context_inode that
> is not SECCLASS_ANON_INODE?
>

I don't think there is any requirement of supporting context_inode
which isn't anon-inode. And even if there is, as you described
earlier, for ANON_INODE__CREATE to work the sclass has to be
SECCLASS_ANON_INODE. I'll appreciate comments on this from others,
particularly Daniel and Stephen who originally discussed and
implemented this patch.


> > +   isec->sid = context_isec->sid;
> > +   } else {
> > +   isec-&

Re: [PATCH v13 2/4] fs: add LSM-supporting anon-inode interface

2021-01-06 Thread Lokesh Gidra
On Wed, Jan 6, 2021 at 6:10 PM Paul Moore  wrote:
>
> On Wed, Nov 11, 2020 at 8:54 PM Lokesh Gidra  wrote:
> > From: Daniel Colascione 
> >
> > This change adds a new function, anon_inode_getfd_secure, that creates
> > anonymous-node file with individual non-S_PRIVATE inode to which security
> > modules can apply policy. Existing callers continue using the original
> > singleton-inode kind of anonymous-inode file. We can transition anonymous
> > inode users to the new kind of anonymous inode in individual patches for
> > the sake of bisection and review.
> >
> > The new function accepts an optional context_inode parameter that callers
> > can use to provide additional contextual information to security modules.
> > For example, in case of userfaultfd, the created inode is a 'logical child'
> > of the context_inode (userfaultfd inode of the parent process) in the sense
> > that it provides the security context required during creation of the child
> > process' userfaultfd inode.
> >
> > Signed-off-by: Daniel Colascione 
> >
> > [Delete obsolete comments to alloc_anon_inode()]
> > [Add context_inode description in comments to anon_inode_getfd_secure()]
> > [Remove definition of anon_inode_getfile_secure() as there are no callers]
> > [Make __anon_inode_getfile() static]
> > [Use correct error cast in __anon_inode_getfile()]
> > [Fix error handling in __anon_inode_getfile()]
>
> Lokesh, I'm assuming you made the changes in the brackets above?  If
> so they should include your initials or some other means of
> attributing them to you, e.g. "[LG: Fix error ...]".

Thanks for reviewing the patch. Sorry for missing this. If it's
critical then I can upload another version of the patches to fix this.
Kindly let me know.
>
> > Signed-off-by: Lokesh Gidra 
> > Reviewed-by: Eric Biggers 
> > ---
> >  fs/anon_inodes.c| 150 ++--
> >  fs/libfs.c  |   5 --
> >  include/linux/anon_inodes.h |   5 ++
> >  3 files changed, 115 insertions(+), 45 deletions(-)
> >
> > diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> > index 89714308c25b..023337d65a03 100644
> > --- a/fs/anon_inodes.c
> > +++ b/fs/anon_inodes.c
> > @@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
> > .kill_sb= kill_anon_super,
> >  };
> >
> > -/**
> > - * anon_inode_getfile - creates a new file instance by hooking it up to an
> > - *  anonymous inode, and a dentry that describe the 
> > "class"
> > - *  of the file
> > - *
> > - * @name:[in]name of the "class" of the new file
> > - * @fops:[in]file operations for the new file
> > - * @priv:[in]private data for the new file (will be file's 
> > private_data)
> > - * @flags:   [in]flags
> > - *
> > - * Creates a new file by hooking it on a single inode. This is useful for 
> > files
> > - * that do not need to have a full-fledged inode in order to operate 
> > correctly.
> > - * All the files created with anon_inode_getfile() will share a single 
> > inode,
> > - * hence saving memory and avoiding code duplication for the 
> > file/inode/dentry
> > - * setup.  Returns the newly created file* or an error pointer.
> > - */
> > -struct file *anon_inode_getfile(const char *name,
> > -   const struct file_operations *fops,
> > -   void *priv, int flags)
> > +static struct inode *anon_inode_make_secure_inode(
> > +   const char *name,
> > +   const struct inode *context_inode)
> >  {
> > -   struct file *file;
> > +   struct inode *inode;
> > +   const struct qstr qname = QSTR_INIT(name, strlen(name));
> > +   int error;
> > +
> > +   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
> > +   if (IS_ERR(inode))
> > +   return inode;
> > +   inode->i_flags &= ~S_PRIVATE;
> > +   error = security_inode_init_security_anon(inode, , 
> > context_inode);
> > +   if (error) {
> > +   iput(inode);
> > +   return ERR_PTR(error);
> > +   }
> > +   return inode;
> > +}
> >
> > -   if (IS_ERR(anon_inode_inode))
> > -   return ERR_PTR(-ENODEV);
> > +static struct file *__anon_inode_getfile(const char *name,
> > +const struct file_operations *fops,
> > +   

Re: [PATCH v12 3/4] selinux: teach SELinux about anonymous inodes

2020-11-24 Thread Lokesh Gidra
On Mon, Nov 23, 2020 at 2:43 PM Paul Moore  wrote:
>
> On Mon, Nov 23, 2020 at 2:21 PM Lokesh Gidra  wrote:
> > On Sun, Nov 22, 2020 at 3:14 PM Paul Moore  wrote:
> > > On Wed, Nov 18, 2020 at 5:39 PM Lokesh Gidra  
> > > wrote:
> > > > I have created a cuttlefish build and have tested with the attached
> > > > userfaultfd program:
> > >
> > > Thanks, that's a good place to start, a few comments:
> > >
> > > - While we support Android as a distribution, it isn't a platform that
> > > we common use for development and testing.  At the moment, Fedora is
> > > probably your best choice for that.
> > >
> > I tried setting up a debian/ubuntu system for testing using the
> > instructions on the selinux-testsuite page, but the system kept
> > freezing after 'setenforce 1'. I'll try with fedora now.
>
> I would expect you to have much better luck with Fedora.

Yes. It worked!
>
> > > - Your test program should be written in vanilla C for the
> > > selinux-testsuite.  Looking at the userfaultfdSimple.cc code that
> > > should be a trivial conversion.
> > >
> > > - I think you have a good start on a test for the selinux-testsuite,
> > > please take a look at the test suite and submit a patch against that
> > > repo.  Ondrej (CC'd) currently maintains the test suite and he may
> > > have some additional thoughts.
> > >
> > > * https://github.com/SELinuxProject/selinux-testsuite
> >
> > Thanks a lot for the inputs. I'll start working on this.
>
> Great, let us know if you hit any problems.  I think we would all like
> to see this upstream :)
>
I have the patch ready. I couldn't find any instructions on the
testsuite site about patch submission. Can you please tell me how to
proceed.

> --
> paul moore
> www.paul-moore.com


Re: [PATCH v12 3/4] selinux: teach SELinux about anonymous inodes

2020-11-23 Thread Lokesh Gidra
On Sun, Nov 22, 2020 at 3:14 PM Paul Moore  wrote:
>
> On Wed, Nov 18, 2020 at 5:39 PM Lokesh Gidra  wrote:
> > I have created a cuttlefish build and have tested with the attached
> > userfaultfd program:
>
> Thanks, that's a good place to start, a few comments:
>
> - While we support Android as a distribution, it isn't a platform that
> we common use for development and testing.  At the moment, Fedora is
> probably your best choice for that.
>
I tried setting up a debian/ubuntu system for testing using the
instructions on the selinux-testsuite page, but the system kept
freezing after 'setenforce 1'. I'll try with fedora now.

> - Your test program should be written in vanilla C for the
> selinux-testsuite.  Looking at the userfaultfdSimple.cc code that
> should be a trivial conversion.
>
> - I think you have a good start on a test for the selinux-testsuite,
> please take a look at the test suite and submit a patch against that
> repo.  Ondrej (CC'd) currently maintains the test suite and he may
> have some additional thoughts.
>
> * https://github.com/SELinuxProject/selinux-testsuite

Thanks a lot for the inputs. I'll start working on this.
>
> > 1) Without these kernel patches the program executes without any 
> > restrictions
> >
> > vsoc_x86_64:/ $ ./system/bin/userfaultfdSimple
> > api: 170
> > features: 511
> > ioctls: 9223372036854775811
> >
> > read: Try again
> >
> >
> > 2) With these patches applied but without any policy the 'permission
> > denied' is thrown
> >
> > vsoc_x86_64:/ $ ./system/bin/userfaultfdSimple
> > syscall(userfaultfd): Permission denied
> >
> > with the following logcat message:
> > 11-18 14:21:44.041  3130  3130 W userfaultfdSimp: type=1400
> > audit(0.0:107): avc: denied { create } for dev="anon_inodefs"
> > ino=45031 scontext=u:r:shell:s0 tcontext=u:object_r:shell:s0
> > tclass=anon_inode permissive=0
> >
> >
> > 3) With the attached .te policy file in place the following output is
> > observed, confirming that the patch is working as intended.
> > vsoc_x86_64:/ $ ./vendor/bin/userfaultfdSimple
> > UFFDIO_API: Permission denied
> >
> > with the following logcat message:
> > 11-18 14:33:29.142  2028  2028 W userfaultfdSimp: type=1400
> > audit(0.0:104): avc: denied { ioctl } for
> > path="anon_inode:[userfaultfd]" dev="anon_inodefs" ino=41169
> > ioctlcmd=0xaa3f scontext=u:r:userfaultfdSimple:s0
> > tcontext=u:object_r:uffd_t:s0 tclass=anon_inode permissive=0
>
> --
> paul moore
> www.paul-moore.com


Re: [PATCH v6 1/2] Add UFFD_USER_MODE_ONLY

2020-11-23 Thread Lokesh Gidra
On Fri, Nov 20, 2020 at 3:33 PM Andrew Morton  wrote:
>
> On Thu, 19 Nov 2020 19:04:10 -0800 Lokesh Gidra  
> wrote:
>
> > userfaultfd handles page faults from both user and kernel code.
> > Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
> > the resulting userfaultfd object refuse to handle faults from kernel
> > mode, treating these faults as if SIGBUS were always raised, causing
> > the kernel code to fail with EFAULT.
> >
> > A future patch adds a knob allowing administrators to give some
> > processes the ability to create userfaultfd file objects only if they
> > pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
> > will exploit userfaultfd's ability to delay kernel page faults to open
> > timing windows for future exploits.
>
> Can we assume that an update to the userfaultfd(2) manpage is in the
> works?
>
Yes, I'm working on it. Can the kernel version which will have these
patches be known now so that I can mention it in the manpage?

> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, 
> > unsigned long reason)
> >
> >   if (ctx->features & UFFD_FEATURE_SIGBUS)
> >   goto out;
> > + if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
> > + ctx->flags & UFFD_USER_MODE_ONLY) {
> > + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
> > + "sysctl knob to 1 if kernel faults must be handled "
> > + "without obtaining CAP_SYS_PTRACE capability\n");
> > + goto out;
> > + }
> >
> >   /*
> >* If it's already released don't get it. This avoids to loop
> > @@ -1965,10 +1972,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> >   BUG_ON(!current->mm);
> >
> >   /* Check the UFFD_* constants for consistency.  */
> > + BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
>
> Are we sure this is true for all architectures?

Yes, none of the architectures are using the least-significant bit for
O_CLOEXEC or O_NONBLOCK.
>
> >   BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
> >   BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
> >
> > - if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
> > + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
> >   return -EINVAL;
> >
> >   ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
> > diff --git a/include/uapi/linux/userfaultfd.h 
> > b/include/uapi/linux/userfaultfd.h
> > index e7e98bde221f..5f2d88212f7c 100644
> > --- a/include/uapi/linux/userfaultfd.h
> > +++ b/include/uapi/linux/userfaultfd.h
> > @@ -257,4 +257,13 @@ struct uffdio_writeprotect {
> >   __u64 mode;
> >  };
> >
> > +/*
> > + * Flags for the userfaultfd(2) system call itself.
> > + */
> > +
> > +/*
> > + * Create a userfaultfd that can handle page faults only in user mode.
> > + */
> > +#define UFFD_USER_MODE_ONLY 1
> > +
> >  #endif /* _LINUX_USERFAULTFD_H */
>
> It would be nice to define this in include/linux/userfaultfd_k.h,
> alongside the other flags.  But I guess it has to be here because it's
> part of the userspace API.


Re: [PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-11-19 Thread Lokesh Gidra
On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra  wrote:
>
> With this change, when the knob is set to 0, it allows unprivileged
> users to call userfaultfd, like when it is set to 1, but with the
> restriction that page faults from only user-mode can be handled.
> In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
> must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
> EPERM.
>
> This enables administrators to reduce the likelihood that an attacker
> with access to userfaultfd can delay faulting kernel code to widen
> timing windows for other exploits.
>
> The default value of this knob is changed to 0. This is required for
> correct functioning of pipe mutex. However, this will fail postcopy
> live migration, which will be unnoticeable to the VM guests. To avoid
> this, set 'vm.userfault = 1' in /sys/sysctl.conf.
>
> The main reason this change is desirable as in the short term is that
> the Android userland will behave as with the sysctl set to zero. So
> without this commit, any Linux binary using userfaultfd to manage its
> memory would behave differently if run within the Android userland.
> For more details, refer to Andrea's reply [1].
>
> [1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/
>
> Signed-off-by: Lokesh Gidra 
> Reviewed-by: Andrea Arcangeli 
> ---
>  Documentation/admin-guide/sysctl/vm.rst | 15 ++-
>  fs/userfaultfd.c| 10 --
>  2 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst 
> b/Documentation/admin-guide/sysctl/vm.rst
> index f455fa00c00f..d06a98b2a4e7 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a 
> zone.
>  unprivileged_userfaultfd
>  
>
> -This flag controls whether unprivileged users can use the userfaultfd
> -system calls.  Set this to 1 to allow unprivileged users to use the
> -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
> -privileged users (with SYS_CAP_PTRACE capability).
> +This flag controls the mode in which unprivileged users can use the
> +userfaultfd system calls. Set this to 0 to restrict unprivileged users
> +to handle page faults in user mode only. In this case, users without
> +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
> +succeed. Prohibiting use of userfaultfd for handling faults from kernel
> +mode may make certain vulnerabilities more difficult to exploit.
>
> -The default value is 1.
> +Set this to 1 to allow unprivileged users to use the userfaultfd system
> +calls without any restrictions.
> +
> +The default value is 0.
>
>
>  user_reserve_kbytes
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 605599fde015..894cc28142e7 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -28,7 +28,7 @@
>  #include 
>  #include 
>
> -int sysctl_unprivileged_userfaultfd __read_mostly = 1;
> +int sysctl_unprivileged_userfaultfd __read_mostly;
>
>  static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
>
> @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> struct userfaultfd_ctx *ctx;
> int fd;
>
> -   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
> +   if (!sysctl_unprivileged_userfaultfd &&
> +   (flags & UFFD_USER_MODE_ONLY) == 0 &&
> +   !capable(CAP_SYS_PTRACE)) {
> +   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
> +   "sysctl knob to 1 if kernel faults must be handled "
> +   "without obtaining CAP_SYS_PTRACE capability\n");
> return -EPERM;
> +   }
>
> BUG_ON(!current->mm);
>
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
Adding linux...@kvack.org list


Re: [PATCH v6 1/2] Add UFFD_USER_MODE_ONLY

2020-11-19 Thread Lokesh Gidra
On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra  wrote:
>
> userfaultfd handles page faults from both user and kernel code.
> Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
> the resulting userfaultfd object refuse to handle faults from kernel
> mode, treating these faults as if SIGBUS were always raised, causing
> the kernel code to fail with EFAULT.
>
> A future patch adds a knob allowing administrators to give some
> processes the ability to create userfaultfd file objects only if they
> pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
> will exploit userfaultfd's ability to delay kernel page faults to open
> timing windows for future exploits.
>
> Signed-off-by: Daniel Colascione 
> Signed-off-by: Lokesh Gidra 
> Reviewed-by: Andrea Arcangeli 
> ---
>  fs/userfaultfd.c | 10 +-
>  include/uapi/linux/userfaultfd.h |  9 +
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 000b457ad087..605599fde015 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, 
> unsigned long reason)
>
> if (ctx->features & UFFD_FEATURE_SIGBUS)
> goto out;
> +   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
> +   ctx->flags & UFFD_USER_MODE_ONLY) {
> +   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
> +   "sysctl knob to 1 if kernel faults must be handled "
> +   "without obtaining CAP_SYS_PTRACE capability\n");
> +   goto out;
> +   }
>
> /*
>  * If it's already released don't get it. This avoids to loop
> @@ -1965,10 +1972,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> BUG_ON(!current->mm);
>
> /* Check the UFFD_* constants for consistency.  */
> +   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
> BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
> BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
>
> -   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
> +   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
> return -EINVAL;
>
> ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
> diff --git a/include/uapi/linux/userfaultfd.h 
> b/include/uapi/linux/userfaultfd.h
> index e7e98bde221f..5f2d88212f7c 100644
> --- a/include/uapi/linux/userfaultfd.h
> +++ b/include/uapi/linux/userfaultfd.h
> @@ -257,4 +257,13 @@ struct uffdio_writeprotect {
> __u64 mode;
>  };
>
> +/*
> + * Flags for the userfaultfd(2) system call itself.
> + */
> +
> +/*
> + * Create a userfaultfd that can handle page faults only in user mode.
> + */
> +#define UFFD_USER_MODE_ONLY 1
> +
>  #endif /* _LINUX_USERFAULTFD_H */
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
Adding linux...@kvack.org mailing list


Re: [PATCH v6 0/2] Control over userfaultfd kernel-fault handling

2020-11-19 Thread Lokesh Gidra
On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra  wrote:
>
> This patch series is split from [1]. The other series enables SELinux
> support for userfaultfd file descriptors so that its creation and
> movement can be controlled.
>
> It has been demonstrated on various occasions that suspending kernel
> code execution for an arbitrary amount of time at any access to
> userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
> to change the intended behavior of the kernel. For instance, handling
> page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
> Likewise, FUSE, which is similar to userfaultfd in this respect, has been
> exploited in [4, 5] for similar outcome.
>
> This small patch series adds a new flag to userfaultfd(2) that allows
> callers to give up the ability to handle kernel-mode faults with the
> resulting UFFD file object. It then adds a 'user-mode only' option to
> the unprivileged_userfaultfd sysctl knob to require unprivileged
> callers to use this new flag.
>
> The purpose of this new interface is to decrease the chance of an
> unprivileged userfaultfd user taking advantage of userfaultfd to
> enhance security vulnerabilities by lengthening the race window in
> kernel code.
>
> [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
> [2] https://duasynt.com/blog/linux-kernel-heap-spray
> [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
> [4] 
> https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
> [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
>
> Changes since v5:
>
>   - Added printk_once when unprivileged_userfaultfd is set to 0 and
> userfaultfd syscall is called without UFFD_USER_MODE_ONLY in the
> absence of CAP_SYS_PTRACE capability.
>
> Changes since v4:
>
>   - Added warning when bailing out from handling kernel fault.
>
> Changes since v3:
>
>   - Modified the meaning of value '0' of unprivileged_userfaultfd
> sysctl knob. Setting this knob to '0' now allows unprivileged users
> to use userfaultfd, but can handle page faults in user-mode only.
>   - The default value of unprivileged_userfaultfd sysctl knob is changed
> to '0'.
>
> Changes since v2:
>
>   - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
> userfaultfd().
>
> Changes since v1:
>
>   - Added external references to the threats from allowing unprivileged
> users to handle page faults from kernel-mode.
>   - Removed the new sysctl knob restricting handling of page
> faults from kernel-mode, and added an option for the same
> in the existing 'unprivileged_userfaultfd' knob.
>
> Lokesh Gidra (2):
>   Add UFFD_USER_MODE_ONLY
>   Add user-mode only option to unprivileged_userfaultfd sysctl knob
>
>  Documentation/admin-guide/sysctl/vm.rst | 15 ++-
>  fs/userfaultfd.c| 20 +---
>  include/uapi/linux/userfaultfd.h|  9 +
>  3 files changed, 36 insertions(+), 8 deletions(-)
>
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
Adding linux...@kvack.org mailing list.


[PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-11-19 Thread Lokesh Gidra
With this change, when the knob is set to 0, it allows unprivileged
users to call userfaultfd, like when it is set to 1, but with the
restriction that page faults from only user-mode can be handled.
In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
EPERM.

This enables administrators to reduce the likelihood that an attacker
with access to userfaultfd can delay faulting kernel code to widen
timing windows for other exploits.

The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy
live migration, which will be unnoticeable to the VM guests. To avoid
this, set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that
the Android userland will behave as with the sysctl set to zero. So
without this commit, any Linux binary using userfaultfd to manage its
memory would behave differently if run within the Android userland.
For more details, refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/

Signed-off-by: Lokesh Gidra 
Reviewed-by: Andrea Arcangeli 
---
 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 10 --
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index f455fa00c00f..d06a98b2a4e7 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a 
zone.
 unprivileged_userfaultfd
 
 
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.
 
-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.
 
 
 user_reserve_kbytes
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 605599fde015..894cc28142e7 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd __read_mostly;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+   if (!sysctl_unprivileged_userfaultfd &&
+   (flags & UFFD_USER_MODE_ONLY) == 0 &&
+   !capable(CAP_SYS_PTRACE)) {
+   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
+   "sysctl knob to 1 if kernel faults must be handled "
+   "without obtaining CAP_SYS_PTRACE capability\n");
return -EPERM;
+   }
 
BUG_ON(!current->mm);
 
-- 
2.29.0.rc1.297.gfa9743e501-goog



[PATCH v6 1/2] Add UFFD_USER_MODE_ONLY

2020-11-19 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
Reviewed-by: Andrea Arcangeli 
---
 fs/userfaultfd.c | 10 +-
 include/uapi/linux/userfaultfd.h |  9 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 000b457ad087..605599fde015 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY) {
+   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
+   "sysctl knob to 1 if kernel faults must be handled "
+   "without obtaining CAP_SYS_PTRACE capability\n");
+   goto out;
+   }
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1965,10 +1972,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.29.0.rc1.297.gfa9743e501-goog



[PATCH v6 0/2] Control over userfaultfd kernel-fault handling

2020-11-19 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v5:

  - Added printk_once when unprivileged_userfaultfd is set to 0 and
userfaultfd syscall is called without UFFD_USER_MODE_ONLY in the
absence of CAP_SYS_PTRACE capability.

Changes since v4:

  - Added warning when bailing out from handling kernel fault.

Changes since v3:

  - Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
  - The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

  - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 20 +---
 include/uapi/linux/userfaultfd.h|  9 +
 3 files changed, 36 insertions(+), 8 deletions(-)

-- 
2.29.0.rc1.297.gfa9743e501-goog



Re: [PATCH v6 0/2] Control over userfaultfd kernel-fault handling

2020-11-19 Thread Lokesh Gidra
On Mon, Oct 26, 2020 at 2:00 PM Lokesh Gidra  wrote:
>
> This patch series is split from [1]. The other series enables SELinux
> support for userfaultfd file descriptors so that its creation and
> movement can be controlled.
>
> It has been demonstrated on various occasions that suspending kernel
> code execution for an arbitrary amount of time at any access to
> userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
> to change the intended behavior of the kernel. For instance, handling
> page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
> Likewise, FUSE, which is similar to userfaultfd in this respect, has been
> exploited in [4, 5] for similar outcome.
>
> This small patch series adds a new flag to userfaultfd(2) that allows
> callers to give up the ability to handle kernel-mode faults with the
> resulting UFFD file object. It then adds a 'user-mode only' option to
> the unprivileged_userfaultfd sysctl knob to require unprivileged
> callers to use this new flag.
>
> The purpose of this new interface is to decrease the chance of an
> unprivileged userfaultfd user taking advantage of userfaultfd to
> enhance security vulnerabilities by lengthening the race window in
> kernel code.
>
> [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
> [2] https://duasynt.com/blog/linux-kernel-heap-spray
> [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
> [4] 
> https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
> [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
>
> Changes since v5:
>
>   - Added printk_once when unprivileged_userfaultfd is set to 0 and
> userfaultfd syscall is called without UFFD_USER_MODE_ONLY in the
> absence of CAP_SYS_PTRACE capability.
>
> Changes since v4:
>
>   - Added warning when bailing out from handling kernel fault.
>
> Changes since v3:
>
>   - Modified the meaning of value '0' of unprivileged_userfaultfd
> sysctl knob. Setting this knob to '0' now allows unprivileged users
> to use userfaultfd, but can handle page faults in user-mode only.
>   - The default value of unprivileged_userfaultfd sysctl knob is changed
> to '0'.
>
> Changes since v2:
>
>   - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
> userfaultfd().
>
> Changes since v1:
>
>   - Added external references to the threats from allowing unprivileged
> users to handle page faults from kernel-mode.
>   - Removed the new sysctl knob restricting handling of page
> faults from kernel-mode, and added an option for the same
> in the existing 'unprivileged_userfaultfd' knob.
>
> Lokesh Gidra (2):
>   Add UFFD_USER_MODE_ONLY
>   Add user-mode only option to unprivileged_userfaultfd sysctl knob
>
>  Documentation/admin-guide/sysctl/vm.rst | 15 ++-
>  fs/userfaultfd.c| 20 +---
>  include/uapi/linux/userfaultfd.h|  9 +
>  3 files changed, 36 insertions(+), 8 deletions(-)
>
> --
> 2.29.0.rc1.297.gfa9743e501-goog
>
It's been quite some time since this patch-series has received
'Reviewed-by' by Andrea. Please let me know if anything is blocking it
from taking forward.


Re: [PATCH v12 3/4] selinux: teach SELinux about anonymous inodes

2020-11-18 Thread Lokesh Gidra
On Thu, Nov 12, 2020 at 4:13 PM Paul Moore  wrote:
>
> On Tue, Nov 10, 2020 at 10:30 PM Lokesh Gidra  wrote:
> > On Tue, Nov 10, 2020 at 6:13 PM Paul Moore  wrote:
> > > On Tue, Nov 10, 2020 at 1:24 PM Lokesh Gidra  
> > > wrote:
> > > > On Mon, Nov 9, 2020 at 7:12 PM Paul Moore  wrote:
> > > > > On Fri, Nov 6, 2020 at 10:56 AM Lokesh Gidra  
> > > > > wrote:
> > > > > >
> > > > > > From: Daniel Colascione 
> > > > > >
> > > > > > This change uses the anon_inodes and LSM infrastructure introduced 
> > > > > > in
> > > > > > the previous patches to give SELinux the ability to control
> > > > > > anonymous-inode files that are created using the new
> > > > > > anon_inode_getfd_secure() function.
> > > > > >
> > > > > > A SELinux policy author detects and controls these anonymous inodes 
> > > > > > by
> > > > > > adding a name-based type_transition rule that assigns a new security
> > > > > > type to anonymous-inode files created in some domain. The name used
> > > > > > for the name-based transition is the name associated with the
> > > > > > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > > > > > "[perf_event]".
> > > > > >
> > > > > > Example:
> > > > > >
> > > > > > type uffd_t;
> > > > > > type_transition sysadm_t sysadm_t : anon_inode uffd_t 
> > > > > > "[userfaultfd]";
> > > > > > allow sysadm_t uffd_t:anon_inode { create };
> > > > > >
> > > > > > (The next patch in this series is necessary for making userfaultfd
> > > > > > support this new interface.  The example above is just
> > > > > > for exposition.)
> > > > > >
> > > > > > Signed-off-by: Daniel Colascione 
> > > > > > Signed-off-by: Lokesh Gidra 
> > > > > > ---
> > > > > >  security/selinux/hooks.c| 53 
> > > > > > +
> > > > > >  security/selinux/include/classmap.h |  2 ++
> > > > > >  2 files changed, 55 insertions(+)
> > > > > >
> > > > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > > > > index 6b1826fc3658..1c0adcdce7a8 100644
> > > > > > --- a/security/selinux/hooks.c
> > > > > > +++ b/security/selinux/hooks.c
> > > > > > @@ -2927,6 +2927,58 @@ static int 
> > > > > > selinux_inode_init_security(struct inode *inode, struct inode *dir,
> > > > > > return 0;
> > > > > >  }
> > > > > >
> > > > > > +static int selinux_inode_init_security_anon(struct inode *inode,
> > > > > > +   const struct qstr *name,
> > > > > > +   const struct inode 
> > > > > > *context_inode)
> > > > > > +{
> > > > > > +   const struct task_security_struct *tsec = 
> > > > > > selinux_cred(current_cred());
> > > > > > +   struct common_audit_data ad;
> > > > > > +   struct inode_security_struct *isec;
> > > > > > +   int rc;
> > > > > > +
> > > > > > +   if (unlikely(!selinux_initialized(_state)))
> > > > > > +   return 0;
> > > > > > +
> > > > > > +   isec = selinux_inode(inode);
> > > > > > +
> > > > > > +   /*
> > > > > > +* We only get here once per ephemeral inode.  The inode has
> > > > > > +* been initialized via inode_alloc_security but is 
> > > > > > otherwise
> > > > > > +* untouched.
> > > > > > +*/
> > > > > > +
> > > > > > +   if (context_inode) {
> > > > > > +   struct inode_security_struct *context_isec =
> > > > > > +   selinux_inode(context_inode);
> > > > > > +   isec->sclass = context_isec->sclass;
> > > > > > +   isec->sid = context_isec->s

[PATCH v13 4/4] userfaultfd: use secure anon inodes for userfaultfd

2020-11-11 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 000b457ad087..dd78daf06de6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -972,14 +972,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -989,7 +989,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1100,7 +1100,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1160,6 +1160,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1167,7 +1168,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1985,8 +1986,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.29.2.299.gdc1121823c-goog



[PATCH v13 1/4] security: add inode_init_security_anon() LSM hook

2020-11-11 Thread Lokesh Gidra
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.

The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 include/linux/lsm_hook_defs.h |  2 ++
 include/linux/lsm_hooks.h |  9 +
 include/linux/security.h  | 10 ++
 security/security.c   |  8 
 4 files changed, 29 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..435a2e22ff95 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct 
inode *inode)
 LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
 struct inode *dir, const struct qstr *qstr, const char **name,
 void **value, size_t *len)
+LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
+const struct qstr *name, const struct inode *context_inode)
 LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
 umode_t mode)
 LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7ab8afb..3af055b7ee1f 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -233,6 +233,15 @@
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ *  Set up the incore security field for the new anonymous inode
+ *  and return whether the inode creation is permitted by the security
+ *  module or not.
+ *  @inode contains the inode structure
+ *  @name name of the anonymous inode class
+ *  @context_inode optional related inode
+ * Returns 0 on success, -EACCES if the security module denies the
+ * creation of this inode, or another -errno upon other errors.
  * @inode_create:
  * Check permission to create a regular file.
  * @dir contains inode structure of the parent of the new file.
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..7494a93b9ed9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -323,6 +323,9 @@ void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr,
 initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode);
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len);
@@ -737,6 +740,13 @@ static inline int security_inode_init_security(struct 
inode *inode,
return 0;
 }
 
+static inline int security_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode 
*context_inode)
+{
+   return 0;
+}
+
 static inline int security_old_inode_init_security(struct inode *inode,
   struct inode *dir,
   const struct qstr *qstr,
diff --git a/security/security.c b/security/security.c
index a28045dc9e7f..8989ba6af4f6 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1058,6 +1058,14 @@ int security_inode_init_security(struct inode *inode, 
struct inode *dir,
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+   return call_int_hook(inode_init_security_anon, 0, inode, name,
+context_inode);
+}
+
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len)
-- 
2.29.2.299.gdc1121823c-goog



[PATCH v13 2/4] fs: add LSM-supporting anon-inode interface

2020-11-11 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Delete obsolete comments to alloc_anon_inode()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make __anon_inode_getfile() static]
[Use correct error cast in __anon_inode_getfile()]
[Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/anon_inodes.c| 150 ++--
 fs/libfs.c  |   5 --
 include/linux/anon_inodes.h |   5 ++
 3 files changed, 115 insertions(+), 45 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..023337d65a03 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+   }
+
+   file = alloc_file_pseudo(inode, anon_inode_mnt, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
if (IS_ERR(file))
-  

[PATCH v13 0/4] SELinux support for anonymous inodes and UFFD

2020-11-11 Thread Lokesh Gidra
Linux label and return -EACCES if it's
invalid.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  fs: add LSM-supporting anon-inode interface
  selinux: teach SELinux about anonymous inodes
  userfaultfd: use secure anon inodes for userfaultfd

Lokesh Gidra (1):
  security: add inode_init_security_anon() LSM hook

 fs/anon_inodes.c| 150 
 fs/libfs.c  |   5 -
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   5 +
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  56 +++
 security/selinux/include/classmap.h |   2 +
 10 files changed, 212 insertions(+), 54 deletions(-)

-- 
2.29.2.299.gdc1121823c-goog



[PATCH v13 3/4] selinux: teach SELinux about anonymous inodes

2020-11-11 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 security/selinux/hooks.c| 56 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 58 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6b1826fc3658..d092aa512868 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2927,6 +2927,61 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   if (context_isec->initialized != LABEL_INITIALIZED)
+   return -EACCES;
+
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   ANON_INODE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6992,6 +7047,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.29.2.299.gdc1121823c-goog



Re: [PATCH v12 3/4] selinux: teach SELinux about anonymous inodes

2020-11-10 Thread Lokesh Gidra
On Tue, Nov 10, 2020 at 6:13 PM Paul Moore  wrote:
>
> On Tue, Nov 10, 2020 at 1:24 PM Lokesh Gidra  wrote:
> > On Mon, Nov 9, 2020 at 7:12 PM Paul Moore  wrote:
> > > On Fri, Nov 6, 2020 at 10:56 AM Lokesh Gidra  
> > > wrote:
> > > >
> > > > From: Daniel Colascione 
> > > >
> > > > This change uses the anon_inodes and LSM infrastructure introduced in
> > > > the previous patches to give SELinux the ability to control
> > > > anonymous-inode files that are created using the new
> > > > anon_inode_getfd_secure() function.
> > > >
> > > > A SELinux policy author detects and controls these anonymous inodes by
> > > > adding a name-based type_transition rule that assigns a new security
> > > > type to anonymous-inode files created in some domain. The name used
> > > > for the name-based transition is the name associated with the
> > > > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > > > "[perf_event]".
> > > >
> > > > Example:
> > > >
> > > > type uffd_t;
> > > > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > > > allow sysadm_t uffd_t:anon_inode { create };
> > > >
> > > > (The next patch in this series is necessary for making userfaultfd
> > > > support this new interface.  The example above is just
> > > > for exposition.)
> > > >
> > > > Signed-off-by: Daniel Colascione 
> > > > Signed-off-by: Lokesh Gidra 
> > > > ---
> > > >  security/selinux/hooks.c| 53 +
> > > >  security/selinux/include/classmap.h |  2 ++
> > > >  2 files changed, 55 insertions(+)
> > > >
> > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > > index 6b1826fc3658..1c0adcdce7a8 100644
> > > > --- a/security/selinux/hooks.c
> > > > +++ b/security/selinux/hooks.c
> > > > @@ -2927,6 +2927,58 @@ static int selinux_inode_init_security(struct 
> > > > inode *inode, struct inode *dir,
> > > > return 0;
> > > >  }
> > > >
> > > > +static int selinux_inode_init_security_anon(struct inode *inode,
> > > > +   const struct qstr *name,
> > > > +   const struct inode 
> > > > *context_inode)
> > > > +{
> > > > +   const struct task_security_struct *tsec = 
> > > > selinux_cred(current_cred());
> > > > +   struct common_audit_data ad;
> > > > +   struct inode_security_struct *isec;
> > > > +   int rc;
> > > > +
> > > > +   if (unlikely(!selinux_initialized(_state)))
> > > > +   return 0;
> > > > +
> > > > +   isec = selinux_inode(inode);
> > > > +
> > > > +   /*
> > > > +* We only get here once per ephemeral inode.  The inode has
> > > > +* been initialized via inode_alloc_security but is otherwise
> > > > +* untouched.
> > > > +*/
> > > > +
> > > > +   if (context_inode) {
> > > > +   struct inode_security_struct *context_isec =
> > > > +   selinux_inode(context_inode);
> > > > +   isec->sclass = context_isec->sclass;
> > > > +   isec->sid = context_isec->sid;
> > >
> > > I suppose this isn't a major concern given the limited usage at the
> > > moment, but I wonder if it would be a good idea to make sure the
> > > context_inode's SELinux label is valid before we assign it to the
> > > anonymous inode?  If it is invalid, what should we do?  Do we attempt
> > > to (re)validate it?  Do we simply fallback to the transition approach?
> >
> > Frankly, I'm not too familiar with SELinux. Originally this patch
> > series was developed by Daniel, in consultation with Stephen Smalley.
> > In my (probably naive) opinion we should fallback to transition
> > approach. But I'd request you to tell me if this needs to be addressed
> > now, and if so then what's the right approach.
> >
> > If the decision is to address this now, then what's the best way to
> > check the SELinux label validity?
>
> You can check to see if an inode's label is valid by look

Re: [PATCH v12 3/4] selinux: teach SELinux about anonymous inodes

2020-11-10 Thread Lokesh Gidra
Thanks a lot Paul for the reviewing this patch.

On Mon, Nov 9, 2020 at 7:12 PM Paul Moore  wrote:
>
> On Fri, Nov 6, 2020 at 10:56 AM Lokesh Gidra  wrote:
> >
> > From: Daniel Colascione 
> >
> > This change uses the anon_inodes and LSM infrastructure introduced in
> > the previous patches to give SELinux the ability to control
> > anonymous-inode files that are created using the new
> > anon_inode_getfd_secure() function.
> >
> > A SELinux policy author detects and controls these anonymous inodes by
> > adding a name-based type_transition rule that assigns a new security
> > type to anonymous-inode files created in some domain. The name used
> > for the name-based transition is the name associated with the
> > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > "[perf_event]".
> >
> > Example:
> >
> > type uffd_t;
> > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > allow sysadm_t uffd_t:anon_inode { create };
> >
> > (The next patch in this series is necessary for making userfaultfd
> > support this new interface.  The example above is just
> > for exposition.)
> >
> > Signed-off-by: Daniel Colascione 
> > Signed-off-by: Lokesh Gidra 
> > ---
> >  security/selinux/hooks.c| 53 +
> >  security/selinux/include/classmap.h |  2 ++
> >  2 files changed, 55 insertions(+)
> >
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 6b1826fc3658..1c0adcdce7a8 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -2927,6 +2927,58 @@ static int selinux_inode_init_security(struct inode 
> > *inode, struct inode *dir,
> > return 0;
> >  }
> >
> > +static int selinux_inode_init_security_anon(struct inode *inode,
> > +   const struct qstr *name,
> > +   const struct inode 
> > *context_inode)
> > +{
> > +   const struct task_security_struct *tsec = 
> > selinux_cred(current_cred());
> > +   struct common_audit_data ad;
> > +   struct inode_security_struct *isec;
> > +   int rc;
> > +
> > +   if (unlikely(!selinux_initialized(_state)))
> > +   return 0;
> > +
> > +   isec = selinux_inode(inode);
> > +
> > +   /*
> > +* We only get here once per ephemeral inode.  The inode has
> > +* been initialized via inode_alloc_security but is otherwise
> > +* untouched.
> > +*/
> > +
> > +   if (context_inode) {
> > +   struct inode_security_struct *context_isec =
> > +   selinux_inode(context_inode);
> > +   isec->sclass = context_isec->sclass;
> > +   isec->sid = context_isec->sid;
>
> I suppose this isn't a major concern given the limited usage at the
> moment, but I wonder if it would be a good idea to make sure the
> context_inode's SELinux label is valid before we assign it to the
> anonymous inode?  If it is invalid, what should we do?  Do we attempt
> to (re)validate it?  Do we simply fallback to the transition approach?
>
Frankly, I'm not too familiar with SELinux. Originally this patch
series was developed by Daniel, in consultation with Stephen Smalley.
In my (probably naive) opinion we should fallback to transition
approach. But I'd request you to tell me if this needs to be addressed
now, and if so then what's the right approach.

If the decision is to address this now, then what's the best way to
check the SELinux label validity?

> > +   } else {
> > +   isec->sclass = SECCLASS_ANON_INODE;
> > +   rc = security_transition_sid(
> > +   _state, tsec->sid, tsec->sid,
> > +   isec->sclass, name, >sid);
> > +   if (rc)
> > +   return rc;
> > +   }
> > +
> > +   isec->initialized = LABEL_INITIALIZED;
> > +
> > +   /*
> > +* Now that we've initialized security, check whether we're
> > +* allowed to actually create this type of anonymous inode.
> > +*/
> > +
> > +   ad.type = LSM_AUDIT_DATA_INODE;
> > +   ad.u.inode = inode;
> > +
> > +   return avc_has_perm(_state,
> > +   tsec->sid,
> > +   isec->sid,
> > +   isec->sclass,
&

[PATCH v12 4/4] userfaultfd: use secure anon inodes for userfaultfd

2020-11-06 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 000b457ad087..dd78daf06de6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -972,14 +972,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -989,7 +989,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1100,7 +1100,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1160,6 +1160,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1167,7 +1168,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1985,8 +1986,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v12 0/4] SELinux support for anonymous inodes and UFFD

2020-11-06 Thread Lokesh Gidra
g/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  fs: add LSM-supporting anon-inode interface
  selinux: teach SELinux about anonymous inodes
  userfaultfd: use secure anon inodes for userfaultfd

Lokesh Gidra (1):
  security: add inode_init_security_anon() LSM hook

 fs/anon_inodes.c| 150 
 fs/libfs.c  |   5 -
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   5 +
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  53 ++
 security/selinux/include/classmap.h |   2 +
 10 files changed, 209 insertions(+), 54 deletions(-)

-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v12 3/4] selinux: teach SELinux about anonymous inodes

2020-11-06 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6b1826fc3658..1c0adcdce7a8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2927,6 +2927,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6992,6 +7044,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v12 2/4] fs: add LSM-supporting anon-inode interface

2020-11-06 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules.
For example, in case of userfaultfd, the created inode is a 'logical child'
of the context_inode (userfaultfd inode of the parent process) in the sense
that it provides the security context required during creation of the child
process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Delete obsolete comments to alloc_anon_inode()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make __anon_inode_getfile() static]
[Use correct error cast in __anon_inode_getfile()]
[Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c| 150 ++--
 fs/libfs.c  |   5 --
 include/linux/anon_inodes.h |   5 ++
 3 files changed, 115 insertions(+), 45 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..f4b35aaed7b2 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+   }
+
+   file = alloc_file_pseudo(inode, anon_inode_mnt, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
if (IS_ERR(file))
-   goto err;
+   

[PATCH v12 1/4] security: add inode_init_security_anon() LSM hook

2020-11-06 Thread Lokesh Gidra
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.

The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.

Signed-off-by: Lokesh Gidra 
---
 include/linux/lsm_hook_defs.h |  2 ++
 include/linux/lsm_hooks.h |  9 +
 include/linux/security.h  | 10 ++
 security/security.c   |  8 
 4 files changed, 29 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..435a2e22ff95 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct 
inode *inode)
 LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
 struct inode *dir, const struct qstr *qstr, const char **name,
 void **value, size_t *len)
+LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
+const struct qstr *name, const struct inode *context_inode)
 LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
 umode_t mode)
 LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7ab8afb..3af055b7ee1f 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -233,6 +233,15 @@
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ *  Set up the incore security field for the new anonymous inode
+ *  and return whether the inode creation is permitted by the security
+ *  module or not.
+ *  @inode contains the inode structure
+ *  @name name of the anonymous inode class
+ *  @context_inode optional related inode
+ * Returns 0 on success, -EACCES if the security module denies the
+ * creation of this inode, or another -errno upon other errors.
  * @inode_create:
  * Check permission to create a regular file.
  * @dir contains inode structure of the parent of the new file.
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..7494a93b9ed9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -323,6 +323,9 @@ void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr,
 initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode);
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len);
@@ -737,6 +740,13 @@ static inline int security_inode_init_security(struct 
inode *inode,
return 0;
 }
 
+static inline int security_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode 
*context_inode)
+{
+   return 0;
+}
+
 static inline int security_old_inode_init_security(struct inode *inode,
   struct inode *dir,
   const struct qstr *qstr,
diff --git a/security/security.c b/security/security.c
index a28045dc9e7f..8989ba6af4f6 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1058,6 +1058,14 @@ int security_inode_init_security(struct inode *inode, 
struct inode *dir,
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+   return call_int_hook(inode_init_security_anon, 0, inode, name,
+context_inode);
+}
+
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len)
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v11 4/4] userfaultfd: use secure anon inodes for userfaultfd

2020-11-05 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
Reviewed-by: Eric Biggers 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 000b457ad087..dd78daf06de6 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -972,14 +972,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -989,7 +989,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1100,7 +1100,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1160,6 +1160,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1167,7 +1168,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1985,8 +1986,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v11 2/4] fs: add LSM-supporting anon-inode interface

2020-11-05 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.

For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Update comments to alloc_anon_inode()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make __anon_inode_getfile() static]
[Use correct error cast in __anon_inode_getfile()]
[Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c| 149 ++--
 fs/libfs.c  |   6 +-
 include/linux/anon_inodes.h |   5 ++
 3 files changed, 117 insertions(+), 43 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..fc935acb90d6 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+   }
+
+   file = alloc_file_pseudo(inode, anon_inode_mnt, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
  

[PATCH v11 3/4] selinux: teach SELinux about anonymous inodes

2020-11-05 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6b1826fc3658..1c0adcdce7a8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2927,6 +2927,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6992,6 +7044,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v11 1/4] security: add inode_init_security_anon() LSM hook

2020-11-05 Thread Lokesh Gidra
This change adds a new LSM hook, inode_init_security_anon(), that
will be used while creating secure anonymous inodes.

The new hook accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon-
inode of the same type.

Signed-off-by: Lokesh Gidra 
---
 include/linux/lsm_hook_defs.h |  2 ++
 include/linux/lsm_hooks.h |  9 +
 include/linux/security.h  | 10 ++
 security/security.c   |  8 
 4 files changed, 29 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..435a2e22ff95 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct 
inode *inode)
 LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
 struct inode *dir, const struct qstr *qstr, const char **name,
 void **value, size_t *len)
+LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
+const struct qstr *name, const struct inode *context_inode)
 LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
 umode_t mode)
 LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7ab8afb..3af055b7ee1f 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -233,6 +233,15 @@
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ *  Set up the incore security field for the new anonymous inode
+ *  and return whether the inode creation is permitted by the security
+ *  module or not.
+ *  @inode contains the inode structure
+ *  @name name of the anonymous inode class
+ *  @context_inode optional related inode
+ * Returns 0 on success, -EACCES if the security module denies the
+ * creation of this inode, or another -errno upon other errors.
  * @inode_create:
  * Check permission to create a regular file.
  * @dir contains inode structure of the parent of the new file.
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..7494a93b9ed9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -323,6 +323,9 @@ void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr,
 initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode);
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len);
@@ -737,6 +740,13 @@ static inline int security_inode_init_security(struct 
inode *inode,
return 0;
 }
 
+static inline int security_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode 
*context_inode)
+{
+   return 0;
+}
+
 static inline int security_old_inode_init_security(struct inode *inode,
   struct inode *dir,
   const struct qstr *qstr,
diff --git a/security/security.c b/security/security.c
index a28045dc9e7f..8989ba6af4f6 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1058,6 +1058,14 @@ int security_inode_init_security(struct inode *inode, 
struct inode *dir,
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+   return call_int_hook(inode_init_security_anon, 0, inode, name,
+context_inode);
+}
+
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 const struct qstr *qstr, const char **name,
 void **value, size_t *len)
-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v11 0/4] SELinux support for anonymous inodes and UFFD

2020-11-05 Thread Lokesh Gidra
  userfaultfd: use secure anon inodes for userfaultfd

Lokesh Gidra (1):
  security: add inode_init_security_anon() LSM hook

 fs/anon_inodes.c| 149 
 fs/libfs.c  |   6 +-
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   5 +
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  53 ++
 security/selinux/include/classmap.h |   2 +
 10 files changed, 211 insertions(+), 52 deletions(-)

-- 
2.29.1.341.ge80a0c044ae-goog



[PATCH v10 3/3] Use secure anon inodes for userfaultfd

2020-11-03 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..918535b49475 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -978,14 +978,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -995,7 +995,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1106,7 +1106,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1166,6 +1166,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1173,7 +1174,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1995,8 +1996,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v10 1/3] Add a new LSM-supporting anonymous inode interface

2020-11-03 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.

For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Fix comment documenting return values of inode_init_security_anon()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make __anon_inode_getfile() static]
[Use correct error cast in __anon_inode_getfile()]
[Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c  | 148 +-
 include/linux/anon_inodes.h   |   8 ++
 include/linux/lsm_hook_defs.h |   2 +
 include/linux/lsm_hooks.h |   9 +++
 include/linux/security.h  |  10 +++
 security/security.c   |   8 ++
 6 files changed, 145 insertions(+), 40 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..a3fe08fcaa52 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);

[PATCH v10 2/3] Teach SELinux about anonymous inodes

2020-11-03 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patch to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
Cc: Al Viro 
Cc: Andrew Morton 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a340986aa92e..7b22c3112583 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6987,6 +7039,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v10 0/3] SELinux support for anonymous inodes and UFFD

2020-11-03 Thread Lokesh Gidra
Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors and in the future, other kinds of
anonymous-inode-based file descriptor.  SELinux policy authors can
apply policy types to anonymous inodes by providing name-based
transition rules keyed off the anonymous inode internal name (
"[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
applying policy to the new SIDs thus produced.

With SELinux managed userfaultfd, an admin can control creation and
movement of the file descriptors. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the corresponding
security_ptrace_access_check() checks. For privacy, the admin may
want to deny such accesses, which is possible with SELinux support.

Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
allows callers to opt into this SELinux management. In this new "secure"
mode, anon_inodes create new ephemeral inodes for anonymous file objects
instead of reusing the normal anon_inodes singleton dummy inode. A new
LSM hook gives security modules an opportunity to configure and veto
these ephemeral inodes.

This patch series is one of two fork of [1] and is an
alternative to [2].

The primary difference between the two patch series is that this
partch series creates a unique inode for each "secure" anonymous
inode, while the other patch series ([2]) continues using the
singleton dummy anonymous inode and adds a way to attach SELinux
security information directly to file objects.

I prefer the approach in this patch series because 1) it's a smaller
patch than [2], and 2) it produces a more regular security
architecture: in this patch series, secure anonymous inodes aren't
S_PRIVATE and they maintain the SELinux property that the label for a
file is in its inode. We do need an additional inode per anonymous
file, but per-struct-file inode creation doesn't seem to be a problem
for pipes and sockets.

The previous version of this feature ([1]) created a new SELinux
security class for userfaultfd file descriptors. This version adopts
the generic transition-based approach of [2].

This patch series also differs from [2] in that it doesn't affect all
anonymous inodes right away --- instead requiring anon_inodes callers
to opt in --- but this difference isn't one of basic approach. The
important question to resolve is whether we should be creating new
inodes or enhancing per-file data.

Changes from the first version of the patch:

  - Removed some error checks
  - Defined a new anon_inode SELinux class to resolve the
ambiguity in [3]
  - Inherit sclass as well as descriptor from context inode

Changes from the second version of the patch:

  - Fixed example policy in the commit message to reflect the use of
the new anon_inode class.

Changes from the third version of the patch:

  - Dropped the fops parameter to the LSM hook
  - Documented hook parameters
  - Fixed incorrect class used for SELinux transition
  - Removed stray UFFD changed early in the series
  - Removed a redundant ERR_PTR(PTR_ERR())

Changes from the fourth version of the patch:

  - Removed an unused parameter from an internal function
  - Fixed function documentation

Changes from the fifth version of the patch:

  - Fixed function documentation in fs/anon_inodes.c and
include/linux/lsm_hooks.h
  - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
owner from userfaultfd_ctx.

Changes from the sixth version of the patch:

  - Removed definition of anon_inode_getfile_secure() as there are no
callers.
  - Simplified function description of anon_inode_getfd_secure().
  - Elaborated more on the purpose of 'context_inode' in commit message.

Changes from the seventh version of the patch:

  - Fixed error handling in _anon_inode_getfile().
  - Fixed minor comment and indentation related issues.

Changes from the eighth version of the patch:

  - Replaced selinux_state.initialized with selinux_state.initialized

Changes from the ninth version of the patch:

  - Fixed function names in fs/anon_inodes.c
  - Fixed comment of anon_inode_getfd_secure()
  - Fixed name of the patch wherein userfaultfd code uses
anon_inode_getfd_secure()

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  Add a new LSM-supporting anonymous inode interface
  Teach SELinux about anonymous inodes
  Use secure anon inodes for userfaultfd

 fs/anon_inodes.c| 148 
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   8 ++
 

[PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-10-26 Thread Lokesh Gidra
With this change, when the knob is set to 0, it allows unprivileged
users to call userfaultfd, like when it is set to 1, but with the
restriction that page faults from only user-mode can be handled.
In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
EPERM.

This enables administrators to reduce the likelihood that an attacker
with access to userfaultfd can delay faulting kernel code to widen
timing windows for other exploits.

The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy
live migration, which will be unnoticeable to the VM guests. To avoid
this, set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that
the Android userland will behave as with the sysctl set to zero. So
without this commit, any Linux binary using userfaultfd to manage its
memory would behave differently if run within the Android userland.
For more details, refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/

Signed-off-by: Lokesh Gidra 
Reviewed-by: Andrea Arcangeli 
---
 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 10 --
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index f455fa00c00f..d06a98b2a4e7 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a 
zone.
 unprivileged_userfaultfd
 
 
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.
 
-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.
 
 
 user_reserve_kbytes
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 605599fde015..894cc28142e7 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd __read_mostly;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+   if (!sysctl_unprivileged_userfaultfd &&
+   (flags & UFFD_USER_MODE_ONLY) == 0 &&
+   !capable(CAP_SYS_PTRACE)) {
+   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
+   "sysctl knob to 1 if kernel faults must be handled "
+   "without obtaining CAP_SYS_PTRACE capability\n");
return -EPERM;
+   }
 
BUG_ON(!current->mm);
 
-- 
2.29.0.rc1.297.gfa9743e501-goog



[PATCH v6 1/2] Add UFFD_USER_MODE_ONLY

2020-10-26 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
Reviewed-by: Andrea Arcangeli 
---
 fs/userfaultfd.c | 10 +-
 include/uapi/linux/userfaultfd.h |  9 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 000b457ad087..605599fde015 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY) {
+   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
+   "sysctl knob to 1 if kernel faults must be handled "
+   "without obtaining CAP_SYS_PTRACE capability\n");
+   goto out;
+   }
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1965,10 +1972,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.29.0.rc1.297.gfa9743e501-goog



[PATCH v6 0/2] Control over userfaultfd kernel-fault handling

2020-10-26 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v5:

  - Added printk_once when unprivileged_userfaultfd is set to 0 and
userfaultfd syscall is called without UFFD_USER_MODE_ONLY in the
absence of CAP_SYS_PTRACE capability.

Changes since v4:

  - Added warning when bailing out from handling kernel fault.

Changes since v3:

  - Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
  - The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

  - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 20 +---
 include/uapi/linux/userfaultfd.h|  9 +
 3 files changed, 36 insertions(+), 8 deletions(-)

-- 
2.29.0.rc1.297.gfa9743e501-goog



Re: [PATCH v10 0/3] SELinux support for anonymous inodes and UFFD

2020-10-26 Thread Lokesh Gidra
On Sun, Oct 11, 2020 at 1:29 AM Lokesh Gidra  wrote:
>
> Userfaultfd in unprivileged contexts could be potentially very
> useful. We'd like to harden userfaultfd to make such unprivileged use
> less risky. This patch series allows SELinux to manage userfaultfd
> file descriptors and in the future, other kinds of
> anonymous-inode-based file descriptor.  SELinux policy authors can
> apply policy types to anonymous inodes by providing name-based
> transition rules keyed off the anonymous inode internal name (
> "[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
> applying policy to the new SIDs thus produced.
>
> With SELinux managed userfaultfd, an admin can control creation and
> movement of the file descriptors. In particular, handling of
> a userfaultfd descriptor by a different process is essentially a
> ptrace access into the process, without any of the corresponding
> security_ptrace_access_check() checks. For privacy, the admin may
> want to deny such accesses, which is possible with SELinux support.
>
> Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
> allows callers to opt into this SELinux management. In this new "secure"
> mode, anon_inodes create new ephemeral inodes for anonymous file objects
> instead of reusing the normal anon_inodes singleton dummy inode. A new
> LSM hook gives security modules an opportunity to configure and veto
> these ephemeral inodes.
>
> This patch series is one of two fork of [1] and is an
> alternative to [2].
>
> The primary difference between the two patch series is that this
> partch series creates a unique inode for each "secure" anonymous
> inode, while the other patch series ([2]) continues using the
> singleton dummy anonymous inode and adds a way to attach SELinux
> security information directly to file objects.
>
> I prefer the approach in this patch series because 1) it's a smaller
> patch than [2], and 2) it produces a more regular security
> architecture: in this patch series, secure anonymous inodes aren't
> S_PRIVATE and they maintain the SELinux property that the label for a
> file is in its inode. We do need an additional inode per anonymous
> file, but per-struct-file inode creation doesn't seem to be a problem
> for pipes and sockets.
>
> The previous version of this feature ([1]) created a new SELinux
> security class for userfaultfd file descriptors. This version adopts
> the generic transition-based approach of [2].
>
> This patch series also differs from [2] in that it doesn't affect all
> anonymous inodes right away --- instead requiring anon_inodes callers
> to opt in --- but this difference isn't one of basic approach. The
> important question to resolve is whether we should be creating new
> inodes or enhancing per-file data.
>
> Changes from the first version of the patch:
>
>   - Removed some error checks
>   - Defined a new anon_inode SELinux class to resolve the
> ambiguity in [3]
>   - Inherit sclass as well as descriptor from context inode
>
> Changes from the second version of the patch:
>
>   - Fixed example policy in the commit message to reflect the use of
> the new anon_inode class.
>
> Changes from the third version of the patch:
>
>   - Dropped the fops parameter to the LSM hook
>   - Documented hook parameters
>   - Fixed incorrect class used for SELinux transition
>   - Removed stray UFFD changed early in the series
>   - Removed a redundant ERR_PTR(PTR_ERR())
>
> Changes from the fourth version of the patch:
>
>   - Removed an unused parameter from an internal function
>   - Fixed function documentation
>
> Changes from the fifth version of the patch:
>
>   - Fixed function documentation in fs/anon_inodes.c and
> include/linux/lsm_hooks.h
>   - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
> owner from userfaultfd_ctx.
>
> Changes from the sixth version of the patch:
>
>   - Removed definition of anon_inode_getfile_secure() as there are no
> callers.
>   - Simplified function description of anon_inode_getfd_secure().
>   - Elaborated more on the purpose of 'context_inode' in commit message.
>
> Changes from the seventh version of the patch:
>
>   - Fixed error handling in _anon_inode_getfile().
>   - Fixed minor comment and indentation related issues.
>
> Changes from the eighth version of the patch:
>
>   - Replaced selinux_state.initialized with selinux_state.initialized
>
> Changes from the ninth version of the patch:
>
>   - Fixed function names in fs/anon_inodes.c
>   - Fixed comment of anon_inode_getfd_secure()
>   - Fixed name of the patch wherein userfaultfd code uses
> anon_inode_getfd_

Re: [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-10-23 Thread Lokesh Gidra
On Fri, Oct 23, 2020 at 7:48 PM Andrea Arcangeli  wrote:
>
> Hello everyone,
>
> On Sat, Oct 10, 2020 at 11:24:56PM -0700, Lokesh Gidra wrote:
> > With this change, when the knob is set to 0, it allows unprivileged
> > users to call userfaultfd, like when it is set to 1, but with the
> > restriction that page faults from only user-mode can be handled.
> > In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
> > must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
> > EPERM.
> >
> > This enables administrators to reduce the likelihood that
> > an attacker with access to userfaultfd can delay faulting kernel
> > code to widen timing windows for other exploits.
> >
> > The default value of this knob is changed to 0. This is required for
> > correct functioning of pipe mutex. However, this will fail postcopy
> > live migration, which will be unnoticeable to the VM guests. To avoid
> > this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details,
> > refer to Andrea's reply [1].
> >
> > [1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/
> >
> > Signed-off-by: Lokesh Gidra 
>
> Nobody commented so it seems everyone is on board with this change to
> synchronize the kernel default with the post-boot Android default.
>
> The email in the link above was pretty long, so the below would be a
> summary that could be added to the commit header:
>
> ==
>
> The main reason this change is desirable as in the short term is that
> the Android userland will behave as with the sysctl set to zero. So
> without this commit, any Linux binary using userfaultfd to manage its
> memory would behave differently if run within the Android userland.
>
> ==

Sure. I'll add it in the next revision.
>
> Reviewed-by: Andrea Arcangeli 
>
Thanks so much for the review. I hope it's ok to add your
'reviewed-by' in the next revision?
>
> BTW, this is still a minor nitpick, but a printk_once of the 1/2 could
> be added before the return -EPERM too, that's actually what I meant
> when I suggested to add a printk_once :), however the printk_once you
> added can turn out to be useful too for devs converting code to use
> bounce buffers, so it's fine too, just it could go under DEBUG_VM and
> to be ratelimited (similarly to the "FAULT_FLAG_ALLOW_RETRY missing
> %x\n" printk).

I'll move the printk_once from 1/2 to this patch, as you suggested.
>
> Thanks,
> Andrea
>


Re: [PATCH v4 0/2] Control over userfaultfd kernel-fault handling

2020-10-22 Thread Lokesh Gidra
On Thu, Oct 8, 2020 at 4:22 PM Nick Kralevich  wrote:
>
> On Wed, Oct 7, 2020 at 9:01 PM Andrea Arcangeli  wrote:
> >
> > Hello Lokesh,
> >
> > On Wed, Oct 07, 2020 at 01:26:55PM -0700, Lokesh Gidra wrote:
> > > On Wed, Sep 23, 2020 at 11:56 PM Lokesh Gidra  
> > > wrote:
> > > >
> > > > This patch series is split from [1]. The other series enables SELinux
> > > > support for userfaultfd file descriptors so that its creation and
> > > > movement can be controlled.
> > > >
> > > > It has been demonstrated on various occasions that suspending kernel
> > > > code execution for an arbitrary amount of time at any access to
> > > > userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
> > > > to change the intended behavior of the kernel. For instance, handling
> > > > page faults in kernel-mode using userfaultfd has been exploited in [2, 
> > > > 3].
> > > > Likewise, FUSE, which is similar to userfaultfd in this respect, has 
> > > > been
> > > > exploited in [4, 5] for similar outcome.
> > > >
> > > > This small patch series adds a new flag to userfaultfd(2) that allows
> > > > callers to give up the ability to handle kernel-mode faults with the
> > > > resulting UFFD file object. It then adds a 'user-mode only' option to
> > > > the unprivileged_userfaultfd sysctl knob to require unprivileged
> > > > callers to use this new flag.
> > > >
> > > > The purpose of this new interface is to decrease the chance of an
> > > > unprivileged userfaultfd user taking advantage of userfaultfd to
> > > > enhance security vulnerabilities by lengthening the race window in
> > > > kernel code.
> > > >
> > > > [1] 
> > > > https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
> > > > [2] https://duasynt.com/blog/linux-kernel-heap-spray
> > > > [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
> >
> > I've looking at those links and I've been trying to verify the link
> > [3] is relevant.
> >
> > Specifically I've been trying to verify if 1) current state of the art
> > modern SLUB randomization techniques already enabled in production and
> > rightfully wasting some CPU in all enterprise kernels to prevent
> > things like above to become an issue in practice 2) combined with the
> > fact different memcg need to share the same kmemcaches (which was
> > incidentally fixed a few months ago upstream) and 3) further
> > robustness enhancements against exploits in the slub metadata, may
> > already render the exploit [3] from 2016 irrelevant in practice.
>
> It's quite possible that some other mitigation was helpful against the
> technique used by this particular exploit. It's the nature of exploits
> that they are fragile and will change as new soft mitigations are
> introduced. The effectiveness of a particular exploit mitigation
> change is orthogonal to the change presented here.
>
> The purpose of this change is to prevent an attacker from suspending
> kernel code execution and having kernel data structures in a
> predictable state. This makes it harder for an attacker to "win" race
> conditions against various kernel data structures. This change
> compliments other kernel hardening changes such as the changes you've
> referenced above. Focusing on one particular exploit somewhat misses
> the point of this change.
>
> >
> > So I started by trying to reproduce [3] by building 4.5.1 with a
> > .config with no robustness features and I booted it on fedora-32 or
> > gentoo userland and I cannot even invoke call_usermodehelper. Calling
> > socket(22, AF_INET, 0) won't invoke such function. Can you reproduce
> > on 4.5.1? Which kernel .config should I use to build 4.5.1 in order
> > for call_usermodehelper to be invoked by the exploit? Could you help
> > to verify it?
>
> I haven't tried to verify this myself. I wonder if the usermode
> hardening changes also impacted this exploit? See
> https://lkml.org/lkml/2017/1/16/468
>
> But again, focusing on an exploit, which is inherently fragile in
> nature and dependent on the state of the kernel tree at a particular
> time, is unlikely to be useful to analyze this patch.
>
> >
> > It even has uninitialized variable spawning random perrors so it
> > doesn't give a warm fuzzy feeling:
> >
> > 
> > int main(int argc, char **argv) {
> > void *region, *map;
> > 

[PATCH v10 0/3] SELinux support for anonymous inodes and UFFD

2020-10-11 Thread Lokesh Gidra
Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors and in the future, other kinds of
anonymous-inode-based file descriptor.  SELinux policy authors can
apply policy types to anonymous inodes by providing name-based
transition rules keyed off the anonymous inode internal name (
"[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
applying policy to the new SIDs thus produced.

With SELinux managed userfaultfd, an admin can control creation and
movement of the file descriptors. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the corresponding
security_ptrace_access_check() checks. For privacy, the admin may
want to deny such accesses, which is possible with SELinux support.

Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
allows callers to opt into this SELinux management. In this new "secure"
mode, anon_inodes create new ephemeral inodes for anonymous file objects
instead of reusing the normal anon_inodes singleton dummy inode. A new
LSM hook gives security modules an opportunity to configure and veto
these ephemeral inodes.

This patch series is one of two fork of [1] and is an
alternative to [2].

The primary difference between the two patch series is that this
partch series creates a unique inode for each "secure" anonymous
inode, while the other patch series ([2]) continues using the
singleton dummy anonymous inode and adds a way to attach SELinux
security information directly to file objects.

I prefer the approach in this patch series because 1) it's a smaller
patch than [2], and 2) it produces a more regular security
architecture: in this patch series, secure anonymous inodes aren't
S_PRIVATE and they maintain the SELinux property that the label for a
file is in its inode. We do need an additional inode per anonymous
file, but per-struct-file inode creation doesn't seem to be a problem
for pipes and sockets.

The previous version of this feature ([1]) created a new SELinux
security class for userfaultfd file descriptors. This version adopts
the generic transition-based approach of [2].

This patch series also differs from [2] in that it doesn't affect all
anonymous inodes right away --- instead requiring anon_inodes callers
to opt in --- but this difference isn't one of basic approach. The
important question to resolve is whether we should be creating new
inodes or enhancing per-file data.

Changes from the first version of the patch:

  - Removed some error checks
  - Defined a new anon_inode SELinux class to resolve the
ambiguity in [3]
  - Inherit sclass as well as descriptor from context inode

Changes from the second version of the patch:

  - Fixed example policy in the commit message to reflect the use of
the new anon_inode class.

Changes from the third version of the patch:

  - Dropped the fops parameter to the LSM hook
  - Documented hook parameters
  - Fixed incorrect class used for SELinux transition
  - Removed stray UFFD changed early in the series
  - Removed a redundant ERR_PTR(PTR_ERR())

Changes from the fourth version of the patch:

  - Removed an unused parameter from an internal function
  - Fixed function documentation

Changes from the fifth version of the patch:

  - Fixed function documentation in fs/anon_inodes.c and
include/linux/lsm_hooks.h
  - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
owner from userfaultfd_ctx.

Changes from the sixth version of the patch:

  - Removed definition of anon_inode_getfile_secure() as there are no
callers.
  - Simplified function description of anon_inode_getfd_secure().
  - Elaborated more on the purpose of 'context_inode' in commit message.

Changes from the seventh version of the patch:

  - Fixed error handling in _anon_inode_getfile().
  - Fixed minor comment and indentation related issues.

Changes from the eighth version of the patch:

  - Replaced selinux_state.initialized with selinux_state.initialized

Changes from the ninth version of the patch:

  - Fixed function names in fs/anon_inodes.c
  - Fixed comment of anon_inode_getfd_secure()
  - Fixed name of the patch wherein userfaultfd code uses
anon_inode_getfd_secure()

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  Add a new LSM-supporting anonymous inode interface
  Teach SELinux about anonymous inodes
  Use secure anon inodes for userfaultfd

 fs/anon_inodes.c| 148 
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   8 ++
 

[PATCH v10 2/3] Teach SELinux about anonymous inodes

2020-10-11 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patch to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
Cc: Al Viro 
Cc: Andrew Morton 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a340986aa92e..7b22c3112583 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6987,6 +7039,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v10 1/3] Add a new LSM-supporting anonymous inode interface

2020-10-11 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.

For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Fix comment documenting return values of inode_init_security_anon()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make __anon_inode_getfile() static]
[Use correct error cast in __anon_inode_getfile()]
[Fix error handling in __anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c  | 148 +-
 include/linux/anon_inodes.h   |   8 ++
 include/linux/lsm_hook_defs.h |   2 +
 include/linux/lsm_hooks.h |   9 +++
 include/linux/security.h  |  10 +++
 security/security.c   |   8 ++
 6 files changed, 145 insertions(+), 40 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..a3fe08fcaa52 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *__anon_inode_getfile(const char *name,
+const struct file_operations *fops,
+void *priv, int flags,
+const struct inode *context_inode,
+bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);

[PATCH v10 3/3] Use secure anon inodes for userfaultfd

2020-10-11 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..918535b49475 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -978,14 +978,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -995,7 +995,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1106,7 +1106,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1166,6 +1166,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1173,7 +1174,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1995,8 +1996,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-10-11 Thread Lokesh Gidra
With this change, when the knob is set to 0, it allows unprivileged
users to call userfaultfd, like when it is set to 1, but with the
restriction that page faults from only user-mode can be handled.
In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
EPERM.

This enables administrators to reduce the likelihood that
an attacker with access to userfaultfd can delay faulting kernel
code to widen timing windows for other exploits.

The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy
live migration, which will be unnoticeable to the VM guests. To avoid
this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/

Signed-off-by: Lokesh Gidra 
---
 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c|  6 --
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index 4b9d2e8e9142..4263d38c3c21 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a 
zone.
 unprivileged_userfaultfd
 
 
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.
 
-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.
 
 
 user_reserve_kbytes
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index bd229f06d4e9..0f8a975db3be 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd __read_mostly;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -1976,7 +1976,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+   if (!sysctl_unprivileged_userfaultfd &&
+   (flags & UFFD_USER_MODE_ONLY) == 0 &&
+   !capable(CAP_SYS_PTRACE))
return -EPERM;
 
BUG_ON(!current->mm);
-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v5 0/2] Control over userfaultfd kernel-fault handling

2020-10-11 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v4:

  - Added warning when bailing out from handling kernel fault.

Changes since v3:

  - Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
  - The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

  - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 16 +---
 include/uapi/linux/userfaultfd.h|  9 +
 3 files changed, 32 insertions(+), 8 deletions(-)

-- 
2.28.0.1011.ga647a8990f-goog



[PATCH v5 1/2] Add UFFD_USER_MODE_ONLY

2020-10-11 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 10 +-
 include/uapi/linux/userfaultfd.h |  9 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..bd229f06d4e9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY) {
+   printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
+   "sysctl knob to 1 if kernel faults must be handled "
+   "without obtaining CAP_SYS_PTRACE capability\n");
+   goto out;
+   }
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1975,10 +1982,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.28.0.1011.ga647a8990f-goog



Re: [PATCH v9 0/3] SELinux support for anonymous inodes and UFFD

2020-10-07 Thread Lokesh Gidra
On Wed, Sep 23, 2020 at 12:33 PM Lokesh Gidra  wrote:
>
> Userfaultfd in unprivileged contexts could be potentially very
> useful. We'd like to harden userfaultfd to make such unprivileged use
> less risky. This patch series allows SELinux to manage userfaultfd
> file descriptors and in the future, other kinds of
> anonymous-inode-based file descriptor.  SELinux policy authors can
> apply policy types to anonymous inodes by providing name-based
> transition rules keyed off the anonymous inode internal name (
> "[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
> applying policy to the new SIDs thus produced.
>
> With SELinux managed userfaultfd, an admin can control creation and
> movement of the file descriptors. In particular, handling of
> a userfaultfd descriptor by a different process is essentially a
> ptrace access into the process, without any of the corresponding
> security_ptrace_access_check() checks. For privacy, the admin may
> want to deny such accesses, which is possible with SELinux support.
>
> Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
> allows callers to opt into this SELinux management. In this new "secure"
> mode, anon_inodes create new ephemeral inodes for anonymous file objects
> instead of reusing the normal anon_inodes singleton dummy inode. A new
> LSM hook gives security modules an opportunity to configure and veto
> these ephemeral inodes.
>
> This patch series is one of two fork of [1] and is an
> alternative to [2].
>
> The primary difference between the two patch series is that this
> partch series creates a unique inode for each "secure" anonymous
> inode, while the other patch series ([2]) continues using the
> singleton dummy anonymous inode and adds a way to attach SELinux
> security information directly to file objects.
>
> I prefer the approach in this patch series because 1) it's a smaller
> patch than [2], and 2) it produces a more regular security
> architecture: in this patch series, secure anonymous inodes aren't
> S_PRIVATE and they maintain the SELinux property that the label for a
> file is in its inode. We do need an additional inode per anonymous
> file, but per-struct-file inode creation doesn't seem to be a problem
> for pipes and sockets.
>
> The previous version of this feature ([1]) created a new SELinux
> security class for userfaultfd file descriptors. This version adopts
> the generic transition-based approach of [2].
>
> This patch series also differs from [2] in that it doesn't affect all
> anonymous inodes right away --- instead requiring anon_inodes callers
> to opt in --- but this difference isn't one of basic approach. The
> important question to resolve is whether we should be creating new
> inodes or enhancing per-file data.
>
> Changes from the first version of the patch:
>
>   - Removed some error checks
>   - Defined a new anon_inode SELinux class to resolve the
> ambiguity in [3]
>   - Inherit sclass as well as descriptor from context inode
>
> Changes from the second version of the patch:
>
>   - Fixed example policy in the commit message to reflect the use of
> the new anon_inode class.
>
> Changes from the third version of the patch:
>
>   - Dropped the fops parameter to the LSM hook
>   - Documented hook parameters
>   - Fixed incorrect class used for SELinux transition
>   - Removed stray UFFD changed early in the series
>   - Removed a redundant ERR_PTR(PTR_ERR())
>
> Changes from the fourth version of the patch:
>
>   - Removed an unused parameter from an internal function
>   - Fixed function documentation
>
> Changes from the fifth version of the patch:
>
>   - Fixed function documentation in fs/anon_inodes.c and
> include/linux/lsm_hooks.h
>   - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
> owner from userfaultfd_ctx.
>
> Changes from the sixth version of the patch:
>
>   - Removed definition of anon_inode_getfile_secure() as there are no
> callers.
>   - Simplified function description of anon_inode_getfd_secure().
>   - Elaborated more on the purpose of 'context_inode' in commit message.
>
> Changes from the seventh version of the patch:
>
>   - Fixed error handling in _anon_inode_getfile().
>   - Fixed minor comment and indentation related issues.
>
> Changes from the eighth version of the patch:
>
>   - Replaced selinux_state.initialized with selinux_state.initialized
>

 Is there anything else that needs to be done before merging this
patch series? I urge the reviewers to please take a look.

>
> [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
> [2] 
>

Re: [PATCH v4 0/2] Control over userfaultfd kernel-fault handling

2020-10-07 Thread Lokesh Gidra
On Wed, Sep 23, 2020 at 11:56 PM Lokesh Gidra  wrote:
>
> This patch series is split from [1]. The other series enables SELinux
> support for userfaultfd file descriptors so that its creation and
> movement can be controlled.
>
> It has been demonstrated on various occasions that suspending kernel
> code execution for an arbitrary amount of time at any access to
> userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
> to change the intended behavior of the kernel. For instance, handling
> page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
> Likewise, FUSE, which is similar to userfaultfd in this respect, has been
> exploited in [4, 5] for similar outcome.
>
> This small patch series adds a new flag to userfaultfd(2) that allows
> callers to give up the ability to handle kernel-mode faults with the
> resulting UFFD file object. It then adds a 'user-mode only' option to
> the unprivileged_userfaultfd sysctl knob to require unprivileged
> callers to use this new flag.
>
> The purpose of this new interface is to decrease the chance of an
> unprivileged userfaultfd user taking advantage of userfaultfd to
> enhance security vulnerabilities by lengthening the race window in
> kernel code.
>
> [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
> [2] https://duasynt.com/blog/linux-kernel-heap-spray
> [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
> [4] 
> https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
> [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
>
> Changes since v3:
>
>   - Modified the meaning of value '0' of unprivileged_userfaultfd
> sysctl knob. Setting this knob to '0' now allows unprivileged users
> to use userfaultfd, but can handle page faults in user-mode only.
>   - The default value of unprivileged_userfaultfd sysctl knob is changed
> to '0'.
>
Request reviewers and maintainers to please take a look.

> Changes since v2:
>
>   - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
> userfaultfd().
>
> Changes since v1:
>
>   - Added external references to the threats from allowing unprivileged
> users to handle page faults from kernel-mode.
>   - Removed the new sysctl knob restricting handling of page
> faults from kernel-mode, and added an option for the same
> in the existing 'unprivileged_userfaultfd' knob.
>
> Lokesh Gidra (2):
>   Add UFFD_USER_MODE_ONLY
>   Add user-mode only option to unprivileged_userfaultfd sysctl knob
>
>  Documentation/admin-guide/sysctl/vm.rst | 15 ++-
>  fs/userfaultfd.c| 12 +---
>  include/uapi/linux/userfaultfd.h|  9 +
>  3 files changed, 28 insertions(+), 8 deletions(-)
>
> --
> 2.28.0.681.g6f77f65b4e-goog
>


Re: [PATCH 0/5] Speed up mremap on large regions

2020-10-02 Thread Lokesh Gidra
On Thu, Oct 1, 2020 at 10:36 PM Kirill A. Shutemov
 wrote:
>
> On Thu, Oct 01, 2020 at 05:09:02PM -0700, Lokesh Gidra wrote:
> > On Thu, Oct 1, 2020 at 9:00 AM Kalesh Singh  wrote:
> > >
> > > On Thu, Oct 1, 2020 at 8:27 AM Kirill A. Shutemov
> > >  wrote:
> > > >
> > > > On Wed, Sep 30, 2020 at 03:42:17PM -0700, Lokesh Gidra wrote:
> > > > > On Wed, Sep 30, 2020 at 3:32 PM Kirill A. Shutemov
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Sep 30, 2020 at 10:21:17PM +, Kalesh Singh wrote:
> > > > > > > mremap time can be optimized by moving entries at the PMD/PUD 
> > > > > > > level if
> > > > > > > the source and destination addresses are PMD/PUD-aligned and
> > > > > > > PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 
> > > > > > > and
> > > > > > > x86. Other architectures where this type of move is supported and 
> > > > > > > known to
> > > > > > > be safe can also opt-in to these optimizations by enabling 
> > > > > > > HAVE_MOVE_PMD
> > > > > > > and HAVE_MOVE_PUD.
> > > > > > >
> > > > > > > Observed Performance Improvements for remapping a PUD-aligned 
> > > > > > > 1GB-sized
> > > > > > > region on x86 and arm64:
> > > > > > >
> > > > > > > - HAVE_MOVE_PMD is already enabled on x86 : N/A
> > > > > > > - Enabling HAVE_MOVE_PUD on x86   : ~13x speed up
> > > > > > >
> > > > > > > - Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
> > > > > > > - Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
> > > > > > >
> > > > > > >   Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
> > > > > > >   give a total of ~150x speed up on arm64.
> > > > > >
> > > > > > Is there a *real* workload that benefit from HAVE_MOVE_PUD?
> > > > > >
> > > > > We have a Java garbage collector under development which requires
> > > > > moving physical pages of multi-gigabyte heap using mremap. During this
> > > > > move, the application threads have to be paused for correctness. It is
> > > > > critical to keep this pause as short as possible to avoid jitters
> > > > > during user interaction. This is where HAVE_MOVE_PUD will greatly
> > > > > help.
> > > >
> > > > Any chance to quantify the effect of mremap() with and without
> > > > HAVE_MOVE_PUD?
> > > >
> > > > I doubt it's a major contributor to the GC pause. I expect you need to
> > > > move tens of gigs to get sizable effect. And if your GC routinely moves
> > > > tens of gigs, maybe problem somewhere else?
> > > >
> > > > I'm asking for numbers, because increase in complexity comes with cost.
> > > > If it doesn't provide an substantial benefit to a real workload
> > > > maintaining the code forever doesn't make sense.
> > >
> > mremap is indeed the biggest contributor to the GC pause. It has to
> > take place in what is typically known as a 'stop-the-world' pause,
> > wherein all application threads are paused. During this pause the GC
> > thread flips the GC roots (threads' stacks, globals etc.), and then
> > resumes threads along with concurrent compaction of the heap.This
> > GC-root flip differs depending on which compaction algorithm is being
> > used.
> >
> > In our case it involves updating object references in threads' stacks
> > and remapping java heap to a different location. The threads' stacks
> > can be handled in parallel with the mremap. Therefore, the dominant
> > factor is indeed the cost of mremap. From patches 2 and 4, it is clear
> > that remapping 1GB without this optimization will take ~9ms on arm64.
> >
> > Although this mremap has to happen only once every GC cycle, and the
> > typical size is also not going to be more than a GB or 2, pausing
> > application threads for ~9ms is guaranteed to cause jitters. OTOH,
> > with this optimization, mremap is reduced to ~60us, which is a totally
> > acceptable pause time.
> >
> > Unfortunately, implementation of the new GC algorithm hasn't yet
> > reached the point where I can quantify the effect of this
> > optimization. But I can confirm that without this optimization the new
> > GC will not be approved.
>
> IIUC, the 9ms -> 90us improvement attributed to combination HAVE_MOVE_PMD
> and HAVE_MOVE_PUD, right? I expect HAVE_MOVE_PMD to be reasonable for some
> workloads, but marginal benefit of HAVE_MOVE_PUD is in doubt. Do you see
> it's useful for your workload?
>
Yes, 9ms -> 90us is when both are combined. The past experience has
been that even ~1ms long stop-the-world pause is prone to cause
jitters. HAVE_MOVE_PMD takes us only this far. So HAVE_MOVE_PUD is
required to bring the mremap cost to acceptable level.

Ideally, I was hoping that the functionality of HAVE_MOVE_PMD can be
extended to all levels of the hierarchical page table, and in the
process simplify the implementation. But unfortunately, that doesn't
seem to be possible from patch 3.

> --
>  Kirill A. Shutemov


Re: [PATCH 0/5] Speed up mremap on large regions

2020-10-01 Thread Lokesh Gidra
On Thu, Oct 1, 2020 at 9:00 AM Kalesh Singh  wrote:
>
> On Thu, Oct 1, 2020 at 8:27 AM Kirill A. Shutemov
>  wrote:
> >
> > On Wed, Sep 30, 2020 at 03:42:17PM -0700, Lokesh Gidra wrote:
> > > On Wed, Sep 30, 2020 at 3:32 PM Kirill A. Shutemov
> > >  wrote:
> > > >
> > > > On Wed, Sep 30, 2020 at 10:21:17PM +, Kalesh Singh wrote:
> > > > > mremap time can be optimized by moving entries at the PMD/PUD level if
> > > > > the source and destination addresses are PMD/PUD-aligned and
> > > > > PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
> > > > > x86. Other architectures where this type of move is supported and 
> > > > > known to
> > > > > be safe can also opt-in to these optimizations by enabling 
> > > > > HAVE_MOVE_PMD
> > > > > and HAVE_MOVE_PUD.
> > > > >
> > > > > Observed Performance Improvements for remapping a PUD-aligned 
> > > > > 1GB-sized
> > > > > region on x86 and arm64:
> > > > >
> > > > > - HAVE_MOVE_PMD is already enabled on x86 : N/A
> > > > > - Enabling HAVE_MOVE_PUD on x86   : ~13x speed up
> > > > >
> > > > > - Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
> > > > > - Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
> > > > >
> > > > >   Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
> > > > >   give a total of ~150x speed up on arm64.
> > > >
> > > > Is there a *real* workload that benefit from HAVE_MOVE_PUD?
> > > >
> > > We have a Java garbage collector under development which requires
> > > moving physical pages of multi-gigabyte heap using mremap. During this
> > > move, the application threads have to be paused for correctness. It is
> > > critical to keep this pause as short as possible to avoid jitters
> > > during user interaction. This is where HAVE_MOVE_PUD will greatly
> > > help.
> >
> > Any chance to quantify the effect of mremap() with and without
> > HAVE_MOVE_PUD?
> >
> > I doubt it's a major contributor to the GC pause. I expect you need to
> > move tens of gigs to get sizable effect. And if your GC routinely moves
> > tens of gigs, maybe problem somewhere else?
> >
> > I'm asking for numbers, because increase in complexity comes with cost.
> > If it doesn't provide an substantial benefit to a real workload
> > maintaining the code forever doesn't make sense.
>
mremap is indeed the biggest contributor to the GC pause. It has to
take place in what is typically known as a 'stop-the-world' pause,
wherein all application threads are paused. During this pause the GC
thread flips the GC roots (threads' stacks, globals etc.), and then
resumes threads along with concurrent compaction of the heap.This
GC-root flip differs depending on which compaction algorithm is being
used.

In our case it involves updating object references in threads' stacks
and remapping java heap to a different location. The threads' stacks
can be handled in parallel with the mremap. Therefore, the dominant
factor is indeed the cost of mremap. From patches 2 and 4, it is clear
that remapping 1GB without this optimization will take ~9ms on arm64.

Although this mremap has to happen only once every GC cycle, and the
typical size is also not going to be more than a GB or 2, pausing
application threads for ~9ms is guaranteed to cause jitters. OTOH,
with this optimization, mremap is reduced to ~60us, which is a totally
acceptable pause time.

Unfortunately, implementation of the new GC algorithm hasn't yet
reached the point where I can quantify the effect of this
optimization. But I can confirm that without this optimization the new
GC will not be approved.


> Lokesh on this thread would be better able to answer this. I'll let
> him weigh in here.
> Thanks, Kalesh
> >
> > --
> >  Kirill A. Shutemov
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to kernel-team+unsubscr...@android.com.
> >


Re: [PATCH 0/5] Speed up mremap on large regions

2020-09-30 Thread Lokesh Gidra
On Wed, Sep 30, 2020 at 3:32 PM Kirill A. Shutemov
 wrote:
>
> On Wed, Sep 30, 2020 at 10:21:17PM +, Kalesh Singh wrote:
> > mremap time can be optimized by moving entries at the PMD/PUD level if
> > the source and destination addresses are PMD/PUD-aligned and
> > PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
> > x86. Other architectures where this type of move is supported and known to
> > be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
> > and HAVE_MOVE_PUD.
> >
> > Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
> > region on x86 and arm64:
> >
> > - HAVE_MOVE_PMD is already enabled on x86 : N/A
> > - Enabling HAVE_MOVE_PUD on x86   : ~13x speed up
> >
> > - Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
> > - Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
> >
> >   Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
> >   give a total of ~150x speed up on arm64.
>
> Is there a *real* workload that benefit from HAVE_MOVE_PUD?
>
We have a Java garbage collector under development which requires
moving physical pages of multi-gigabyte heap using mremap. During this
move, the application threads have to be paused for correctness. It is
critical to keep this pause as short as possible to avoid jitters
during user interaction. This is where HAVE_MOVE_PUD will greatly
help.
> --
>  Kirill A. Shutemov


[PATCH v4 1/2] Add UFFD_USER_MODE_ONLY

2020-09-24 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 6 +-
 include/uapi/linux/userfaultfd.h | 9 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..3191434057f3 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY)
+   goto out;
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1975,10 +1978,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v4 0/2] Control over userfaultfd kernel-fault handling

2020-09-24 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v3:

  - Modified the meaning of value '0' of unprivileged_userfaultfd
sysctl knob. Setting this knob to '0' now allows unprivileged users
to use userfaultfd, but can handle page faults in user-mode only.
  - The default value of unprivileged_userfaultfd sysctl knob is changed
to '0'.

Changes since v2:

  - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c| 12 +---
 include/uapi/linux/userfaultfd.h|  9 +
 3 files changed, 28 insertions(+), 8 deletions(-)

-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v4 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-09-24 Thread Lokesh Gidra
With this change, when the knob is set to 0, it allows unprivileged
users to call userfaultfd, like when it is set to 1, but with the
restriction that page faults from only user-mode can be handled.
In this mode, an unprivileged user (without SYS_CAP_PTRACE capability)
must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with
EPERM.

This enables administrators to reduce the likelihood that
an attacker with access to userfaultfd can delay faulting kernel
code to widen timing windows for other exploits.

The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy
live migration, which will be unnoticeable to the VM guests. To avoid
this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.gi9...@redhat.com/

Signed-off-by: Lokesh Gidra 
---
 Documentation/admin-guide/sysctl/vm.rst | 15 ++-
 fs/userfaultfd.c|  6 --
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index 4b9d2e8e9142..4263d38c3c21 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a 
zone.
 unprivileged_userfaultfd
 
 
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.
 
-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.
 
 
 user_reserve_kbytes
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3191434057f3..3816c11a986a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd __read_mostly;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -1972,7 +1972,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+   if (!sysctl_unprivileged_userfaultfd &&
+   (flags & UFFD_USER_MODE_ONLY) == 0 &&
+   !capable(CAP_SYS_PTRACE))
return -EPERM;
 
BUG_ON(!current->mm);
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v9 3/3] Wire UFFD up to SELinux

2020-09-23 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..918535b49475 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -978,14 +978,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -995,7 +995,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1106,7 +1106,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1166,6 +1166,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1173,7 +1174,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1995,8 +1996,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v9 2/3] Teach SELinux about anonymous inodes

2020-09-23 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patch to give SELinux the ability to control
anonymous-inode files that are created using the new anon_inode_getfd_secure()
function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
Cc: Al Viro 
Cc: Andrew Morton 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a340986aa92e..7b22c3112583 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_initialized(_state)))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6987,6 +7039,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v9 1/3] Add a new LSM-supporting anonymous inode interface

2020-09-23 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.

For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Fix comment documenting return values of inode_init_security_anon()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make _anon_inode_getfile() static]
[Use correct error cast in _anon_inode_getfile()]
[Fix error handling in _anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c  | 147 +-
 include/linux/anon_inodes.h   |   8 ++
 include/linux/lsm_hook_defs.h |   2 +
 include/linux/lsm_hooks.h |   9 +++
 include/linux/security.h  |  10 +++
 security/security.c   |   8 ++
 6 files changed, 144 insertions(+), 40 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..c3f16deda211 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *_anon_inode_getfile(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags,
+   const struct inode *context_inode,
+   bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+ 

[PATCH v9 0/3] SELinux support for anonymous inodes and UFFD

2020-09-23 Thread Lokesh Gidra
Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors and in the future, other kinds of
anonymous-inode-based file descriptor.  SELinux policy authors can
apply policy types to anonymous inodes by providing name-based
transition rules keyed off the anonymous inode internal name (
"[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
applying policy to the new SIDs thus produced.

With SELinux managed userfaultfd, an admin can control creation and
movement of the file descriptors. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the corresponding
security_ptrace_access_check() checks. For privacy, the admin may
want to deny such accesses, which is possible with SELinux support.

Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
allows callers to opt into this SELinux management. In this new "secure"
mode, anon_inodes create new ephemeral inodes for anonymous file objects
instead of reusing the normal anon_inodes singleton dummy inode. A new
LSM hook gives security modules an opportunity to configure and veto
these ephemeral inodes.

This patch series is one of two fork of [1] and is an
alternative to [2].

The primary difference between the two patch series is that this
partch series creates a unique inode for each "secure" anonymous
inode, while the other patch series ([2]) continues using the
singleton dummy anonymous inode and adds a way to attach SELinux
security information directly to file objects.

I prefer the approach in this patch series because 1) it's a smaller
patch than [2], and 2) it produces a more regular security
architecture: in this patch series, secure anonymous inodes aren't
S_PRIVATE and they maintain the SELinux property that the label for a
file is in its inode. We do need an additional inode per anonymous
file, but per-struct-file inode creation doesn't seem to be a problem
for pipes and sockets.

The previous version of this feature ([1]) created a new SELinux
security class for userfaultfd file descriptors. This version adopts
the generic transition-based approach of [2].

This patch series also differs from [2] in that it doesn't affect all
anonymous inodes right away --- instead requiring anon_inodes callers
to opt in --- but this difference isn't one of basic approach. The
important question to resolve is whether we should be creating new
inodes or enhancing per-file data.

Changes from the first version of the patch:

  - Removed some error checks
  - Defined a new anon_inode SELinux class to resolve the
ambiguity in [3]
  - Inherit sclass as well as descriptor from context inode

Changes from the second version of the patch:

  - Fixed example policy in the commit message to reflect the use of
the new anon_inode class.

Changes from the third version of the patch:

  - Dropped the fops parameter to the LSM hook
  - Documented hook parameters
  - Fixed incorrect class used for SELinux transition
  - Removed stray UFFD changed early in the series
  - Removed a redundant ERR_PTR(PTR_ERR())

Changes from the fourth version of the patch:

  - Removed an unused parameter from an internal function
  - Fixed function documentation

Changes from the fifth version of the patch:

  - Fixed function documentation in fs/anon_inodes.c and
include/linux/lsm_hooks.h
  - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
owner from userfaultfd_ctx.

Changes from the sixth version of the patch:

  - Removed definition of anon_inode_getfile_secure() as there are no
callers.
  - Simplified function description of anon_inode_getfd_secure().
  - Elaborated more on the purpose of 'context_inode' in commit message.

Changes from the seventh version of the patch:

  - Fixed error handling in _anon_inode_getfile().
  - Fixed minor comment and indentation related issues.

Changes from the eighth version of the patch:

  - Replaced selinux_state.initialized with selinux_state.initialized


[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  Add a new LSM-supporting anonymous inode interface
  Teach SELinux about anonymous inodes
  Wire UFFD up to SELinux

 fs/anon_inodes.c| 147 
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   8 ++
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  53 ++
 

Re: [PATCH v8 1/3] Add a new LSM-supporting anonymous inode interface

2020-09-07 Thread Lokesh Gidra
On Tue, Sep 1, 2020 at 5:41 AM Christian Brauner
 wrote:
>
> On Wed, Aug 26, 2020 at 11:35:20PM -0700, Lokesh Gidra wrote:
> > From: Daniel Colascione 
> >
> > This change adds a new function, anon_inode_getfd_secure, that creates
> > anonymous-node file with individual non-S_PRIVATE inode to which security
> > modules can apply policy. Existing callers continue using the original
> > singleton-inode kind of anonymous-inode file. We can transition anonymous
> > inode users to the new kind of anonymous inode in individual patches for
> > the sake of bisection and review.
> >
> > The new function accepts an optional context_inode parameter that
> > callers can use to provide additional contextual information to
> > security modules for granting/denying permission to create an anon inode
> > of the same type.
> >
> > For example, in case of userfaultfd, the created inode is a
> > 'logical child' of the context_inode (userfaultfd inode of the
> > parent process) in the sense that it provides the security context
> > required during creation of the child process' userfaultfd inode.
> >
> > Signed-off-by: Daniel Colascione 
> >
> > [Fix comment documenting return values of inode_init_security_anon()]
> > [Add context_inode description in comments to anon_inode_getfd_secure()]
> > [Remove definition of anon_inode_getfile_secure() as there are no callers]
> > [Make _anon_inode_getfile() static]
> > [Use correct error cast in _anon_inode_getfile()]
> > [Fix error handling in _anon_inode_getfile()]
> >
> > Signed-off-by: Lokesh Gidra 
> > ---
> >  fs/anon_inodes.c  | 147 +-
> >  include/linux/anon_inodes.h   |   8 ++
> >  include/linux/lsm_hook_defs.h |   2 +
> >  include/linux/lsm_hooks.h |   9 +++
> >  include/linux/security.h  |  10 +++
> >  security/security.c   |   8 ++
> >  6 files changed, 144 insertions(+), 40 deletions(-)
> >
> > diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> > index 89714308c25b..c3f16deda211 100644
> > --- a/fs/anon_inodes.c
> > +++ b/fs/anon_inodes.c
> > @@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
> >   .kill_sb= kill_anon_super,
> >  };
> >
> > -/**
> > - * anon_inode_getfile - creates a new file instance by hooking it up to an
> > - *  anonymous inode, and a dentry that describe the 
> > "class"
> > - *  of the file
> > - *
> > - * @name:[in]name of the "class" of the new file
> > - * @fops:[in]file operations for the new file
> > - * @priv:[in]private data for the new file (will be file's 
> > private_data)
> > - * @flags:   [in]flags
> > - *
> > - * Creates a new file by hooking it on a single inode. This is useful for 
> > files
> > - * that do not need to have a full-fledged inode in order to operate 
> > correctly.
> > - * All the files created with anon_inode_getfile() will share a single 
> > inode,
> > - * hence saving memory and avoiding code duplication for the 
> > file/inode/dentry
> > - * setup.  Returns the newly created file* or an error pointer.
> > - */
> > -struct file *anon_inode_getfile(const char *name,
> > - const struct file_operations *fops,
> > - void *priv, int flags)
> > +static struct inode *anon_inode_make_secure_inode(
> > + const char *name,
> > + const struct inode *context_inode)
> >  {
> > - struct file *file;
> > + struct inode *inode;
> > + const struct qstr qname = QSTR_INIT(name, strlen(name));
> > + int error;
> > +
> > + inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
> > + if (IS_ERR(inode))
> > + return inode;
> > + inode->i_flags &= ~S_PRIVATE;
> > + error = security_inode_init_security_anon(inode, , 
> > context_inode);
> > + if (error) {
> > + iput(inode);
> > + return ERR_PTR(error);
> > + }
> > + return inode;
> > +}
>
> Hey,
>
> Iiuc, this makes each newly created anon inode fd correspond to a unique
> file and to a unique inode:
>
> fd1 -> file1 -> inode1
> fd2 -> file2 -> inode2
>
Not every anon inode. Just the ones created through
anon_inode_getfd_secure() API.

> Whereas before we had every anon inode fd correspond to a unique file
> but all files map to the _same_ inode:
>
> fd1 -> fi

Re: [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only

2020-09-04 Thread Lokesh Gidra
On Thu, Sep 3, 2020 at 8:34 PM Andrea Arcangeli  wrote:
>
> Hello,
>
> On Mon, Aug 17, 2020 at 03:11:16PM -0700, Lokesh Gidra wrote:
> > There has been an emphasis that Android is probably the only user for
> > the restriction of userfaults from kernel-space and that it wouldn’t
> > be useful anywhere else. I humbly disagree! There are various areas
> > where the PROT_NONE+SIGSEGV trick is (and can be) used in a purely
> > user-space setting. Basically, any lazy, on-demand,
>
> For the record what I said is quoted below
> https://lkml.kernel.org/r/20200520194804.gj26...@redhat.com :
>
> """It all boils down of how peculiar it is to be able to leverage only
> the acceleration [..] Right now there's a single user that can cope
> with that limitation [..] If there will be more users [..]  it'd be
> fine to add a value "2" later."""
>
> Specifically I never said "that it wouldn’t be useful anywhere else.".
>
Thanks a lot for clarifying.

> Also I'm only arguing about the sysctl visible kABI change in patch
> 2/2: the flag passed as parameter to the syscall in patch 1/2 is all
> great, because seccomp needs it in the scalar parameter of the syscall
> to implement a filter equivalent to your sysctl "2" policy with only
> patch 1/2 applied.
>
> I've two more questions now:
>
> 1) why don't you enforce the block of kernel initiated faults with
>seccomp-bpf instead of adding a sysctl value 2? Is the sysctl just
>an optimization to remove a few instructions per syscall in the bpf
>execution of Android unprivileged apps? You should block a lot of
>other syscalls by default to all unprivileged processes, including
>vmsplice.
>
>In other words if it's just for Android, why can't Android solve it
>with only patch 1/2 by tweaking the seccomp filter?

I would let Nick (nnk@) and Jeff (jeffv@) respond to this.

The previous responses from both of them on this email thread
(https://lore.kernel.org/lkml/CABXk95A-E4NYqA5qVrPgDF18YW-z4_udzLwa0cdo2OfqVsy=s...@mail.gmail.com/
and 
https://lore.kernel.org/lkml/CAFJ0LnGfrzvVgtyZQ+UqRM6F3M7iXOhTkUBTc+9sV+=rrfn...@mail.gmail.com/)
suggest that the performance overhead of seccomp-bpf is too much. Kees
also objected to it
(https://lore.kernel.org/lkml/202005200921.2BD5A0ADD@keescook/)

I'm not familiar with how seccomp-bpf works. All that I can add here
is that userfaultfd syscall is usually not invoked in a performance
critical code path. So, if the performance overhead of seccomp-bpf (if
enabled) is observed on all syscalls originating from a process, then
I'd say patch 2/2 is essential. Otherwise, it should be ok to let
seccomp perform the same functionality instead.

>
> 2) given that Android is secure enough with the sysctl at value 2, why
>should we even retain the current sysctl 0 semantics? Why can't
>more secure systems just use seccomp and block userfaultfd, as it
>is already happens by default in the podman default seccomp
>whitelist (for those containers that don't define a new json
>whitelist in the OCI schema)? Shouldn't we focus our energy in
>making containers more secure by preventing the OCI schema of a
>random container to re-enable userfaultfd in the container seccomp
>filter instead of trying to solve this with a global sysctl?
>
>What's missing in my view is a kubernetes hard allowlist/denylist
>that cannot be overridden with the OCI schema in case people has
>the bad idea of running containers downloaded from a not fully
>trusted source, without adding virt isolation and that's an
>userland problem to be solved in the container runtime, not a
>kernel issue. Then you'd just add userfaultfd to the json of the
>k8s hard seccomp denylist instead of going around tweaking sysctl.
>
> What's your take in changing your 2/2 patch to just replace value "0"
> and avoid introducing a new value "2"?

SGTM. Disabling uffd completely for unprivileged processes can be
achieved either using seccomp-bpf, or via SELinux, once the following
patch series is upstreamed
https://lore.kernel.org/lkml/20200827063522.2563293-1-lokeshgi...@google.com/

>
> The value "0" was motivated by the concern that uffd can enlarge the
> race window for use after free by providing one more additional way to
> block kernel faults, but value "2" is already enough to solve that
> concern completely and it'll be the default on all Android.
>
> In other words by adding "2" you're effectively doing a more
> finegrined and more optimal implementation of "0" that remains useful
> and available to unprivileged apps and it already resolves all
> "robustness against side effects other k

Re: [PATCH v8 2/3] Teach SELinux about anonymous inodes

2020-08-31 Thread Lokesh Gidra
On Mon, Aug 31, 2020 at 11:05 AM Stephen Smalley
 wrote:
>
> On Thu, Aug 27, 2020 at 2:35 AM Lokesh Gidra  wrote:
> >
> > From: Daniel Colascione 
> >
> > This change uses the anon_inodes and LSM infrastructure introduced in
> > the previous patch to give SELinux the ability to control
> > anonymous-inode files that are created using the new 
> > anon_inode_getfd_secure()
> > function.
> >
> > A SELinux policy author detects and controls these anonymous inodes by
> > adding a name-based type_transition rule that assigns a new security
> > type to anonymous-inode files created in some domain. The name used
> > for the name-based transition is the name associated with the
> > anonymous inode for file listings --- e.g., "[userfaultfd]" or
> > "[perf_event]".
> >
> > Example:
> >
> > type uffd_t;
> > type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
> > allow sysadm_t uffd_t:anon_inode { create };
> >
> > (The next patch in this series is necessary for making userfaultfd
> > support this new interface.  The example above is just
> > for exposition.)
> >
> > Signed-off-by: Daniel Colascione 
> > Acked-by: Casey Schaufler 
> > Acked-by: Stephen Smalley 
> > Cc: Al Viro 
> > Cc: Andrew Morton 
> > ---
> >  security/selinux/hooks.c| 53 +
> >  security/selinux/include/classmap.h |  2 ++
> >  2 files changed, 55 insertions(+)
> >
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index a340986aa92e..b83f56e5ef40 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode 
> > *inode, struct inode *dir,
> > return 0;
> >  }
> >
> > +static int selinux_inode_init_security_anon(struct inode *inode,
> > +   const struct qstr *name,
> > +   const struct inode 
> > *context_inode)
> > +{
> > +   const struct task_security_struct *tsec = 
> > selinux_cred(current_cred());
> > +   struct common_audit_data ad;
> > +   struct inode_security_struct *isec;
> > +   int rc;
> > +
> > +   if (unlikely(!selinux_state.initialized))
>
> This should use selinux_initialized(_state) instead.

Thanks for the review. I'll make the change in the next version.

Kindly have a look at other patches in the series as well.


[PATCH v8 3/3] Wire UFFD up to SELinux

2020-08-27 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..918535b49475 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -978,14 +978,14 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode);
if (fd < 0)
return fd;
 
@@ -995,7 +995,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1106,7 +1106,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1166,6 +1166,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1173,7 +1174,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1995,8 +1996,8 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]", _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v8 0/3] SELinux support for anonymous inodes and UFFD

2020-08-27 Thread Lokesh Gidra
Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors and in the future, other kinds of
anonymous-inode-based file descriptor.  SELinux policy authors can
apply policy types to anonymous inodes by providing name-based
transition rules keyed off the anonymous inode internal name (
"[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
applying policy to the new SIDs thus produced.

With SELinux managed userfaultfd, an admin can control creation and
movement of the file descriptors. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the corresponding
security_ptrace_access_check() checks. For privacy, the admin may
want to deny such accesses, which is possible with SELinux support.

Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
allows callers to opt into this SELinux management. In this new "secure"
mode, anon_inodes create new ephemeral inodes for anonymous file objects
instead of reusing the normal anon_inodes singleton dummy inode. A new
LSM hook gives security modules an opportunity to configure and veto
these ephemeral inodes.

This patch series is one of two fork of [1] and is an
alternative to [2].

The primary difference between the two patch series is that this
partch series creates a unique inode for each "secure" anonymous
inode, while the other patch series ([2]) continues using the
singleton dummy anonymous inode and adds a way to attach SELinux
security information directly to file objects.

I prefer the approach in this patch series because 1) it's a smaller
patch than [2], and 2) it produces a more regular security
architecture: in this patch series, secure anonymous inodes aren't
S_PRIVATE and they maintain the SELinux property that the label for a
file is in its inode. We do need an additional inode per anonymous
file, but per-struct-file inode creation doesn't seem to be a problem
for pipes and sockets.

The previous version of this feature ([1]) created a new SELinux
security class for userfaultfd file descriptors. This version adopts
the generic transition-based approach of [2].

This patch series also differs from [2] in that it doesn't affect all
anonymous inodes right away --- instead requiring anon_inodes callers
to opt in --- but this difference isn't one of basic approach. The
important question to resolve is whether we should be creating new
inodes or enhancing per-file data.

Changes from the first version of the patch:

  - Removed some error checks
  - Defined a new anon_inode SELinux class to resolve the
ambiguity in [3]
  - Inherit sclass as well as descriptor from context inode

Changes from the second version of the patch:

  - Fixed example policy in the commit message to reflect the use of
the new anon_inode class.

Changes from the third version of the patch:

  - Dropped the fops parameter to the LSM hook
  - Documented hook parameters
  - Fixed incorrect class used for SELinux transition
  - Removed stray UFFD changed early in the series
  - Removed a redundant ERR_PTR(PTR_ERR())

Changes from the fourth version of the patch:

  - Removed an unused parameter from an internal function
  - Fixed function documentation

Changes from the fifth version of the patch:

  - Fixed function documentation in fs/anon_inodes.c and
include/linux/lsm_hooks.h
  - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
owner from userfaultfd_ctx.

Changed from the sixth version of the patch:

  - Removed definition of anon_inode_getfile_secure() as there are no
callers.
  - Simplified function description of anon_inode_getfd_secure().
  - Elaborated more on the purpose of 'context_inode' in commit message.

Changed from the seventh version of the patch:

  - Fixed error handling in _anon_inode_getfile().
  - Fixed minor comment and indentation related issues.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  Add a new LSM-supporting anonymous inode interface
  Teach SELinux about anonymous inodes
  Wire UFFD up to SELinux

 fs/anon_inodes.c| 147 
 fs/userfaultfd.c|  19 ++--
 include/linux/anon_inodes.h |   8 ++
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   9 ++
 include/linux/security.h|  10 ++
 security/security.c |   8 ++
 security/selinux/hooks.c|  53 ++
 security/selinux/include/classmap.h |   2 +
 9 files changed, 209 insertions(+), 49 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v8 1/3] Add a new LSM-supporting anonymous inode interface

2020-08-27 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.

For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Fix comment documenting return values of inode_init_security_anon()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make _anon_inode_getfile() static]
[Use correct error cast in _anon_inode_getfile()]
[Fix error handling in _anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c  | 147 +-
 include/linux/anon_inodes.h   |   8 ++
 include/linux/lsm_hook_defs.h |   2 +
 include/linux/lsm_hooks.h |   9 +++
 include/linux/security.h  |  10 +++
 security/security.c   |   8 ++
 6 files changed, 144 insertions(+), 40 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..c3f16deda211 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,79 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
 {
-   struct file *file;
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+static struct file *_anon_inode_getfile(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags,
+   const struct inode *context_inode,
+   bool secure)
+{
+   struct inode *inode;
+   struct file *file;
 
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+   if (secure) {
+   inode = anon_inode_make_secure_inode(name, context_inode);
+   if (IS_ERR(inode)) {
+   file = ERR_CAST(inode);
+   goto err;
+   }
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENODEV);
+   goto err;
+   }
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+ 

[PATCH v8 2/3] Teach SELinux about anonymous inodes

2020-08-27 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patch to give SELinux the ability to control
anonymous-inode files that are created using the new anon_inode_getfd_secure()
function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Acked-by: Casey Schaufler 
Acked-by: Stephen Smalley 
Cc: Al Viro 
Cc: Andrew Morton 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a340986aa92e..b83f56e5ef40 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_state.initialized))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6987,6 +7039,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v3 0/2] Control over userfaultfd kernel-fault handling

2020-08-25 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v2:

  - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in
userfaultfd().

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 10 +++---
 fs/userfaultfd.c| 16 +---
 include/uapi/linux/userfaultfd.h|  9 +
 kernel/sysctl.c |  2 +-
 4 files changed, 30 insertions(+), 7 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v3 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-08-25 Thread Lokesh Gidra
A third option is added to 'unprivileged_userfaultfd' sysctl knob.
When the knob is set to 2, it allows unprivileged users to call
userfaultfd, like when it is set to 1, but with the restriction that
page faults from only user-mode can be handled. In this mode,
an unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This facility allows administrators to reduce the likelihood that
an attacker with access to userfaultfd can delay faulting kernel
code to widen timing windows for other exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 Documentation/admin-guide/sysctl/vm.rst | 10 +++---
 fs/userfaultfd.c| 10 --
 kernel/sysctl.c |  2 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index 4b9d2e8e9142..23d6feb79f5c 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -872,9 +872,13 @@ unprivileged_userfaultfd
 
 
 This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+system calls.  Set this to 0 to restrict userfaultfd to only privileged
+users (with SYS_CAP_PTRACE capability), set this to 1 to allow unprivileged
+users to use the userfaultfd system calls, or set this to 2 to restrict
+unprivileged users to handle page faults in user mode only. In the last case,
+users without SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for
+userfaultfd to succeed.  Prohibiting use of userfaultfd for handling faults
+from kernel mode may make certain vulnerabilities more difficult to exploit.
 
 The default value is 1.
 
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3191434057f3..26cb87cf492d 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1972,8 +1972,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
-   return -EPERM;
+   switch (sysctl_unprivileged_userfaultfd) {
+   case 2:
+   if (flags & UFFD_USER_MODE_ONLY)
+   break;
+   case 0:
+   if (!capable(CAP_SYS_PTRACE))
+   return -EPERM;
+   }
 
BUG_ON(!current->mm);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 287862f91717..7e94215dfff5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3119,7 +3119,7 @@ static struct ctl_table vm_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
-   .extra2 = SYSCTL_ONE,
+   .extra2 = ,
},
 #endif
{ }
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v3 1/2] Add UFFD_USER_MODE_ONLY

2020-08-25 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 6 +-
 include/uapi/linux/userfaultfd.h | 9 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..3191434057f3 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY)
+   goto out;
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1975,10 +1978,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.28.0.297.g1956fa8f8d-goog



Re: [PATCH v7 1/3] Add a new LSM-supporting anonymous inode interface

2020-08-25 Thread Lokesh Gidra
On Mon, Aug 24, 2020 at 8:50 PM Eric Biggers  wrote:
>
> On Fri, Aug 21, 2020 at 11:56:43AM -0700, Lokesh Gidra wrote:
> > From: Daniel Colascione 
> >
> > This change adds a new function, anon_inode_getfd_secure, that creates
> > anonymous-node file with individual non-S_PRIVATE inode to which security
> > modules can apply policy. Existing callers continue using the original
> > singleton-inode kind of anonymous-inode file. We can transition anonymous
> > inode users to the new kind of anonymous inode in individual patches for
> > the sake of bisection and review.
> >
> > The new function accepts an optional context_inode parameter that
> > callers can use to provide additional contextual information to
> > security modules for granting/denying permission to create an anon inode
> > of the same type.
> >
> > For example, in case of userfaultfd, the created inode is a
> > 'logical child' of the context_inode (userfaultfd inode of the
> > parent process) in the sense that it provides the security context
> > required during creation of the child process' userfaultfd inode.
> >
> > Signed-off-by: Daniel Colascione 
> >
> > [Fix comment documenting return values of inode_init_security_anon()]
> > [Add context_inode description in comments to anon_inode_getfd_secure()]
> > [Remove definition of anon_inode_getfile_secure() as there are no callers]
> > [Make _anon_inode_getfile() static]
> > [Use correct error cast in _anon_inode_getfile()]
> >
> > Signed-off-by: Lokesh Gidra 
> > ---
> >  fs/anon_inodes.c  | 148 --
> >  include/linux/anon_inodes.h   |  13 +++
> >  include/linux/lsm_hook_defs.h |   2 +
> >  include/linux/lsm_hooks.h |   7 ++
> >  include/linux/security.h  |   3 +
> >  security/security.c   |   9 +++
> >  6 files changed, 141 insertions(+), 41 deletions(-)
> >
> > diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> > index 89714308c25b..2aa8b57be895 100644
> > --- a/fs/anon_inodes.c
> > +++ b/fs/anon_inodes.c
> > @@ -55,61 +55,78 @@ static struct file_system_type anon_inode_fs_type = {
> >   .kill_sb= kill_anon_super,
> >  };
> >
> > -/**
> > - * anon_inode_getfile - creates a new file instance by hooking it up to an
> > - *  anonymous inode, and a dentry that describe the 
> > "class"
> > - *  of the file
> > - *
> > - * @name:[in]name of the "class" of the new file
> > - * @fops:[in]file operations for the new file
> > - * @priv:[in]private data for the new file (will be file's 
> > private_data)
> > - * @flags:   [in]flags
> > - *
> > - * Creates a new file by hooking it on a single inode. This is useful for 
> > files
> > - * that do not need to have a full-fledged inode in order to operate 
> > correctly.
> > - * All the files created with anon_inode_getfile() will share a single 
> > inode,
> > - * hence saving memory and avoiding code duplication for the 
> > file/inode/dentry
> > - * setup.  Returns the newly created file* or an error pointer.
> > - */
> > -struct file *anon_inode_getfile(const char *name,
> > - const struct file_operations *fops,
> > - void *priv, int flags)
> > +static struct inode *anon_inode_make_secure_inode(
> > + const char *name,
> > + const struct inode *context_inode)
> > +{
> > + struct inode *inode;
> > + const struct qstr qname = QSTR_INIT(name, strlen(name));
> > + int error;
> > +
> > + inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
> > + if (IS_ERR(inode))
> > + return inode;
> > + inode->i_flags &= ~S_PRIVATE;
> > + error = security_inode_init_security_anon(
> > + inode, , context_inode);
>
> Weird indentation here.  The call to security_inode_init_security_anon() fits 
> on
> one line.
>
> > + if (error) {
> > + iput(inode);
> > + return ERR_PTR(error);
> > + }
> > + return inode;
> > +}
> > +
> > +static struct file *_anon_inode_getfile(const char *name,
> > + const struct file_operations *fops,
> > + void *priv, int flags,
> > + const struct inode *context_inode,
> > + bool secure)
> >  {
> &g

Re: [PATCH v2 1/2] Add UFFD_USER_MODE_ONLY

2020-08-25 Thread Lokesh Gidra
On Mon, Aug 24, 2020 at 5:32 AM Sebastian Andrzej Siewior
 wrote:
>
> On 2020-08-21 18:40:17 [-0700], Lokesh Gidra wrote:
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1966,6 +1969,7 @@ static void init_once_userfaultfd_ctx(void *mem)
> >
> >  SYSCALL_DEFINE1(userfaultfd, int, flags)
> >  {
> > + static const int uffd_flags = UFFD_USER_MODE_ONLY;
> >   struct userfaultfd_ctx *ctx;
> >   int fd;
> Why?

Not sure! I guess Daniel didn't want to repeat the long flag name twice.

Thanks for catching that. I'll send another version fixing this.
>
> Sebastian


[PATCH v2 0/2] Control over userfaultfd kernel-fault handling

2020-08-21 Thread Lokesh Gidra
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and
movement can be controlled.

It has been demonstrated on various occasions that suspending kernel
code execution for an arbitrary amount of time at any access to
userspace memory (copy_from_user()/copy_to_user()/...) can be exploited
to change the intended behavior of the kernel. For instance, handling
page faults in kernel-mode using userfaultfd has been exploited in [2, 3].
Likewise, FUSE, which is similar to userfaultfd in this respect, has been
exploited in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to
the unprivileged_userfaultfd sysctl knob to require unprivileged
callers to use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to
enhance security vulnerabilities by lengthening the race window in
kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] 
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

Changes since v1:

  - Added external references to the threats from allowing unprivileged
users to handle page faults from kernel-mode.
  - Removed the new sysctl knob restricting handling of page
faults from kernel-mode, and added an option for the same
in the existing 'unprivileged_userfaultfd' knob.

Lokesh Gidra (2):
  Add UFFD_USER_MODE_ONLY
  Add user-mode only option to unprivileged_userfaultfd sysctl knob

 Documentation/admin-guide/sysctl/vm.rst | 10 +++---
 fs/userfaultfd.c| 17 ++---
 include/uapi/linux/userfaultfd.h|  9 +
 kernel/sysctl.c |  2 +-
 4 files changed, 31 insertions(+), 7 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v2 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

2020-08-21 Thread Lokesh Gidra
A third option is added to 'unprivileged_userfaultfd' sysctl knob.
When the knob is set to 2, it allows unprivileged users to call
userfaultfd, like when it is set to 1, but with the restriction that
page faults from only user-mode can be handled. In this mode,
an unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This facility allows administrators to reduce the likelihood that
an attacker with access to userfaultfd can delay faulting kernel
code to widen timing windows for other exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 Documentation/admin-guide/sysctl/vm.rst | 10 +++---
 fs/userfaultfd.c| 10 --
 kernel/sysctl.c |  2 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst 
b/Documentation/admin-guide/sysctl/vm.rst
index 4b9d2e8e9142..23d6feb79f5c 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -872,9 +872,13 @@ unprivileged_userfaultfd
 
 
 This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+system calls.  Set this to 0 to restrict userfaultfd to only privileged
+users (with SYS_CAP_PTRACE capability), set this to 1 to allow unprivileged
+users to use the userfaultfd system calls, or set this to 2 to restrict
+unprivileged users to handle page faults in user mode only. In the last case,
+users without SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for
+userfaultfd to succeed.  Prohibiting use of userfaultfd for handling faults
+from kernel mode may make certain vulnerabilities more difficult to exploit.
 
 The default value is 1.
 
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3e4ae6145112..2fcdeb28c960 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1973,8 +1973,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;
 
-   if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
-   return -EPERM;
+   switch (sysctl_unprivileged_userfaultfd) {
+   case 2:
+   if (flags & UFFD_USER_MODE_ONLY)
+   break;
+   case 0:
+   if (!capable(CAP_SYS_PTRACE))
+   return -EPERM;
+   }
 
BUG_ON(!current->mm);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 287862f91717..7e94215dfff5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3119,7 +3119,7 @@ static struct ctl_table vm_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
-   .extra2 = SYSCTL_ONE,
+   .extra2 = ,
},
 #endif
{ }
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v2 1/2] Add UFFD_USER_MODE_ONLY

2020-08-21 Thread Lokesh Gidra
userfaultfd handles page faults from both user and kernel code.
Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes
the resulting userfaultfd object refuse to handle faults from kernel
mode, treating these faults as if SIGBUS were always raised, causing
the kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione 
Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 7 ++-
 include/uapi/linux/userfaultfd.h | 9 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..3e4ae6145112 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -405,6 +405,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
+   if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+   ctx->flags & UFFD_USER_MODE_ONLY)
+   goto out;
 
/*
 * If it's already released don't get it. This avoids to loop
@@ -1966,6 +1969,7 @@ static void init_once_userfaultfd_ctx(void *mem)
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
 {
+   static const int uffd_flags = UFFD_USER_MODE_ONLY;
struct userfaultfd_ctx *ctx;
int fd;
 
@@ -1975,10 +1979,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
BUG_ON(!current->mm);
 
/* Check the UFFD_* constants for consistency.  */
+   BUILD_BUG_ON(uffd_flags & UFFD_SHARED_FCNTL_FLAGS);
BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-   if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+   if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags))
return -EINVAL;
 
ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@ struct uffdio_writeprotect {
__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.28.0.297.g1956fa8f8d-goog



Re: [PATCH v7 1/3] Add a new LSM-supporting anonymous inode interface

2020-08-21 Thread Lokesh Gidra
On Fri, Aug 21, 2020 at 11:57 AM Lokesh Gidra  wrote:
>
> From: Daniel Colascione 
>
> This change adds a new function, anon_inode_getfd_secure, that creates
> anonymous-node file with individual non-S_PRIVATE inode to which security
> modules can apply policy. Existing callers continue using the original
> singleton-inode kind of anonymous-inode file. We can transition anonymous
> inode users to the new kind of anonymous inode in individual patches for
> the sake of bisection and review.
>
> The new function accepts an optional context_inode parameter that
> callers can use to provide additional contextual information to
> security modules for granting/denying permission to create an anon inode
> of the same type.
>
> For example, in case of userfaultfd, the created inode is a
> 'logical child' of the context_inode (userfaultfd inode of the
> parent process) in the sense that it provides the security context
> required during creation of the child process' userfaultfd inode.
>
> Signed-off-by: Daniel Colascione 
>
> [Fix comment documenting return values of inode_init_security_anon()]
> [Add context_inode description in comments to anon_inode_getfd_secure()]
> [Remove definition of anon_inode_getfile_secure() as there are no callers]
> [Make _anon_inode_getfile() static]
> [Use correct error cast in _anon_inode_getfile()]
>
> Signed-off-by: Lokesh Gidra 
> ---
>  fs/anon_inodes.c  | 148 --
>  include/linux/anon_inodes.h   |  13 +++
>  include/linux/lsm_hook_defs.h |   2 +
>  include/linux/lsm_hooks.h |   7 ++
>  include/linux/security.h  |   3 +
>  security/security.c   |   9 +++
>  6 files changed, 141 insertions(+), 41 deletions(-)
>
> diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> index 89714308c25b..2aa8b57be895 100644
> --- a/fs/anon_inodes.c
> +++ b/fs/anon_inodes.c
> @@ -55,61 +55,78 @@ static struct file_system_type anon_inode_fs_type = {
> .kill_sb= kill_anon_super,
>  };
>
> -/**
> - * anon_inode_getfile - creates a new file instance by hooking it up to an
> - *  anonymous inode, and a dentry that describe the 
> "class"
> - *  of the file
> - *
> - * @name:[in]name of the "class" of the new file
> - * @fops:[in]file operations for the new file
> - * @priv:[in]private data for the new file (will be file's 
> private_data)
> - * @flags:   [in]flags
> - *
> - * Creates a new file by hooking it on a single inode. This is useful for 
> files
> - * that do not need to have a full-fledged inode in order to operate 
> correctly.
> - * All the files created with anon_inode_getfile() will share a single inode,
> - * hence saving memory and avoiding code duplication for the 
> file/inode/dentry
> - * setup.  Returns the newly created file* or an error pointer.
> - */
> -struct file *anon_inode_getfile(const char *name,
> -   const struct file_operations *fops,
> -   void *priv, int flags)
> +static struct inode *anon_inode_make_secure_inode(
> +   const char *name,
> +   const struct inode *context_inode)
> +{
> +   struct inode *inode;
> +   const struct qstr qname = QSTR_INIT(name, strlen(name));
> +   int error;
> +
> +   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
> +   if (IS_ERR(inode))
> +   return inode;
> +   inode->i_flags &= ~S_PRIVATE;
> +   error = security_inode_init_security_anon(
> +   inode, , context_inode);
> +   if (error) {
> +   iput(inode);
> +   return ERR_PTR(error);
> +   }
> +   return inode;
> +}
> +
> +static struct file *_anon_inode_getfile(const char *name,
> +   const struct file_operations *fops,
> +   void *priv, int flags,
> +   const struct inode *context_inode,
> +   bool secure)
>  {
> +   struct inode *inode;
> struct file *file;
>
> -   if (IS_ERR(anon_inode_inode))
> -   return ERR_PTR(-ENODEV);
> +   if (secure) {
> +   inode = anon_inode_make_secure_inode(
> +   name, context_inode);
> +   if (IS_ERR(inode))
> +   return ERR_CAST(inode);
> +   } else {
> +   inode = anon_inode_inode;
> +   if (IS_ERR(inode))
> +   return ERR_PTR(-ENODEV);
> +   /*
> +* We know the anon_inode inode count i

[PATCH v7 2/3] Teach SELinux about anonymous inodes

2020-08-21 Thread Lokesh Gidra
From: Daniel Colascione 

This change uses the anon_inodes and LSM infrastructure introduced in
the previous patch to give SELinux the ability to control
anonymous-inode files that are created using the new anon_inode_getfd_secure()
function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione 
Acked-by: Casey Schaufler 
Acked-by: Stephen Smalley 
Cc: Al Viro 
Cc: Andrew Morton 
---
 security/selinux/hooks.c| 53 +
 security/selinux/include/classmap.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ca901025802a..5b403ad44aad 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode 
*inode, struct inode *dir,
return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+   const struct qstr *name,
+   const struct inode *context_inode)
+{
+   const struct task_security_struct *tsec = selinux_cred(current_cred());
+   struct common_audit_data ad;
+   struct inode_security_struct *isec;
+   int rc;
+
+   if (unlikely(!selinux_state.initialized))
+   return 0;
+
+   isec = selinux_inode(inode);
+
+   /*
+* We only get here once per ephemeral inode.  The inode has
+* been initialized via inode_alloc_security but is otherwise
+* untouched.
+*/
+
+   if (context_inode) {
+   struct inode_security_struct *context_isec =
+   selinux_inode(context_inode);
+   isec->sclass = context_isec->sclass;
+   isec->sid = context_isec->sid;
+   } else {
+   isec->sclass = SECCLASS_ANON_INODE;
+   rc = security_transition_sid(
+   _state, tsec->sid, tsec->sid,
+   isec->sclass, name, >sid);
+   if (rc)
+   return rc;
+   }
+
+   isec->initialized = LABEL_INITIALIZED;
+
+   /*
+* Now that we've initialized security, check whether we're
+* allowed to actually create this type of anonymous inode.
+*/
+
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+
+   return avc_has_perm(_state,
+   tsec->sid,
+   isec->sid,
+   isec->sclass,
+   FILE__CREATE,
+   );
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, 
umode_t mode)
 {
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6993,6 +7045,7 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
 
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+   LSM_HOOK_INIT(inode_init_security_anon, 
selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
  {"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
  { "integrity", "confidentiality", NULL } },
+   { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
   };
 
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v7 3/3] Wire UFFD up to SELinux

2020-08-21 Thread Lokesh Gidra
From: Daniel Colascione 

This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.

Signed-off-by: Daniel Colascione 

[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
 in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]

Signed-off-by: Lokesh Gidra 
---
 fs/userfaultfd.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..46ea552fe7c4 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -978,14 +978,16 @@ static __poll_t userfaultfd_poll(struct file *file, 
poll_table *wait)
 
 static const struct file_operations userfaultfd_fops;
 
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
  struct uffd_msg *msg)
 {
int fd;
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure(
+   "[userfaultfd]", _fops, new,
+   O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS),
+   inode);
if (fd < 0)
return fd;
 
@@ -995,7 +997,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx 
*ctx,
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-   struct uffd_msg *msg)
+   struct uffd_msg *msg, struct inode *inode)
 {
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1106,7 +1108,7 @@ static ssize_t userfaultfd_ctx_read(struct 
userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(>fd_wqh.lock);
 
if (!ret && msg->event == UFFD_EVENT_FORK) {
-   ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+   ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(>event_wqh.lock);
if (!list_empty(_event)) {
/*
@@ -1166,6 +1168,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+   struct inode *inode = file_inode(file);
 
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1173,7 +1176,7 @@ static ssize_t userfaultfd_read(struct file *file, char 
__user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
-   _ret = userfaultfd_ctx_read(ctx, no_wait, );
+   _ret = userfaultfd_ctx_read(ctx, no_wait, , inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, , sizeof(msg)))
@@ -1995,8 +1998,10 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
 
-   fd = anon_inode_getfd("[userfaultfd]", _fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+   fd = anon_inode_getfd_secure("[userfaultfd]",
+   _fops, ctx,
+   O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS),
+   NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v7 0/3] SELinux support for anonymous inodes and UFFD

2020-08-21 Thread Lokesh Gidra
Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors and in the future, other kinds of
anonymous-inode-based file descriptor.  SELinux policy authors can
apply policy types to anonymous inodes by providing name-based
transition rules keyed off the anonymous inode internal name (
"[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
applying policy to the new SIDs thus produced.

With SELinux managed userfaultfd, an admin can control creation and
movement of the file descriptors. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the corresponding
security_ptrace_access_check() checks. For privacy, the admin may
want to deny such accesses, which is possible with SELinux support.

Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
allows callers to opt into this SELinux management. In this new "secure"
mode, anon_inodes create new ephemeral inodes for anonymous file objects
instead of reusing the normal anon_inodes singleton dummy inode. A new
LSM hook gives security modules an opportunity to configure and veto
these ephemeral inodes.

This patch series is one of two fork of [1] and is an
alternative to [2].

The primary difference between the two patch series is that this
partch series creates a unique inode for each "secure" anonymous
inode, while the other patch series ([2]) continues using the
singleton dummy anonymous inode and adds a way to attach SELinux
security information directly to file objects.

I prefer the approach in this patch series because 1) it's a smaller
patch than [2], and 2) it produces a more regular security
architecture: in this patch series, secure anonymous inodes aren't
S_PRIVATE and they maintain the SELinux property that the label for a
file is in its inode. We do need an additional inode per anonymous
file, but per-struct-file inode creation doesn't seem to be a problem
for pipes and sockets.

The previous version of this feature ([1]) created a new SELinux
security class for userfaultfd file descriptors. This version adopts
the generic transition-based approach of [2].

This patch series also differs from [2] in that it doesn't affect all
anonymous inodes right away --- instead requiring anon_inodes callers
to opt in --- but this difference isn't one of basic approach. The
important question to resolve is whether we should be creating new
inodes or enhancing per-file data.

Changes from the first version of the patch:

  - Removed some error checks
  - Defined a new anon_inode SELinux class to resolve the
ambiguity in [3]
  - Inherit sclass as well as descriptor from context inode

Changes from the second version of the patch:

  - Fixed example policy in the commit message to reflect the use of
the new anon_inode class.

Changes from the third version of the patch:

  - Dropped the fops parameter to the LSM hook
  - Documented hook parameters
  - Fixed incorrect class used for SELinux transition
  - Removed stray UFFD changed early in the series
  - Removed a redundant ERR_PTR(PTR_ERR())

Changes from the fourth version of the patch:

  - Removed an unused parameter from an internal function
  - Fixed function documentation

Changes from the fifth version of the patch:

  - Fixed function documentation in fs/anon_inodes.c and
include/linux/lsm_hooks.h
  - Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
owner from userfaultfd_ctx.

Changed from the sixth version of the patch:

  - Removed definition of anon_inode_getfile_secure() as there are no
callers.
  - Simplified function description of anon_inode_getfd_secure().
  - Elaborated more on the purpose of 'context_inode' in commit message.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dan...@google.com/
[2] 
https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-...@tycho.nsa.gov/
[3] 
https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba...@tycho.nsa.gov/

Daniel Colascione (3):
  Add a new LSM-supporting anonymous inode interface
  Teach SELinux about anonymous inodes
  Wire UFFD up to SELinux

 fs/anon_inodes.c| 148 
 fs/userfaultfd.c|  23 +++--
 include/linux/anon_inodes.h |  13 +++
 include/linux/lsm_hook_defs.h   |   2 +
 include/linux/lsm_hooks.h   |   7 ++
 include/linux/security.h|   3 +
 security/security.c |   9 ++
 security/selinux/hooks.c|  53 ++
 security/selinux/include/classmap.h |   2 +
 9 files changed, 210 insertions(+), 50 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog



[PATCH v7 1/3] Add a new LSM-supporting anonymous inode interface

2020-08-21 Thread Lokesh Gidra
From: Daniel Colascione 

This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.

The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.

For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.

Signed-off-by: Daniel Colascione 

[Fix comment documenting return values of inode_init_security_anon()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make _anon_inode_getfile() static]
[Use correct error cast in _anon_inode_getfile()]

Signed-off-by: Lokesh Gidra 
---
 fs/anon_inodes.c  | 148 --
 include/linux/anon_inodes.h   |  13 +++
 include/linux/lsm_hook_defs.h |   2 +
 include/linux/lsm_hooks.h |   7 ++
 include/linux/security.h  |   3 +
 security/security.c   |   9 +++
 6 files changed, 141 insertions(+), 41 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..2aa8b57be895 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,78 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb= kill_anon_super,
 };
 
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *  anonymous inode, and a dentry that describe the "class"
- *  of the file
- *
- * @name:[in]name of the "class" of the new file
- * @fops:[in]file operations for the new file
- * @priv:[in]private data for the new file (will be file's 
private_data)
- * @flags:   [in]flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
-   const struct file_operations *fops,
-   void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+   const char *name,
+   const struct inode *context_inode)
+{
+   struct inode *inode;
+   const struct qstr qname = QSTR_INIT(name, strlen(name));
+   int error;
+
+   inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+   if (IS_ERR(inode))
+   return inode;
+   inode->i_flags &= ~S_PRIVATE;
+   error = security_inode_init_security_anon(
+   inode, , context_inode);
+   if (error) {
+   iput(inode);
+   return ERR_PTR(error);
+   }
+   return inode;
+}
+
+static struct file *_anon_inode_getfile(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags,
+   const struct inode *context_inode,
+   bool secure)
 {
+   struct inode *inode;
struct file *file;
 
-   if (IS_ERR(anon_inode_inode))
-   return ERR_PTR(-ENODEV);
+   if (secure) {
+   inode = anon_inode_make_secure_inode(
+   name, context_inode);
+   if (IS_ERR(inode))
+   return ERR_CAST(inode);
+   } else {
+   inode = anon_inode_inode;
+   if (IS_ERR(inode))
+   return ERR_PTR(-ENODEV);
+   /*
+* We know the anon_inode inode count is always
+* greater than zero, so ihold() is safe.
+*/
+   ihold(inode);
+   }
 
-   if (fops->owner && !try_module_get(fops->owner))
-   return ERR_PTR(-ENOENT);
+   if (fops->owner && !try_module_get(fops->owner)) {
+   file = ERR_PTR(-ENOENT);
+   goto err;
+   }
 
-   /*
-* We know the anon_inode inode count is always greater than zero,
-* so ihold() is safe.
-*/
-   ihold(anon_inode_inode);
-   file = alloc_file_pseudo(anon_inode_inode, an

Re: [PATCH v6 0/3] SELinux support for anonymous inodes and UFFD

2020-08-20 Thread Lokesh Gidra
On Thu, Aug 20, 2020 at 11:36 AM James Morris  wrote:
>
> On Fri, 7 Aug 2020, Lokesh Gidra wrote:
>
> > Userfaultfd in unprivileged contexts could be potentially very
> > useful. We'd like to harden userfaultfd to make such unprivileged use
> > less risky. This patch series allows SELinux to manage userfaultfd
> > file descriptors and in the future, other kinds of
> > anonymous-inode-based file descriptor.  SELinux policy authors can
> > apply policy types to anonymous inodes by providing name-based
> > transition rules keyed off the anonymous inode internal name (
> > "[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
> > applying policy to the new SIDs thus produced.
>
> Can you expand more on why this would be useful, e.g. use-cases?
>
With SELinux managed userfaultfd file descriptors, an administrator
can control creation and movement of them. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the
corresponding security_ptrace_access_check() checks. For privacy, the
admin may want to deny such accesses,
which is possible with SELinux support.

I'll add this use case in the cover letter too in the next version.


Re: [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only

2020-08-17 Thread Lokesh Gidra
On Wed, Aug 5, 2020 at 10:44 PM Michael S. Tsirkin  wrote:
>
> On Wed, Aug 05, 2020 at 05:43:02PM -0700, Nick Kralevich wrote:
> > On Fri, Jul 24, 2020 at 6:40 AM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Jul 23, 2020 at 05:13:28PM -0700, Nick Kralevich wrote:
> > > > On Thu, Jul 23, 2020 at 10:30 AM Lokesh Gidra  
> > > > wrote:
> > > > > From the discussion so far it seems that there is a consensus that
> > > > > patch 1/2 in this series should be upstreamed in any case. Is there
> > > > > anything that is pending on that patch?
> > > >
> > > > That's my reading of this thread too.
> > > >
> > > > > > > Unless I'm mistaken that you can already enforce bit 1 of the 
> > > > > > > second
> > > > > > > parameter of the userfaultfd syscall to be set with seccomp-bpf, 
> > > > > > > this
> > > > > > > would be more a question to the Android userland team.
> > > > > > >
> > > > > > > The question would be: does it ever happen that a seccomp filter 
> > > > > > > isn't
> > > > > > > already applied to unprivileged software running without
> > > > > > > SYS_CAP_PTRACE capability?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > Android uses selinux as our primary sandboxing mechanism. We do use
> > > > > > seccomp on a few processes, but we have found that it has a
> > > > > > surprisingly high performance cost [1] on arm64 devices so turning 
> > > > > > it
> > > > > > on system wide is not a good option.
> > > > > >
> > > > > > [1] 
> > > > > > https://lore.kernel.org/linux-security-module/20200606.3F7109A@keescook/T/#m82ace19539ac595682affabdf652c0ffa5d27dad
> > > >
> > > > As Jeff mentioned, seccomp is used strategically on Android, but is
> > > > not applied to all processes. It's too expensive and impractical when
> > > > simpler implementations (such as this sysctl) can exist. It's also
> > > > significantly simpler to test a sysctl value for correctness as
> > > > opposed to a seccomp filter.
> > >
> > > Given that selinux is already used system-wide on Android, what is wrong
> > > with using selinux to control userfaultfd as opposed to seccomp?
> >
> > Userfaultfd file descriptors will be generally controlled by SELinux.
> > You can see the patchset at
> > https://lore.kernel.org/lkml/20200401213903.182112-3-dan...@google.com/
> > (which is also referenced in the original commit message for this
> > patchset). However, the SELinux patchset doesn't include the ability
> > to control FAULT_FLAG_USER / UFFD_USER_MODE_ONLY directly.
> >
> > SELinux already has the ability to control who gets CAP_SYS_PTRACE,
> > which combined with this patch, is largely equivalent to direct
> > UFFD_USER_MODE_ONLY checks. Additionally, with the SELinux patch
> > above, movement of userfaultfd file descriptors can be mediated by
> > SELinux, preventing one process from acquiring userfaultfd descriptors
> > of other processes unless allowed by security policy.
> >
> > It's an interesting question whether finer-grain SELinux support for
> > controlling UFFD_USER_MODE_ONLY should be added. I can see some
> > advantages to implementing this. However, we don't need to decide that
> > now.
> >
> > Kernel security checks generally break down into DAC (discretionary
> > access control) and MAC (mandatory access control) controls. Most
> > kernel security features check via both of these mechanisms. Security
> > attributes of the system should be settable without necessarily
> > relying on an LSM such as SELinux. This patch follows the same basic
> > model -- system wide control of a hardening feature is provided by the
> > unprivileged_userfaultfd_user_mode_only sysctl (DAC), and if needed,
> > SELinux support for this can also be implemented on top of the DAC
> > controls.
> >
> > This DAC/MAC split has been successful in several other security
> > features. For example, the ability to map at page zero is controlled
> > in DAC via the mmap_min_addr sysctl [1], and via SELinux via the
> > mmap_zero access vector [2]. Similarly, access to the kernel ring
> > buffer is controlled both via DAC as the dmesg_restrict sysctl [3], as
> > well as the SELinux syslog_read [2] check. Inde

Re: [PATCH v6 1/3] Add a new LSM-supporting anonymous inode interface

2020-08-17 Thread Lokesh Gidra
On Fri, Aug 7, 2020 at 4:02 PM Al Viro  wrote:
>
> On Fri, Aug 07, 2020 at 03:49:39PM -0700, Lokesh Gidra wrote:
>
> > The new functions accept an optional context_inode parameter that
> > callers can use to provide additional contextual information to
> > security modules, e.g., indicating that one anonymous struct file is a
> > logical child of another, allowing a security model to propagate
> > security information from one to the other.
>
> What the hell is "logical child" and what are the lifetime rules implied
> by that relationship?

context_inode provides the security context required by the security
modules for granting/denying permission to create an anon inode of the
same type.

In case of userfaultfd, the relationship between the context_inode and
the created inode is described as that of ‘logical child’ because the
context_inode (userfaultfd inode of the parent process) provides the
security context required for creation of child process’ userfaultfd
inode. But there is no relationship beyond this point. Therefore, no
reference to context_inode is held anywhere.


  1   2   >