Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Dave Hansen
On 10/22/2015 03:25 PM, Jerome Glisse wrote:
> On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote:
...
>> Can you give an example of where a process might be doing a gup and it
>> is completely separate from the CPU context that it's being executed under?
> 
> In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also
> believe that in infiniband one can have GUP call from workqueue that can
> run at any time. In GPU driver we also use GUP thought at this point we
> do not allow another process from accessing a buffer that is populated
> by GUP from another process.

>From quick grepping, there are only a couple of callers that do
get_user_pages() on something that isn't current->mm.

We can fairly easily introduce something new, like

get_foreign_user_pages()

That sets a flag to tell us to ignore the current PKRU state.

I've attached a patch that at creates a variant of get_user_pages() for
when you're going after another process's mm.  This even makes a few of
the gup call sites look nicer because they're not passing 'current,
current->mm'.

>>> So as first i would just allow GUP to always work and then come up with
>>> syscall to allow to set pkey on device file. This obviously is a lot more
>>> work as you need to go over all device driver using GUP.
>>
>> I wouldn't be opposed to adding some context to the thread (like
>> pagefault_disable()) that indicates whether we should enforce protection
>> keys.  If we are in some asynchronous context, disassociated from the
>> running CPU's protection keys, we could set a flag.
> 
> I was simply thinking of having a global set of pkeys against the process
> mm struct which would be the default global setting for all device GUP
> access. This global set could be override by userspace on a per device
> basis allowing some device to have more access than others.

For now, I think leaving it permissive by default is probably OK.  A
device's access to memory is permissive after a gup anyway.

As you note, doing this is going to require another whole set of user
interfaces, so I'd rather revisit it later once we have a more concrete
need for it.

1. Store a common PKRU value somewhere and activate when servicing work
   outside of the context of the actual process.  Set this PKRU value
   with input from userspace and new user APIs.
2. When work is queued, copy the PKRU value and use it while servicing
   the work.
3. Do all out-of-context work with PKRU=0, or by disabling the PKRU
   checks conditionally.

>> I'd really appreciate if you could point to some concrete examples here
>> which could actually cause a problem, like workqueues doing gups.
> 
> Well i could grep for all current user of GUP, but i can tell you that this
> is gonna be the model for GPU thread ie a kernel workqueue gonna handle
> page fault on behalf of GPU and will perform equivalent of GUP. Also apply
> for infiniband ODP thing which is upstream.




---

 b/arch/x86/mm/mpx.c   |4 +-
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |4 +-
 b/drivers/gpu/drm/i915/i915_gem_userptr.c |2 -
 b/drivers/gpu/drm/radeon/radeon_ttm.c |4 +-
 b/drivers/gpu/drm/via/via_dmablit.c   |3 --
 b/drivers/infiniband/core/umem.c  |2 -
 b/drivers/infiniband/core/umem_odp.c  |8 ++---
 b/drivers/infiniband/hw/mthca/mthca_memfree.c |3 --
 b/drivers/infiniband/hw/qib/qib_user_pages.c  |3 --
 b/drivers/infiniband/hw/usnic/usnic_uiom.c|2 -
 b/drivers/media/pci/ivtv/ivtv-yuv.c   |8 ++---
 b/drivers/media/v4l2-core/videobuf-dma-sg.c   |3 --
 b/drivers/virt/fsl_hypervisor.c   |5 +--
 b/fs/exec.c   |8 -
 b/include/linux/mm.h  |   14 +
 b/mm/frame_vector.c   |2 -
 b/mm/gup.c|   39 --
 b/mm/mempolicy.c  |6 ++--
 b/security/tomoyo/domain.c|9 +-
 19 files changed, 79 insertions(+), 50 deletions(-)

diff -puN mm/gup.c~get_current_user_pages mm/gup.c
--- a/mm/gup.c~get_current_user_pages	2015-10-22 16:03:24.957026355 -0700
+++ b/mm/gup.c	2015-10-22 16:46:58.181109179 -0700
@@ -752,11 +752,12 @@ EXPORT_SYMBOL(get_user_pages_locked);
  * according to the parameters "pages", "write", "force"
  * respectively.
  */
-__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-	   unsigned long start, unsigned long nr_pages,
+__always_inline long __get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 	   int write, int force, struct page **pages,
 	   unsigned int gup_flags)
 {
+	struct task_struct *tsk = curent;
+	struct mm_struct *mm = tsk->mm
 	long ret;
 	int locked = 1;
 	down_read(>mmap_sem);
@@ -795,7 +796,7 @@ long get_user_pages_unlocked(struct task
 

Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Jerome Glisse
On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote:
> On 10/22/2015 01:57 PM, Jerome Glisse wrote:
> > I have not read all the patches, but here i assume that for GUP you do
> > not first call arch_vma_access_permitted(). So issue i see is that GUP
> > for a process might happen inside another process and that process might
> > have different pkru protection keys, effectively randomly allowing or
> > forbidding a device driver to perform a GUP from say some workqueue that
> > just happen to be schedule against a different processor/thread than the
> > one against which it is doing the GUP for.
> 
> There are some places where there is no real context from which we can
> determine access rights.  ptrace is a good example.  We don't enforce
> PKEYs when walking _another_ process's page tables.
> 
> Can you give an example of where a process might be doing a gup and it
> is completely separate from the CPU context that it's being executed under?

In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also
believe that in infiniband one can have GUP call from workqueue that can
run at any time. In GPU driver we also use GUP thought at this point we
do not allow another process from accessing a buffer that is populated
by GUP from another process.

I am also here mainly talking about what future GPU will do where you will
have the CPU service page fault from GPU inside a workqueue that can run
at any point in time.

> 
> > Second and more fundamental thing i have issue with is that this whole
> > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
> > believe that device driver should be forbidden to do GUP base on pkru
> > keys.
> 
> I don't think of it as something necessarily central to the CPU, but
> something central to things that walk page tables.  We mark page tables
> with PKEYs and things that walk them will have certain rights.

My point is that we are seing devices that want to walk the page table and
they do it from a work queue inside the kernel which can run against another
process than the one they are doing the walk from.

I am sure there is already upstream device driver that does so, i have not
check all of them to confirm thought.


> > Tying this to the pkru reg value of whatever processor happens to be
> > running some device driver kernel function that try to do a GUP seems
> > broken to me.
> 
> That's one way to look at it.  Another way is that PKRU is specifying
> some real _intent_ about whether we want access to be allowed to some
> memory.

I think i misexpress myself here, yes PKRU is about specifying intent but
specifying it for CPU thread not for device thread. GPU for instance have
threads that run on behalf of a given process and i would rather see some
kind of coherent way to specify that for each devices like you allow it
to specify it on per CPU thread basis.


> > So as first i would just allow GUP to always work and then come up with
> > syscall to allow to set pkey on device file. This obviously is a lot more
> > work as you need to go over all device driver using GUP.
> 
> I wouldn't be opposed to adding some context to the thread (like
> pagefault_disable()) that indicates whether we should enforce protection
> keys.  If we are in some asynchronous context, disassociated from the
> running CPU's protection keys, we could set a flag.

I was simply thinking of having a global set of pkeys against the process
mm struct which would be the default global setting for all device GUP
access. This global set could be override by userspace on a per device
basis allowing some device to have more access than others.


> I'd really appreciate if you could point to some concrete examples here
> which could actually cause a problem, like workqueues doing gups.

Well i could grep for all current user of GUP, but i can tell you that this
is gonna be the model for GPU thread ie a kernel workqueue gonna handle
page fault on behalf of GPU and will perform equivalent of GUP. Also apply
for infiniband ODP thing which is upstream.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Dave Hansen
On 10/22/2015 01:57 PM, Jerome Glisse wrote:
> I have not read all the patches, but here i assume that for GUP you do
> not first call arch_vma_access_permitted(). So issue i see is that GUP
> for a process might happen inside another process and that process might
> have different pkru protection keys, effectively randomly allowing or
> forbidding a device driver to perform a GUP from say some workqueue that
> just happen to be schedule against a different processor/thread than the
> one against which it is doing the GUP for.

There are some places where there is no real context from which we can
determine access rights.  ptrace is a good example.  We don't enforce
PKEYs when walking _another_ process's page tables.

Can you give an example of where a process might be doing a gup and it
is completely separate from the CPU context that it's being executed under?

> Second and more fundamental thing i have issue with is that this whole
> pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
> believe that device driver should be forbidden to do GUP base on pkru
> keys.

I don't think of it as something necessarily central to the CPU, but
something central to things that walk page tables.  We mark page tables
with PKEYs and things that walk them will have certain rights.

> Tying this to the pkru reg value of whatever processor happens to be
> running some device driver kernel function that try to do a GUP seems
> broken to me.

That's one way to look at it.  Another way is that PKRU is specifying
some real _intent_ about whether we want access to be allowed to some
memory.

> So as first i would just allow GUP to always work and then come up with
> syscall to allow to set pkey on device file. This obviously is a lot more
> work as you need to go over all device driver using GUP.

I wouldn't be opposed to adding some context to the thread (like
pagefault_disable()) that indicates whether we should enforce protection
keys.  If we are in some asynchronous context, disassociated from the
running CPU's protection keys, we could set a flag.

I'd really appreciate if you could point to some concrete examples here
which could actually cause a problem, like workqueues doing gups.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Jerome Glisse
On Mon, Sep 28, 2015 at 12:18:23PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> Today, for normal faults and page table walks, we check the VMA
> and/or PTE to ensure that it is compatible with the action.  For
> instance, if we get a write fault on a non-writeable VMA, we
> SIGSEGV.
> 
> We try to do the same thing for protection keys.  Basically, we
> try to make sure that if a user does this:
> 
>   mprotect(ptr, size, PROT_NONE);
>   *ptr = foo;
> 
> they see the same effects with protection keys when they do this:
> 
>   mprotect(ptr, size, PROT_READ|PROT_WRITE);
>   set_pkey(ptr, size, 4);
>   wrpkru(0xff3f); // access disable pkey 4
>   *ptr = foo;
> 
> The state to do that checking is in the VMA, but we also
> sometimes have to do it on the page tables only, like when doing
> a get_user_pages_fast() where we have no VMA.
> 
> We add two functions and expose them to generic code:
> 
>   arch_pte_access_permitted(pte, write)
>   arch_vma_access_permitted(vma, write)
> 
> These are, of course, backed up in x86 arch code with checks
> against the PTE or VMA's protection key.
> 
> But, there are also cases where we do not want to respect
> protection keys.  When we ptrace(), for instance, we do not want
> to apply the tracer's PKRU permissions to the PTEs from the
> process being traced.


Well i am bit puzzle here because this will not provide consistant
protection as far as GUP (get_user_pages) is concern, assuming i
understand the pkru thing properly. Those are register local to CPU
and they are writeable by userspace thread so thread can temporarily
revoke access to range while executing untrusted subfunctions.

I have not read all the patches, but here i assume that for GUP you do
not first call arch_vma_access_permitted(). So issue i see is that GUP
for a process might happen inside another process and that process might
have different pkru protection keys, effectively randomly allowing or
forbidding a device driver to perform a GUP from say some workqueue that
just happen to be schedule against a different processor/thread than the
one against which it is doing the GUP for.

Second and more fundamental thing i have issue with is that this whole
pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
believe that device driver should be forbidden to do GUP base on pkru
keys.

Tying this to the pkru reg value of whatever processor happens to be
running some device driver kernel function that try to do a GUP seems
broken to me.

Sadly setting properties like pkru keys per device is not something that
is easy to do. I would do it on a per device file basis and allow user
space program to change them against the device file, then device driver
doing GUP would use that to check against the pte key and allow forbid
GUP.

Also doing it on per device file makes it harder for program to leverage
this feature as now they have to think about all device file they have
open. Maybe we need to keep a list of device that are use by a process
in the task struct and allow to set pkey globaly for all devices, while
allowing overriding this common default on per device basis.

So as first i would just allow GUP to always work and then come up with
syscall to allow to set pkey on device file. This obviously is a lot more
work as you need to go over all device driver using GUP.

This are my thoughts so far.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Jerome Glisse
On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote:
> On 10/22/2015 01:57 PM, Jerome Glisse wrote:
> > I have not read all the patches, but here i assume that for GUP you do
> > not first call arch_vma_access_permitted(). So issue i see is that GUP
> > for a process might happen inside another process and that process might
> > have different pkru protection keys, effectively randomly allowing or
> > forbidding a device driver to perform a GUP from say some workqueue that
> > just happen to be schedule against a different processor/thread than the
> > one against which it is doing the GUP for.
> 
> There are some places where there is no real context from which we can
> determine access rights.  ptrace is a good example.  We don't enforce
> PKEYs when walking _another_ process's page tables.
> 
> Can you give an example of where a process might be doing a gup and it
> is completely separate from the CPU context that it's being executed under?

In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also
believe that in infiniband one can have GUP call from workqueue that can
run at any time. In GPU driver we also use GUP thought at this point we
do not allow another process from accessing a buffer that is populated
by GUP from another process.

I am also here mainly talking about what future GPU will do where you will
have the CPU service page fault from GPU inside a workqueue that can run
at any point in time.

> 
> > Second and more fundamental thing i have issue with is that this whole
> > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
> > believe that device driver should be forbidden to do GUP base on pkru
> > keys.
> 
> I don't think of it as something necessarily central to the CPU, but
> something central to things that walk page tables.  We mark page tables
> with PKEYs and things that walk them will have certain rights.

My point is that we are seing devices that want to walk the page table and
they do it from a work queue inside the kernel which can run against another
process than the one they are doing the walk from.

I am sure there is already upstream device driver that does so, i have not
check all of them to confirm thought.


> > Tying this to the pkru reg value of whatever processor happens to be
> > running some device driver kernel function that try to do a GUP seems
> > broken to me.
> 
> That's one way to look at it.  Another way is that PKRU is specifying
> some real _intent_ about whether we want access to be allowed to some
> memory.

I think i misexpress myself here, yes PKRU is about specifying intent but
specifying it for CPU thread not for device thread. GPU for instance have
threads that run on behalf of a given process and i would rather see some
kind of coherent way to specify that for each devices like you allow it
to specify it on per CPU thread basis.


> > So as first i would just allow GUP to always work and then come up with
> > syscall to allow to set pkey on device file. This obviously is a lot more
> > work as you need to go over all device driver using GUP.
> 
> I wouldn't be opposed to adding some context to the thread (like
> pagefault_disable()) that indicates whether we should enforce protection
> keys.  If we are in some asynchronous context, disassociated from the
> running CPU's protection keys, we could set a flag.

I was simply thinking of having a global set of pkeys against the process
mm struct which would be the default global setting for all device GUP
access. This global set could be override by userspace on a per device
basis allowing some device to have more access than others.


> I'd really appreciate if you could point to some concrete examples here
> which could actually cause a problem, like workqueues doing gups.

Well i could grep for all current user of GUP, but i can tell you that this
is gonna be the model for GPU thread ie a kernel workqueue gonna handle
page fault on behalf of GPU and will perform equivalent of GUP. Also apply
for infiniband ODP thing which is upstream.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Dave Hansen
On 10/22/2015 01:57 PM, Jerome Glisse wrote:
> I have not read all the patches, but here i assume that for GUP you do
> not first call arch_vma_access_permitted(). So issue i see is that GUP
> for a process might happen inside another process and that process might
> have different pkru protection keys, effectively randomly allowing or
> forbidding a device driver to perform a GUP from say some workqueue that
> just happen to be schedule against a different processor/thread than the
> one against which it is doing the GUP for.

There are some places where there is no real context from which we can
determine access rights.  ptrace is a good example.  We don't enforce
PKEYs when walking _another_ process's page tables.

Can you give an example of where a process might be doing a gup and it
is completely separate from the CPU context that it's being executed under?

> Second and more fundamental thing i have issue with is that this whole
> pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
> believe that device driver should be forbidden to do GUP base on pkru
> keys.

I don't think of it as something necessarily central to the CPU, but
something central to things that walk page tables.  We mark page tables
with PKEYs and things that walk them will have certain rights.

> Tying this to the pkru reg value of whatever processor happens to be
> running some device driver kernel function that try to do a GUP seems
> broken to me.

That's one way to look at it.  Another way is that PKRU is specifying
some real _intent_ about whether we want access to be allowed to some
memory.

> So as first i would just allow GUP to always work and then come up with
> syscall to allow to set pkey on device file. This obviously is a lot more
> work as you need to go over all device driver using GUP.

I wouldn't be opposed to adding some context to the thread (like
pagefault_disable()) that indicates whether we should enforce protection
keys.  If we are in some asynchronous context, disassociated from the
running CPU's protection keys, we could set a flag.

I'd really appreciate if you could point to some concrete examples here
which could actually cause a problem, like workqueues doing gups.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Jerome Glisse
On Mon, Sep 28, 2015 at 12:18:23PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> Today, for normal faults and page table walks, we check the VMA
> and/or PTE to ensure that it is compatible with the action.  For
> instance, if we get a write fault on a non-writeable VMA, we
> SIGSEGV.
> 
> We try to do the same thing for protection keys.  Basically, we
> try to make sure that if a user does this:
> 
>   mprotect(ptr, size, PROT_NONE);
>   *ptr = foo;
> 
> they see the same effects with protection keys when they do this:
> 
>   mprotect(ptr, size, PROT_READ|PROT_WRITE);
>   set_pkey(ptr, size, 4);
>   wrpkru(0xff3f); // access disable pkey 4
>   *ptr = foo;
> 
> The state to do that checking is in the VMA, but we also
> sometimes have to do it on the page tables only, like when doing
> a get_user_pages_fast() where we have no VMA.
> 
> We add two functions and expose them to generic code:
> 
>   arch_pte_access_permitted(pte, write)
>   arch_vma_access_permitted(vma, write)
> 
> These are, of course, backed up in x86 arch code with checks
> against the PTE or VMA's protection key.
> 
> But, there are also cases where we do not want to respect
> protection keys.  When we ptrace(), for instance, we do not want
> to apply the tracer's PKRU permissions to the PTEs from the
> process being traced.


Well i am bit puzzle here because this will not provide consistant
protection as far as GUP (get_user_pages) is concern, assuming i
understand the pkru thing properly. Those are register local to CPU
and they are writeable by userspace thread so thread can temporarily
revoke access to range while executing untrusted subfunctions.

I have not read all the patches, but here i assume that for GUP you do
not first call arch_vma_access_permitted(). So issue i see is that GUP
for a process might happen inside another process and that process might
have different pkru protection keys, effectively randomly allowing or
forbidding a device driver to perform a GUP from say some workqueue that
just happen to be schedule against a different processor/thread than the
one against which it is doing the GUP for.

Second and more fundamental thing i have issue with is that this whole
pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
believe that device driver should be forbidden to do GUP base on pkru
keys.

Tying this to the pkru reg value of whatever processor happens to be
running some device driver kernel function that try to do a GUP seems
broken to me.

Sadly setting properties like pkru keys per device is not something that
is easy to do. I would do it on a per device file basis and allow user
space program to change them against the device file, then device driver
doing GUP would use that to check against the pte key and allow forbid
GUP.

Also doing it on per device file makes it harder for program to leverage
this feature as now they have to think about all device file they have
open. Maybe we need to keep a list of device that are use by a process
in the task struct and allow to set pkey globaly for all devices, while
allowing overriding this common default on per device basis.

So as first i would just allow GUP to always work and then come up with
syscall to allow to set pkey on device file. This obviously is a lot more
work as you need to go over all device driver using GUP.

This are my thoughts so far.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-10-22 Thread Dave Hansen
On 10/22/2015 03:25 PM, Jerome Glisse wrote:
> On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote:
...
>> Can you give an example of where a process might be doing a gup and it
>> is completely separate from the CPU context that it's being executed under?
> 
> In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also
> believe that in infiniband one can have GUP call from workqueue that can
> run at any time. In GPU driver we also use GUP thought at this point we
> do not allow another process from accessing a buffer that is populated
> by GUP from another process.

>From quick grepping, there are only a couple of callers that do
get_user_pages() on something that isn't current->mm.

We can fairly easily introduce something new, like

get_foreign_user_pages()

That sets a flag to tell us to ignore the current PKRU state.

I've attached a patch that at creates a variant of get_user_pages() for
when you're going after another process's mm.  This even makes a few of
the gup call sites look nicer because they're not passing 'current,
current->mm'.

>>> So as first i would just allow GUP to always work and then come up with
>>> syscall to allow to set pkey on device file. This obviously is a lot more
>>> work as you need to go over all device driver using GUP.
>>
>> I wouldn't be opposed to adding some context to the thread (like
>> pagefault_disable()) that indicates whether we should enforce protection
>> keys.  If we are in some asynchronous context, disassociated from the
>> running CPU's protection keys, we could set a flag.
> 
> I was simply thinking of having a global set of pkeys against the process
> mm struct which would be the default global setting for all device GUP
> access. This global set could be override by userspace on a per device
> basis allowing some device to have more access than others.

For now, I think leaving it permissive by default is probably OK.  A
device's access to memory is permissive after a gup anyway.

As you note, doing this is going to require another whole set of user
interfaces, so I'd rather revisit it later once we have a more concrete
need for it.

1. Store a common PKRU value somewhere and activate when servicing work
   outside of the context of the actual process.  Set this PKRU value
   with input from userspace and new user APIs.
2. When work is queued, copy the PKRU value and use it while servicing
   the work.
3. Do all out-of-context work with PKRU=0, or by disabling the PKRU
   checks conditionally.

>> I'd really appreciate if you could point to some concrete examples here
>> which could actually cause a problem, like workqueues doing gups.
> 
> Well i could grep for all current user of GUP, but i can tell you that this
> is gonna be the model for GPU thread ie a kernel workqueue gonna handle
> page fault on behalf of GPU and will perform equivalent of GUP. Also apply
> for infiniband ODP thing which is upstream.




---

 b/arch/x86/mm/mpx.c   |4 +-
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |4 +-
 b/drivers/gpu/drm/i915/i915_gem_userptr.c |2 -
 b/drivers/gpu/drm/radeon/radeon_ttm.c |4 +-
 b/drivers/gpu/drm/via/via_dmablit.c   |3 --
 b/drivers/infiniband/core/umem.c  |2 -
 b/drivers/infiniband/core/umem_odp.c  |8 ++---
 b/drivers/infiniband/hw/mthca/mthca_memfree.c |3 --
 b/drivers/infiniband/hw/qib/qib_user_pages.c  |3 --
 b/drivers/infiniband/hw/usnic/usnic_uiom.c|2 -
 b/drivers/media/pci/ivtv/ivtv-yuv.c   |8 ++---
 b/drivers/media/v4l2-core/videobuf-dma-sg.c   |3 --
 b/drivers/virt/fsl_hypervisor.c   |5 +--
 b/fs/exec.c   |8 -
 b/include/linux/mm.h  |   14 +
 b/mm/frame_vector.c   |2 -
 b/mm/gup.c|   39 --
 b/mm/mempolicy.c  |6 ++--
 b/security/tomoyo/domain.c|9 +-
 19 files changed, 79 insertions(+), 50 deletions(-)

diff -puN mm/gup.c~get_current_user_pages mm/gup.c
--- a/mm/gup.c~get_current_user_pages	2015-10-22 16:03:24.957026355 -0700
+++ b/mm/gup.c	2015-10-22 16:46:58.181109179 -0700
@@ -752,11 +752,12 @@ EXPORT_SYMBOL(get_user_pages_locked);
  * according to the parameters "pages", "write", "force"
  * respectively.
  */
-__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-	   unsigned long start, unsigned long nr_pages,
+__always_inline long __get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 	   int write, int force, struct page **pages,
 	   unsigned int gup_flags)
 {
+	struct task_struct *tsk = curent;
+	struct mm_struct *mm = tsk->mm
 	long ret;
 	int locked = 1;
 	down_read(>mmap_sem);
@@ -795,7 +796,7 @@ long get_user_pages_unlocked(struct task
 

[PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-09-28 Thread Dave Hansen

From: Dave Hansen 

Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action.  For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.

We try to do the same thing for protection keys.  Basically, we
try to make sure that if a user does this:

mprotect(ptr, size, PROT_NONE);
*ptr = foo;

they see the same effects with protection keys when they do this:

mprotect(ptr, size, PROT_READ|PROT_WRITE);
set_pkey(ptr, size, 4);
wrpkru(0xff3f); // access disable pkey 4
*ptr = foo;

The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.

We add two functions and expose them to generic code:

arch_pte_access_permitted(pte, write)
arch_vma_access_permitted(vma, write)

These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.

But, there are also cases where we do not want to respect
protection keys.  When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.

Signed-off-by: Dave Hansen 
---

 b/arch/powerpc/include/asm/mmu_context.h   |   11 ++
 b/arch/s390/include/asm/mmu_context.h  |   11 ++
 b/arch/unicore32/include/asm/mmu_context.h |   11 ++
 b/arch/x86/include/asm/mmu_context.h   |   51 -
 b/arch/x86/include/asm/pgtable.h   |   29 
 b/arch/x86/mm/fault.c  |   21 +++
 b/arch/x86/mm/gup.c|3 +
 b/include/asm-generic/mm_hooks.h   |   11 ++
 b/mm/gup.c |   17 -
 b/mm/memory.c  |4 ++
 10 files changed, 165 insertions(+), 4 deletions(-)

diff -puN arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/powerpc/include/asm/mmu_context.h
--- a/arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 
11:39:47.619258875 -0700
+++ b/arch/powerpc/include/asm/mmu_context.h2015-09-28 11:39:47.637259694 
-0700
@@ -148,5 +148,16 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool 
write)
+{
+   /* by default, allow everything */
+   return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff -puN arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/s390/include/asm/mmu_context.h
--- a/arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault2015-09-28 
11:39:47.621258967 -0700
+++ b/arch/s390/include/asm/mmu_context.h   2015-09-28 11:39:47.638259740 
-0700
@@ -130,4 +130,15 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool 
write)
+{
+   /* by default, allow everything */
+   return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 #endif /* __S390_MMU_CONTEXT_H */
diff -puN arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/unicore32/include/asm/mmu_context.h
--- a/arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault   
2015-09-28 11:39:47.622259012 -0700
+++ b/arch/unicore32/include/asm/mmu_context.h  2015-09-28 11:39:47.638259740 
-0700
@@ -97,4 +97,15 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool 
write)
+{
+   /* by default, allow everything */
+   return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 #endif
diff -puN arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/x86/include/asm/mmu_context.h
--- a/arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 
11:39:47.624259103 -0700
+++ b/arch/x86/include/asm/mmu_context.h2015-09-28 11:39:47.638259740 
-0700
@@ -263,4 +263,53 @@ static inline u16 vma_pkey(struct vm_are
return pkey;
 }
 
-#endif /* _ASM_X86_MMU_CONTEXT_H */
+static inline bool __pkru_allows_pkey(u16 pkey, bool write)
+{
+   u32 pkru = read_pkru();
+
+   if (!__pkru_allows_read(pkru, pkey))
+   return false;
+   if (write && !__pkru_allows_write(pkru, pkey))
+   return false;
+
+   return true;
+}
+
+/*
+ * We only want to enforce protection keys on the current process
+ * because we effectively have no access to PKRU for other
+ * processes or any way to tell *which * PKRU in a threaded
+ * process we could use.
+ *
+ * So do not enforce things if the VMA 

[PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

2015-09-28 Thread Dave Hansen

From: Dave Hansen 

Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action.  For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.

We try to do the same thing for protection keys.  Basically, we
try to make sure that if a user does this:

mprotect(ptr, size, PROT_NONE);
*ptr = foo;

they see the same effects with protection keys when they do this:

mprotect(ptr, size, PROT_READ|PROT_WRITE);
set_pkey(ptr, size, 4);
wrpkru(0xff3f); // access disable pkey 4
*ptr = foo;

The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.

We add two functions and expose them to generic code:

arch_pte_access_permitted(pte, write)
arch_vma_access_permitted(vma, write)

These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.

But, there are also cases where we do not want to respect
protection keys.  When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.

Signed-off-by: Dave Hansen 
---

 b/arch/powerpc/include/asm/mmu_context.h   |   11 ++
 b/arch/s390/include/asm/mmu_context.h  |   11 ++
 b/arch/unicore32/include/asm/mmu_context.h |   11 ++
 b/arch/x86/include/asm/mmu_context.h   |   51 -
 b/arch/x86/include/asm/pgtable.h   |   29 
 b/arch/x86/mm/fault.c  |   21 +++
 b/arch/x86/mm/gup.c|3 +
 b/include/asm-generic/mm_hooks.h   |   11 ++
 b/mm/gup.c |   17 -
 b/mm/memory.c  |4 ++
 10 files changed, 165 insertions(+), 4 deletions(-)

diff -puN arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/powerpc/include/asm/mmu_context.h
--- a/arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 
11:39:47.619258875 -0700
+++ b/arch/powerpc/include/asm/mmu_context.h2015-09-28 11:39:47.637259694 
-0700
@@ -148,5 +148,16 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool 
write)
+{
+   /* by default, allow everything */
+   return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff -puN arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/s390/include/asm/mmu_context.h
--- a/arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault2015-09-28 
11:39:47.621258967 -0700
+++ b/arch/s390/include/asm/mmu_context.h   2015-09-28 11:39:47.638259740 
-0700
@@ -130,4 +130,15 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool 
write)
+{
+   /* by default, allow everything */
+   return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 #endif /* __S390_MMU_CONTEXT_H */
diff -puN arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/unicore32/include/asm/mmu_context.h
--- a/arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault   
2015-09-28 11:39:47.622259012 -0700
+++ b/arch/unicore32/include/asm/mmu_context.h  2015-09-28 11:39:47.638259740 
-0700
@@ -97,4 +97,15 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool 
write)
+{
+   /* by default, allow everything */
+   return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+   /* by default, allow everything */
+   return true;
+}
 #endif
diff -puN arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault 
arch/x86/include/asm/mmu_context.h
--- a/arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 
11:39:47.624259103 -0700
+++ b/arch/x86/include/asm/mmu_context.h2015-09-28 11:39:47.638259740 
-0700
@@ -263,4 +263,53 @@ static inline u16 vma_pkey(struct vm_are
return pkey;
 }
 
-#endif /* _ASM_X86_MMU_CONTEXT_H */
+static inline bool __pkru_allows_pkey(u16 pkey, bool write)
+{
+   u32 pkru = read_pkru();
+
+   if (!__pkru_allows_read(pkru, pkey))
+   return false;
+   if (write && !__pkru_allows_write(pkru, pkey))
+   return false;
+
+   return true;
+}
+
+/*
+ * We only want to enforce protection keys on the current process
+ * because we effectively have no access to PKRU for other
+ * processes or any way to tell *which * PKRU in a threaded
+ * process