Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On 10/22/2015 03:25 PM, Jerome Glisse wrote: > On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote: ... >> Can you give an example of where a process might be doing a gup and it >> is completely separate from the CPU context that it's being executed under? > > In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also > believe that in infiniband one can have GUP call from workqueue that can > run at any time. In GPU driver we also use GUP thought at this point we > do not allow another process from accessing a buffer that is populated > by GUP from another process. >From quick grepping, there are only a couple of callers that do get_user_pages() on something that isn't current->mm. We can fairly easily introduce something new, like get_foreign_user_pages() That sets a flag to tell us to ignore the current PKRU state. I've attached a patch that at creates a variant of get_user_pages() for when you're going after another process's mm. This even makes a few of the gup call sites look nicer because they're not passing 'current, current->mm'. >>> So as first i would just allow GUP to always work and then come up with >>> syscall to allow to set pkey on device file. This obviously is a lot more >>> work as you need to go over all device driver using GUP. >> >> I wouldn't be opposed to adding some context to the thread (like >> pagefault_disable()) that indicates whether we should enforce protection >> keys. If we are in some asynchronous context, disassociated from the >> running CPU's protection keys, we could set a flag. > > I was simply thinking of having a global set of pkeys against the process > mm struct which would be the default global setting for all device GUP > access. This global set could be override by userspace on a per device > basis allowing some device to have more access than others. For now, I think leaving it permissive by default is probably OK. A device's access to memory is permissive after a gup anyway. As you note, doing this is going to require another whole set of user interfaces, so I'd rather revisit it later once we have a more concrete need for it. 1. Store a common PKRU value somewhere and activate when servicing work outside of the context of the actual process. Set this PKRU value with input from userspace and new user APIs. 2. When work is queued, copy the PKRU value and use it while servicing the work. 3. Do all out-of-context work with PKRU=0, or by disabling the PKRU checks conditionally. >> I'd really appreciate if you could point to some concrete examples here >> which could actually cause a problem, like workqueues doing gups. > > Well i could grep for all current user of GUP, but i can tell you that this > is gonna be the model for GPU thread ie a kernel workqueue gonna handle > page fault on behalf of GPU and will perform equivalent of GUP. Also apply > for infiniband ODP thing which is upstream. --- b/arch/x86/mm/mpx.c |4 +- b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |4 +- b/drivers/gpu/drm/i915/i915_gem_userptr.c |2 - b/drivers/gpu/drm/radeon/radeon_ttm.c |4 +- b/drivers/gpu/drm/via/via_dmablit.c |3 -- b/drivers/infiniband/core/umem.c |2 - b/drivers/infiniband/core/umem_odp.c |8 ++--- b/drivers/infiniband/hw/mthca/mthca_memfree.c |3 -- b/drivers/infiniband/hw/qib/qib_user_pages.c |3 -- b/drivers/infiniband/hw/usnic/usnic_uiom.c|2 - b/drivers/media/pci/ivtv/ivtv-yuv.c |8 ++--- b/drivers/media/v4l2-core/videobuf-dma-sg.c |3 -- b/drivers/virt/fsl_hypervisor.c |5 +-- b/fs/exec.c |8 - b/include/linux/mm.h | 14 + b/mm/frame_vector.c |2 - b/mm/gup.c| 39 -- b/mm/mempolicy.c |6 ++-- b/security/tomoyo/domain.c|9 +- 19 files changed, 79 insertions(+), 50 deletions(-) diff -puN mm/gup.c~get_current_user_pages mm/gup.c --- a/mm/gup.c~get_current_user_pages 2015-10-22 16:03:24.957026355 -0700 +++ b/mm/gup.c 2015-10-22 16:46:58.181109179 -0700 @@ -752,11 +752,12 @@ EXPORT_SYMBOL(get_user_pages_locked); * according to the parameters "pages", "write", "force" * respectively. */ -__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, - unsigned long start, unsigned long nr_pages, +__always_inline long __get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int write, int force, struct page **pages, unsigned int gup_flags) { + struct task_struct *tsk = curent; + struct mm_struct *mm = tsk->mm long ret; int locked = 1; down_read(>mmap_sem); @@ -795,7 +796,7 @@ long get_user_pages_unlocked(struct task
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote: > On 10/22/2015 01:57 PM, Jerome Glisse wrote: > > I have not read all the patches, but here i assume that for GUP you do > > not first call arch_vma_access_permitted(). So issue i see is that GUP > > for a process might happen inside another process and that process might > > have different pkru protection keys, effectively randomly allowing or > > forbidding a device driver to perform a GUP from say some workqueue that > > just happen to be schedule against a different processor/thread than the > > one against which it is doing the GUP for. > > There are some places where there is no real context from which we can > determine access rights. ptrace is a good example. We don't enforce > PKEYs when walking _another_ process's page tables. > > Can you give an example of where a process might be doing a gup and it > is completely separate from the CPU context that it's being executed under? In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also believe that in infiniband one can have GUP call from workqueue that can run at any time. In GPU driver we also use GUP thought at this point we do not allow another process from accessing a buffer that is populated by GUP from another process. I am also here mainly talking about what future GPU will do where you will have the CPU service page fault from GPU inside a workqueue that can run at any point in time. > > > Second and more fundamental thing i have issue with is that this whole > > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not > > believe that device driver should be forbidden to do GUP base on pkru > > keys. > > I don't think of it as something necessarily central to the CPU, but > something central to things that walk page tables. We mark page tables > with PKEYs and things that walk them will have certain rights. My point is that we are seing devices that want to walk the page table and they do it from a work queue inside the kernel which can run against another process than the one they are doing the walk from. I am sure there is already upstream device driver that does so, i have not check all of them to confirm thought. > > Tying this to the pkru reg value of whatever processor happens to be > > running some device driver kernel function that try to do a GUP seems > > broken to me. > > That's one way to look at it. Another way is that PKRU is specifying > some real _intent_ about whether we want access to be allowed to some > memory. I think i misexpress myself here, yes PKRU is about specifying intent but specifying it for CPU thread not for device thread. GPU for instance have threads that run on behalf of a given process and i would rather see some kind of coherent way to specify that for each devices like you allow it to specify it on per CPU thread basis. > > So as first i would just allow GUP to always work and then come up with > > syscall to allow to set pkey on device file. This obviously is a lot more > > work as you need to go over all device driver using GUP. > > I wouldn't be opposed to adding some context to the thread (like > pagefault_disable()) that indicates whether we should enforce protection > keys. If we are in some asynchronous context, disassociated from the > running CPU's protection keys, we could set a flag. I was simply thinking of having a global set of pkeys against the process mm struct which would be the default global setting for all device GUP access. This global set could be override by userspace on a per device basis allowing some device to have more access than others. > I'd really appreciate if you could point to some concrete examples here > which could actually cause a problem, like workqueues doing gups. Well i could grep for all current user of GUP, but i can tell you that this is gonna be the model for GPU thread ie a kernel workqueue gonna handle page fault on behalf of GPU and will perform equivalent of GUP. Also apply for infiniband ODP thing which is upstream. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On 10/22/2015 01:57 PM, Jerome Glisse wrote: > I have not read all the patches, but here i assume that for GUP you do > not first call arch_vma_access_permitted(). So issue i see is that GUP > for a process might happen inside another process and that process might > have different pkru protection keys, effectively randomly allowing or > forbidding a device driver to perform a GUP from say some workqueue that > just happen to be schedule against a different processor/thread than the > one against which it is doing the GUP for. There are some places where there is no real context from which we can determine access rights. ptrace is a good example. We don't enforce PKEYs when walking _another_ process's page tables. Can you give an example of where a process might be doing a gup and it is completely separate from the CPU context that it's being executed under? > Second and more fundamental thing i have issue with is that this whole > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not > believe that device driver should be forbidden to do GUP base on pkru > keys. I don't think of it as something necessarily central to the CPU, but something central to things that walk page tables. We mark page tables with PKEYs and things that walk them will have certain rights. > Tying this to the pkru reg value of whatever processor happens to be > running some device driver kernel function that try to do a GUP seems > broken to me. That's one way to look at it. Another way is that PKRU is specifying some real _intent_ about whether we want access to be allowed to some memory. > So as first i would just allow GUP to always work and then come up with > syscall to allow to set pkey on device file. This obviously is a lot more > work as you need to go over all device driver using GUP. I wouldn't be opposed to adding some context to the thread (like pagefault_disable()) that indicates whether we should enforce protection keys. If we are in some asynchronous context, disassociated from the running CPU's protection keys, we could set a flag. I'd really appreciate if you could point to some concrete examples here which could actually cause a problem, like workqueues doing gups. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On Mon, Sep 28, 2015 at 12:18:23PM -0700, Dave Hansen wrote: > > From: Dave Hansen > > Today, for normal faults and page table walks, we check the VMA > and/or PTE to ensure that it is compatible with the action. For > instance, if we get a write fault on a non-writeable VMA, we > SIGSEGV. > > We try to do the same thing for protection keys. Basically, we > try to make sure that if a user does this: > > mprotect(ptr, size, PROT_NONE); > *ptr = foo; > > they see the same effects with protection keys when they do this: > > mprotect(ptr, size, PROT_READ|PROT_WRITE); > set_pkey(ptr, size, 4); > wrpkru(0xff3f); // access disable pkey 4 > *ptr = foo; > > The state to do that checking is in the VMA, but we also > sometimes have to do it on the page tables only, like when doing > a get_user_pages_fast() where we have no VMA. > > We add two functions and expose them to generic code: > > arch_pte_access_permitted(pte, write) > arch_vma_access_permitted(vma, write) > > These are, of course, backed up in x86 arch code with checks > against the PTE or VMA's protection key. > > But, there are also cases where we do not want to respect > protection keys. When we ptrace(), for instance, we do not want > to apply the tracer's PKRU permissions to the PTEs from the > process being traced. Well i am bit puzzle here because this will not provide consistant protection as far as GUP (get_user_pages) is concern, assuming i understand the pkru thing properly. Those are register local to CPU and they are writeable by userspace thread so thread can temporarily revoke access to range while executing untrusted subfunctions. I have not read all the patches, but here i assume that for GUP you do not first call arch_vma_access_permitted(). So issue i see is that GUP for a process might happen inside another process and that process might have different pkru protection keys, effectively randomly allowing or forbidding a device driver to perform a GUP from say some workqueue that just happen to be schedule against a different processor/thread than the one against which it is doing the GUP for. Second and more fundamental thing i have issue with is that this whole pkru keys are centric to CPU POV ie this is a CPU feature. So i do not believe that device driver should be forbidden to do GUP base on pkru keys. Tying this to the pkru reg value of whatever processor happens to be running some device driver kernel function that try to do a GUP seems broken to me. Sadly setting properties like pkru keys per device is not something that is easy to do. I would do it on a per device file basis and allow user space program to change them against the device file, then device driver doing GUP would use that to check against the pte key and allow forbid GUP. Also doing it on per device file makes it harder for program to leverage this feature as now they have to think about all device file they have open. Maybe we need to keep a list of device that are use by a process in the task struct and allow to set pkey globaly for all devices, while allowing overriding this common default on per device basis. So as first i would just allow GUP to always work and then come up with syscall to allow to set pkey on device file. This obviously is a lot more work as you need to go over all device driver using GUP. This are my thoughts so far. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote: > On 10/22/2015 01:57 PM, Jerome Glisse wrote: > > I have not read all the patches, but here i assume that for GUP you do > > not first call arch_vma_access_permitted(). So issue i see is that GUP > > for a process might happen inside another process and that process might > > have different pkru protection keys, effectively randomly allowing or > > forbidding a device driver to perform a GUP from say some workqueue that > > just happen to be schedule against a different processor/thread than the > > one against which it is doing the GUP for. > > There are some places where there is no real context from which we can > determine access rights. ptrace is a good example. We don't enforce > PKEYs when walking _another_ process's page tables. > > Can you give an example of where a process might be doing a gup and it > is completely separate from the CPU context that it's being executed under? In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also believe that in infiniband one can have GUP call from workqueue that can run at any time. In GPU driver we also use GUP thought at this point we do not allow another process from accessing a buffer that is populated by GUP from another process. I am also here mainly talking about what future GPU will do where you will have the CPU service page fault from GPU inside a workqueue that can run at any point in time. > > > Second and more fundamental thing i have issue with is that this whole > > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not > > believe that device driver should be forbidden to do GUP base on pkru > > keys. > > I don't think of it as something necessarily central to the CPU, but > something central to things that walk page tables. We mark page tables > with PKEYs and things that walk them will have certain rights. My point is that we are seing devices that want to walk the page table and they do it from a work queue inside the kernel which can run against another process than the one they are doing the walk from. I am sure there is already upstream device driver that does so, i have not check all of them to confirm thought. > > Tying this to the pkru reg value of whatever processor happens to be > > running some device driver kernel function that try to do a GUP seems > > broken to me. > > That's one way to look at it. Another way is that PKRU is specifying > some real _intent_ about whether we want access to be allowed to some > memory. I think i misexpress myself here, yes PKRU is about specifying intent but specifying it for CPU thread not for device thread. GPU for instance have threads that run on behalf of a given process and i would rather see some kind of coherent way to specify that for each devices like you allow it to specify it on per CPU thread basis. > > So as first i would just allow GUP to always work and then come up with > > syscall to allow to set pkey on device file. This obviously is a lot more > > work as you need to go over all device driver using GUP. > > I wouldn't be opposed to adding some context to the thread (like > pagefault_disable()) that indicates whether we should enforce protection > keys. If we are in some asynchronous context, disassociated from the > running CPU's protection keys, we could set a flag. I was simply thinking of having a global set of pkeys against the process mm struct which would be the default global setting for all device GUP access. This global set could be override by userspace on a per device basis allowing some device to have more access than others. > I'd really appreciate if you could point to some concrete examples here > which could actually cause a problem, like workqueues doing gups. Well i could grep for all current user of GUP, but i can tell you that this is gonna be the model for GPU thread ie a kernel workqueue gonna handle page fault on behalf of GPU and will perform equivalent of GUP. Also apply for infiniband ODP thing which is upstream. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On 10/22/2015 01:57 PM, Jerome Glisse wrote: > I have not read all the patches, but here i assume that for GUP you do > not first call arch_vma_access_permitted(). So issue i see is that GUP > for a process might happen inside another process and that process might > have different pkru protection keys, effectively randomly allowing or > forbidding a device driver to perform a GUP from say some workqueue that > just happen to be schedule against a different processor/thread than the > one against which it is doing the GUP for. There are some places where there is no real context from which we can determine access rights. ptrace is a good example. We don't enforce PKEYs when walking _another_ process's page tables. Can you give an example of where a process might be doing a gup and it is completely separate from the CPU context that it's being executed under? > Second and more fundamental thing i have issue with is that this whole > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not > believe that device driver should be forbidden to do GUP base on pkru > keys. I don't think of it as something necessarily central to the CPU, but something central to things that walk page tables. We mark page tables with PKEYs and things that walk them will have certain rights. > Tying this to the pkru reg value of whatever processor happens to be > running some device driver kernel function that try to do a GUP seems > broken to me. That's one way to look at it. Another way is that PKRU is specifying some real _intent_ about whether we want access to be allowed to some memory. > So as first i would just allow GUP to always work and then come up with > syscall to allow to set pkey on device file. This obviously is a lot more > work as you need to go over all device driver using GUP. I wouldn't be opposed to adding some context to the thread (like pagefault_disable()) that indicates whether we should enforce protection keys. If we are in some asynchronous context, disassociated from the running CPU's protection keys, we could set a flag. I'd really appreciate if you could point to some concrete examples here which could actually cause a problem, like workqueues doing gups. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On Mon, Sep 28, 2015 at 12:18:23PM -0700, Dave Hansen wrote: > > From: Dave Hansen> > Today, for normal faults and page table walks, we check the VMA > and/or PTE to ensure that it is compatible with the action. For > instance, if we get a write fault on a non-writeable VMA, we > SIGSEGV. > > We try to do the same thing for protection keys. Basically, we > try to make sure that if a user does this: > > mprotect(ptr, size, PROT_NONE); > *ptr = foo; > > they see the same effects with protection keys when they do this: > > mprotect(ptr, size, PROT_READ|PROT_WRITE); > set_pkey(ptr, size, 4); > wrpkru(0xff3f); // access disable pkey 4 > *ptr = foo; > > The state to do that checking is in the VMA, but we also > sometimes have to do it on the page tables only, like when doing > a get_user_pages_fast() where we have no VMA. > > We add two functions and expose them to generic code: > > arch_pte_access_permitted(pte, write) > arch_vma_access_permitted(vma, write) > > These are, of course, backed up in x86 arch code with checks > against the PTE or VMA's protection key. > > But, there are also cases where we do not want to respect > protection keys. When we ptrace(), for instance, we do not want > to apply the tracer's PKRU permissions to the PTEs from the > process being traced. Well i am bit puzzle here because this will not provide consistant protection as far as GUP (get_user_pages) is concern, assuming i understand the pkru thing properly. Those are register local to CPU and they are writeable by userspace thread so thread can temporarily revoke access to range while executing untrusted subfunctions. I have not read all the patches, but here i assume that for GUP you do not first call arch_vma_access_permitted(). So issue i see is that GUP for a process might happen inside another process and that process might have different pkru protection keys, effectively randomly allowing or forbidding a device driver to perform a GUP from say some workqueue that just happen to be schedule against a different processor/thread than the one against which it is doing the GUP for. Second and more fundamental thing i have issue with is that this whole pkru keys are centric to CPU POV ie this is a CPU feature. So i do not believe that device driver should be forbidden to do GUP base on pkru keys. Tying this to the pkru reg value of whatever processor happens to be running some device driver kernel function that try to do a GUP seems broken to me. Sadly setting properties like pkru keys per device is not something that is easy to do. I would do it on a per device file basis and allow user space program to change them against the device file, then device driver doing GUP would use that to check against the pte key and allow forbid GUP. Also doing it on per device file makes it harder for program to leverage this feature as now they have to think about all device file they have open. Maybe we need to keep a list of device that are use by a process in the task struct and allow to set pkey globaly for all devices, while allowing overriding this common default on per device basis. So as first i would just allow GUP to always work and then come up with syscall to allow to set pkey on device file. This obviously is a lot more work as you need to go over all device driver using GUP. This are my thoughts so far. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
On 10/22/2015 03:25 PM, Jerome Glisse wrote: > On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote: ... >> Can you give an example of where a process might be doing a gup and it >> is completely separate from the CPU context that it's being executed under? > > In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also > believe that in infiniband one can have GUP call from workqueue that can > run at any time. In GPU driver we also use GUP thought at this point we > do not allow another process from accessing a buffer that is populated > by GUP from another process. >From quick grepping, there are only a couple of callers that do get_user_pages() on something that isn't current->mm. We can fairly easily introduce something new, like get_foreign_user_pages() That sets a flag to tell us to ignore the current PKRU state. I've attached a patch that at creates a variant of get_user_pages() for when you're going after another process's mm. This even makes a few of the gup call sites look nicer because they're not passing 'current, current->mm'. >>> So as first i would just allow GUP to always work and then come up with >>> syscall to allow to set pkey on device file. This obviously is a lot more >>> work as you need to go over all device driver using GUP. >> >> I wouldn't be opposed to adding some context to the thread (like >> pagefault_disable()) that indicates whether we should enforce protection >> keys. If we are in some asynchronous context, disassociated from the >> running CPU's protection keys, we could set a flag. > > I was simply thinking of having a global set of pkeys against the process > mm struct which would be the default global setting for all device GUP > access. This global set could be override by userspace on a per device > basis allowing some device to have more access than others. For now, I think leaving it permissive by default is probably OK. A device's access to memory is permissive after a gup anyway. As you note, doing this is going to require another whole set of user interfaces, so I'd rather revisit it later once we have a more concrete need for it. 1. Store a common PKRU value somewhere and activate when servicing work outside of the context of the actual process. Set this PKRU value with input from userspace and new user APIs. 2. When work is queued, copy the PKRU value and use it while servicing the work. 3. Do all out-of-context work with PKRU=0, or by disabling the PKRU checks conditionally. >> I'd really appreciate if you could point to some concrete examples here >> which could actually cause a problem, like workqueues doing gups. > > Well i could grep for all current user of GUP, but i can tell you that this > is gonna be the model for GPU thread ie a kernel workqueue gonna handle > page fault on behalf of GPU and will perform equivalent of GUP. Also apply > for infiniband ODP thing which is upstream. --- b/arch/x86/mm/mpx.c |4 +- b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |4 +- b/drivers/gpu/drm/i915/i915_gem_userptr.c |2 - b/drivers/gpu/drm/radeon/radeon_ttm.c |4 +- b/drivers/gpu/drm/via/via_dmablit.c |3 -- b/drivers/infiniband/core/umem.c |2 - b/drivers/infiniband/core/umem_odp.c |8 ++--- b/drivers/infiniband/hw/mthca/mthca_memfree.c |3 -- b/drivers/infiniband/hw/qib/qib_user_pages.c |3 -- b/drivers/infiniband/hw/usnic/usnic_uiom.c|2 - b/drivers/media/pci/ivtv/ivtv-yuv.c |8 ++--- b/drivers/media/v4l2-core/videobuf-dma-sg.c |3 -- b/drivers/virt/fsl_hypervisor.c |5 +-- b/fs/exec.c |8 - b/include/linux/mm.h | 14 + b/mm/frame_vector.c |2 - b/mm/gup.c| 39 -- b/mm/mempolicy.c |6 ++-- b/security/tomoyo/domain.c|9 +- 19 files changed, 79 insertions(+), 50 deletions(-) diff -puN mm/gup.c~get_current_user_pages mm/gup.c --- a/mm/gup.c~get_current_user_pages 2015-10-22 16:03:24.957026355 -0700 +++ b/mm/gup.c 2015-10-22 16:46:58.181109179 -0700 @@ -752,11 +752,12 @@ EXPORT_SYMBOL(get_user_pages_locked); * according to the parameters "pages", "write", "force" * respectively. */ -__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, - unsigned long start, unsigned long nr_pages, +__always_inline long __get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int write, int force, struct page **pages, unsigned int gup_flags) { + struct task_struct *tsk = curent; + struct mm_struct *mm = tsk->mm long ret; int locked = 1; down_read(>mmap_sem); @@ -795,7 +796,7 @@ long get_user_pages_unlocked(struct task
[PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
From: Dave Hansen Today, for normal faults and page table walks, we check the VMA and/or PTE to ensure that it is compatible with the action. For instance, if we get a write fault on a non-writeable VMA, we SIGSEGV. We try to do the same thing for protection keys. Basically, we try to make sure that if a user does this: mprotect(ptr, size, PROT_NONE); *ptr = foo; they see the same effects with protection keys when they do this: mprotect(ptr, size, PROT_READ|PROT_WRITE); set_pkey(ptr, size, 4); wrpkru(0xff3f); // access disable pkey 4 *ptr = foo; The state to do that checking is in the VMA, but we also sometimes have to do it on the page tables only, like when doing a get_user_pages_fast() where we have no VMA. We add two functions and expose them to generic code: arch_pte_access_permitted(pte, write) arch_vma_access_permitted(vma, write) These are, of course, backed up in x86 arch code with checks against the PTE or VMA's protection key. But, there are also cases where we do not want to respect protection keys. When we ptrace(), for instance, we do not want to apply the tracer's PKRU permissions to the PTEs from the process being traced. Signed-off-by: Dave Hansen --- b/arch/powerpc/include/asm/mmu_context.h | 11 ++ b/arch/s390/include/asm/mmu_context.h | 11 ++ b/arch/unicore32/include/asm/mmu_context.h | 11 ++ b/arch/x86/include/asm/mmu_context.h | 51 - b/arch/x86/include/asm/pgtable.h | 29 b/arch/x86/mm/fault.c | 21 +++ b/arch/x86/mm/gup.c|3 + b/include/asm-generic/mm_hooks.h | 11 ++ b/mm/gup.c | 17 - b/mm/memory.c |4 ++ 10 files changed, 165 insertions(+), 4 deletions(-) diff -puN arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault arch/powerpc/include/asm/mmu_context.h --- a/arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 11:39:47.619258875 -0700 +++ b/arch/powerpc/include/asm/mmu_context.h2015-09-28 11:39:47.637259694 -0700 @@ -148,5 +148,16 @@ static inline void arch_bprm_mm_init(str { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff -puN arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault arch/s390/include/asm/mmu_context.h --- a/arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault2015-09-28 11:39:47.621258967 -0700 +++ b/arch/s390/include/asm/mmu_context.h 2015-09-28 11:39:47.638259740 -0700 @@ -130,4 +130,15 @@ static inline void arch_bprm_mm_init(str { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif /* __S390_MMU_CONTEXT_H */ diff -puN arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault arch/unicore32/include/asm/mmu_context.h --- a/arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 11:39:47.622259012 -0700 +++ b/arch/unicore32/include/asm/mmu_context.h 2015-09-28 11:39:47.638259740 -0700 @@ -97,4 +97,15 @@ static inline void arch_bprm_mm_init(str { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif diff -puN arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault arch/x86/include/asm/mmu_context.h --- a/arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 11:39:47.624259103 -0700 +++ b/arch/x86/include/asm/mmu_context.h2015-09-28 11:39:47.638259740 -0700 @@ -263,4 +263,53 @@ static inline u16 vma_pkey(struct vm_are return pkey; } -#endif /* _ASM_X86_MMU_CONTEXT_H */ +static inline bool __pkru_allows_pkey(u16 pkey, bool write) +{ + u32 pkru = read_pkru(); + + if (!__pkru_allows_read(pkru, pkey)) + return false; + if (write && !__pkru_allows_write(pkru, pkey)) + return false; + + return true; +} + +/* + * We only want to enforce protection keys on the current process + * because we effectively have no access to PKRU for other + * processes or any way to tell *which * PKRU in a threaded + * process we could use. + * + * So do not enforce things if the VMA
[PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys
From: Dave HansenToday, for normal faults and page table walks, we check the VMA and/or PTE to ensure that it is compatible with the action. For instance, if we get a write fault on a non-writeable VMA, we SIGSEGV. We try to do the same thing for protection keys. Basically, we try to make sure that if a user does this: mprotect(ptr, size, PROT_NONE); *ptr = foo; they see the same effects with protection keys when they do this: mprotect(ptr, size, PROT_READ|PROT_WRITE); set_pkey(ptr, size, 4); wrpkru(0xff3f); // access disable pkey 4 *ptr = foo; The state to do that checking is in the VMA, but we also sometimes have to do it on the page tables only, like when doing a get_user_pages_fast() where we have no VMA. We add two functions and expose them to generic code: arch_pte_access_permitted(pte, write) arch_vma_access_permitted(vma, write) These are, of course, backed up in x86 arch code with checks against the PTE or VMA's protection key. But, there are also cases where we do not want to respect protection keys. When we ptrace(), for instance, we do not want to apply the tracer's PKRU permissions to the PTEs from the process being traced. Signed-off-by: Dave Hansen --- b/arch/powerpc/include/asm/mmu_context.h | 11 ++ b/arch/s390/include/asm/mmu_context.h | 11 ++ b/arch/unicore32/include/asm/mmu_context.h | 11 ++ b/arch/x86/include/asm/mmu_context.h | 51 - b/arch/x86/include/asm/pgtable.h | 29 b/arch/x86/mm/fault.c | 21 +++ b/arch/x86/mm/gup.c|3 + b/include/asm-generic/mm_hooks.h | 11 ++ b/mm/gup.c | 17 - b/mm/memory.c |4 ++ 10 files changed, 165 insertions(+), 4 deletions(-) diff -puN arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault arch/powerpc/include/asm/mmu_context.h --- a/arch/powerpc/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 11:39:47.619258875 -0700 +++ b/arch/powerpc/include/asm/mmu_context.h2015-09-28 11:39:47.637259694 -0700 @@ -148,5 +148,16 @@ static inline void arch_bprm_mm_init(str { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff -puN arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault arch/s390/include/asm/mmu_context.h --- a/arch/s390/include/asm/mmu_context.h~pkeys-11-pte-fault2015-09-28 11:39:47.621258967 -0700 +++ b/arch/s390/include/asm/mmu_context.h 2015-09-28 11:39:47.638259740 -0700 @@ -130,4 +130,15 @@ static inline void arch_bprm_mm_init(str { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif /* __S390_MMU_CONTEXT_H */ diff -puN arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault arch/unicore32/include/asm/mmu_context.h --- a/arch/unicore32/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 11:39:47.622259012 -0700 +++ b/arch/unicore32/include/asm/mmu_context.h 2015-09-28 11:39:47.638259740 -0700 @@ -97,4 +97,15 @@ static inline void arch_bprm_mm_init(str { } +static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write) +{ + /* by default, allow everything */ + return true; +} + +static inline bool arch_pte_access_permitted(pte_t pte, bool write) +{ + /* by default, allow everything */ + return true; +} #endif diff -puN arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault arch/x86/include/asm/mmu_context.h --- a/arch/x86/include/asm/mmu_context.h~pkeys-11-pte-fault 2015-09-28 11:39:47.624259103 -0700 +++ b/arch/x86/include/asm/mmu_context.h2015-09-28 11:39:47.638259740 -0700 @@ -263,4 +263,53 @@ static inline u16 vma_pkey(struct vm_are return pkey; } -#endif /* _ASM_X86_MMU_CONTEXT_H */ +static inline bool __pkru_allows_pkey(u16 pkey, bool write) +{ + u32 pkru = read_pkru(); + + if (!__pkru_allows_read(pkru, pkey)) + return false; + if (write && !__pkru_allows_write(pkru, pkey)) + return false; + + return true; +} + +/* + * We only want to enforce protection keys on the current process + * because we effectively have no access to PKRU for other + * processes or any way to tell *which * PKRU in a threaded + * process