Re: [PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-21 Thread Jia He

[On behalf of justin...@arm.com]

Hi Matthew

On 2019/9/20 23:53, Matthew Wilcox wrote:

On Fri, Sep 20, 2019 at 09:54:37PM +0800, Jia He wrote:

-static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
+static inline int cow_user_page(struct page *dst, struct page *src,
+   struct vm_fault *vmf)
  {

Can we talk about the return type here?


+   } else {
+   /* Other thread has already handled the fault
+* and we don't need to do anything. If it's
+* not the case, the fault will be triggered
+* again on the same address.
+*/
+   pte_unmap_unlock(vmf->pte, vmf->ptl);
+   return -1;

...

+   return 0;
  }

So -1 for "try again" and 0 for "succeeded".


+   if (cow_user_page(new_page, old_page, vmf)) {

Then we use it like a bool.  But it's kind of backwards from a bool because
false is success.


+   /* COW failed, if the fault was solved by other,
+* it's fine. If not, userspace would re-fault on
+* the same address and we will handle the fault
+* from the second attempt.
+*/
+   put_page(new_page);
+   if (old_page)
+   put_page(old_page);
+   return 0;

And we don't use the return value; in fact we invert it.

Would this make more sense:

static inline bool cow_user_page(struct page *dst, struct page *src,
struct vm_fault *vmf)
...
pte_unmap_unlock(vmf->pte, vmf->ptl);
return false;
...
return true;
...
if (!cow_user_page(new_page, old_page, vmf)) {

That reads more sensibly for me.  We could also go with returning a
vm_fault_t, but that would be more complex than needed today, I think.


Ok, will change the return type to bool as you suggested.
Thanks

---
Cheers,
Justin (Jia He)



Re: [PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-20 Thread Kirill A. Shutemov
On Fri, Sep 20, 2019 at 08:53:00AM -0700, Matthew Wilcox wrote:
> On Fri, Sep 20, 2019 at 09:54:37PM +0800, Jia He wrote:
> > -static inline void cow_user_page(struct page *dst, struct page *src, 
> > unsigned long va, struct vm_area_struct *vma)
> > +static inline int cow_user_page(struct page *dst, struct page *src,
> > +   struct vm_fault *vmf)
> >  {
> 
> Can we talk about the return type here?
> 
> > +   } else {
> > +   /* Other thread has already handled the fault
> > +* and we don't need to do anything. If it's
> > +* not the case, the fault will be triggered
> > +* again on the same address.
> > +*/
> > +   pte_unmap_unlock(vmf->pte, vmf->ptl);
> > +   return -1;
> ...
> > +   return 0;
> >  }
> 
> So -1 for "try again" and 0 for "succeeded".
> 
> > +   if (cow_user_page(new_page, old_page, vmf)) {
> 
> Then we use it like a bool.  But it's kind of backwards from a bool because
> false is success.
> 
> > +   /* COW failed, if the fault was solved by other,
> > +* it's fine. If not, userspace would re-fault on
> > +* the same address and we will handle the fault
> > +* from the second attempt.
> > +*/
> > +   put_page(new_page);
> > +   if (old_page)
> > +   put_page(old_page);
> > +   return 0;
> 
> And we don't use the return value; in fact we invert it.
> 
> Would this make more sense:
> 
> static inline bool cow_user_page(struct page *dst, struct page *src,
>   struct vm_fault *vmf)
> ...
>   pte_unmap_unlock(vmf->pte, vmf->ptl);
>   return false;
> ...
>   return true;
> ...
>   if (!cow_user_page(new_page, old_page, vmf)) {
> 
> That reads more sensibly for me.

I like this idea too.

-- 
 Kirill A. Shutemov


Re: [PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-20 Thread Matthew Wilcox
On Fri, Sep 20, 2019 at 09:54:37PM +0800, Jia He wrote:
> -static inline void cow_user_page(struct page *dst, struct page *src, 
> unsigned long va, struct vm_area_struct *vma)
> +static inline int cow_user_page(struct page *dst, struct page *src,
> + struct vm_fault *vmf)
>  {

Can we talk about the return type here?

> + } else {
> + /* Other thread has already handled the fault
> +  * and we don't need to do anything. If it's
> +  * not the case, the fault will be triggered
> +  * again on the same address.
> +  */
> + pte_unmap_unlock(vmf->pte, vmf->ptl);
> + return -1;
...
> + return 0;
>  }

So -1 for "try again" and 0 for "succeeded".

> + if (cow_user_page(new_page, old_page, vmf)) {

Then we use it like a bool.  But it's kind of backwards from a bool because
false is success.

> + /* COW failed, if the fault was solved by other,
> +  * it's fine. If not, userspace would re-fault on
> +  * the same address and we will handle the fault
> +  * from the second attempt.
> +  */
> + put_page(new_page);
> + if (old_page)
> + put_page(old_page);
> + return 0;

And we don't use the return value; in fact we invert it.

Would this make more sense:

static inline bool cow_user_page(struct page *dst, struct page *src,
struct vm_fault *vmf)
...
pte_unmap_unlock(vmf->pte, vmf->ptl);
return false;
...
return true;
...
if (!cow_user_page(new_page, old_page, vmf)) {

That reads more sensibly for me.  We could also go with returning a
vm_fault_t, but that would be more complex than needed today, I think.


RE: [PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-20 Thread Justin He (Arm Technology China)
Thanks for your patent review 

--
Cheers,
Justin (Jia He)



> -Original Message-
> From: Kirill A. Shutemov 
> Sent: 2019年9月20日 22:21
> To: Justin He (Arm Technology China) 
> Cc: Catalin Marinas ; Will Deacon
> ; Mark Rutland ; James Morse
> ; Marc Zyngier ; Matthew
> Wilcox ; Kirill A. Shutemov
> ; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux...@kvack.org; Suzuki Poulose
> ; Punit Agrawal ;
> Anshuman Khandual ; Alex Van Brunt
> ; Robin Murphy ;
> Thomas Gleixner ; Andrew Morton  foundation.org>; Jérôme Glisse ; Ralph Campbell
> ; hejia...@gmail.com; Kaly Xin (Arm Technology
> China) ; nd 
> Subject: Re: [PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is
> cleared
> 
> On Fri, Sep 20, 2019 at 09:54:37PM +0800, Jia He wrote:
> > When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest,
> there
> > will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> >
> > Below call trace is from arm64 do_page_fault for debugging purpose
> > [  110.016195] Call trace:
> > [  110.016826]  do_page_fault+0x5a4/0x690
> > [  110.017812]  do_mem_abort+0x50/0xb0
> > [  110.018726]  el1_da+0x20/0xc4
> > [  110.019492]  __arch_copy_from_user+0x180/0x280
> > [  110.020646]  do_wp_page+0xb0/0x860
> > [  110.021517]  __handle_mm_fault+0x994/0x1338
> > [  110.022606]  handle_mm_fault+0xe8/0x180
> > [  110.023584]  do_page_fault+0x240/0x690
> > [  110.024535]  do_mem_abort+0x50/0xb0
> > [  110.025423]  el0_da+0x20/0x24
> >
> > The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> > [9b007000] pgd=00023d4f8003, pud=00023da9b003,
> pmd=00023d4b3003, pte=36298607bd3
> >
> > As told by Catalin: "On arm64 without hardware Access Flag, copying from
> > user will fail because the pte is old and cannot be marked young. So we
> > always end up with zeroed page after fork() + CoW for pfn mappings. we
> > don't always have a hardware-managed access flag on arm64."
> >
> > This patch fix it by calling pte_mkyoung. Also, the parameter is
> > changed because vmf should be passed to cow_user_page()
> >
> > Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
> > in case there can be some obscure use-case.(by Kirill)
> >
> > [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> >
> > Reported-by: Yibo Cai 
> > Signed-off-by: Jia He 
> 
> Acked-by: Kirill A. Shutemov 
> 
> --
>  Kirill A. Shutemov


Re: [PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-20 Thread Kirill A. Shutemov
On Fri, Sep 20, 2019 at 09:54:37PM +0800, Jia He wrote:
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
> will be a double page fault in __copy_from_user_inatomic of cow_user_page.
> 
> Below call trace is from arm64 do_page_fault for debugging purpose
> [  110.016195] Call trace:
> [  110.016826]  do_page_fault+0x5a4/0x690
> [  110.017812]  do_mem_abort+0x50/0xb0
> [  110.018726]  el1_da+0x20/0xc4
> [  110.019492]  __arch_copy_from_user+0x180/0x280
> [  110.020646]  do_wp_page+0xb0/0x860
> [  110.021517]  __handle_mm_fault+0x994/0x1338
> [  110.022606]  handle_mm_fault+0xe8/0x180
> [  110.023584]  do_page_fault+0x240/0x690
> [  110.024535]  do_mem_abort+0x50/0xb0
> [  110.025423]  el0_da+0x20/0x24
> 
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [9b007000] pgd=00023d4f8003, pud=00023da9b003, 
> pmd=00023d4b3003, pte=36298607bd3
> 
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
> 
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
> 
> Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
> in case there can be some obscure use-case.(by Kirill)
> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> Reported-by: Yibo Cai 
> Signed-off-by: Jia He 

Acked-by: Kirill A. Shutemov 

-- 
 Kirill A. Shutemov


[PATCH v7 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-20 Thread Jia He
When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[9b007000] pgd=00023d4f8003, pud=00023da9b003, 
pmd=00023d4b3003, pte=36298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case.(by Kirill)

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Reported-by: Yibo Cai 
Signed-off-by: Jia He 
---
 mm/memory.c | 67 -
 1 file changed, 61 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..3e39e40fee87 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+   return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
randomize_va_space = 0;
@@ -2140,8 +2147,13 @@ static inline int pte_unmap_same(struct mm_struct *mm, 
pmd_t *pmd,
return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
+static inline int cow_user_page(struct page *dst, struct page *src,
+   struct vm_fault *vmf)
 {
+   struct vm_area_struct *vma = vmf->vma;
+   struct mm_struct *mm = vma->vm_mm;
+   unsigned long addr = vmf->address;
+
debug_dma_assert_idle(src);
 
/*
@@ -2151,21 +2163,53 @@ static inline void cow_user_page(struct page *dst, 
struct page *src, unsigned lo
 * fails, we just zero-fill it. Live with it.
 */
if (unlikely(!src)) {
-   void *kaddr = kmap_atomic(dst);
-   void __user *uaddr = (void __user *)(va & PAGE_MASK);
+   void *kaddr;
+   pte_t entry;
+   void __user *uaddr = (void __user *)(addr & PAGE_MASK);
 
+   /* On architectures with software "accessed" bits, we would
+* take a double page fault, so mark it accessed here.
+*/
+   if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+   vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr,
+  >ptl);
+   if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+   entry = pte_mkyoung(vmf->orig_pte);
+   if (ptep_set_access_flags(vma, addr,
+ vmf->pte, entry, 0))
+   update_mmu_cache(vma, addr, vmf->pte);
+   } else {
+   /* Other thread has already handled the fault
+* and we don't need to do anything. If it's
+* not the case, the fault will be triggered
+* again on the same address.
+*/
+   pte_unmap_unlock(vmf->pte, vmf->ptl);
+   return -1;
+   }
+   pte_unmap_unlock(vmf->pte, vmf->ptl);
+   }
+
+   kaddr = kmap_atomic(dst);
/*
 * This really shouldn't fail, because the page is there
 * in the page tables. But it might just be unreadable,
 * in which case we just give up and fill the result with
 * zeroes.
 */
-   if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
+   if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+   /* Give a warn in case there can be some obscure
+* use-case
+*/
+   WARN_ON_ONCE(1);