Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
On 2018/7/25 18:44, Laurent Dufour wrote: > > On 25/07/2018 11:04, zhong jiang wrote: >> On 2018/7/25 0:10, Laurent Dufour wrote: >>> On 24/07/2018 16:26, zhong jiang wrote: On 2018/5/17 19:06, Laurent Dufour wrote: > From: Peter Zijlstra > > Provide infrastructure to do a speculative fault (not holding > mmap_sem). > > The not holding of mmap_sem means we can race against VMA > change/removal and page-table destruction. We use the SRCU VMA freeing > to keep the VMA around. We use the VMA seqcount to detect change > (including umapping / page-table deletion) and we use gup_fast() style > page-table walking to deal with page-table races. > > Once we've obtained the page and are ready to update the PTE, we > validate if the state we started the fault with is still valid, if > not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the > PTE and we're done. > > Signed-off-by: Peter Zijlstra (Intel) > > [Manage the newly introduced pte_spinlock() for speculative page > fault to fail if the VMA is touched in our back] > [Rename vma_is_dead() to vma_has_changed() and declare it here] > [Fetch p4d and pud] > [Set vmd.sequence in __handle_mm_fault()] > [Abort speculative path when handle_userfault() has to be called] > [Add additional VMA's flags checks in handle_speculative_fault()] > [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] > [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] > [Remove warning comment about waiting for !seq&1 since we don't want > to wait] > [Remove warning about no huge page support, mention it explictly] > [Don't call do_fault() in the speculative path as __do_fault() calls > vma->vm_ops->fault() which may want to release mmap_sem] > [Only vm_fault pointer argument for vma_has_changed()] > [Fix check against huge page, calling pmd_trans_huge()] > [Use READ_ONCE() when reading VMA's fields in the speculative path] > [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for > processing done in vm_normal_page()] > [Check that vma->anon_vma is already set when starting the speculative > path] > [Check for memory policy as we can't support MPOL_INTERLEAVE case due to > the processing done in mpol_misplaced()] > [Don't support VMA growing up or down] > [Move check on vm_sequence just before calling handle_pte_fault()] > [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] > [Add mem cgroup oom check] > [Use READ_ONCE to access p*d entries] > [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] > [Don't fetch pte again in handle_pte_fault() when running the speculative > path] > [Check PMD against concurrent collapsing operation] > [Try spin lock the pte during the speculative path to avoid deadlock with > other CPU's invalidating the TLB and requiring this CPU to catch the > inter processor's interrupt] > [Move define of FAULT_FLAG_SPECULATIVE here] > [Introduce __handle_speculative_fault() and add a check against > mm->mm_users in handle_speculative_fault() defined in mm.h] > Signed-off-by: Laurent Dufour > --- > include/linux/hugetlb_inline.h | 2 +- > include/linux/mm.h | 30 > include/linux/pagemap.h| 4 +- > mm/internal.h | 16 +- > mm/memory.c| 340 > - > 5 files changed, 385 insertions(+), 7 deletions(-) > > diff --git a/include/linux/hugetlb_inline.h > b/include/linux/hugetlb_inline.h > index 0660a03d37d9..9e25283d6fc9 100644 > --- a/include/linux/hugetlb_inline.h > +++ b/include/linux/hugetlb_inline.h > @@ -8,7 +8,7 @@ > > static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) > { > - return !!(vma->vm_flags & VM_HUGETLB); > + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB); > } > > #else > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 05cbba70104b..31acf98a7d92 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16]; > #define FAULT_FLAG_USER 0x40/* The fault originated in > userspace */ > #define FAULT_FLAG_REMOTE0x80/* faulting for non current > tsk/mm */ > #define FAULT_FLAG_INSTRUCTION 0x100/* The fault was during an > instruction fetch */ > +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not > holding mmap_sem */ > > #define FAULT_FLAG_TRACE \ > { FAULT_FLAG_WRITE, "WRITE" }, \ > @@ -343,6 +344,10 @@ struct vm_fault { > gfp_t gfp_mask; /* gfp mask to be used for allocations > */ > pgoff_t pgoff;
Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
On 25/07/2018 11:04, zhong jiang wrote: > On 2018/7/25 0:10, Laurent Dufour wrote: >> >> On 24/07/2018 16:26, zhong jiang wrote: >>> On 2018/5/17 19:06, Laurent Dufour wrote: From: Peter Zijlstra Provide infrastructure to do a speculative fault (not holding mmap_sem). The not holding of mmap_sem means we can race against VMA change/removal and page-table destruction. We use the SRCU VMA freeing to keep the VMA around. We use the VMA seqcount to detect change (including umapping / page-table deletion) and we use gup_fast() style page-table walking to deal with page-table races. Once we've obtained the page and are ready to update the PTE, we validate if the state we started the fault with is still valid, if not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the PTE and we're done. Signed-off-by: Peter Zijlstra (Intel) [Manage the newly introduced pte_spinlock() for speculative page fault to fail if the VMA is touched in our back] [Rename vma_is_dead() to vma_has_changed() and declare it here] [Fetch p4d and pud] [Set vmd.sequence in __handle_mm_fault()] [Abort speculative path when handle_userfault() has to be called] [Add additional VMA's flags checks in handle_speculative_fault()] [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] [Remove warning comment about waiting for !seq&1 since we don't want to wait] [Remove warning about no huge page support, mention it explictly] [Don't call do_fault() in the speculative path as __do_fault() calls vma->vm_ops->fault() which may want to release mmap_sem] [Only vm_fault pointer argument for vma_has_changed()] [Fix check against huge page, calling pmd_trans_huge()] [Use READ_ONCE() when reading VMA's fields in the speculative path] [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for processing done in vm_normal_page()] [Check that vma->anon_vma is already set when starting the speculative path] [Check for memory policy as we can't support MPOL_INTERLEAVE case due to the processing done in mpol_misplaced()] [Don't support VMA growing up or down] [Move check on vm_sequence just before calling handle_pte_fault()] [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] [Add mem cgroup oom check] [Use READ_ONCE to access p*d entries] [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] [Don't fetch pte again in handle_pte_fault() when running the speculative path] [Check PMD against concurrent collapsing operation] [Try spin lock the pte during the speculative path to avoid deadlock with other CPU's invalidating the TLB and requiring this CPU to catch the inter processor's interrupt] [Move define of FAULT_FLAG_SPECULATIVE here] [Introduce __handle_speculative_fault() and add a check against mm->mm_users in handle_speculative_fault() defined in mm.h] Signed-off-by: Laurent Dufour --- include/linux/hugetlb_inline.h | 2 +- include/linux/mm.h | 30 include/linux/pagemap.h| 4 +- mm/internal.h | 16 +- mm/memory.c| 340 - 5 files changed, 385 insertions(+), 7 deletions(-) diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h index 0660a03d37d9..9e25283d6fc9 100644 --- a/include/linux/hugetlb_inline.h +++ b/include/linux/hugetlb_inline.h @@ -8,7 +8,7 @@ static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) { - return !!(vma->vm_flags & VM_HUGETLB); + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB); } #else diff --git a/include/linux/mm.h b/include/linux/mm.h index 05cbba70104b..31acf98a7d92 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_USER 0x40/* The fault originated in userspace */ #define FAULT_FLAG_REMOTE 0x80/* faulting for non current tsk/mm */ #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */ +#define FAULT_FLAG_SPECULATIVE0x200 /* Speculative fault, not holding mmap_sem */ #define FAULT_FLAG_TRACE \ { FAULT_FLAG_WRITE, "WRITE" }, \ @@ -343,6 +344,10 @@ struct vm_fault { gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ unsigned long address; /* Faulting virtual address */ +#ifdef
Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
On 2018/7/25 0:10, Laurent Dufour wrote: > > On 24/07/2018 16:26, zhong jiang wrote: >> On 2018/5/17 19:06, Laurent Dufour wrote: >>> From: Peter Zijlstra >>> >>> Provide infrastructure to do a speculative fault (not holding >>> mmap_sem). >>> >>> The not holding of mmap_sem means we can race against VMA >>> change/removal and page-table destruction. We use the SRCU VMA freeing >>> to keep the VMA around. We use the VMA seqcount to detect change >>> (including umapping / page-table deletion) and we use gup_fast() style >>> page-table walking to deal with page-table races. >>> >>> Once we've obtained the page and are ready to update the PTE, we >>> validate if the state we started the fault with is still valid, if >>> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the >>> PTE and we're done. >>> >>> Signed-off-by: Peter Zijlstra (Intel) >>> >>> [Manage the newly introduced pte_spinlock() for speculative page >>> fault to fail if the VMA is touched in our back] >>> [Rename vma_is_dead() to vma_has_changed() and declare it here] >>> [Fetch p4d and pud] >>> [Set vmd.sequence in __handle_mm_fault()] >>> [Abort speculative path when handle_userfault() has to be called] >>> [Add additional VMA's flags checks in handle_speculative_fault()] >>> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] >>> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] >>> [Remove warning comment about waiting for !seq&1 since we don't want >>> to wait] >>> [Remove warning about no huge page support, mention it explictly] >>> [Don't call do_fault() in the speculative path as __do_fault() calls >>> vma->vm_ops->fault() which may want to release mmap_sem] >>> [Only vm_fault pointer argument for vma_has_changed()] >>> [Fix check against huge page, calling pmd_trans_huge()] >>> [Use READ_ONCE() when reading VMA's fields in the speculative path] >>> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for >>> processing done in vm_normal_page()] >>> [Check that vma->anon_vma is already set when starting the speculative >>> path] >>> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to >>> the processing done in mpol_misplaced()] >>> [Don't support VMA growing up or down] >>> [Move check on vm_sequence just before calling handle_pte_fault()] >>> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] >>> [Add mem cgroup oom check] >>> [Use READ_ONCE to access p*d entries] >>> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] >>> [Don't fetch pte again in handle_pte_fault() when running the speculative >>> path] >>> [Check PMD against concurrent collapsing operation] >>> [Try spin lock the pte during the speculative path to avoid deadlock with >>> other CPU's invalidating the TLB and requiring this CPU to catch the >>> inter processor's interrupt] >>> [Move define of FAULT_FLAG_SPECULATIVE here] >>> [Introduce __handle_speculative_fault() and add a check against >>> mm->mm_users in handle_speculative_fault() defined in mm.h] >>> Signed-off-by: Laurent Dufour >>> --- >>> include/linux/hugetlb_inline.h | 2 +- >>> include/linux/mm.h | 30 >>> include/linux/pagemap.h| 4 +- >>> mm/internal.h | 16 +- >>> mm/memory.c| 340 >>> - >>> 5 files changed, 385 insertions(+), 7 deletions(-) >>> >>> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h >>> index 0660a03d37d9..9e25283d6fc9 100644 >>> --- a/include/linux/hugetlb_inline.h >>> +++ b/include/linux/hugetlb_inline.h >>> @@ -8,7 +8,7 @@ >>> >>> static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) >>> { >>> - return !!(vma->vm_flags & VM_HUGETLB); >>> + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB); >>> } >>> >>> #else >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 05cbba70104b..31acf98a7d92 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16]; >>> #define FAULT_FLAG_USER0x40/* The fault originated in >>> userspace */ >>> #define FAULT_FLAG_REMOTE 0x80/* faulting for non current tsk/mm */ >>> #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an >>> instruction fetch */ >>> +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not >>> holding mmap_sem */ >>> >>> #define FAULT_FLAG_TRACE \ >>> { FAULT_FLAG_WRITE, "WRITE" }, \ >>> @@ -343,6 +344,10 @@ struct vm_fault { >>> gfp_t gfp_mask; /* gfp mask to be used for allocations >>> */ >>> pgoff_t pgoff; /* Logical page offset based on vma */ >>> unsigned long address; /* Faulting virtual address */ >>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT >>> + unsigned int sequence; >>> + pmd_t orig_pmd; /* value of PMD at the time of fault */ >>> +#endif >>>
Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
On 2018/5/17 19:06, Laurent Dufour wrote: > From: Peter Zijlstra > > Provide infrastructure to do a speculative fault (not holding > mmap_sem). > > The not holding of mmap_sem means we can race against VMA > change/removal and page-table destruction. We use the SRCU VMA freeing > to keep the VMA around. We use the VMA seqcount to detect change > (including umapping / page-table deletion) and we use gup_fast() style > page-table walking to deal with page-table races. > > Once we've obtained the page and are ready to update the PTE, we > validate if the state we started the fault with is still valid, if > not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the > PTE and we're done. > > Signed-off-by: Peter Zijlstra (Intel) > > [Manage the newly introduced pte_spinlock() for speculative page > fault to fail if the VMA is touched in our back] > [Rename vma_is_dead() to vma_has_changed() and declare it here] > [Fetch p4d and pud] > [Set vmd.sequence in __handle_mm_fault()] > [Abort speculative path when handle_userfault() has to be called] > [Add additional VMA's flags checks in handle_speculative_fault()] > [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] > [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] > [Remove warning comment about waiting for !seq&1 since we don't want > to wait] > [Remove warning about no huge page support, mention it explictly] > [Don't call do_fault() in the speculative path as __do_fault() calls > vma->vm_ops->fault() which may want to release mmap_sem] > [Only vm_fault pointer argument for vma_has_changed()] > [Fix check against huge page, calling pmd_trans_huge()] > [Use READ_ONCE() when reading VMA's fields in the speculative path] > [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for > processing done in vm_normal_page()] > [Check that vma->anon_vma is already set when starting the speculative > path] > [Check for memory policy as we can't support MPOL_INTERLEAVE case due to > the processing done in mpol_misplaced()] > [Don't support VMA growing up or down] > [Move check on vm_sequence just before calling handle_pte_fault()] > [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] > [Add mem cgroup oom check] > [Use READ_ONCE to access p*d entries] > [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] > [Don't fetch pte again in handle_pte_fault() when running the speculative > path] > [Check PMD against concurrent collapsing operation] > [Try spin lock the pte during the speculative path to avoid deadlock with > other CPU's invalidating the TLB and requiring this CPU to catch the > inter processor's interrupt] > [Move define of FAULT_FLAG_SPECULATIVE here] > [Introduce __handle_speculative_fault() and add a check against > mm->mm_users in handle_speculative_fault() defined in mm.h] > Signed-off-by: Laurent Dufour > --- > include/linux/hugetlb_inline.h | 2 +- > include/linux/mm.h | 30 > include/linux/pagemap.h| 4 +- > mm/internal.h | 16 +- > mm/memory.c| 340 > - > 5 files changed, 385 insertions(+), 7 deletions(-) > > diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h > index 0660a03d37d9..9e25283d6fc9 100644 > --- a/include/linux/hugetlb_inline.h > +++ b/include/linux/hugetlb_inline.h > @@ -8,7 +8,7 @@ > > static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) > { > - return !!(vma->vm_flags & VM_HUGETLB); > + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB); > } > > #else > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 05cbba70104b..31acf98a7d92 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16]; > #define FAULT_FLAG_USER 0x40/* The fault originated in > userspace */ > #define FAULT_FLAG_REMOTE0x80/* faulting for non current tsk/mm */ > #define FAULT_FLAG_INSTRUCTION 0x100/* The fault was during an > instruction fetch */ > +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not > holding mmap_sem */ > > #define FAULT_FLAG_TRACE \ > { FAULT_FLAG_WRITE, "WRITE" }, \ > @@ -343,6 +344,10 @@ struct vm_fault { > gfp_t gfp_mask; /* gfp mask to be used for allocations > */ > pgoff_t pgoff; /* Logical page offset based on vma */ > unsigned long address; /* Faulting virtual address */ > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > + unsigned int sequence; > + pmd_t orig_pmd; /* value of PMD at the time of fault */ > +#endif > pmd_t *pmd; /* Pointer to pmd entry matching >* the 'address' */ > pud_t *pud; /* Pointer to pud entry matching > @@ -1415,6 +1420,31 @@ int invalidate_inode_page(struct page *page);
Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
On 24/07/2018 16:26, zhong jiang wrote: > On 2018/5/17 19:06, Laurent Dufour wrote: >> From: Peter Zijlstra >> >> Provide infrastructure to do a speculative fault (not holding >> mmap_sem). >> >> The not holding of mmap_sem means we can race against VMA >> change/removal and page-table destruction. We use the SRCU VMA freeing >> to keep the VMA around. We use the VMA seqcount to detect change >> (including umapping / page-table deletion) and we use gup_fast() style >> page-table walking to deal with page-table races. >> >> Once we've obtained the page and are ready to update the PTE, we >> validate if the state we started the fault with is still valid, if >> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the >> PTE and we're done. >> >> Signed-off-by: Peter Zijlstra (Intel) >> >> [Manage the newly introduced pte_spinlock() for speculative page >> fault to fail if the VMA is touched in our back] >> [Rename vma_is_dead() to vma_has_changed() and declare it here] >> [Fetch p4d and pud] >> [Set vmd.sequence in __handle_mm_fault()] >> [Abort speculative path when handle_userfault() has to be called] >> [Add additional VMA's flags checks in handle_speculative_fault()] >> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] >> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] >> [Remove warning comment about waiting for !seq&1 since we don't want >> to wait] >> [Remove warning about no huge page support, mention it explictly] >> [Don't call do_fault() in the speculative path as __do_fault() calls >> vma->vm_ops->fault() which may want to release mmap_sem] >> [Only vm_fault pointer argument for vma_has_changed()] >> [Fix check against huge page, calling pmd_trans_huge()] >> [Use READ_ONCE() when reading VMA's fields in the speculative path] >> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for >> processing done in vm_normal_page()] >> [Check that vma->anon_vma is already set when starting the speculative >> path] >> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to >> the processing done in mpol_misplaced()] >> [Don't support VMA growing up or down] >> [Move check on vm_sequence just before calling handle_pte_fault()] >> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] >> [Add mem cgroup oom check] >> [Use READ_ONCE to access p*d entries] >> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] >> [Don't fetch pte again in handle_pte_fault() when running the speculative >> path] >> [Check PMD against concurrent collapsing operation] >> [Try spin lock the pte during the speculative path to avoid deadlock with >> other CPU's invalidating the TLB and requiring this CPU to catch the >> inter processor's interrupt] >> [Move define of FAULT_FLAG_SPECULATIVE here] >> [Introduce __handle_speculative_fault() and add a check against >> mm->mm_users in handle_speculative_fault() defined in mm.h] >> Signed-off-by: Laurent Dufour >> --- >> include/linux/hugetlb_inline.h | 2 +- >> include/linux/mm.h | 30 >> include/linux/pagemap.h| 4 +- >> mm/internal.h | 16 +- >> mm/memory.c| 340 >> - >> 5 files changed, 385 insertions(+), 7 deletions(-) >> >> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h >> index 0660a03d37d9..9e25283d6fc9 100644 >> --- a/include/linux/hugetlb_inline.h >> +++ b/include/linux/hugetlb_inline.h >> @@ -8,7 +8,7 @@ >> >> static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) >> { >> -return !!(vma->vm_flags & VM_HUGETLB); >> +return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB); >> } >> >> #else >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 05cbba70104b..31acf98a7d92 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16]; >> #define FAULT_FLAG_USER 0x40/* The fault originated in >> userspace */ >> #define FAULT_FLAG_REMOTE 0x80/* faulting for non current tsk/mm */ >> #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an >> instruction fetch */ >> +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not >> holding mmap_sem */ >> >> #define FAULT_FLAG_TRACE \ >> { FAULT_FLAG_WRITE, "WRITE" }, \ >> @@ -343,6 +344,10 @@ struct vm_fault { >> gfp_t gfp_mask; /* gfp mask to be used for allocations >> */ >> pgoff_t pgoff; /* Logical page offset based on vma */ >> unsigned long address; /* Faulting virtual address */ >> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT >> +unsigned int sequence; >> +pmd_t orig_pmd; /* value of PMD at the time of fault */ >> +#endif >> pmd_t *pmd; /* Pointer to pmd entry matching >> * the 'address' */ >>
[PATCH v11 19/26] mm: provide speculative fault infrastructure
From: Peter ZijlstraProvide infrastructure to do a speculative fault (not holding mmap_sem). The not holding of mmap_sem means we can race against VMA change/removal and page-table destruction. We use the SRCU VMA freeing to keep the VMA around. We use the VMA seqcount to detect change (including umapping / page-table deletion) and we use gup_fast() style page-table walking to deal with page-table races. Once we've obtained the page and are ready to update the PTE, we validate if the state we started the fault with is still valid, if not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the PTE and we're done. Signed-off-by: Peter Zijlstra (Intel) [Manage the newly introduced pte_spinlock() for speculative page fault to fail if the VMA is touched in our back] [Rename vma_is_dead() to vma_has_changed() and declare it here] [Fetch p4d and pud] [Set vmd.sequence in __handle_mm_fault()] [Abort speculative path when handle_userfault() has to be called] [Add additional VMA's flags checks in handle_speculative_fault()] [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] [Remove warning comment about waiting for !seq&1 since we don't want to wait] [Remove warning about no huge page support, mention it explictly] [Don't call do_fault() in the speculative path as __do_fault() calls vma->vm_ops->fault() which may want to release mmap_sem] [Only vm_fault pointer argument for vma_has_changed()] [Fix check against huge page, calling pmd_trans_huge()] [Use READ_ONCE() when reading VMA's fields in the speculative path] [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for processing done in vm_normal_page()] [Check that vma->anon_vma is already set when starting the speculative path] [Check for memory policy as we can't support MPOL_INTERLEAVE case due to the processing done in mpol_misplaced()] [Don't support VMA growing up or down] [Move check on vm_sequence just before calling handle_pte_fault()] [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] [Add mem cgroup oom check] [Use READ_ONCE to access p*d entries] [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] [Don't fetch pte again in handle_pte_fault() when running the speculative path] [Check PMD against concurrent collapsing operation] [Try spin lock the pte during the speculative path to avoid deadlock with other CPU's invalidating the TLB and requiring this CPU to catch the inter processor's interrupt] [Move define of FAULT_FLAG_SPECULATIVE here] [Introduce __handle_speculative_fault() and add a check against mm->mm_users in handle_speculative_fault() defined in mm.h] Signed-off-by: Laurent Dufour --- include/linux/hugetlb_inline.h | 2 +- include/linux/mm.h | 30 include/linux/pagemap.h| 4 +- mm/internal.h | 16 +- mm/memory.c| 340 - 5 files changed, 385 insertions(+), 7 deletions(-) diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h index 0660a03d37d9..9e25283d6fc9 100644 --- a/include/linux/hugetlb_inline.h +++ b/include/linux/hugetlb_inline.h @@ -8,7 +8,7 @@ static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) { - return !!(vma->vm_flags & VM_HUGETLB); + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB); } #else diff --git a/include/linux/mm.h b/include/linux/mm.h index 05cbba70104b..31acf98a7d92 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_USER0x40/* The fault originated in userspace */ #define FAULT_FLAG_REMOTE 0x80/* faulting for non current tsk/mm */ #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */ +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not holding mmap_sem */ #define FAULT_FLAG_TRACE \ { FAULT_FLAG_WRITE, "WRITE" }, \ @@ -343,6 +344,10 @@ struct vm_fault { gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ unsigned long address; /* Faulting virtual address */ +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + unsigned int sequence; + pmd_t orig_pmd; /* value of PMD at the time of fault */ +#endif pmd_t *pmd; /* Pointer to pmd entry matching * the 'address' */ pud_t *pud; /* Pointer to pud entry matching @@ -1415,6 +1420,31 @@ int invalidate_inode_page(struct page *page); #ifdef CONFIG_MMU extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, unsigned int flags); + +#ifdef