Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-21 Thread Jay Patel
On Jul 19 2023, Aneesh Kumar K V wrote:
> On 7/19/23 10:34 AM, Hugh Dickins wrote:
> > On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
> >> Hugh Dickins  writes:
> >>
> >>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> >>> in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
> >>> stricter than the previous implementation, which skipped when pmd_none()
> >>> (with a comment on khugepaged collapse transitions): but wouldn't we want
> >>> to know, if an assert_pte_locked() caller can be racing such transitions?
> >>>
> >>
> >> The reason we had that pmd_none check there was to handle khugpaged. In
> >> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
> >> ppc64 had the assert_pte_locked check inside that ptep_clear.
> >>
> >> _pmd = pmdp_collapse_flush(vma, address, pmd);
> >> ..
> >> ptep_clear()
> >> -> asset_ptep_locked()
> >> ---> pmd_none
> >> -> BUG
> >>
> >>
> >> The problem is how assert_pte_locked() verify whether we are holding
> >> ptl. It does that by walking the page table again and in this specific
> >> case by the time we call the function we already had cleared pmd .
> > 
> > Aneesh, please clarify, I've spent hours on this.
> > 
> > From all your use of past tense ("had"), I thought you were Acking my
> > patch; but now, after looking again at v3.11 source and today's,
> > I think you are NAKing my patch in its present form.
> > 
> 
> Sorry for the confusion my reply created. 
> 
> > You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
> > uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
> > *pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
> > Is that your point?
> > 
> 
> Yes. I haven't tested this yet to verify that it is indeed hitting that BUG.
> But a code inspection tells me we will hit that BUG on powerpc because of
> the above details.
>
Hi Aneesh,

After testing it, I can confirm that it encountered a BUG on powerpc.
Log report as attached

Thanks,
Jay Patel 
> > I can easily restore that khugepaged comment (which had appeared to me
> > out of date at the time, but now looks still relevant) and pmd_none(*pmd)
> > check: but please clarify.
> > 
> 
> That is correct. if we add that pmd_none check back we should be good here.
> 
> 
> -aneesh
[   53.513058][  T105] [ cut here ]
[   53.513080][  T105] kernel BUG at arch/powerpc/mm/pgtable.c:327!
[   53.513090][  T105] Oops: Exception in kernel mode, sig: 5 [#1]
[   53.513099][  T105] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   53.513109][  T105] Modules linked in: bonding pseries_rng rng_core 
vmx_crypto gf128mul ibmveth crc32c_vpmsum fuse autofs4  

[   53.513135][  T105] CPU: 3 PID: 105 Comm: khugepaged Not tainted 
6.5.0-rc1-gebfaf626e99f-dirty #1
[   53.513146][  T105] Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 
0xf05 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries   
  
[   53.513156][  T105] NIP:  c0079478 LR: c007946c CTR: 

[   53.513165][  T105] REGS: c8e9b930 TRAP: 0700   Not tainted  
(6.5.0-rc1-gebfaf626e99f-dirty) 
   
[   53.513175][  T105] MSR:  8282b033  
 CR: 24002882  XER: 2004
   
[   53.513202][  T105] CFAR: c0412a30 IRQMASK: 0
[   53.513202][  T105] GPR00: c007946c c8e9bbd0 
c12d3500 0001
[   53.513202][  T105] GPR04: 1100 c8e9bbb0 
c8e9bbf0 
[   53.513202][  T105] GPR08: 03ff  
 000a
[   53.513202][  T105] GPR12: 497b c0001ec9d480 
c00c0016fe00 c00051455000
[   53.513202][  T105] GPR16:   
0001 0001
[   53.513202][  T105] GPR20: c2912430 c00051455000 
 c946e650
[   53.513202][  T105] GPR24: c29800e8 1100 
c00c00145168 c2980180
[   53.513202][  T105] GPR28: 1100 8603f85b80c0 
c8e9bc70 c0001bd0d680
[   53.513304][  T105] NIP [c0079478] 
assert_pte_locked.part.18+0x168/0x1b0
[   53.513318][  T105] LR [c007946c] 
assert_pte_locked.part.18+0x15c/0x1b0
[   53.513329][  T105] Call Trace:
[   53.513335][  T105] [c8e9bbd0] [c007946c] 
assert_pte_locked.part.18+0x15c/0x1b0 (unreliable)  
  
[   53.513350][  T105] [c8e9bc00] [c048e10c] 
collapse_huge_page+0x11dc/0x1700
[   53.513362][  T105] [c8e9bd40] 

Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-18 Thread Aneesh Kumar K V
On 7/19/23 10:34 AM, Hugh Dickins wrote:
> On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
>> Hugh Dickins  writes:
>>
>>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
>>> in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
>>> stricter than the previous implementation, which skipped when pmd_none()
>>> (with a comment on khugepaged collapse transitions): but wouldn't we want
>>> to know, if an assert_pte_locked() caller can be racing such transitions?
>>>
>>
>> The reason we had that pmd_none check there was to handle khugpaged. In
>> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
>> ppc64 had the assert_pte_locked check inside that ptep_clear.
>>
>> _pmd = pmdp_collapse_flush(vma, address, pmd);
>> ..
>> ptep_clear()
>> -> asset_ptep_locked()
>> ---> pmd_none
>> -> BUG
>>
>>
>> The problem is how assert_pte_locked() verify whether we are holding
>> ptl. It does that by walking the page table again and in this specific
>> case by the time we call the function we already had cleared pmd .
> 
> Aneesh, please clarify, I've spent hours on this.
> 
> From all your use of past tense ("had"), I thought you were Acking my
> patch; but now, after looking again at v3.11 source and today's,
> I think you are NAKing my patch in its present form.
> 

Sorry for the confusion my reply created. 

> You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
> uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
> *pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
> Is that your point?
> 

Yes. I haven't tested this yet to verify that it is indeed hitting that BUG.
But a code inspection tells me we will hit that BUG on powerpc because of
the above details.

> I can easily restore that khugepaged comment (which had appeared to me
> out of date at the time, but now looks still relevant) and pmd_none(*pmd)
> check: but please clarify.
> 

That is correct. if we add that pmd_none check back we should be good here.


-aneesh


Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-18 Thread Hugh Dickins
On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
> Hugh Dickins  writes:
> 
> > Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> > in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
> > stricter than the previous implementation, which skipped when pmd_none()
> > (with a comment on khugepaged collapse transitions): but wouldn't we want
> > to know, if an assert_pte_locked() caller can be racing such transitions?
> >
> 
> The reason we had that pmd_none check there was to handle khugpaged. In
> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
> ppc64 had the assert_pte_locked check inside that ptep_clear.
> 
> _pmd = pmdp_collapse_flush(vma, address, pmd);
> ..
> ptep_clear()
> -> asset_ptep_locked()
> ---> pmd_none
> -> BUG
> 
> 
> The problem is how assert_pte_locked() verify whether we are holding
> ptl. It does that by walking the page table again and in this specific
> case by the time we call the function we already had cleared pmd .

Aneesh, please clarify, I've spent hours on this.

>From all your use of past tense ("had"), I thought you were Acking my
patch; but now, after looking again at v3.11 source and today's,
I think you are NAKing my patch in its present form.

You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
*pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
Is that your point?

I can easily restore that khugepaged comment (which had appeared to me
out of date at the time, but now looks still relevant) and pmd_none(*pmd)
check: but please clarify.

Thanks,
Hugh

> >
> > This mod might cause new crashes: which either expose my ignorance, or
> > indicate issues to be fixed, or limit the usage of assert_pte_locked().
> >
> > Signed-off-by: Hugh Dickins 
> > ---
> >  arch/powerpc/mm/pgtable.c | 16 ++--
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> > index cb2dcdb18f8e..16b061af86d7 100644
> > --- a/arch/powerpc/mm/pgtable.c
> > +++ b/arch/powerpc/mm/pgtable.c
> > @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> > long addr)
> > p4d_t *p4d;
> > pud_t *pud;
> > pmd_t *pmd;
> > +   pte_t *pte;
> > +   spinlock_t *ptl;
> >  
> > if (mm == _mm)
> > return;
> > @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> > long addr)
> > pud = pud_offset(p4d, addr);
> > BUG_ON(pud_none(*pud));
> > pmd = pmd_offset(pud, addr);
> > -   /*
> > -* khugepaged to collapse normal pages to hugepage, first set
> > -* pmd to none to force page fault/gup to take mmap_lock. After
> > -* pmd is set to none, we do a pte_clear which does this assertion
> > -* so if we find pmd none, return.
> > -*/
> > -   if (pmd_none(*pmd))
> > -   return;
> > -   BUG_ON(!pmd_present(*pmd));
> > -   assert_spin_locked(pte_lockptr(mm, pmd));
> > +   pte = pte_offset_map_nolock(mm, pmd, addr, );
> > +   BUG_ON(!pte);
> > +   assert_spin_locked(ptl);
> > +   pte_unmap(pte);
> >  }
> >  #endif /* CONFIG_DEBUG_VM */
> >  
> > -- 
> > 2.35.3


Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-18 Thread Aneesh Kumar K.V
Hugh Dickins  writes:

> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
> stricter than the previous implementation, which skipped when pmd_none()
> (with a comment on khugepaged collapse transitions): but wouldn't we want
> to know, if an assert_pte_locked() caller can be racing such transitions?
>

The reason we had that pmd_none check there was to handle khugpaged. In
case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
ppc64 had the assert_pte_locked check inside that ptep_clear.

_pmd = pmdp_collapse_flush(vma, address, pmd);
..
ptep_clear()
-> asset_ptep_locked()
---> pmd_none
-> BUG


The problem is how assert_pte_locked() verify whether we are holding
ptl. It does that by walking the page table again and in this specific
case by the time we call the function we already had cleared pmd .
>
> This mod might cause new crashes: which either expose my ignorance, or
> indicate issues to be fixed, or limit the usage of assert_pte_locked().
>
> Signed-off-by: Hugh Dickins 
> ---
>  arch/powerpc/mm/pgtable.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index cb2dcdb18f8e..16b061af86d7 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> long addr)
>   p4d_t *p4d;
>   pud_t *pud;
>   pmd_t *pmd;
> + pte_t *pte;
> + spinlock_t *ptl;
>  
>   if (mm == _mm)
>   return;
> @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
> long addr)
>   pud = pud_offset(p4d, addr);
>   BUG_ON(pud_none(*pud));
>   pmd = pmd_offset(pud, addr);
> - /*
> -  * khugepaged to collapse normal pages to hugepage, first set
> -  * pmd to none to force page fault/gup to take mmap_lock. After
> -  * pmd is set to none, we do a pte_clear which does this assertion
> -  * so if we find pmd none, return.
> -  */
> - if (pmd_none(*pmd))
> - return;
> - BUG_ON(!pmd_present(*pmd));
> - assert_spin_locked(pte_lockptr(mm, pmd));
> + pte = pte_offset_map_nolock(mm, pmd, addr, );
> + BUG_ON(!pte);
> + assert_spin_locked(ptl);
> + pte_unmap(pte);
>  }
>  #endif /* CONFIG_DEBUG_VM */
>  
> -- 
> 2.35.3


[PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

2023-07-11 Thread Hugh Dickins
Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
stricter than the previous implementation, which skipped when pmd_none()
(with a comment on khugepaged collapse transitions): but wouldn't we want
to know, if an assert_pte_locked() caller can be racing such transitions?

This mod might cause new crashes: which either expose my ignorance, or
indicate issues to be fixed, or limit the usage of assert_pte_locked().

Signed-off-by: Hugh Dickins 
---
 arch/powerpc/mm/pgtable.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..16b061af86d7 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long 
addr)
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
+   pte_t *pte;
+   spinlock_t *ptl;
 
if (mm == _mm)
return;
@@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned 
long addr)
pud = pud_offset(p4d, addr);
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, addr);
-   /*
-* khugepaged to collapse normal pages to hugepage, first set
-* pmd to none to force page fault/gup to take mmap_lock. After
-* pmd is set to none, we do a pte_clear which does this assertion
-* so if we find pmd none, return.
-*/
-   if (pmd_none(*pmd))
-   return;
-   BUG_ON(!pmd_present(*pmd));
-   assert_spin_locked(pte_lockptr(mm, pmd));
+   pte = pte_offset_map_nolock(mm, pmd, addr, );
+   BUG_ON(!pte);
+   assert_spin_locked(ptl);
+   pte_unmap(pte);
 }
 #endif /* CONFIG_DEBUG_VM */
 
-- 
2.35.3