Re: [PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-30 Thread David Hildenbrand

On 30.01.24 09:45, Ryan Roberts wrote:

On 30/01/2024 08:37, David Hildenbrand wrote:

On 30.01.24 09:31, Ryan Roberts wrote:

On 29/01/2024 14:32, David Hildenbrand wrote:

We don't need up-to-date accessed-dirty information for anon folios and can
simply work with the ptent we already have. Also, we know the RSS counter
we want to update.

We can safely move arch_check_zapped_pte() + tlb_remove_tlb_entry() +
zap_install_uffd_wp_if_needed() after updating the folio and RSS.

While at it, only call zap_install_uffd_wp_if_needed() if there is even
any chance that pte_install_uffd_wp_if_needed() would do *something*.
That is, just don't bother if uffd-wp does not apply.

Signed-off-by: David Hildenbrand 
---
   mm/memory.c | 16 +++-
   1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 69502cdc0a7d..20bc13ab8db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1552,12 +1552,9 @@ static inline void zap_present_pte(struct mmu_gather
*tlb,
   folio = page_folio(page);
   if (unlikely(!should_zap_folio(details, folio)))
   return;
-    ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
-    arch_check_zapped_pte(vma, ptent);
-    tlb_remove_tlb_entry(tlb, pte, addr);
-    zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
     if (!folio_test_anon(folio)) {
+    ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
   if (pte_dirty(ptent)) {
   folio_mark_dirty(folio);
   if (tlb_delay_rmap(tlb)) {
@@ -1567,8 +1564,17 @@ static inline void zap_present_pte(struct mmu_gather
*tlb,
   }
   if (pte_young(ptent) && likely(vma_has_recency(vma)))
   folio_mark_accessed(folio);
+    rss[mm_counter(folio)]--;
+    } else {
+    /* We don't need up-to-date accessed/dirty bits. */
+    ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
+    rss[MM_ANONPAGES]--;
   }
-    rss[mm_counter(folio)]--;
+    arch_check_zapped_pte(vma, ptent);


Isn't the x86 (only) implementation of this relying on the dirty bit? So doesn't
that imply you still need get_and_clear for anon? (And in hindsight I think that
logic would apply to the previous patch too?)


x86 uses the encoding !writable && dirty to indicate special shadow stacks. That
is, the hw dirty bit is set by software (to create that combination), not by
hardware.

So you don't have to sync against any hw changes of the hw dirty bit. What you
had in the original PTE you read is sufficient.



Right, got it. In that case:


Thanks a lot for paying that much attention during your reviews! Highly 
appreciated!




Reviewed-by: Ryan Roberts 




--
Cheers,

David / dhildenb



Re: [PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-30 Thread Ryan Roberts
On 30/01/2024 08:37, David Hildenbrand wrote:
> On 30.01.24 09:31, Ryan Roberts wrote:
>> On 29/01/2024 14:32, David Hildenbrand wrote:
>>> We don't need up-to-date accessed-dirty information for anon folios and can
>>> simply work with the ptent we already have. Also, we know the RSS counter
>>> we want to update.
>>>
>>> We can safely move arch_check_zapped_pte() + tlb_remove_tlb_entry() +
>>> zap_install_uffd_wp_if_needed() after updating the folio and RSS.
>>>
>>> While at it, only call zap_install_uffd_wp_if_needed() if there is even
>>> any chance that pte_install_uffd_wp_if_needed() would do *something*.
>>> That is, just don't bother if uffd-wp does not apply.
>>>
>>> Signed-off-by: David Hildenbrand 
>>> ---
>>>   mm/memory.c | 16 +++-
>>>   1 file changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 69502cdc0a7d..20bc13ab8db2 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -1552,12 +1552,9 @@ static inline void zap_present_pte(struct mmu_gather
>>> *tlb,
>>>   folio = page_folio(page);
>>>   if (unlikely(!should_zap_folio(details, folio)))
>>>   return;
>>> -    ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
>>> -    arch_check_zapped_pte(vma, ptent);
>>> -    tlb_remove_tlb_entry(tlb, pte, addr);
>>> -    zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
>>>     if (!folio_test_anon(folio)) {
>>> +    ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
>>>   if (pte_dirty(ptent)) {
>>>   folio_mark_dirty(folio);
>>>   if (tlb_delay_rmap(tlb)) {
>>> @@ -1567,8 +1564,17 @@ static inline void zap_present_pte(struct mmu_gather
>>> *tlb,
>>>   }
>>>   if (pte_young(ptent) && likely(vma_has_recency(vma)))
>>>   folio_mark_accessed(folio);
>>> +    rss[mm_counter(folio)]--;
>>> +    } else {
>>> +    /* We don't need up-to-date accessed/dirty bits. */
>>> +    ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
>>> +    rss[MM_ANONPAGES]--;
>>>   }
>>> -    rss[mm_counter(folio)]--;
>>> +    arch_check_zapped_pte(vma, ptent);
>>
>> Isn't the x86 (only) implementation of this relying on the dirty bit? So 
>> doesn't
>> that imply you still need get_and_clear for anon? (And in hindsight I think 
>> that
>> logic would apply to the previous patch too?)
> 
> x86 uses the encoding !writable && dirty to indicate special shadow stacks. 
> That
> is, the hw dirty bit is set by software (to create that combination), not by
> hardware.
> 
> So you don't have to sync against any hw changes of the hw dirty bit. What you
> had in the original PTE you read is sufficient.
> 

Right, got it. In that case:

Reviewed-by: Ryan Roberts 




Re: [PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-30 Thread David Hildenbrand

On 30.01.24 09:31, Ryan Roberts wrote:

On 29/01/2024 14:32, David Hildenbrand wrote:

We don't need up-to-date accessed-dirty information for anon folios and can
simply work with the ptent we already have. Also, we know the RSS counter
we want to update.

We can safely move arch_check_zapped_pte() + tlb_remove_tlb_entry() +
zap_install_uffd_wp_if_needed() after updating the folio and RSS.

While at it, only call zap_install_uffd_wp_if_needed() if there is even
any chance that pte_install_uffd_wp_if_needed() would do *something*.
That is, just don't bother if uffd-wp does not apply.

Signed-off-by: David Hildenbrand 
---
  mm/memory.c | 16 +++-
  1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 69502cdc0a7d..20bc13ab8db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1552,12 +1552,9 @@ static inline void zap_present_pte(struct mmu_gather 
*tlb,
folio = page_folio(page);
if (unlikely(!should_zap_folio(details, folio)))
return;
-   ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
-   arch_check_zapped_pte(vma, ptent);
-   tlb_remove_tlb_entry(tlb, pte, addr);
-   zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
  
  	if (!folio_test_anon(folio)) {

+   ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
if (pte_dirty(ptent)) {
folio_mark_dirty(folio);
if (tlb_delay_rmap(tlb)) {
@@ -1567,8 +1564,17 @@ static inline void zap_present_pte(struct mmu_gather 
*tlb,
}
if (pte_young(ptent) && likely(vma_has_recency(vma)))
folio_mark_accessed(folio);
+   rss[mm_counter(folio)]--;
+   } else {
+   /* We don't need up-to-date accessed/dirty bits. */
+   ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
+   rss[MM_ANONPAGES]--;
}
-   rss[mm_counter(folio)]--;
+   arch_check_zapped_pte(vma, ptent);


Isn't the x86 (only) implementation of this relying on the dirty bit? So doesn't
that imply you still need get_and_clear for anon? (And in hindsight I think that
logic would apply to the previous patch too?)


x86 uses the encoding !writable && dirty to indicate special shadow 
stacks. That is, the hw dirty bit is set by software (to create that 
combination), not by hardware.


So you don't have to sync against any hw changes of the hw dirty bit. 
What you had in the original PTE you read is sufficient.


--
Cheers,

David / dhildenb



Re: [PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-30 Thread Ryan Roberts
On 29/01/2024 14:32, David Hildenbrand wrote:
> We don't need up-to-date accessed-dirty information for anon folios and can
> simply work with the ptent we already have. Also, we know the RSS counter
> we want to update.
> 
> We can safely move arch_check_zapped_pte() + tlb_remove_tlb_entry() +
> zap_install_uffd_wp_if_needed() after updating the folio and RSS.
> 
> While at it, only call zap_install_uffd_wp_if_needed() if there is even
> any chance that pte_install_uffd_wp_if_needed() would do *something*.
> That is, just don't bother if uffd-wp does not apply.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  mm/memory.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 69502cdc0a7d..20bc13ab8db2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1552,12 +1552,9 @@ static inline void zap_present_pte(struct mmu_gather 
> *tlb,
>   folio = page_folio(page);
>   if (unlikely(!should_zap_folio(details, folio)))
>   return;
> - ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
> - arch_check_zapped_pte(vma, ptent);
> - tlb_remove_tlb_entry(tlb, pte, addr);
> - zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
>  
>   if (!folio_test_anon(folio)) {
> + ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
>   if (pte_dirty(ptent)) {
>   folio_mark_dirty(folio);
>   if (tlb_delay_rmap(tlb)) {
> @@ -1567,8 +1564,17 @@ static inline void zap_present_pte(struct mmu_gather 
> *tlb,
>   }
>   if (pte_young(ptent) && likely(vma_has_recency(vma)))
>   folio_mark_accessed(folio);
> + rss[mm_counter(folio)]--;
> + } else {
> + /* We don't need up-to-date accessed/dirty bits. */
> + ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
> + rss[MM_ANONPAGES]--;
>   }
> - rss[mm_counter(folio)]--;
> + arch_check_zapped_pte(vma, ptent);

Isn't the x86 (only) implementation of this relying on the dirty bit? So doesn't
that imply you still need get_and_clear for anon? (And in hindsight I think that
logic would apply to the previous patch too?)

Impl:

void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte)
{
/*
 * Hardware before shadow stack can (rarely) set Dirty=1
 * on a Write=0 PTE. So the below condition
 * only indicates a software bug when shadow stack is
 * supported by the HW. This checking is covered in
 * pte_shstk().
 */
VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) &&
pte_shstk(pte));
}

static inline bool pte_shstk(pte_t pte)
{
return cpu_feature_enabled(X86_FEATURE_SHSTK) &&
   (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY;
}


> + tlb_remove_tlb_entry(tlb, pte, addr);
> + if (unlikely(userfaultfd_pte_wp(vma, ptent)))
> + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
> +
>   if (!delay_rmap) {
>   folio_remove_rmap_pte(folio, page, vma);
>   if (unlikely(page_mapcount(page) < 0))



[PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-29 Thread David Hildenbrand
We don't need up-to-date accessed-dirty information for anon folios and can
simply work with the ptent we already have. Also, we know the RSS counter
we want to update.

We can safely move arch_check_zapped_pte() + tlb_remove_tlb_entry() +
zap_install_uffd_wp_if_needed() after updating the folio and RSS.

While at it, only call zap_install_uffd_wp_if_needed() if there is even
any chance that pte_install_uffd_wp_if_needed() would do *something*.
That is, just don't bother if uffd-wp does not apply.

Signed-off-by: David Hildenbrand 
---
 mm/memory.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 69502cdc0a7d..20bc13ab8db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1552,12 +1552,9 @@ static inline void zap_present_pte(struct mmu_gather 
*tlb,
folio = page_folio(page);
if (unlikely(!should_zap_folio(details, folio)))
return;
-   ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
-   arch_check_zapped_pte(vma, ptent);
-   tlb_remove_tlb_entry(tlb, pte, addr);
-   zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
 
if (!folio_test_anon(folio)) {
+   ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
if (pte_dirty(ptent)) {
folio_mark_dirty(folio);
if (tlb_delay_rmap(tlb)) {
@@ -1567,8 +1564,17 @@ static inline void zap_present_pte(struct mmu_gather 
*tlb,
}
if (pte_young(ptent) && likely(vma_has_recency(vma)))
folio_mark_accessed(folio);
+   rss[mm_counter(folio)]--;
+   } else {
+   /* We don't need up-to-date accessed/dirty bits. */
+   ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm);
+   rss[MM_ANONPAGES]--;
}
-   rss[mm_counter(folio)]--;
+   arch_check_zapped_pte(vma, ptent);
+   tlb_remove_tlb_entry(tlb, pte, addr);
+   if (unlikely(userfaultfd_pte_wp(vma, ptent)))
+   zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
+
if (!delay_rmap) {
folio_remove_rmap_pte(folio, page, vma);
if (unlikely(page_mapcount(page) < 0))
-- 
2.43.0