Re: [PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes

2023-10-18 Thread Aneesh Kumar K.V
Aneesh Kumar K V  writes:

> On 10/18/23 11:25 AM, Christophe Leroy wrote:
>> 
>> 
>> Le 18/10/2023 à 06:55, Aneesh Kumar K.V a écrit :
>>> With commit 9fee28baa601 ("powerpc: implement the new page table range
>>> API") we added set_ptes to powerpc architecture but the implementation
>>> missed calling the pte filter for all the ptes we are setting in the
>>> range. set_pte_filter can be used for filter pte values and on some
>>> platforms which don't support coherent icache it clears the exec bit so
>>> that we can flush the icache on exec fault
>>>
>>> The patch also removes the usage of arch_enter/leave_lazy_mmu() because
>>> set_pte is not supposed to be used when updating a pte entry. Powerpc
>>> architecture uses this rule to skip the expensive tlb invalidate which
>>> is not needed when you are setting up the pte for the first time. See
>>> commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
>>> _PAGE_NUMA bit") for more details
>>>
>>> Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>   arch/powerpc/mm/pgtable.c | 33 -
>>>   1 file changed, 20 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>>> index 3ba9fe411604..95ab20cca2da 100644
>>> --- a/arch/powerpc/mm/pgtable.c
>>> +++ b/arch/powerpc/mm/pgtable.c
>>> @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long 
>>> addr, pte_t *ptep,
>>> pte_t pte, unsigned int nr)
>>>   {
>>> /*
>>> -* Make sure hardware valid bit is not set. We don't do
>>> -* tlb flush for this update.
>>> +* We don't need to call arch_enter/leave_lazy_mmu_mode()
>>> +* because we expect set_ptes to be only be used on not present
>>> +* and not hw_valid ptes. Hence there is not translation cache flush
>>> +* involved that need to be batched.
>>>  */
>>> -   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>>> +   for (;;) {
>>>   
>>> -   /* Note: mm->context.id might not yet have been assigned as
>>> -* this context might not have been activated yet when this
>>> -* is called.
>>> -*/
>>> -   pte = set_pte_filter(pte);
>>> +   /*
>>> +* Make sure hardware valid bit is not set. We don't do
>>> +* tlb flush for this update.
>>> +*/
>>> +   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>>>   
>>> -   /* Perform the setting of the PTE */
>>> -   arch_enter_lazy_mmu_mode();
>>> -   for (;;) {
>>> +   /* Note: mm->context.id might not yet have been assigned as
>>> +* this context might not have been activated yet when this
>>> +* is called.
>>> +*/
>>> +   pte = set_pte_filter(pte);
>> 
>> Why do you need to call set_pte_filter() inside the loop ?
>> The only difference between previous pte and next pte is the RPN, other 
>> flags remain untouched so I can't see why you need to call 
>> set_pte_filter() again.
>> 
>
> I missed the fact that we use the filtered pte in all the ptes in the range. 
> One other details
> that made me look at calling the filter in the loop was we clearing the 
> struct page->flags.
> The only flag right now we care about the PG_dcache_clean and that moved to 
> folio. So we might be
> good here. May be we add a comment in set_pte_filter saying can operate only 
> on folio->flags ? 
>
>>> +
>>> +   /* Perform the setting of the PTE */
>>> __set_pte_at(mm, addr, ptep, pte, 0);
>>> if (--nr == 0)
>>> break;
>>> ptep++;
>>> -   pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>>> addr += PAGE_SIZE;
>>> +   /* increment the pfn */
>>> +   pte = __pte(pte_val(pte) + PAGE_SIZE);
>> 
>> PAGE_SIZE doesn't work on all platforms, see for instance e500.
>> 
>> see comment at 
>> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147
>> 
>> And then you see 
>> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63
>> 
>
> Didn't know that. I actually wanted to do pfn_pte(pte_pfn(pte) + 1) . But 
> that needs pgprot_t. I
> can move it back to PTE_RPN_SHIFT with details of the above documented. 
>

something like this ?

>From 62825870d4b48ffb53e9837dfb4cf7c0422732ec Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" 
Date: Fri, 6 Oct 2023 22:47:00 +0530
Subject: [PATCH] powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in
 set_ptes

With commit 9fee28baa601 ("powerpc: implement the new page table range
API") we added set_ptes to powerpc architecture. The implementation
included calling arch_enter/leave_lazy_mmu() calls.

The patch removes the usage of arch_enter/leave_lazy_mmu() because
set_pte is not supposed to be used when updating a pte entry. Powerpc
architecture uses this rule to skip the expensive tlb invalida

Re: [PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes

2023-10-17 Thread Aneesh Kumar K V
On 10/18/23 11:25 AM, Christophe Leroy wrote:
> 
> 
> Le 18/10/2023 à 06:55, Aneesh Kumar K.V a écrit :
>> With commit 9fee28baa601 ("powerpc: implement the new page table range
>> API") we added set_ptes to powerpc architecture but the implementation
>> missed calling the pte filter for all the ptes we are setting in the
>> range. set_pte_filter can be used for filter pte values and on some
>> platforms which don't support coherent icache it clears the exec bit so
>> that we can flush the icache on exec fault
>>
>> The patch also removes the usage of arch_enter/leave_lazy_mmu() because
>> set_pte is not supposed to be used when updating a pte entry. Powerpc
>> architecture uses this rule to skip the expensive tlb invalidate which
>> is not needed when you are setting up the pte for the first time. See
>> commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
>> _PAGE_NUMA bit") for more details
>>
>> Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>   arch/powerpc/mm/pgtable.c | 33 -
>>   1 file changed, 20 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index 3ba9fe411604..95ab20cca2da 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long 
>> addr, pte_t *ptep,
>>  pte_t pte, unsigned int nr)
>>   {
>>  /*
>> - * Make sure hardware valid bit is not set. We don't do
>> - * tlb flush for this update.
>> + * We don't need to call arch_enter/leave_lazy_mmu_mode()
>> + * because we expect set_ptes to be only be used on not present
>> + * and not hw_valid ptes. Hence there is not translation cache flush
>> + * involved that need to be batched.
>>   */
>> -VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>> +for (;;) {
>>   
>> -/* Note: mm->context.id might not yet have been assigned as
>> - * this context might not have been activated yet when this
>> - * is called.
>> - */
>> -pte = set_pte_filter(pte);
>> +/*
>> + * Make sure hardware valid bit is not set. We don't do
>> + * tlb flush for this update.
>> + */
>> +VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>>   
>> -/* Perform the setting of the PTE */
>> -arch_enter_lazy_mmu_mode();
>> -for (;;) {
>> +/* Note: mm->context.id might not yet have been assigned as
>> + * this context might not have been activated yet when this
>> + * is called.
>> + */
>> +pte = set_pte_filter(pte);
> 
> Why do you need to call set_pte_filter() inside the loop ?
> The only difference between previous pte and next pte is the RPN, other 
> flags remain untouched so I can't see why you need to call 
> set_pte_filter() again.
> 

I missed the fact that we use the filtered pte in all the ptes in the range. 
One other details
that made me look at calling the filter in the loop was we clearing the struct 
page->flags.
The only flag right now we care about the PG_dcache_clean and that moved to 
folio. So we might be
good here. May be we add a comment in set_pte_filter saying can operate only on 
folio->flags ? 

>> +
>> +/* Perform the setting of the PTE */
>>  __set_pte_at(mm, addr, ptep, pte, 0);
>>  if (--nr == 0)
>>  break;
>>  ptep++;
>> -pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>>  addr += PAGE_SIZE;
>> +/* increment the pfn */
>> +pte = __pte(pte_val(pte) + PAGE_SIZE);
> 
> PAGE_SIZE doesn't work on all platforms, see for instance e500.
> 
> see comment at 
> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147
> 
> And then you see 
> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63
> 

Didn't know that. I actually wanted to do pfn_pte(pte_pfn(pte) + 1) . But that 
needs pgprot_t. I
can move it back to PTE_RPN_SHIFT with details of the above documented. 

>> +
>>  }
>> -arch_leave_lazy_mmu_mode();
>>   }
>>   
>>   void unmap_kernel_page(unsigned long va)
> 
> Christophe

-aneesh


Re: [PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes

2023-10-17 Thread Christophe Leroy


Le 18/10/2023 à 06:55, Aneesh Kumar K.V a écrit :
> With commit 9fee28baa601 ("powerpc: implement the new page table range
> API") we added set_ptes to powerpc architecture but the implementation
> missed calling the pte filter for all the ptes we are setting in the
> range. set_pte_filter can be used for filter pte values and on some
> platforms which don't support coherent icache it clears the exec bit so
> that we can flush the icache on exec fault
> 
> The patch also removes the usage of arch_enter/leave_lazy_mmu() because
> set_pte is not supposed to be used when updating a pte entry. Powerpc
> architecture uses this rule to skip the expensive tlb invalidate which
> is not needed when you are setting up the pte for the first time. See
> commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
> _PAGE_NUMA bit") for more details
> 
> Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
> Signed-off-by: Aneesh Kumar K.V 
> ---
>   arch/powerpc/mm/pgtable.c | 33 -
>   1 file changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 3ba9fe411604..95ab20cca2da 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, 
> pte_t *ptep,
>   pte_t pte, unsigned int nr)
>   {
>   /*
> -  * Make sure hardware valid bit is not set. We don't do
> -  * tlb flush for this update.
> +  * We don't need to call arch_enter/leave_lazy_mmu_mode()
> +  * because we expect set_ptes to be only be used on not present
> +  * and not hw_valid ptes. Hence there is not translation cache flush
> +  * involved that need to be batched.
>*/
> - VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
> + for (;;) {
>   
> - /* Note: mm->context.id might not yet have been assigned as
> -  * this context might not have been activated yet when this
> -  * is called.
> -  */
> - pte = set_pte_filter(pte);
> + /*
> +  * Make sure hardware valid bit is not set. We don't do
> +  * tlb flush for this update.
> +  */
> + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>   
> - /* Perform the setting of the PTE */
> - arch_enter_lazy_mmu_mode();
> - for (;;) {
> + /* Note: mm->context.id might not yet have been assigned as
> +  * this context might not have been activated yet when this
> +  * is called.
> +  */
> + pte = set_pte_filter(pte);

Why do you need to call set_pte_filter() inside the loop ?
The only difference between previous pte and next pte is the RPN, other 
flags remain untouched so I can't see why you need to call 
set_pte_filter() again.

> +
> + /* Perform the setting of the PTE */
>   __set_pte_at(mm, addr, ptep, pte, 0);
>   if (--nr == 0)
>   break;
>   ptep++;
> - pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>   addr += PAGE_SIZE;
> + /* increment the pfn */
> + pte = __pte(pte_val(pte) + PAGE_SIZE);

PAGE_SIZE doesn't work on all platforms, see for instance e500.

see comment at 
https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147

And then you see 
https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63

> +
>   }
> - arch_leave_lazy_mmu_mode();
>   }
>   
>   void unmap_kernel_page(unsigned long va)

Christophe


[PATCH] powerpc/mm: Update set_ptes to call pte_filter for all the ptes

2023-10-17 Thread Aneesh Kumar K.V
With commit 9fee28baa601 ("powerpc: implement the new page table range
API") we added set_ptes to powerpc architecture but the implementation
missed calling the pte filter for all the ptes we are setting in the
range. set_pte_filter can be used for filter pte values and on some
platforms which don't support coherent icache it clears the exec bit so
that we can flush the icache on exec fault

The patch also removes the usage of arch_enter/leave_lazy_mmu() because
set_pte is not supposed to be used when updating a pte entry. Powerpc
architecture uses this rule to skip the expensive tlb invalidate which
is not needed when you are setting up the pte for the first time. See
commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating
_PAGE_NUMA bit") for more details

Fixes: 9fee28baa601 ("powerpc: implement the new page table range API")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 3ba9fe411604..95ab20cca2da 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -191,28 +191,35 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
pte_t pte, unsigned int nr)
 {
/*
-* Make sure hardware valid bit is not set. We don't do
-* tlb flush for this update.
+* We don't need to call arch_enter/leave_lazy_mmu_mode()
+* because we expect set_ptes to be only be used on not present
+* and not hw_valid ptes. Hence there is not translation cache flush
+* involved that need to be batched.
 */
-   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
+   for (;;) {
 
-   /* Note: mm->context.id might not yet have been assigned as
-* this context might not have been activated yet when this
-* is called.
-*/
-   pte = set_pte_filter(pte);
+   /*
+* Make sure hardware valid bit is not set. We don't do
+* tlb flush for this update.
+*/
+   VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   /* Perform the setting of the PTE */
-   arch_enter_lazy_mmu_mode();
-   for (;;) {
+   /* Note: mm->context.id might not yet have been assigned as
+* this context might not have been activated yet when this
+* is called.
+*/
+   pte = set_pte_filter(pte);
+
+   /* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
if (--nr == 0)
break;
ptep++;
-   pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
addr += PAGE_SIZE;
+   /* increment the pfn */
+   pte = __pte(pte_val(pte) + PAGE_SIZE);
+
}
-   arch_leave_lazy_mmu_mode();
 }
 
 void unmap_kernel_page(unsigned long va)
-- 
2.41.0