[PATCH] powerpc/mm: move setting pte specific flags to pfn_pmd

2020-10-22 Thread Aneesh Kumar K.V
powerpc used to set the pte specific flags in set_pte_at().  This is
different from other architectures. To be consistent with other
architecture powerpc updated pfn_pte to set _PAGE_PTE with
commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte")

The commit didn't do the same w.r.t pfn_pmd because we expect pmd_mkhuge
to do that. But as per Linus that is a bad rule [1].
Hence update pfn_pmd to set _PAGE_PTE.

[1]
" The rule that you must use "pmd_mkhuge()" seems _completely_ wrong.
It's insane. The only valid use to ever make a pmd out of a pfn is to
make a huge-page."

message-id: CAHk-=whG+Z2mBFTT026PZAdjn=gsslk9bk0wnyj5peyuvgf...@mail.gmail.com

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 17 -
 arch/powerpc/mm/book3s64/pgtable.c   |  8 +++-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cd3feeac6e87..a39886681629 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1231,13 +1231,28 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
return hash__pmd_same(pmd_a, pmd_b);
 }
 
-static inline pmd_t pmd_mkhuge(pmd_t pmd)
+static inline pmd_t __pmd_mkhuge(pmd_t pmd)
 {
if (radix_enabled())
return radix__pmd_mkhuge(pmd);
return hash__pmd_mkhuge(pmd);
 }
 
+/*
+ * pfn_pmd return a pmd_t that can be used as pmd pte entry.
+ */
+static inline pmd_t pmd_mkhuge(pmd_t pmd)
+{
+#ifdef CONFIG_DEBUG_VM
+   if (radix_enabled())
+   WARN_ON((pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)) == 0);
+   else
+   WARN_ON((pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE | 
H_PAGE_THP_HUGE)) !=
+   cpu_to_be64(_PAGE_PTE | H_PAGE_THP_HUGE));
+#endif
+   return pmd;
+}
+
 #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
 extern int pmdp_set_access_flags(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp,
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index e18ae50a275c..5b3a3bae21aa 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -136,12 +136,18 @@ static pmd_t pmd_set_protbits(pmd_t pmd, pgprot_t pgprot)
return __pmd(pmd_val(pmd) | pgprot_val(pgprot));
 }
 
+/*
+ * At some point we should be able to get rid of
+ * pmd_mkhuge() and mk_huge_pmd() when we update all the
+ * other archs to mark the pmd huge in pfn_pmd()
+ */
 pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
 {
unsigned long pmdv;
 
pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK;
-   return pmd_set_protbits(__pmd(pmdv), pgprot);
+
+   return __pmd_mkhuge(pmd_set_protbits(__pmd(pmdv), pgprot));
 }
 
 pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
-- 
2.26.2



Re: [PATCH v2] powerpc/mm: Add mask of always present MMU features

2020-10-21 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> Le 12/10/2020 à 17:39, Christophe Leroy a écrit :
>> On the same principle as commit 773edeadf672 ("powerpc/mm: Add mask
>> of possible MMU features"), add mask for MMU features that are
>> always there in order to optimise out dead branches.
>> 
>> Signed-off-by: Christophe Leroy 
>> ---
>> v2: Features must be anded with MMU_FTRS_POSSIBLE instead of ~0, otherwise
>>  MMU_FTRS_ALWAYS is ~0 when no #ifdef matches.
>
> This is still not enough. For BOOK3S/32, MMU_FTRS_POSSIBLE is still too much.
> We need a #ifdef CONFIG_PPC_BOOK3S_32 with 0.
>
> Christophe
>
>> ---
>>   arch/powerpc/include/asm/mmu.h | 25 +
>>   1 file changed, 25 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
>> index 255a1837e9f7..64e7e7f7cda9 100644
>> --- a/arch/powerpc/include/asm/mmu.h
>> +++ b/arch/powerpc/include/asm/mmu.h
>> @@ -201,8 +201,30 @@ enum {
>>  0,
>>   };
>>   
>> +enum {
>> +MMU_FTRS_ALWAYS =
>> +#ifdef CONFIG_PPC_8xx
>> +MMU_FTR_TYPE_8xx &
>> +#endif
>> +#ifdef CONFIG_40x
>> +MMU_FTR_TYPE_40x &
>> +#endif
>> +#ifdef CONFIG_PPC_47x
>> +MMU_FTR_TYPE_47x &
>> +#elif defined(CONFIG_44x)
>> +MMU_FTR_TYPE_44x &
>> +#endif
>> +#if defined(CONFIG_E200) || defined(CONFIG_E500)
>> +MMU_FTR_TYPE_FSL_E &
>> +#endif
>> +MMU_FTRS_POSSIBLE,
>> +};

Will it be simpler if we make it a #define like below?

#ifdef CONFIG_PPC_8XX
#define MMU_FTR_ALWAYS  MMU_FTR_TYPE_8XX & MMU_FTR_POSSIBLE
#endif



>> +
>>   static inline bool early_mmu_has_feature(unsigned long feature)
>>   {
>> +if (MMU_FTRS_ALWAYS & feature)
>> +return true;
>> +
>>  return !!(MMU_FTRS_POSSIBLE & cur_cpu_spec->mmu_features & feature);
>>   }
>>   
>> @@ -231,6 +253,9 @@ static __always_inline bool mmu_has_feature(unsigned 
>> long feature)
>>  }
>>   #endif
>>   
>> +if (MMU_FTRS_ALWAYS & feature)
>> +return true;
>> +
>>  if (!(MMU_FTRS_POSSIBLE & feature))
>>  return false;
>>   
>> 


Re: [PATCH v6 02/11] mm/gup: Use functions to track lockless pgtbl walks on gup_pgd_range

2020-10-15 Thread Aneesh Kumar K.V

Hi Michal,

On 10/15/20 8:16 PM, Michal Suchánek wrote:

Hello,

On Thu, Feb 06, 2020 at 12:25:18AM -0300, Leonardo Bras wrote:

On Thu, 2020-02-06 at 00:08 -0300, Leonardo Bras wrote:

 gup_pgd_range(addr, end, gup_flags, pages, );
-   local_irq_enable();
+   end_lockless_pgtbl_walk(IRQS_ENABLED);
 ret = nr;
 }
  


Just noticed IRQS_ENABLED is not available on other archs than ppc64.
I will fix this for v7.


Has threre been v7?

I cannot find it.

Thanks

Michal



https://lore.kernel.org/linuxppc-dev/20200505071729.54912-1-aneesh.ku...@linux.ibm.com

This series should help here.

-aneesh


[PATCH] powerpc/opal_elog: Handle multiple writes to ack attribute

2020-10-14 Thread Aneesh Kumar K.V
Even though we use self removing sysfs helper, we still need
to make sure we do the final kobject delete conditionally.
sysfs_remove_file_self() will handle parallel calls to remove
the sysfs attribute file and returns true only in the caller
that removed the attribute file. The other parallel callers
are returned false. Do the final kobject delete checking
the return value of sysfs_remove_file_self().

Cc: Mahesh Salgaonkar 
Cc: Oliver O'Halloran 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/powernv/opal-elog.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-elog.c 
b/arch/powerpc/platforms/powernv/opal-elog.c
index 5e33b1fc67c2..37b380eef41a 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -72,9 +72,14 @@ static ssize_t elog_ack_store(struct elog_obj *elog_obj,
  const char *buf,
  size_t count)
 {
-   opal_send_ack_elog(elog_obj->id);
-   sysfs_remove_file_self(_obj->kobj, >attr);
-   kobject_put(_obj->kobj);
+   /*
+* Try to self remove this attribute. If we are successful,
+* delete the kobject itself.
+*/
+   if (sysfs_remove_file_self(_obj->kobj, >attr)) {
+   opal_send_ack_elog(elog_obj->id);
+   kobject_put(_obj->kobj);
+   }
return count;
 }
 
-- 
2.26.2



Re: [PATCH] powerpc/features: Remove CPU_FTR_NODSISRALIGN

2020-10-13 Thread Aneesh Kumar K.V

On 10/13/20 3:45 PM, Michael Ellerman wrote:

Christophe Leroy  writes:

Le 13/10/2020 à 09:23, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


CPU_FTR_NODSISRALIGN has not been used since
commit 31bfdb036f12 ("powerpc: Use instruction emulation
infrastructure to handle alignment faults")

Remove it.

Signed-off-by: Christophe Leroy 
---
   arch/powerpc/include/asm/cputable.h | 22 ++
   arch/powerpc/kernel/dt_cpu_ftrs.c   |  8 
   arch/powerpc/kernel/prom.c  |  2 +-
   3 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 1098863e17ee..c598961d9f15 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -273,13 +273,6 @@ static int __init feat_enable_idle_nap(struct 
dt_cpu_feature *f)
return 1;
   }
   
-static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)

-{
-   cur_cpu_spec->cpu_features &= ~CPU_FTR_NODSISRALIGN;
-
-   return 1;
-}
-
   static int __init feat_enable_idle_stop(struct dt_cpu_feature *f)
   {
u64 lpcr;
@@ -641,7 +634,6 @@ static struct dt_cpu_feature_match __initdata
{"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
{"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
{"idle-nap", feat_enable_idle_nap, 0},
-   {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},


Rather than removing it entirely, I'd rather we left a comment, so that
it's obvious that we are ignoring that feature on purpose, not because
we forget about it.

eg:

diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index f204ad79b6b5..45cb7e59bd13 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -640,7 +640,7 @@ static struct dt_cpu_feature_match __initdata
{"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
{"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
{"idle-nap", feat_enable_idle_nap, 0},
-   {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
+   // "alignment-interrupt-dsisr" ignored
{"idle-stop", feat_enable_idle_stop, 0},
{"machine-check-power8", feat_enable_mce_power8, 0},
{"performance-monitor-power8", feat_enable_pmu_power8, 0},




why not do it as
static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
{
/* This feature should not be enabled */
#ifdef DEBUG
WARN(1);
#endif

return 1;
}


-aneesh


Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes

2020-10-13 Thread Aneesh Kumar K.V

On 10/14/20 2:28 AM, Andrew Morton wrote:

On Wed,  2 Sep 2020 17:12:09 +0530 "Aneesh Kumar K.V" 
 wrote:


This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. 
We can
merge them via ppc64 tree if required.


Do you think this series is ready to be merged?


Hopefully, except for the Riscv crash.



Possibly-unresolved issues which I have recorded are

Against
mm-debug_vm_pgtable-locks-move-non-page-table-modifying-test-together.patch:

https://lkml.kernel.org/r/56830efb-887e--a46e-ae015e585...@arm.com


I guess the full series do boot fine on arm.


https://lkml.kernel.org/r/20200910075752.GC26874@shao2-debian


This should be fixed by

https://ozlabs.org/~akpm/mmots/broken-out/mm-debug_vm_pgtable-avoid-doing-memory-allocation-with-pgtable_t-mapped.patch



Against mm-debug_vm_pgtable-avoid-none-pte-in-pte_clear_test.patch:

https://lkml.kernel.org/r/87zh5wx51b@linux.ibm.com



yes this one we should get fixed. I was hoping someone familiar with 
Riscv pte updates rules would pitch in. IIUC we need to update 
RANDON_ORVALUE similar to how we updated it for s390 and ppc64.



 Alternatively we can do this

modified   mm/debug_vm_pgtable.c
@@ -548,7 +548,7 @@ static void __init pte_clear_tests(struct mm_struct 
*mm, pte_t *ptep,

pte_t pte = pfn_pte(pfn, prot);

pr_debug("Validating PTE clear\n");
-   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+// pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);

till we get that feedback from RiscV team?


https://lkml.kernel.org/r/37a9facc-ca36-290f-3748-16c4a7a77...@arm.com


same as the above.


https://lkml.kernel.org/r/20201011200258.ga91...@roeck-us.net



same as the above.

-aneesh


Re: [PATCH] powerpc/features: Remove CPU_FTR_NODSISRALIGN

2020-10-13 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> CPU_FTR_NODSISRALIGN has not been used since
> commit 31bfdb036f12 ("powerpc: Use instruction emulation
> infrastructure to handle alignment faults")
>
> Remove it.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/cputable.h | 22 ++
>  arch/powerpc/kernel/dt_cpu_ftrs.c   |  8 
>  arch/powerpc/kernel/prom.c  |  2 +-
>  3 files changed, 11 insertions(+), 21 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h 
> b/arch/powerpc/include/asm/cputable.h
> index 9780c55f9811..accdc1286f37 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -137,7 +137,6 @@ static inline void cpu_feature_keys_init(void) { }
>  #define CPU_FTR_DBELLASM_CONST(0x0004)
>  #define CPU_FTR_CAN_NAP  ASM_CONST(0x0008)
>  #define CPU_FTR_DEBUG_LVL_EXCASM_CONST(0x0010)
> -#define CPU_FTR_NODSISRALIGN ASM_CONST(0x0020)
>  #define CPU_FTR_FPU_UNAVAILABLE  ASM_CONST(0x0040)
>  #define CPU_FTR_LWSYNC   ASM_CONST(0x0080)
>  #define CPU_FTR_NOEXECUTEASM_CONST(0x0100)
> @@ -219,7 +218,7 @@ static inline void cpu_feature_keys_init(void) { }
>  
>  #ifndef __ASSEMBLY__
>  
> -#define CPU_FTR_PPCAS_ARCH_V2(CPU_FTR_NOEXECUTE | 
> CPU_FTR_NODSISRALIGN)
> +#define CPU_FTR_PPCAS_ARCH_V2(CPU_FTR_NOEXECUTE)
>  
>  #define MMU_FTR_PPCAS_ARCH_V2(MMU_FTR_TLBIEL | MMU_FTR_16M_PAGE)
>  
> @@ -378,33 +377,33 @@ static inline void cpu_feature_keys_init(void) { }
>   CPU_FTR_COMMON | CPU_FTR_FPU_UNAVAILABLE  | CPU_FTR_NOEXECUTE)
>  #define CPU_FTRS_CLASSIC32   (CPU_FTR_COMMON)
>  #define CPU_FTRS_8XX (CPU_FTR_NOEXECUTE)
> -#define CPU_FTRS_40X (CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
> -#define CPU_FTRS_44X (CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
> -#define CPU_FTRS_440x6   (CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE | \
> +#define CPU_FTRS_40X (CPU_FTR_NOEXECUTE)
> +#define CPU_FTRS_44X (CPU_FTR_NOEXECUTE)
> +#define CPU_FTRS_440x6   (CPU_FTR_NOEXECUTE | \
>   CPU_FTR_INDEXED_DCR)
>  #define CPU_FTRS_47X (CPU_FTRS_440x6)
>  #define CPU_FTRS_E200(CPU_FTR_SPE_COMP | \
> - CPU_FTR_NODSISRALIGN | CPU_FTR_COHERENT_ICACHE | \
> + CPU_FTR_COHERENT_ICACHE | \
>   CPU_FTR_NOEXECUTE | \
>   CPU_FTR_DEBUG_LVL_EXC)
>  #define CPU_FTRS_E500(CPU_FTR_MAYBE_CAN_DOZE | \
> - CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN | \
> + CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | \
>   CPU_FTR_NOEXECUTE)
>  #define CPU_FTRS_E500_2  (CPU_FTR_MAYBE_CAN_DOZE | \
>   CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | \
> - CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
> -#define CPU_FTRS_E500MC  (CPU_FTR_NODSISRALIGN | \
> + CPU_FTR_NOEXECUTE)
> +#define CPU_FTRS_E500MC  ( \
>   CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE | \
>   CPU_FTR_DBELL | CPU_FTR_DEBUG_LVL_EXC | CPU_FTR_EMB_HV)
>  /*
>   * e5500/e6500 erratum A-006958 is a timebase bug that can use the
>   * same workaround as CPU_FTR_CELL_TB_BUG.
>   */
> -#define CPU_FTRS_E5500   (CPU_FTR_NODSISRALIGN | \
> +#define CPU_FTRS_E5500   ( \
>   CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE | \
>   CPU_FTR_DBELL | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>   CPU_FTR_DEBUG_LVL_EXC | CPU_FTR_EMB_HV | CPU_FTR_CELL_TB_BUG)
> -#define CPU_FTRS_E6500   (CPU_FTR_NODSISRALIGN | \
> +#define CPU_FTRS_E6500   ( \
>   CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE | \
>   CPU_FTR_DBELL | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>   CPU_FTR_DEBUG_LVL_EXC | CPU_FTR_EMB_HV | CPU_FTR_ALTIVEC_COMP | \
> @@ -554,7 +553,6 @@ enum {
>  #define CPU_FTRS_DT_CPU_BASE \
>   (CPU_FTR_LWSYNC |   \
>CPU_FTR_FPU_UNAVAILABLE |  \
> -  CPU_FTR_NODSISRALIGN | \
>CPU_FTR_NOEXECUTE |\
>CPU_FTR_COHERENT_ICACHE |  \
>CPU_FTR_STCX_CHECKS_ADDRESS |  \
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 1098863e17ee..c598961d9f15 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -273,13 +273,6 @@ static int __init feat_enable_idle_nap(struct 
> dt_cpu_feature *f)
>   return 1;
>  }
>  
> -static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
> -{
> - cur_cpu_spec->cpu_features &= ~CPU_FTR_NODSISRALIGN;
> -
> - return 1;
> -}
> -
>  static int __init feat_enable_idle_stop(struct dt_cpu_feature *f)
>  {
>   u64 lpcr;
> @@ -641,7 +634,6 @@ static struct dt_cpu_feature_match __initdata
>   {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
>   {"tm-suspend-xer-so-bug", feat_enable, 

Re: [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test

2020-10-11 Thread Aneesh Kumar K.V
Guenter Roeck  writes:

> On Wed, Sep 02, 2020 at 05:12:22PM +0530, Aneesh Kumar K.V wrote:
>> pte_clear_tests operate on an existing pte entry. Make sure that
>> is not a none pte entry.
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>
> This patch causes all riscv64 images to crash. Reverting it
> as well as the follow-up patch fixes the problem, but there are
> still several warning messages starting with
>   BUG kmem_cache (Not tainted): Freechain corrupt
> I did not try to track down this other problem.
>
> A detailed crash log is at
>   
> https://kerneltests.org/builders/qemu-riscv64-next/builds/523/steps/qemubuildcommand/logs/stdio
>
> Bisect log is attached.


https://lore.kernel.org/linux-mm/87zh5wx51b@linux.ibm.com

This was mentioned earlier. The RANDOM_OR_VALUE used is interacting with
some of the riscv page table accessors. 

-aneesh


Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Aneesh Kumar K.V

On 10/8/20 10:32 PM, Linus Torvalds wrote:

On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
 wrote:


In copy_present_page, after we mark the pte non-writable, we should
check for previous dirty bit updates and make sure we don't lose the dirty
bit on reset.


No, we'll just remove that entirely.

Do you have a test-case that shows a problem? I have a patch that I
was going to delay until 5.10 because I didn't think it mattered in
practice..



Unfortunately, I don't have a test case. That was observed by code 
inspection while I was fixing syzkaller report.



The second part of this patch would be to add a sequence count
protection to fast-GUP pinning, so that GUP and fork() couldn't race,
but I haven't written that part.

Here's the first patch anyway. If you actually have a test-case where
this matters, I guess I need to apply it now..

Linus




-aneesh


[RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Aneesh Kumar K.V
In copy_present_page, after we mark the pte non-writable, we should
check for previous dirty bit updates and make sure we don't lose the dirty
bit on reset.

Also, avoid marking the pte write-protected again if copy_present_page
already marked it write-protected.

Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Michal Hocko 
Cc: Kirill Shutemov 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Signed-off-by: Aneesh Kumar K.V 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index bfe202ef6244..f57b1f04d50a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -848,6 +848,9 @@ copy_present_page(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
if (likely(!page_maybe_dma_pinned(page)))
return 1;
 
+   if (pte_dirty(*src_pte))
+   pte = pte_mkdirty(pte);
+
/*
 * Uhhuh. It looks like the page might be a pinned page,
 * and we actually need to copy it. Now we can set the
@@ -904,6 +907,11 @@ copy_present_pte(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
if (retval <= 0)
return retval;
 
+   /*
+* Fetch the src pte value again, copy_present_page
+* could modify it.
+*/
+   pte = *src_pte;
get_page(page);
page_dup_rmap(page, false);
rss[mm_counter(page)]++;
-- 
2.26.2



[PATCH] mm: Avoid using set_pte_at when updating a present pte

2020-10-08 Thread Aneesh Kumar K.V
This avoids the below warning

WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 
set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 30613 Comm: syz-executor.0 Not tainted 
5.9.0-rc8-syzkaller-00156-gc85fb28b6f99 #0
Call Trace:
 [c01cd1f0] panic+0x29c/0x75c kernel/panic.c:231
 [c01cce24] __warn+0x104/0x1b8 kernel/panic.c:600
 [c0d829e4] report_bug+0x1d4/0x380 lib/bug.c:198
 [c0036800] program_check_exception+0x4e0/0x750 
arch/powerpc/kernel/traps.c:1508
 [c00098a8] program_check_common_virt+0x308/0x360
--- interrupt: 700 at set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
LR = set_pte_at+0x2a4/0x3a0 arch/powerpc/mm/pgtable.c:185
 [c05d2a7c] copy_present_page mm/memory.c:857 [inline]
 [c05d2a7c] copy_present_pte mm/memory.c:899 [inline]
 [c05d2a7c] copy_pte_range mm/memory.c:1014 [inline]
 [c05d2a7c] copy_pmd_range mm/memory.c:1092 [inline]
 [c05d2a7c] copy_pud_range mm/memory.c:1127 [inline]
 [c05d2a7c] copy_p4d_range mm/memory.c:1150 [inline]
 [c05d2a7c] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
 [c01c63cc] dup_mmap kernel/fork.c:592 [inline]
 [c01c63cc] dup_mm+0x77c/0xab0 kernel/fork.c:1355
 [c01c8f70] copy_mm kernel/fork.c:1411 [inline]
 [c01c8f70] copy_process+0x1f00/0x2740 kernel/fork.c:2070
 [c01c9b54] _do_fork+0xc4/0x10b0 kernel/fork.c:2429
 [c01caf54] __do_sys_clone3+0x1d4/0x2b0 kernel/fork.c:27

Architecture like ppc64 expects set_pte_at to be not used for updating a
valid pte. This is further explained in
commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")

Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Michal Hocko 
Cc: Kirill Shutemov 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Signed-off-by: Aneesh Kumar K.V 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..bfe202ef6244 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -854,7 +854,7 @@ copy_present_page(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
 * source pte back to being writable.
 */
if (pte_write(pte))
-   set_pte_at(src_mm, addr, src_pte, pte);
+   ptep_set_access_flags(vma, addr, src_pte, pte, 1);
 
new_page = *prealloc;
if (!new_page)
-- 
2.26.2



[PATCH v3 4/4] powerpc/lmb-size: Use addr #size-cells value when fetching lmb-size

2020-10-07 Thread Aneesh Kumar K.V
Make it consistent with other usages.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/radix_pgtable.c|  7 ---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 13 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 78c5afe98359..f8e9eb49d46b 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -498,7 +498,7 @@ static int __init probe_memory_block_size(unsigned long 
node, const char *uname,
  depth, void *data)
 {
unsigned long *mem_block_size = (unsigned long *)data;
-   const __be64 *prop;
+   const __be32 *prop;
int len;
 
if (depth != 1)
@@ -508,13 +508,14 @@ static int __init probe_memory_block_size(unsigned long 
node, const char *uname,
return 0;
 
prop = of_get_flat_dt_prop(node, "ibm,lmb-size", );
-   if (!prop || len < sizeof(__be64))
+
+   if (!prop || len < dt_root_size_cells * sizeof(__be32))
/*
 * Nothing in the device tree
 */
*mem_block_size = MIN_MEMORY_BLOCK_SIZE;
else
-   *mem_block_size = be64_to_cpup(prop);
+   *mem_block_size = of_read_number(prop, dt_root_size_cells);
return 1;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 843db91e39aa..f8aef06b29ec 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -30,12 +30,17 @@ unsigned long pseries_memory_block_size(void)
 
np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
if (np) {
-   const __be64 *size;
+   int len;
+   int size_cells;
+   const __be32 *prop;
 
-   size = of_get_property(np, "ibm,lmb-size", NULL);
-   if (size)
-   memblock_size = be64_to_cpup(size);
+   size_cells = of_n_size_cells(np);
+
+   prop = of_get_property(np, "ibm,lmb-size", );
+   if (prop && len >= size_cells * sizeof(__be32))
+   memblock_size = of_read_number(prop, size_cells);
of_node_put(np);
+
} else  if (machine_is(pseries)) {
/* This fallback really only applies to pseries */
unsigned int memzero_size = 0;
-- 
2.26.2



[PATCH v3 3/4] powerpc/book3s64/radix: Make radix_mem_block_size 64bit

2020-10-07 Thread Aneesh Kumar K.V
Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block 
panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

Fixes: af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for 
hot-plugged memory")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index ddc414ab3c4d..e0b52940e43c 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -70,7 +70,7 @@ extern unsigned int mmu_base_pid;
 /*
  * memory block size used with radix translation.
  */
-extern unsigned int __ro_after_init radix_mem_block_size;
+extern unsigned long __ro_after_init radix_mem_block_size;
 
 #define PRTB_SIZE_SHIFT(mmu_pid_bits + 4)
 #define PRTB_ENTRIES   (1ul << mmu_pid_bits)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 5c8adeb8c955..78c5afe98359 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -34,7 +34,7 @@
 
 unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
-unsigned int radix_mem_block_size __ro_after_init;
+unsigned long radix_mem_block_size __ro_after_init;
 
 static __ref void *early_alloc_pgtable(unsigned long size, int nid,
unsigned long region_start, unsigned long region_end)
-- 
2.26.2



[PATCH v3 2/4] powerpc/memhotplug: Make lmb size 64bit

2020-10-07 Thread Aneesh Kumar K.V
Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block 
panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

This was found by code audit.

Cc: sta...@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V 
---
 .../platforms/pseries/hotplug-memory.c| 43 +--
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0ea976d1cac4..843db91e39aa 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -277,7 +277,7 @@ static int dlpar_offline_lmb(struct drmem_lmb *lmb)
return dlpar_change_lmb_state(lmb, false);
 }
 
-static int pseries_remove_memblock(unsigned long base, unsigned int 
memblock_size)
+static int pseries_remove_memblock(unsigned long base, unsigned long 
memblock_size)
 {
unsigned long block_sz, start_pfn;
int sections_per_block;
@@ -308,10 +308,11 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
 
 static int pseries_remove_mem_node(struct device_node *np)
 {
-   const __be32 *regs;
+   const __be32 *prop;
unsigned long base;
-   unsigned int lmb_size;
+   unsigned long lmb_size;
int ret = -EINVAL;
+   int addr_cells, size_cells;
 
/*
 * Check to see if we are actually removing memory
@@ -322,12 +323,19 @@ static int pseries_remove_mem_node(struct device_node *np)
/*
 * Find the base address and size of the memblock
 */
-   regs = of_get_property(np, "reg", NULL);
-   if (!regs)
+   prop = of_get_property(np, "reg", NULL);
+   if (!prop)
return ret;
 
-   base = be64_to_cpu(*(unsigned long *)regs);
-   lmb_size = be32_to_cpu(regs[3]);
+   addr_cells = of_n_addr_cells(np);
+   size_cells = of_n_size_cells(np);
+
+   /*
+* "reg" property represents (addr,size) tuple.
+*/
+   base = of_read_number(prop, addr_cells);
+   prop += addr_cells;
+   lmb_size = of_read_number(prop, size_cells);
 
pseries_remove_memblock(base, lmb_size);
return 0;
@@ -564,7 +572,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, 
u32 drc_index)
 
 #else
 static inline int pseries_remove_memblock(unsigned long base,
- unsigned int memblock_size)
+ unsigned long memblock_size)
 {
return -EOPNOTSUPP;
 }
@@ -886,10 +894,11 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 
 static int pseries_add_mem_node(struct device_node *np)
 {
-   const __be32 *regs;
+   const __be32 *prop;
unsigned long base;
-   unsigned int lmb_size;
+   unsigned long lmb_size;
int ret = -EINVAL;
+   int addr_cells, size_cells;
 
/*
 * Check to see if we are actually adding memory
@@ -900,12 +909,18 @@ static int pseries_add_mem_node(struct device_node *np)
/*
 * Find the base and size of the memblock
 */
-   regs = of_get_property(np, "reg", NULL);
-   if (!regs)
+   prop = of_get_property(np, "reg", NULL);
+   if (!prop)
return ret;
 
-   base = be64_to_cpu(*(unsigned long *)regs);
-   lmb_size = be32_to_cpu(regs[3]);
+   addr_cells = of_n_addr_cells(np);
+   size_cells = of_n_size_cells(np);
+   /*
+* "reg" property represents (addr,size) tuple.
+*/
+   base = of_read_number(prop, addr_cells);
+   prop += addr_cells;
+   lmb_size = of_read_number(prop, size_cells);
 
/*
 * Update memory region to represent the memory add
-- 
2.26.2



[PATCH v3 1/4] powerpc/drmem: Make lmb_size 64 bit

2020-10-07 Thread Aneesh Kumar K.V
Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block 
panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

This was found by code audit.

Cc: sta...@vger.kernel.org
Acked-by: Nathan Lynch 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/drmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 030a19d92213..bf2402fed3e0 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -20,7 +20,7 @@ struct drmem_lmb {
 struct drmem_lmb_info {
struct drmem_lmb*lmbs;
int n_lmbs;
-   u32 lmb_size;
+   u64 lmb_size;
 };
 
 extern struct drmem_lmb_info *drmem_info;
@@ -80,7 +80,7 @@ struct of_drconf_cell_v2 {
 #define DRCONF_MEM_RESERVED0x0080
 #define DRCONF_MEM_HOTREMOVABLE0x0100
 
-static inline u32 drmem_lmb_size(void)
+static inline u64 drmem_lmb_size(void)
 {
return drmem_info->lmb_size;
 }
-- 
2.26.2



[PATCH v3 0/4] Enable usage of larger LMB ( > 4G)

2020-10-07 Thread Aneesh Kumar K.V
Changes from v2:
* Don't use root addr and size cells during runtime. Walk up the
  device tree and use the first addr and size cells value (of_n_addr_cells()/
  of_n_size_cells())

Aneesh Kumar K.V (4):
  powerpc/drmem: Make lmb_size 64 bit
  powerpc/memhotplug: Make lmb size 64bit
  powerpc/book3s64/radix: Make radix_mem_block_size 64bit
  powerpc/lmb-size: Use addr #size-cells value when fetching lmb-size

 arch/powerpc/include/asm/book3s/64/mmu.h  |  2 +-
 arch/powerpc/include/asm/drmem.h  |  4 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  9 +--
 .../platforms/pseries/hotplug-memory.c| 56 +--
 4 files changed, 46 insertions(+), 25 deletions(-)

-- 
2.26.2



[PATCH v2] powerpc/mm: Update tlbiel loop on POWER10

2020-10-06 Thread Aneesh Kumar K.V
With POWER10, single tlbiel instruction invalidates all the congruence
class of the TLB and hence we need to issue only one tlbiel with SET=0.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kvm/book3s_hv.c |  7 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++-
 arch/powerpc/mm/book3s64/radix_tlb.c | 23 ---
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3bd3118c7633..00b5c5981db5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4939,7 +4939,12 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 * Work out how many sets the TLB has, for the use of
 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
 */
-   if (radix_enabled())
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   /*
+* P10 will flush all the congruence class with a single tlbiel
+*/
+   kvm->arch.tlb_sets = 1;
+   } else if (radix_enabled())
kvm->arch.tlb_sets = POWER9_TLB_SETS_RADIX; /* 128 */
else if (cpu_has_feature(CPU_FTR_ARCH_300))
kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH;  /* 256 */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 073617ce83e0..2803a4b01109 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -702,6 +702,7 @@ static void wait_for_sync(struct kvm_split_mode *sip, int 
phase)
 
 void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
 {
+   int num_sets;
unsigned long rb, set;
 
/* wait for every other thread to get to real mode */
@@ -712,11 +713,19 @@ void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
mtspr(SPRN_LPID, sip->lpidr_req);
isync();
 
+   /*
+* P10 will flush all the congruence class with a single tlbiel
+*/
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   num_sets =  1;
+   else
+   num_sets = POWER9_TLB_SETS_RADIX;
+
/* Invalidate the TLB on thread 0 */
if (local_paca->kvm_hstate.tid == 0) {
sip->do_set = 0;
asm volatile("ptesync" : : : "memory");
-   for (set = 0; set < POWER9_TLB_SETS_RADIX; ++set) {
+   for (set = 0; set < num_sets; ++set) {
rb = TLBIEL_INVAL_SET_LPID +
(set << TLBIEL_INVAL_SET_SHIFT);
asm volatile(PPC_TLBIEL(%0, %1, 0, 0, 0) : :
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 143b4fd396f0..9e76ba766b3c 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -56,14 +56,21 @@ static void tlbiel_all_isa300(unsigned int num_sets, 
unsigned int is)
if (early_cpu_has_feature(CPU_FTR_HVMODE)) {
/* MSR[HV] should flush partition scope translations first. */
tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 0);
-   for (set = 1; set < num_sets; set++)
-   tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 0);
+
+   if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) {
+   for (set = 1; set < num_sets; set++)
+   tlbiel_radix_set_isa300(set, is, 0,
+   RIC_FLUSH_TLB, 0);
+   }
}
 
/* Flush process scoped entries. */
tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 1);
-   for (set = 1; set < num_sets; set++)
-   tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
+
+   if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) {
+   for (set = 1; set < num_sets; set++)
+   tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
+   }
 
asm volatile("ptesync": : :"memory");
 }
@@ -300,9 +307,11 @@ static __always_inline void _tlbiel_pid(unsigned long pid, 
unsigned long ric)
return;
}
 
-   /* For the remaining sets, just flush the TLB */
-   for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
-   __tlbiel_pid(pid, set, RIC_FLUSH_TLB);
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   /* For the remaining sets, just flush the TLB */
+   for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
+   __tlbiel_pid(pid, set, RIC_FLUSH_TLB);
+   }
 
asm volatile("ptesync": : :"memory");
asm volatile(PPC_RADIX_INVALIDATE_ERAT_USER "; isync" : : :"memory");
-- 
2.26.2



[RFC PATCH] powerpc/mm: Support tlbiel set value of 1 on POWER10

2020-10-06 Thread Aneesh Kumar K.V
With POWER10, tlbiel invalidates all the congruence class of TLB
and hence we need to issue only one tlbiel with SET=0. Update
POWER10_TLB_SETS to 1 and use that in the rest of the code.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  1 +
 arch/powerpc/kvm/book3s_hv.c  |  4 +++-
 arch/powerpc/kvm/book3s_hv_builtin.c  |  8 +++-
 arch/powerpc/mm/book3s64/hash_native.c|  4 +++-
 arch/powerpc/mm/book3s64/radix_tlb.c  | 13 ++---
 5 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 683a9c7d1b03..755ae1ea910a 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -129,6 +129,7 @@
 #define POWER8_TLB_SETS512 /* # sets in POWER8 TLB */
 #define POWER9_TLB_SETS_HASH   256 /* # sets in POWER9 TLB Hash mode */
 #define POWER9_TLB_SETS_RADIX  128 /* # sets in POWER9 TLB Radix mode */
+#define POWER10_TLB_SETS   1   /* # sets in POWER10 TLB */
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3bd3118c7633..12553cb55ede 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4939,7 +4939,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 * Work out how many sets the TLB has, for the use of
 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
 */
-   if (radix_enabled())
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   kvm->arch.tlb_sets = POWER10_TLB_SETS;  /* 1 */
+   else if (radix_enabled())
kvm->arch.tlb_sets = POWER9_TLB_SETS_RADIX; /* 128 */
else if (cpu_has_feature(CPU_FTR_ARCH_300))
kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH;  /* 256 */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 073617ce83e0..7dfe38771f3c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -702,6 +702,7 @@ static void wait_for_sync(struct kvm_split_mode *sip, int 
phase)
 
 void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
 {
+   int num_sets;
unsigned long rb, set;
 
/* wait for every other thread to get to real mode */
@@ -712,11 +713,16 @@ void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
mtspr(SPRN_LPID, sip->lpidr_req);
isync();
 
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   num_sets = POWER10_TLB_SETS;
+   else
+   num_sets = POWER9_TLB_SETS_RADIX;
+
/* Invalidate the TLB on thread 0 */
if (local_paca->kvm_hstate.tid == 0) {
sip->do_set = 0;
asm volatile("ptesync" : : : "memory");
-   for (set = 0; set < POWER9_TLB_SETS_RADIX; ++set) {
+   for (set = 0; set < num_sets; ++set) {
rb = TLBIEL_INVAL_SET_LPID +
(set << TLBIEL_INVAL_SET_SHIFT);
asm volatile(PPC_TLBIEL(%0, %1, 0, 0, 0) : :
diff --git a/arch/powerpc/mm/book3s64/hash_native.c 
b/arch/powerpc/mm/book3s64/hash_native.c
index cf20e5229ce1..abea64c804b2 100644
--- a/arch/powerpc/mm/book3s64/hash_native.c
+++ b/arch/powerpc/mm/book3s64/hash_native.c
@@ -130,7 +130,9 @@ void hash__tlbiel_all(unsigned int action)
BUG();
}
 
-   if (early_cpu_has_feature(CPU_FTR_ARCH_300))
+   if (early_cpu_has_feature(CPU_FTR_ARCH_31))
+   tlbiel_all_isa300(POWER10_TLB_SETS, is);
+   else if (early_cpu_has_feature(CPU_FTR_ARCH_300))
tlbiel_all_isa300(POWER9_TLB_SETS_HASH, is);
else if (early_cpu_has_feature(CPU_FTR_ARCH_207S))
tlbiel_all_isa206(POWER8_TLB_SETS, is);
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index 143b4fd396f0..47db637755c4 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -83,7 +83,9 @@ void radix__tlbiel_all(unsigned int action)
BUG();
}
 
-   if (early_cpu_has_feature(CPU_FTR_ARCH_300))
+   if (early_cpu_has_feature(CPU_FTR_ARCH_31))
+   tlbiel_all_isa300(POWER10_TLB_SETS, is);
+   else if (early_cpu_has_feature(CPU_FTR_ARCH_300))
tlbiel_all_isa300(POWER9_TLB_SETS_RADIX, is);
else
WARN(1, "%s called on pre-POWER9 CPU\n", __func__);
@@ -284,7 +286,7 @@ static inline void fixup_tlbie_lpid(unsigned long lpid)
  */
 static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
 {
-   int set;
+   int set, num_sets;
 
asm volatile("ptesync": : :"memory");
 
@@ -300,8 +302,13 @@ static __always_inlin

Re: [PATCH v2] powerpc/tm: Save and restore AMR on treclaim and trechkpt

2020-09-19 Thread Aneesh Kumar K.V
Gustavo Romero  writes:

> Althought AMR is stashed in the checkpoint area, currently we don't save
> it to the per thread checkpoint struct after a treclaim and so we don't
> restore it either from that struct when we trechkpt. As a consequence when
> the transaction is later rolled back the kernel space AMR value when the
> trechkpt was done appears in userspace.
>
> That commit saves and restores AMR accordingly on treclaim and trechkpt.
> Since AMR value is also used in kernel space in other functions, it also
> takes care of stashing kernel live AMR into the stack before treclaim and
> before trechkpt, restoring it later, just before returning from tm_reclaim
> and __tm_recheckpoint.
>
> Is also fixes two nonrelated comments about CR and MSR.
>

Tested-by: Aneesh Kumar K.V 

> Signed-off-by: Gustavo Romero 
> ---
>  arch/powerpc/include/asm/processor.h |  1 +
>  arch/powerpc/kernel/asm-offsets.c|  1 +
>  arch/powerpc/kernel/tm.S | 35 
>  3 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index ed0d633ab5aa..9f4f6cc033ac 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -220,6 +220,7 @@ struct thread_struct {
>   unsigned long   tm_tar;
>   unsigned long   tm_ppr;
>   unsigned long   tm_dscr;
> + unsigned long   tm_amr;
>  
>   /*
>* Checkpointed FP and VSX 0-31 register set.
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 8711c2164b45..c2722ff36e98 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -176,6 +176,7 @@ int main(void)
>   OFFSET(THREAD_TM_TAR, thread_struct, tm_tar);
>   OFFSET(THREAD_TM_PPR, thread_struct, tm_ppr);
>   OFFSET(THREAD_TM_DSCR, thread_struct, tm_dscr);
> + OFFSET(THREAD_TM_AMR, thread_struct, tm_amr);
>   OFFSET(PT_CKPT_REGS, thread_struct, ckpt_regs);
>   OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
>   OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
> diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
> index 6ba0fdd1e7f8..2b91f233b05d 100644
> --- a/arch/powerpc/kernel/tm.S
> +++ b/arch/powerpc/kernel/tm.S
> @@ -122,6 +122,13 @@ _GLOBAL(tm_reclaim)
>   std r3, STK_PARAM(R3)(r1)
>   SAVE_NVGPRS(r1)
>  
> + /*
> +  * Save kernel live AMR since it will be clobbered by treclaim
> +  * but can be used elsewhere later in kernel space.
> +  */
> + mfspr   r3, SPRN_AMR
> + std r3, TM_FRAME_L1(r1)
> +
>   /* We need to setup MSR for VSX register save instructions. */
>   mfmsr   r14
>   mr  r15, r14
> @@ -245,7 +252,7 @@ _GLOBAL(tm_reclaim)
>* but is used in signal return to 'wind back' to the abort handler.
>*/
>  
> - /*  CR,LR,CCR,MSR ** */
> + /* * CTR, LR, CR, XER ** */
>   mfctr   r3
>   mflrr4
>   mfcrr5
> @@ -256,7 +263,6 @@ _GLOBAL(tm_reclaim)
>   std r5, _CCR(r7)
>   std r6, _XER(r7)
>  
> -
>   /*  TAR, DSCR ** */
>   mfspr   r3, SPRN_TAR
>   mfspr   r4, SPRN_DSCR
> @@ -264,6 +270,10 @@ _GLOBAL(tm_reclaim)
>   std r3, THREAD_TM_TAR(r12)
>   std r4, THREAD_TM_DSCR(r12)
>  
> +/*  AMR  */
> +mfsprr3, SPRN_AMR
> +std  r3, THREAD_TM_AMR(r12)
> +
>   /*
>* MSR and flags: We don't change CRs, and we don't need to alter MSR.
>*/
> @@ -308,7 +318,9 @@ _GLOBAL(tm_reclaim)
>   std r3, THREAD_TM_TFHAR(r12)
>   std r4, THREAD_TM_TFIAR(r12)
>  
> - /* AMR is checkpointed too, but is unsupported by Linux. */
> + /* Restore kernel live AMR */
> + ld  r8, TM_FRAME_L1(r1)
> + mtspr   SPRN_AMR, r8
>  
>   /* Restore original MSR/IRQ state & clear TM mode */
>   ld  r14, TM_FRAME_L0(r1)/* Orig MSR */
> @@ -355,6 +367,13 @@ _GLOBAL(__tm_recheckpoint)
>*/
>   SAVE_NVGPRS(r1)
>  
> + /*
> +  * Save kernel live AMR since it will be clobbered for trechkpt
> +  * but can be used elsewhere later in kernel space.
> +  */
> + mfspr   r8, SPRN_AMR
> + std r8, TM_FRAME_L0(r1)
> +
>   /* Load complete register state from ts_ckpt* registers */
>  
>   addir7, r3, PT_CKPT_REGS/* Thread's ckpt_regs */
> @@ -404,7 +423,7 @@ _GLOBAL(

Re: [PATCH] powerpc/tm: Save and restore AMR on treclaim and trechkpt

2020-09-17 Thread Aneesh Kumar K.V

On 9/18/20 9:35 AM, Gustavo Romero wrote:

Althought AMR is stashed on the checkpoint area, currently we don't save
it to the per thread checkpoint struct after a treclaim and so we don't
restore it either from that struct when we trechkpt. As a consequence when
the transaction is later rolled back kernel space AMR value when the
trechkpt was done appears in userspace.

That commit saves and restores AMR accordingly on treclaim and trechkpt.
Since AMR value is also used in kernel space in other functions, it also
takes care of stashing kernel live AMR into PACA before treclaim and before
trechkpt, restoring it later, just before returning from tm_reclaim and
__tm_recheckpoint.

Is also fixes two nonrelated comments about CR and MSR.

Signed-off-by: Gustavo Romero 
---
  arch/powerpc/include/asm/paca.h  |  1 +
  arch/powerpc/include/asm/processor.h |  1 +
  arch/powerpc/kernel/asm-offsets.c|  2 ++
  arch/powerpc/kernel/tm.S | 31 +++-
  4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..44c605181529 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -179,6 +179,7 @@ struct paca_struct {
u64 sprg_vdso;  /* Saved user-visible sprg */
  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
u64 tm_scratch; /* TM scratch area for reclaim */
+   u64 tm_amr; /* Saved Kernel AMR for 
treclaim/trechkpt */
  #endif
  
  #ifdef CONFIG_PPC_POWERNV

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..9f4f6cc033ac 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -220,6 +220,7 @@ struct thread_struct {
unsigned long   tm_tar;
unsigned long   tm_ppr;
unsigned long   tm_dscr;
+   unsigned long   tm_amr;
  
  	/*

 * Checkpointed FP and VSX 0-31 register set.
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..cf1a6d68a91f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -170,12 +170,14 @@ int main(void)
  
  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM

OFFSET(PACATMSCRATCH, paca_struct, tm_scratch);
+   OFFSET(PACATMAMR, paca_struct, tm_amr);
OFFSET(THREAD_TM_TFHAR, thread_struct, tm_tfhar);
OFFSET(THREAD_TM_TEXASR, thread_struct, tm_texasr);
OFFSET(THREAD_TM_TFIAR, thread_struct, tm_tfiar);
OFFSET(THREAD_TM_TAR, thread_struct, tm_tar);
OFFSET(THREAD_TM_PPR, thread_struct, tm_ppr);
OFFSET(THREAD_TM_DSCR, thread_struct, tm_dscr);
+   OFFSET(THREAD_TM_AMR, thread_struct, tm_amr);
OFFSET(PT_CKPT_REGS, thread_struct, ckpt_regs);
OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 6ba0fdd1e7f8..e178ddb43619 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -152,6 +152,10 @@ _GLOBAL(tm_reclaim)
li  r5, 0
mtmsrd  r5, 1
  
+/* Save AMR since it's used elsewhere in kernel space */

+   mfspr   r8, SPRN_AMR
+   std r8, PACATMAMR(r13)



Can we save this in stack instead of PACA?



+
/*
 * BE CAREFUL HERE:
 * At this point we can't take an SLB miss since we have MSR_RI
@@ -245,7 +249,7 @@ _GLOBAL(tm_reclaim)
 * but is used in signal return to 'wind back' to the abort handler.
 */
  
-	/*  CR,LR,CCR,MSR ** */

+   /* * CTR, LR, CR, XER ** */
mfctr   r3
mflrr4
mfcrr5
@@ -256,7 +260,6 @@ _GLOBAL(tm_reclaim)
std r5, _CCR(r7)
std r6, _XER(r7)
  
-

/*  TAR, DSCR ** */
mfspr   r3, SPRN_TAR
mfspr   r4, SPRN_DSCR
@@ -264,6 +267,10 @@ _GLOBAL(tm_reclaim)
std r3, THREAD_TM_TAR(r12)
std r4, THREAD_TM_DSCR(r12)
  
+/*  AMR  */

+mfspr  r3, SPRN_AMR
+stdr3, THREAD_TM_AMR(r12)
+
/*
 * MSR and flags: We don't change CRs, and we don't need to alter MSR.
 */
@@ -308,8 +315,6 @@ _GLOBAL(tm_reclaim)
std r3, THREAD_TM_TFHAR(r12)
std r4, THREAD_TM_TFIAR(r12)
  
-	/* AMR is checkpointed too, but is unsupported by Linux. */

-
/* Restore original MSR/IRQ state & clear TM mode */
ld  r14, TM_FRAME_L0(r1)/* Orig MSR */
  
@@ -330,6 +335,10 @@ _GLOBAL(tm_reclaim)

ld  r0, PACA_DSCR_DEFAULT(r13)
mtspr   SPRN_DSCR, r0
  
+/* Restore kernel saved AMR */

+   ld  r4, PACATMAMR(r13)
+   mtspr   SPRN_AMR, r4
+
blr
  
  
@@ -355,6 

[PATCH] mm/debug_vm_pgtable: Avoid doing memory allocation with pgtable_t mapped.

2020-09-13 Thread Aneesh Kumar K.V
With highmem, pte_alloc_map() keep the level4 page table mapped using
kmap_atomic(). Avoid doing new memory allocation with page table
mapped like above.

[9.409233] BUG: sleeping function called from invalid context at 
mm/page_alloc.c:4822
[9.410557] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: 
swapper
[9.411932] no locks held by swapper/1.
[9.412595] CPU: 0 PID: 1 Comm: swapper Not tainted 
5.9.0-rc3-00323-gc50eb1ed654b5 #2
[9.413824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[9.415207] Call Trace:
[9.415651]  ? ___might_sleep.cold+0xa7/0xcc
[9.416367]  ? __alloc_pages_nodemask+0x14c/0x5b0
[9.417055]  ? swap_migration_tests+0x50/0x293
[9.417704]  ? debug_vm_pgtable+0x4bc/0x708
[9.418287]  ? swap_migration_tests+0x293/0x293
[9.418911]  ? do_one_initcall+0x82/0x3cb
[9.419465]  ? parse_args+0x1bd/0x280
[9.419983]  ? rcu_read_lock_sched_held+0x36/0x60
[9.420673]  ? trace_initcall_level+0x1f/0xf3
[9.421279]  ? trace_initcall_level+0xbd/0xf3
[9.421881]  ? do_basic_setup+0x9d/0xdd
[9.422410]  ? do_basic_setup+0xc3/0xdd
[9.422938]  ? kernel_init_freeable+0x72/0xa3
[9.423539]  ? rest_init+0x134/0x134
[9.424055]  ? kernel_init+0x5/0x12c
[9.424574]  ? ret_from_fork+0x19/0x30

Reported-by: kernel test robot 
Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index d12bde82ae95..612c665a1136 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -994,7 +994,13 @@ static int __init debug_vm_pgtable(void)
p4dp = p4d_alloc(mm, pgdp, vaddr);
pudp = pud_alloc(mm, p4dp, vaddr);
pmdp = pmd_alloc(mm, pudp, vaddr);
-   ptep = pte_alloc_map(mm, pmdp, vaddr);
+   /*
+* Allocate pgtable_t
+*/
+   if (pte_alloc(mm, pmdp)) {
+   pr_err("pgtable allocation failed\n");
+   return 1;
+   }
 
/*
 * Save all the page table page addresses as the page table
@@ -1048,8 +1054,7 @@ static int __init debug_vm_pgtable(void)
 * proper page table lock.
 */
 
-   ptl = pte_lockptr(mm, pmdp);
-   spin_lock(ptl);
+   ptep = pte_offset_map_lock(mm, pmdp, vaddr, );
pte_clear_tests(mm, ptep, pte_aligned, vaddr, prot);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
pte_unmap_unlock(ptep, ptl);
-- 
2.26.2



Re: [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test

2020-09-10 Thread Aneesh Kumar K.V
Nathan Chancellor  writes:

> On Wed, Sep 02, 2020 at 05:12:22PM +0530, Aneesh Kumar K.V wrote:
>> pte_clear_tests operate on an existing pte entry. Make sure that
>> is not a none pte entry.
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  mm/debug_vm_pgtable.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 9afa1354326b..c36530c69e33 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -542,9 +542,10 @@ static void __init pgd_populate_tests(struct mm_struct 
>> *mm, pgd_t *pgdp,
>>  #endif /* PAGETABLE_P4D_FOLDED */
>>  
>>  static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>> -   unsigned long vaddr)
>> +   unsigned long pfn, unsigned long vaddr,
>> +   pgprot_t prot)
>>  {
>> -pte_t pte = ptep_get(ptep);
>> +pte_t pte = pfn_pte(pfn, prot);
>>  
>>  pr_debug("Validating PTE clear\n");
>>  pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>> @@ -1049,7 +1050,7 @@ static int __init debug_vm_pgtable(void)
>>  
>>  ptl = pte_lockptr(mm, pmdp);
>>  spin_lock(ptl);
>> -pte_clear_tests(mm, ptep, vaddr);
>> +pte_clear_tests(mm, ptep, pte_aligned, vaddr, prot);
>>  pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>  pte_unmap_unlock(ptep, ptl);
>>  
>> -- 
> This patch causes a panic at boot for RISC-V defconfig. The rootfs is here if 
> it is needed:
> https://github.com/ClangBuiltLinux/boot-utils/blob/3b21a5b71451742866349ba4f18638c5a754e660/images/riscv/rootfs.cpio.zst
>
> $ make -skj"$(nproc)" ARCH=riscv CROSS_COMPILE=riscv64-linux- O=out/riscv 
> distclean defconfig Image
>
> $ qemu-system-riscv64 -bios default -M virt -display none -initrd rootfs.cpio 
> -kernel Image -m 512m -nodefaults -serial mon:stdio
> ...
>
> OpenSBI v0.6
>_  _
>   / __ \  / |  _ \_   _|
>  | |  | |_ __   ___ _ __ | (___ | |_) || |
>  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>  | |__| | |_) |  __/ | | |) | |_) || |_
>   \/| .__/ \___|_| |_|_/|/_|
> | |
> |_|
>
> Platform Name  : QEMU Virt Machine
> Platform HART Features : RV64ACDFIMSU
> Platform Max HARTs : 8
> Current Hart   : 0
> Firmware Base  : 0x8000
> Firmware Size  : 120 KB
> Runtime SBI Version: 0.2
>
> MIDELEG : 0x0222
> MEDELEG : 0xb109
> PMP0: 0x8000-0x8001 (A)
> PMP1: 0x-0x (A,R,W,X)
> [0.00] Linux version 5.9.0-rc4-next-20200910 
> (nathan@ubuntu-n2-xlarge-x86) (riscv64-linux-gcc (GCC) 10.2.0, GNU ld (GNU 
> Binutils) 2.35) #1 SMP Thu Sep 10 19:10:43 MST 2020
> ...
> [0.294593] NET: Registered protocol family 17
> [0.295781] 9pnet: Installing 9P2000 support
> [0.296153] Key type dns_resolver registered
> [0.296694] debug_vm_pgtable: [debug_vm_pgtable ]: Validating 
> architecture page table helpers
> [0.297635] Unable to handle kernel paging request at virtual address 
> 0a7fffe01dafefc8
> [0.298029] Oops [#1]
> [0.298153] Modules linked in:
> [0.298433] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc4-next-20200910 #1
> [0.298792] epc: ffe000205afc ra : ffe0008be0aa sp : 
> ffe01ae73d40
> [0.299078]  gp : ffe0010b9b48 tp : ffe01ae68000 t0 : 
> ffe008152000
> [0.299362]  t1 :  t2 :  s0 : 
> ffe01ae73d60
> [0.299648]  s1 : bffb a0 : 0a7fffe01dafefc8 a1 : 
> bffb
> [0.299948]  a2 : ffe0010a2698 a3 : 0001 a4 : 
> 0003
> [0.300231]  a5 : 0800 a6 : f080 a7 : 
> 1b642000
> [0.300521]  s2 : ffe0081517b8 s3 : ffe008150a80 s4 : 
> ffe01af3
> [0.300806]  s5 : ffe01f8ca9b8 s6 : ffe00815 s7 : 
> ffe0010bb100
> [0.301161]  s8 : ffe0010bb108 s9 : 00080202 s10: 
> ffe0010bb928
> [0.301481]  s11: 2008085b t3 :  t4 : 
> 
> [0.301722]  t5 :  t6 : ffe00815
> [0.301947] status: 0120 badaddr: 0a7fffe01dafefc8 cause: 
> 000f
> [0.302569] ---[ end trace 7ffb153d816164cf ]---
> [0.302797] note: swapper/0[1] exited with preempt_count 1
> [0.303101] Kernel panic - not

Re: Flushing transparent hugepages

2020-09-09 Thread Aneesh Kumar K.V
Matthew Wilcox  writes:

> PowerPC has special handling of hugetlbfs pages.  Well, that's what
> the config option says, but actually it handles THP as well.  If
> the config option is enabled.
>
> #ifdef CONFIG_HUGETLB_PAGE
> if (PageCompound(page)) {
> flush_dcache_icache_hugepage(page);
> return;
> }
> #endif

I do have a change posted sometime back to avoid that confusion.
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200320103256.229365-1-aneesh.ku...@linux.ibm.com/

But IIUC we use the head page flags (PG_arch_1) to track whether we need
the flush or not.

>
> By the way, THPs can be mapped askew -- that is, at an offset which
> means you can't use a PMD to map a PMD sized page.
>
> Anyway, we don't really have consensus between the various architectures
> on how to handle either THPs or hugetlb pages.  It's not contemplated
> in Documentation/core-api/cachetlb.rst so there's no real surprise
> we've diverged.
>
> What would you _like_ to see?  Would you rather flush_dcache_page()
> were called once for each subpage, or would you rather maintain
> the page-needs-flushing state once per compound page?  We could also
> introduce flush_dcache_thp() if some architectures would prefer it one
> way and one the other, although that brings into question what to do
> for hugetlbfs pages.
>
> It might not be a bad idea to centralise the handling of all this stuff
> somewhere.  Sounds like the kind of thing Arnd would like to do ;-) I'll
> settle for getting enough clear feedback about what the various arch
> maintainers want that I can write a documentation update for cachetlb.rst.


Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes

2020-09-09 Thread Aneesh Kumar K.V
Gerald Schaefer  writes:

> On Fri, 4 Sep 2020 18:01:15 +0200
> Gerald Schaefer  wrote:
>
> [...]
>> 
>> BTW2, a quick test with this change (so far) made the issues on s390
>> go away:
>> 
>> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
>> spin_unlock(ptl);
>> 
>>  #ifndef CONFIG_PPC_BOOK3S_64
>> -   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>> +   hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, 
>> prot);
>>  #endif
>> 
>> spin_lock(>page_table_lock);
>> 
>> That would more match the "pte_t pointer" usage for hugetlb code,
>> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
>> but I think the root cause is the pte_t pointer.
>> 
>> Not entirely sure though if that would really be the correct fix.
>> I somehow lost whatever little track I had about what these tests
>> really want to check, and if that would still be valid with that
>> change.
>
> Uh oh, wasn't aware that this (or some predecessor) already went
> upstream, and broke our debug kernel today.

Not sure i followed the above. Are you finding that s390 kernel crash
after this patch series or the original patchset? As noted in my patch
the hugetlb test is broken and we should fix that. A quick fix is to
comment out that test for s390 too as i have done for PPC64.


-aneesh


Re: [PATCH v1 4/5] powerpc/fault: Avoid heavy search_exception_tables() verification

2020-09-09 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> search_exception_tables() is an heavy operation, we have to avoid it.
> When KUAP is selected, we'll know the fault has been blocked by KUAP.
> Otherwise, it behaves just as if the address was already in the TLBs
> and no fault was generated.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/mm/fault.c | 20 +---
>  1 file changed, 5 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 525e0c2b5406..edde169ba3a6 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -214,24 +214,14 @@ static bool bad_kernel_fault(struct pt_regs *regs, 
> unsigned long error_code,
>   if (address >= TASK_SIZE)
>   return true;
>  
> - if (!is_exec && (error_code & DSISR_PROTFAULT) &&
> - !search_exception_tables(regs->nip)) {
> + // Read/write fault blocked by KUAP is bad, it can never succeed.
> + if (bad_kuap_fault(regs, address, is_write)) {
>   pr_crit_ratelimited("Kernel attempted to access user page (%lx) 
> - exploit attempt? (uid: %d)\n",
> - address,
> - from_kuid(_user_ns, current_uid()));
> - }
> -
> - // Fault on user outside of certain regions (eg. copy_tofrom_user()) is 
> bad
> - if (!search_exception_tables(regs->nip))
> - return true;

We still need to keep this ? Without that we detect the lack of
exception tables pretty late.



> -
> - // Read/write fault in a valid region (the exception table search passed
> - // above), but blocked by KUAP is bad, it can never succeed.
> - if (bad_kuap_fault(regs, address, is_write))
> + address, from_kuid(_user_ns, 
> current_uid()));
>   return true;
> + }
>  
> - // What's left? Kernel fault on user in well defined regions (extable
> - // matched), and allowed by KUAP in the faulting context.
> + // What's left? Kernel fault on user and allowed by KUAP in the 
> faulting context.
>   return false;
>  }
>  
> -- 
> 2.25.0


[PATCH] powerepc/book3s64/hash: Align start/end address correctly with bolt mapping

2020-09-07 Thread Aneesh Kumar K.V
This ensures we don't do a partial mapping of memory. With nvdimm, when
creating namespaces with size not aligned to 16MB, the kernel ends up partially
mapping the pages. This can result in kernel adding multiple hash page table
entries for the same range. A new namespace will result in
create_section_mapping() with start and end overlapping an already existing
bolted hash page table entry.

commit: 6acd7d5ef264 ("libnvdimm/namespace: Enforce memremap_compat_align()")
made sure that we always create namespaces aligned to 16MB. But we can do
better by avoiding mapping pages that are not aligned. This helps to catch
access to these partially mapped pages early.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/hash_utils.c| 12 +---
 arch/powerpc/mm/book3s64/radix_pgtable.c |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index c663e7ba801f..7185bc43b24f 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -260,8 +260,12 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
DBG("htab_bolt_mapping(%lx..%lx -> %lx (%lx,%d,%d)\n",
vstart, vend, pstart, prot, psize, ssize);
 
-   for (vaddr = vstart, paddr = pstart; vaddr < vend;
-vaddr += step, paddr += step) {
+   /* Carefully map only the possible range */
+   vaddr = ALIGN(vstart, step);
+   paddr = ALIGN(pstart, step);
+   vend  = ALIGN_DOWN(vend, step);
+
+   for (; vaddr < vend; vaddr += step, paddr += step) {
unsigned long hash, hpteg;
unsigned long vsid = get_kernel_vsid(vaddr, ssize);
unsigned long vpn  = hpt_vpn(vaddr, vsid, ssize);
@@ -343,7 +347,9 @@ int htab_remove_mapping(unsigned long vstart, unsigned long 
vend,
if (!mmu_hash_ops.hpte_removebolted)
return -ENODEV;
 
-   for (vaddr = vstart; vaddr < vend; vaddr += step) {
+   /* Unmap the full range specificied */
+   vaddr = ALIGN_DOWN(vstart, step);
+   for (;vaddr < vend; vaddr += step) {
rc = mmu_hash_ops.hpte_removebolted(vaddr, psize, ssize);
if (rc == -ENOENT) {
ret = -ENOENT;
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index d5f0c10d752a..5c8adeb8c955 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -276,6 +276,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
int psize;
 
start = ALIGN(start, PAGE_SIZE);
+   end   = ALIGN_DOWN(end, PAGE_SIZE);
for (addr = start; addr < end; addr += mapping_size) {
unsigned long gap, previous_size;
int rc;
-- 
2.26.2



Re: [PATCH v3 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-09-02 Thread Aneesh Kumar K.V
Anshuman Khandual  writes:

> On 09/01/2020 12:00 PM, Aneesh Kumar K.V wrote:
>> On 9/1/20 9:33 AM, Anshuman Khandual wrote:
>>>
>>>
>>> On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:
>>>> The seems to be missing quite a lot of details w.r.t allocating
>>>> the correct pgtable_t page (huge_pte_alloc()), holding the right
>>>> lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA.
>>>>
>>>> ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these.
>>>> Hence disable the test on ppc64.
>>>
>>> Would really like this to get resolved in an uniform and better way
>>> instead, i.e a modified hugetlb_advanced_tests() which works on all
>>> platforms including ppc64.
>>>
>>> In absence of a modified version, I do realize the situation here,
>>> where DEBUG_VM_PGTABLE test either runs on ppc64 or just completely
>>> remove hugetlb_advanced_tests() from other platforms as well.
>>>
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V 
>>>> ---
>>>>   mm/debug_vm_pgtable.c | 4 
>>>>   1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>>> index a188b6e4e37e..21329c7d672f 100644
>>>> --- a/mm/debug_vm_pgtable.c
>>>> +++ b/mm/debug_vm_pgtable.c
>>>> @@ -813,6 +813,7 @@ static void __init hugetlb_basic_tests(unsigned long 
>>>> pfn, pgprot_t prot)
>>>>   #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>>>>   }
>>>>   +#ifndef CONFIG_PPC_BOOK3S_64
>>>>   static void __init hugetlb_advanced_tests(struct mm_struct *mm,
>>>>     struct vm_area_struct *vma,
>>>>     pte_t *ptep, unsigned long pfn,
>>>> @@ -855,6 +856,7 @@ static void __init hugetlb_advanced_tests(struct 
>>>> mm_struct *mm,
>>>>   pte = huge_ptep_get(ptep);
>>>>   WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
>>>>   }
>>>> +#endif
>>>
>>> In the worst case if we could not get a new hugetlb_advanced_tests() test
>>> that works on all platforms, this might be the last fallback option. In
>>> which case, it will require a proper comment section with a "FIXME: ",
>>> explaining the current situation (and that #ifdef is temporary in nature)
>>> and a hugetlb_advanced_tests() stub when CONFIG_PPC_BOOK3S_64 is enabled.
>>>
>>>>   #else  /* !CONFIG_HUGETLB_PAGE */
>>>>   static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) 
>>>> { }
>>>>   static void __init hugetlb_advanced_tests(struct mm_struct *mm,
>>>> @@ -1065,7 +1067,9 @@ static int __init debug_vm_pgtable(void)
>>>>   pud_populate_tests(mm, pudp, saved_pmdp);
>>>>   spin_unlock(ptl);
>>>>   +#ifndef CONFIG_PPC_BOOK3S_64
>>>>   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>>> +#endif
>>>
>> 
>> I actually wanted to add #ifdef BROKEN. That test is completely broken. 
>> Infact I would suggest to remove that test completely.
>> 
>> 
>> 
>>> #ifdef will not be required here as there would be a stub definition
>>> for hugetlb_advanced_tests() when CONFIG_PPC_BOOK3S_64 is enabled.
>>>
>>>>     spin_lock(>page_table_lock);
>>>>   p4d_clear_tests(mm, p4dp);
>>>>
>>>
>>> But again, we should really try and avoid taking this path.
>>>
>> 
>> To be frank i am kind of frustrated with how this patch series is being 
>> looked at. We pushed a completely broken test to upstream and right now we 
>> have a code in upstream that crash when booted on ppc64. My attempt has been 
>> to make progress here and you definitely seems to be not in agreement to 
>> that.
>> 
>
> I am afraid, this does not accurately represent the situation.
>
> - The second set patch series got merged in it's V5 after accommodating almost
>   all reviews and objections during previous discussion cycles. For a complete
>   development log, please refer https://patchwork.kernel.org/cover/11658627/.
>
> - The series has been repeatedly tested on arm64 and x86 platforms for 
> multiple
>   configurations but build tested on all other enabled platforms. I have 
> always
>   been dependent on voluntary help from folks on the list to get this tested 
> on
>   other enabled platfor

Re: [PATCH v4 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-09-02 Thread Aneesh Kumar K.V

On 9/2/20 6:10 PM, Christophe Leroy wrote:



Le 02/09/2020 à 13:42, Aneesh Kumar K.V a écrit :
ppc64 supports huge vmap only with radix translation. Hence use arch 
helper

to determine the huge vmap support.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 00649b47f6e0..4c73e63b4ceb 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
@@ -206,11 +207,12 @@ static void __init pmd_leaf_tests(unsigned long 
pfn, pgprot_t prot)

  WARN_ON(!pmd_leaf(pmd));
  }
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
  static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, 
pgprot_t prot)

  {
  pmd_t pmd;
-    if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+    if (!arch_ioremap_pmd_supported())


What about moving ioremap_pmd_enabled() from mm/ioremap.c to some .h, 
and using it ?

As ioremap_pmd_enabled() is defined at all time, no need of #ifdef



yes. This was discussed earlier too. IMHO we should do that outside this 
series. I guess figuring out ioremap_pmd/pud support can definitely be 
simplified. With a generic version like


#ifndef arch_ioremap_pmd_supported
static inline bool arch_ioremap_pmd_supported(void)
{
return false;
}
#endif



  return;
  pr_debug("Validating PMD huge\n");
@@ -224,6 +226,9 @@ static void __init pmd_huge_tests(pmd_t *pmdp, 
unsigned long pfn, pgprot_t prot)

  pmd = READ_ONCE(*pmdp);
  WARN_ON(!pmd_none(pmd));
  }
+#else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, 
pgprot_t prot) { }

+#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
  static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t 
prot)

  {
@@ -320,11 +325,12 @@ static void __init pud_leaf_tests(unsigned long 
pfn, pgprot_t prot)

  WARN_ON(!pud_leaf(pud));
  }
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
  static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, 
pgprot_t prot)

  {
  pud_t pud;
-    if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+    if (!arch_ioremap_pud_supported())


What about moving ioremap_pud_enabled() from mm/ioremap.c to some .h, 
and using it ?

As ioremap_pud_enabled() is defined at all time, no need of #ifdef


  return;
  pr_debug("Validating PUD huge\n");
@@ -338,6 +344,10 @@ static void __init pud_huge_tests(pud_t *pudp, 
unsigned long pfn, pgprot_t prot)

  pud = READ_ONCE(*pudp);
  WARN_ON(!pud_none(pud));
  }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, 
pgprot_t prot) { }

+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
  #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
  static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) 
{ }

  static void __init pud_advanced_tests(struct mm_struct *mm,



Christophe




[PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test

2020-09-02 Thread Aneesh Kumar K.V
pte_clear_tests operate on an existing pte entry. Make sure that
is not a none pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 9afa1354326b..c36530c69e33 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -542,9 +542,10 @@ static void __init pgd_populate_tests(struct mm_struct 
*mm, pgd_t *pgdp,
 #endif /* PAGETABLE_P4D_FOLDED */
 
 static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
-  unsigned long vaddr)
+  unsigned long pfn, unsigned long vaddr,
+  pgprot_t prot)
 {
-   pte_t pte = ptep_get(ptep);
+   pte_t pte = pfn_pte(pfn, prot);
 
pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -1049,7 +1050,7 @@ static int __init debug_vm_pgtable(void)
 
ptl = pte_lockptr(mm, pmdp);
spin_lock(ptl);
-   pte_clear_tests(mm, ptep, vaddr);
+   pte_clear_tests(mm, ptep, pte_aligned, vaddr, prot);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
pte_unmap_unlock(ptep, ptl);
 
-- 
2.26.2



[PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test

2020-09-02 Thread Aneesh Kumar K.V
pte_clear_tests operate on an existing pte entry. Make sure that
is not a none pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 9afa1354326b..c36530c69e33 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -542,9 +542,10 @@ static void __init pgd_populate_tests(struct mm_struct 
*mm, pgd_t *pgdp,
 #endif /* PAGETABLE_P4D_FOLDED */
 
 static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
-  unsigned long vaddr)
+  unsigned long pfn, unsigned long vaddr,
+  pgprot_t prot)
 {
-   pte_t pte = ptep_get(ptep);
+   pte_t pte = pfn_pte(pfn, prot);
 
pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -1049,7 +1050,7 @@ static int __init debug_vm_pgtable(void)
 
ptl = pte_lockptr(mm, pmdp);
spin_lock(ptl);
-   pte_clear_tests(mm, ptep, vaddr);
+   pte_clear_tests(mm, ptep, pte_aligned, vaddr, prot);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
pte_unmap_unlock(ptep, ptl);
 
-- 
2.26.2



[PATCH v4 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-09-02 Thread Aneesh Kumar K.V
The seems to be missing quite a lot of details w.r.t allocating
the correct pgtable_t page (huge_pte_alloc()), holding the right
lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA.

ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these.
Hence disable the test on ppc64.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index b53903fdee85..9afa1354326b 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -811,6 +811,7 @@ static void __init hugetlb_basic_tests(unsigned long pfn, 
pgprot_t prot)
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 }
 
+#ifndef CONFIG_PPC_BOOK3S_64
 static void __init hugetlb_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma,
  pte_t *ptep, unsigned long pfn,
@@ -853,6 +854,7 @@ static void __init hugetlb_advanced_tests(struct mm_struct 
*mm,
pte = huge_ptep_get(ptep);
WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
 }
+#endif
 #else  /* !CONFIG_HUGETLB_PAGE */
 static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init hugetlb_advanced_tests(struct mm_struct *mm,
@@ -1065,7 +1067,9 @@ static int __init debug_vm_pgtable(void)
pud_populate_tests(mm, pudp, saved_pmdp);
spin_unlock(ptl);
 
+#ifndef CONFIG_PPC_BOOK3S_64
hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+#endif
 
spin_lock(>page_table_lock);
p4d_clear_tests(mm, p4dp);
-- 
2.26.2



[PATCH v4 11/13] mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries

2020-09-02 Thread Aneesh Kumar K.V
pmd_clear() should not be used to clear pmd level pte entries.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 26023d990bd0..b53903fdee85 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -196,6 +196,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
 
+   /*  Clear the pte entries  */
+   pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pgtable = pgtable_trans_huge_withdraw(mm, pmdp);
 }
 
@@ -319,6 +321,8 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pudp_test_and_clear_young(vma, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_young(pud));
+
+   pudp_huge_get_and_clear(mm, vaddr, pudp);
 }
 
 static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
@@ -442,8 +446,6 @@ static void __init pud_populate_tests(struct mm_struct *mm, 
pud_t *pudp,
 * This entry points to next level page table page.
 * Hence this must not qualify as pud_bad().
 */
-   pmd_clear(pmdp);
-   pud_clear(pudp);
pud_populate(mm, pudp, pmdp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_bad(pud));
@@ -575,7 +577,6 @@ static void __init pmd_populate_tests(struct mm_struct *mm, 
pmd_t *pmdp,
 * This entry points to next level page table page.
 * Hence this must not qualify as pmd_bad().
 */
-   pmd_clear(pmdp);
pmd_populate(mm, pmdp, pgtable);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_bad(pmd));
-- 
2.26.2



[PATCH v4 10/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-02 Thread Aneesh Kumar K.V
Architectures like ppc64 use deposited page table while updating the
huge pte entries.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 2bc1952e5f83..26023d990bd0 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -154,7 +154,7 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
 static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
 {
pmd_t pmd;
 
@@ -165,6 +165,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
+   pgtable_trans_huge_deposit(mm, pmdp, pgtable);
+
pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
@@ -193,6 +195,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_test_and_clear_young(vma, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmdp);
 }
 
 static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
@@ -371,7 +375,7 @@ static void __init pud_basic_tests(unsigned long pfn, 
pgprot_t prot) { }
 static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
 {
 }
 static void __init pud_advanced_tests(struct mm_struct *mm,
@@ -1048,7 +1052,7 @@ static int __init debug_vm_pgtable(void)
 
ptl = pmd_lock(mm, pmdp);
pmd_clear_tests(mm, pmdp);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pmd_huge_tests(pmdp, pmd_aligned, prot);
pmd_populate_tests(mm, pmdp, saved_ptep);
spin_unlock(ptl);
-- 
2.26.2



[PATCH v4 09/13] mm/debug_vm_pgtable/locks: Take correct page table lock

2020-09-02 Thread Aneesh Kumar K.V
Make sure we call pte accessors with correct lock held.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index f59cf6a9b05e..2bc1952e5f83 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1035,30 +1035,39 @@ static int __init debug_vm_pgtable(void)
 
hugetlb_basic_tests(pte_aligned, prot);
 
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
+   /*
+* Page table modifying tests. They need to hold
+* proper page table lock.
+*/
 
ptl = pte_lockptr(mm, pmdp);
spin_lock(ptl);
-
+   pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
+   pte_unmap_unlock(ptep, ptl);
 
+   ptl = pmd_lock(mm, pmdp);
+   pmd_clear_tests(mm, pmdp);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   spin_unlock(ptl);
+
+   ptl = pud_lock(mm, pudp);
+   pud_clear_tests(mm, pudp);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
pud_huge_tests(pudp, pud_aligned, prot);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_unlock(ptl);
 
-   pte_unmap_unlock(ptep, ptl);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
 
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_lock(>page_table_lock);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
p4d_populate_tests(mm, p4dp, saved_pudp);
pgd_populate_tests(mm, pgdp, saved_p4dp);
+   spin_unlock(>page_table_lock);
 
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
-- 
2.26.2



[PATCH v4 08/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-02 Thread Aneesh Kumar K.V
This will help in adding proper locks in a later patch

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 51 ---
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index de333871f407..f59cf6a9b05e 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -986,7 +986,7 @@ static int __init debug_vm_pgtable(void)
p4dp = p4d_alloc(mm, pgdp, vaddr);
pudp = pud_alloc(mm, p4dp, vaddr);
pmdp = pmd_alloc(mm, pudp, vaddr);
-   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   ptep = pte_alloc_map(mm, pmdp, vaddr);
 
/*
 * Save all the page table page addresses as the page table
@@ -1006,33 +1006,12 @@ static int __init debug_vm_pgtable(void)
p4d_basic_tests(p4d_aligned, prot);
pgd_basic_tests(pgd_aligned, prot);
 
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
-
-   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);
 
-   pmd_huge_tests(pmdp, pmd_aligned, prot);
-   pud_huge_tests(pudp, pud_aligned, prot);
-
pte_savedwrite_tests(pte_aligned, protnone);
pmd_savedwrite_tests(pmd_aligned, protnone);
 
-   pte_unmap_unlock(ptep, ptl);
-
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
-   p4d_populate_tests(mm, p4dp, saved_pudp);
-   pgd_populate_tests(mm, pgdp, saved_p4dp);
-
pte_special_tests(pte_aligned, prot);
pte_protnone_tests(pte_aligned, protnone);
pmd_protnone_tests(pmd_aligned, protnone);
@@ -1050,11 +1029,37 @@ static int __init debug_vm_pgtable(void)
pmd_swap_tests(pmd_aligned, prot);
 
swap_migration_tests();
-   hugetlb_basic_tests(pte_aligned, prot);
 
pmd_thp_tests(pmd_aligned, prot);
pud_thp_tests(pud_aligned, prot);
 
+   hugetlb_basic_tests(pte_aligned, prot);
+
+   pte_clear_tests(mm, ptep, vaddr);
+   pmd_clear_tests(mm, pmdp);
+   pud_clear_tests(mm, pudp);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
+
+   ptl = pte_lockptr(mm, pmdp);
+   spin_lock(ptl);
+
+   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+
+
+   pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pud_huge_tests(pudp, pud_aligned, prot);
+
+   pte_unmap_unlock(ptep, ptl);
+
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   p4d_populate_tests(mm, p4dp, saved_pudp);
+   pgd_populate_tests(mm, pgdp, saved_p4dp);
+
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
pmd_free(mm, saved_pmdp);
-- 
2.26.2



[PATCH v4 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry

2020-09-02 Thread Aneesh Kumar K.V
set_pte_at() should not be used to set a pte entry at locations that
already holds a valid pte entry. Architectures like ppc64 don't do TLB
invalidate in set_pte_at() and hence expect it to be used to set locations
that are not a valid PTE.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 35 +++
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 9cafed39c236..de333871f407 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -79,15 +79,18 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   /*
+* Architectures optimize set_pte_at by avoiding TLB flush.
+* This requires set_pte_at to be not used to update an
+* existing pte entry. Clear pte before we do set_pte_at
+*/
+
pr_debug("Validating PTE advanced\n");
pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);
ptep_set_wrprotect(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(pte_write(pte));
-
-   pte = pfn_pte(pfn, prot);
-   set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
@@ -101,13 +104,11 @@ static void __init pte_advanced_tests(struct mm_struct 
*mm,
ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
pte = ptep_get(ptep);
WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
-
-   pte = pfn_pte(pfn, prot);
-   set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear_full(mm, vaddr, ptep, 1);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
 
+   pte = pfn_pte(pfn, prot);
pte = pte_mkyoung(pte);
set_pte_at(mm, vaddr, ptep, pte);
ptep_test_and_clear_young(vma, vaddr, ptep);
@@ -169,9 +170,6 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
-
-   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-   set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
@@ -185,13 +183,11 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
-
-   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-   set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_mkyoung(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_test_and_clear_young(vma, vaddr, pmdp);
@@ -292,17 +288,9 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
WARN_ON(pud_write(pud));
 
 #ifndef __PAGETABLE_PMD_FOLDED
-   pud = pud_mkhuge(pfn_pud(pfn, prot));
-   set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
-
-   pud = pud_mkhuge(pfn_pud(pfn, prot));
-   set_pud_at(mm, vaddr, pudp, pud);
-   pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
-   pud = READ_ONCE(*pudp);
-   WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
 
pud = pud_mkhuge(pfn_pud(pfn, prot));
@@ -315,6 +303,13 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pud = READ_ONCE(*pudp);
WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
 
+#ifndef __PAGETABLE_PMD_FOLDED
+   pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
+   pud = READ_ONCE(*pudp);
+   WARN_ON(!pud_none(pud));
+#endif /* __PAGETABLE_PMD_FOLDED */
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_mkyoung(pud);
set_pud_at(mm, vaddr, pudp, pud);
pudp_test_and_clear_young(vma, vaddr, pudp);
-- 
2.26.2



[PATCH v4 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at

2020-09-02 Thread Aneesh Kumar K.V
kernel expects entries to be marked huge before we use
set_pmd_at()/set_pud_at().

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 8704901f6bd8..9cafed39c236 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -155,7 +155,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
 {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd;
 
if (!has_transparent_hugepage())
return;
@@ -164,19 +164,19 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_wrprotect(pmd);
pmd = pmd_mkclean(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
@@ -236,7 +236,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
 
if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
return;
@@ -276,7 +276,7 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
 {
-   pud_t pud = pfn_pud(pfn, prot);
+   pud_t pud;
 
if (!has_transparent_hugepage())
return;
@@ -285,25 +285,27 @@ static void __init pud_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PUD_SIZE */
vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
 
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_set_wrprotect(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_write(pud));
 
 #ifndef __PAGETABLE_PMD_FOLDED
-   pud = pfn_pud(pfn, prot);
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 
-   pud = pfn_pud(pfn, prot);
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
-   pud = pfn_pud(pfn, prot);
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_wrprotect(pud);
pud = pud_mkclean(pud);
set_pud_at(mm, vaddr, pudp, pud);
-- 
2.26.2



[PATCH v4 05/13] mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING

2020-09-02 Thread Aneesh Kumar K.V
Saved write support was added to track the write bit of a pte after
marking the pte protnone. This was done so that AUTONUMA can convert
a write pte to protnone and still track the old write bit. When converting
it back we set the pte write bit correctly thereby avoiding a write fault
again. Hence enable the test only when CONFIG_NUMA_BALANCING is enabled and
use protnone protflags.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 4c73e63b4ceb..8704901f6bd8 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -119,10 +119,14 @@ static void __init pte_savedwrite_tests(unsigned long 
pfn, pgprot_t prot)
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
pr_debug("Validating PTE saved write\n");
WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte;
WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte;
 }
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -234,6 +238,9 @@ static void __init pmd_savedwrite_tests(unsigned long pfn, 
pgprot_t prot)
 {
pmd_t pmd = pfn_pmd(pfn, prot);
 
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
pr_debug("Validating PMD saved write\n");
WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd;
WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd;
@@ -1019,8 +1026,8 @@ static int __init debug_vm_pgtable(void)
pmd_huge_tests(pmdp, pmd_aligned, prot);
pud_huge_tests(pudp, pud_aligned, prot);
 
-   pte_savedwrite_tests(pte_aligned, prot);
-   pmd_savedwrite_tests(pmd_aligned, prot);
+   pte_savedwrite_tests(pte_aligned, protnone);
+   pmd_savedwrite_tests(pmd_aligned, protnone);
 
pte_unmap_unlock(ptep, ptl);
 
-- 
2.26.2



[PATCH v4 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-09-02 Thread Aneesh Kumar K.V
ppc64 supports huge vmap only with radix translation. Hence use arch helper
to determine the huge vmap support.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 00649b47f6e0..4c73e63b4ceb 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -206,11 +207,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pmd_leaf(pmd));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
prot)
 {
pmd_t pmd;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pmd_supported())
return;
 
pr_debug("Validating PMD huge\n");
@@ -224,6 +226,9 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 }
+#else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
prot) { }
+#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -320,11 +325,12 @@ static void __init pud_leaf_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pud_leaf(pud));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
prot)
 {
pud_t pud;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pud_supported())
return;
 
pr_debug("Validating PUD huge\n");
@@ -338,6 +344,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned 
long pfn, pgprot_t prot)
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
 static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init pud_advanced_tests(struct mm_struct *mm,
-- 
2.26.2



[PATCH v4 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-02 Thread Aneesh Kumar K.V
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting
that bit in random value.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..00649b47f6e0 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
  * entry type. But these bits might affect the ability to clear entries with
  * pxx_clear() because of how dynamic page table folding works on s390. So
  * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
  */
-#define S390_MASK_BITS 4
-#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#if __BITS_PER_LONG == 64
+#define PPC64_SKIP_MASKGENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK0x0
+#endif
+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
 #define RANDOM_NZVALUE GENMASK(7, 0)
 
 static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
-- 
2.26.2



[PATCH v4 01/13] powerpc/mm: Add DEBUG_VM WARN for pmd_clear

2020-09-02 Thread Aneesh Kumar K.V
With the hash page table, the kernel should not use pmd_clear for clearing
huge pte entries. Add a DEBUG_VM WARN to catch the wrong usage.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..079211968987 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -868,6 +868,13 @@ static inline bool pte_ci(pte_t pte)
 
 static inline void pmd_clear(pmd_t *pmdp)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+   /*
+* Don't use this if we can possibly have a hash page table
+* entry mapping this.
+*/
+   WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == 
(H_PAGE_HASHPTE | _PAGE_PTE));
+   }
*pmdp = __pmd(0);
 }
 
@@ -916,6 +923,13 @@ static inline int pmd_bad(pmd_t pmd)
 
 static inline void pud_clear(pud_t *pudp)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+   /*
+* Don't use this if we can possibly have a hash page table
+* entry mapping this.
+*/
+   WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == 
(H_PAGE_HASHPTE | _PAGE_PTE));
+   }
*pudp = __pud(0);
 }
 
-- 
2.26.2



[PATCH v4 02/13] powerpc/mm: Move setting pte specific flags to pfn_pte

2020-09-02 Thread Aneesh Kumar K.V
powerpc used to set the pte specific flags in set_pte_at(). This is
different from other architectures. To be consistent with other
architecture update pfn_pte to set _PAGE_PTE on ppc64. Also, drop now
unused pte_mkpte.

We add a VM_WARN_ON() to catch the usage of calling set_pte_at()
without setting _PAGE_PTE bit. We will remove that after a few releases.

With respect to huge pmd entries, pmd_mkhuge() takes care of adding the
_PAGE_PTE bit.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +--
 arch/powerpc/include/asm/nohash/pgtable.h|  5 -
 arch/powerpc/mm/pgtable.c|  5 -
 3 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 079211968987..2382fd516f6b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -619,7 +619,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t 
pgprot)
VM_BUG_ON(pfn >> (64 - PAGE_SHIFT));
VM_BUG_ON((pfn << PAGE_SHIFT) & ~PTE_RPN_MASK);
 
-   return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot));
+   return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | 
_PAGE_PTE);
 }
 
 static inline unsigned long pte_pfn(pte_t pte)
@@ -655,11 +655,6 @@ static inline pte_t pte_mkexec(pte_t pte)
return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC));
 }
 
-static inline pte_t pte_mkpte(pte_t pte)
-{
-   return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pte_t pte_mkwrite(pte_t pte)
 {
/*
@@ -823,6 +818,14 @@ static inline int pte_none(pte_t pte)
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
 {
+
+   VM_WARN_ON(!(pte_raw(pte) & cpu_to_be64(_PAGE_PTE)));
+   /*
+* Keep the _PAGE_PTE added till we are sure we handle _PAGE_PTE
+* in all the callers.
+*/
+pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
+
if (radix_enabled())
return radix__set_pte_at(mm, addr, ptep, pte, percpu);
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..6277e7596ae5 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -140,11 +140,6 @@ static inline pte_t pte_mkold(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
 }
 
-static inline pte_t pte_mkpte(pte_t pte)
-{
-   return pte;
-}
-
 static inline pte_t pte_mkspecial(pte_t pte)
 {
return __pte(pte_val(pte) | _PAGE_SPECIAL);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9c0547d77af3..ab57b07ef39a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -184,9 +184,6 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
 */
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   /* Add the pte bit when trying to set a pte */
-   pte = pte_mkpte(pte);
-
/* Note: mm->context.id might not yet have been assigned as
 * this context might not have been activated yet when this
 * is called.
@@ -275,8 +272,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep, pte_
 */
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   pte = pte_mkpte(pte);
-
pte = set_pte_filter(pte);
 
val = pte_val(pte);
-- 
2.26.2



[PATCH v4 00/13] mm/debug_vm_pgtable fixes

2020-09-02 Thread Aneesh Kumar K.V
This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. 
We can
merge them via ppc64 tree if required.

Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
page table update rules.

These tests are broken w.r.t page table update rules and results in kernel
crash as below. 

[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

Changes from v3:
* Address review feedback
* Move page table depost and withdraw patch after adding pmdlock to avoid 
bisect failure.

Changes from v2:
* Fix build failure with different configs and architecture.

Changes from v1:
* Address review feedback
* drop test specific pfn_pte and pfn_pmd.
* Update ppc64 page table helper to add _PAGE_PTE 


Aneesh Kumar K.V (13):
  powerpc/mm: Add DEBUG_VM WARN for pmd_clear
  powerpc/mm: Move setting pte specific flags to pfn_pte
  mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
  mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
vmap support.
  mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
CONFIG_NUMA_BALANCING
  mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
set_pmd/pud_at
  mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
existing pte entry
  mm/debug_vm_pgtable/locks: Move non page table modifying test together
  mm/debug_vm_pgtable/locks: Take correct page table lock
  mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
  mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
  mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
  mm/debug_vm_pgtable: Avoid none pte in pte_clear_test

 arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
 arch/powerpc/include/asm/nohash/pgtable.h|   5 -
 arch/powerpc/mm/pgtable.c|   5 -
 mm/debug_vm_pgtable.c| 171 ---
 4 files changed, 131 insertions(+), 79 deletions(-)

-- 
2.26.2



Re: [PATCH] powerpc: Fix random segfault when freeing hugetlb range

2020-09-02 Thread Aneesh Kumar K.V

On 9/2/20 1:41 PM, Christophe Leroy wrote:



Le 02/09/2020 à 05:23, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


The following random segfault is observed from time to time with
map_hugetlb selftest:

root@localhost:~# ./map_hugetlb 1 19
524288 kB hugepages
Mapping 1 Mbytes
Segmentation fault

[   31.219972] map_hugetlb[365]: segfault (11) at 117 nip 77974f8c lr 
779a6834 code 1 in ld-2.23.so[77966000+21000]
[   31.220192] map_hugetlb[365]: code: 9421ffc0 480318d1 93410028 
90010044 9361002c 93810030 93a10034 93c10038
[   31.220307] map_hugetlb[365]: code: 93e1003c 93210024 8123007c 
81430038 <80e90004> 814a0004 7f443a14 813a0004
[   31.221911] BUG: Bad rss-counter state mm:(ptrval) 
type:MM_FILEPAGES val:33
[   31.229362] BUG: Bad rss-counter state mm:(ptrval) 
type:MM_ANONPAGES val:5


This fault is due to hugetlb_free_pgd_range() freeing page tables
that are also used by regular pages.

As explain in the comment at the beginning of
hugetlb_free_pgd_range(), the verification done in free_pgd_range()
on floor and ceiling is not done here, which means
hugetlb_free_pte_range() can free outside the expected range.

As the verification cannot be done in hugetlb_free_pgd_range(), it
must be done in hugetlb_free_pte_range().



Reviewed-by: Aneesh Kumar K.V 

Fixes: b250c8c08c79 ("powerpc/8xx: Manage 512k huge pages as standard 
pages.")

Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/mm/hugetlbpage.c | 18 --
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c 
b/arch/powerpc/mm/hugetlbpage.c

index 26292544630f..e7ae2a2c4545 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -330,10 +330,24 @@ static void free_hugepd_range(struct mmu_gather 
*tlb, hugepd_t *hpdp, int pdshif

   get_hugepd_cache_index(pdshift - shift));
  }
-static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t 
*pmd, unsigned long addr)

+static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
+   unsigned long addr, unsigned long end,
+   unsigned long floor, unsigned long ceiling)
  {
+    unsigned long start = addr;
  pgtable_t token = pmd_pgtable(*pmd);
+    start &= PMD_MASK;
+    if (start < floor)
+    return;
+    if (ceiling) {
+    ceiling &= PMD_MASK;
+    if (!ceiling)
+    return;
+    }
+    if (end - 1 > ceiling - 1)
+    return;
+


We do repeat that for pud/pmd/pte hugetlb_free_range. Can we consolidate
that with comment explaining we are checking if the pgtable entry is
mapping outside the range?


I was thinking about refactoring that into a helper and add all the 
necessary comments to explain what it does.


Will do that in a followup series if you are OK. This patch is a bug fix 
and also have to go through stable.




agreed.

Thanks.
-aneesh


[PATCH v3] powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc

2020-09-01 Thread Aneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support until we get the tests updated.

[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V 
---
 Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 2 +-
 arch/powerpc/Kconfig   | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt 
b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
index 53da483c8326..1c49723e7534 100644
--- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -22,7 +22,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: |  ok  |
+| powerpc: | TODO |
 |   riscv: |  ok  |
 |s390: |  ok  |
 |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
-- 
2.26.2



Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Aneesh Kumar K.V

On 9/2/20 9:19 AM, Anshuman Khandual wrote:



On 09/01/2020 03:28 PM, Aneesh Kumar K.V wrote:

On 9/1/20 1:08 PM, Anshuman Khandual wrote:



On 09/01/2020 12:07 PM, Aneesh Kumar K.V wrote:

On 9/1/20 8:55 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
    mm/debug_vm_pgtable.c | 6 --
    1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 21329c7d672f..8527ebb75f2c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
    static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr)
    {
-    pte_t pte = ptep_get(ptep);
+    pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);


Seems like ptep_get_and_clear() here just clears the entry in preparation
for a following set_pte_at() which otherwise would have been a problem on
ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
existing valid entry. So the commit message here is bit misleading.



and also fetch the pte value which is used further.



      pr_debug("Validating PTE clear\n");
    pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
    p4d_t *p4dp, *saved_p4dp;
    pud_t *pudp, *saved_pudp;
    pmd_t *pmdp, *saved_pmdp, pmd;
-    pte_t *ptep;
+    pte_t *ptep, pte;
    pgtable_t saved_ptep;
    pgprot_t prot, protnone;
    phys_addr_t paddr;
@@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
     */
      ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+    pte = pfn_pte(pte_aligned, prot);
+    set_pte_at(mm, vaddr, ptep, pte);


Not here, creating and populating an entry must be done in respective
test functions itself. Besides, this seems bit redundant as well. The
test pte_clear_tests() with the above change added, already

- Clears the PTEP entry with ptep_get_and_clear()


and fetch the old value set previously.


In that case, please move above two lines i.e

pte = pfn_pte(pte_aligned, prot);
set_pte_at(mm, vaddr, ptep, pte);

from debug_vm_pgtable() to pte_clear_tests() and update it's arguments
as required.



Frankly, I don't understand what these tests are testing. It all looks like 
some random clear and set.


The idea here is to have some value with some randomness preferably, in
a given PTEP before attempting to clear the entry, in order to make sure
that pte_clear() is indeed clearing something of non-zero value.



static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
    unsigned long vaddr, unsigned long pfn,
    pgprot_t prot)
{

 pte_t pte = pfn_pte(pfn, prot);
 set_pte_at(mm, vaddr, ptep, pte);

 pte =  ptep_get_and_clear(mm, vaddr, ptep);


Looking at this again, this preceding pfn_pte() followed by set_pte_at()
is not really required. Its reasonable to start with what ever was there
in the PTEP as a seed value which anyway gets added with RANDOM_ORVALUE.
s/ptep_get/ptep_get_and_clear is sufficient to take care of the powerpc
set_pte_at() constraint.



But the way test is written we had none pte before. That is why I added 
that set_pte_at to put something there. With none pte the below sequence 
fails.


pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);


because nobody is marking a _PAGE_PTE there.

pte_t pte = pfn_pte(pfn, prot);

pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));

will that work for you?

-aneesh


Re: [PATCH] powerpc: Fix random segfault when freeing hugetlb range

2020-09-01 Thread Aneesh Kumar K.V
Christophe Leroy  writes:

> The following random segfault is observed from time to time with
> map_hugetlb selftest:
>
> root@localhost:~# ./map_hugetlb 1 19
> 524288 kB hugepages
> Mapping 1 Mbytes
> Segmentation fault
>
> [   31.219972] map_hugetlb[365]: segfault (11) at 117 nip 77974f8c lr 
> 779a6834 code 1 in ld-2.23.so[77966000+21000]
> [   31.220192] map_hugetlb[365]: code: 9421ffc0 480318d1 93410028 90010044 
> 9361002c 93810030 93a10034 93c10038
> [   31.220307] map_hugetlb[365]: code: 93e1003c 93210024 8123007c 81430038 
> <80e90004> 814a0004 7f443a14 813a0004
> [   31.221911] BUG: Bad rss-counter state mm:(ptrval) type:MM_FILEPAGES val:33
> [   31.229362] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:5
>
> This fault is due to hugetlb_free_pgd_range() freeing page tables
> that are also used by regular pages.
>
> As explain in the comment at the beginning of
> hugetlb_free_pgd_range(), the verification done in free_pgd_range()
> on floor and ceiling is not done here, which means
> hugetlb_free_pte_range() can free outside the expected range.
>
> As the verification cannot be done in hugetlb_free_pgd_range(), it
> must be done in hugetlb_free_pte_range().
>

Reviewed-by: Aneesh Kumar K.V 

> Fixes: b250c8c08c79 ("powerpc/8xx: Manage 512k huge pages as standard pages.")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/mm/hugetlbpage.c | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 26292544630f..e7ae2a2c4545 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -330,10 +330,24 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
> hugepd_t *hpdp, int pdshif
>get_hugepd_cache_index(pdshift - shift));
>  }
>  
> -static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd, 
> unsigned long addr)
> +static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
> +unsigned long addr, unsigned long end,
> +unsigned long floor, unsigned long ceiling)
>  {
> + unsigned long start = addr;
>   pgtable_t token = pmd_pgtable(*pmd);
>  
> + start &= PMD_MASK;
> + if (start < floor)
> + return;
> + if (ceiling) {
> + ceiling &= PMD_MASK;
> + if (!ceiling)
> + return;
> + }
> + if (end - 1 > ceiling - 1)
> + return;
> +

We do repeat that for pud/pmd/pte hugetlb_free_range. Can we consolidate
that with comment explaining we are checking if the pgtable entry is
mapping outside the range?

>   pmd_clear(pmd);
>   pte_free_tlb(tlb, token, addr);
>   mm_dec_nr_ptes(tlb->mm);
> @@ -363,7 +377,7 @@ static void hugetlb_free_pmd_range(struct mmu_gather 
> *tlb, pud_t *pud,
>*/
>   WARN_ON(!IS_ENABLED(CONFIG_PPC_8xx));
>  
> - hugetlb_free_pte_range(tlb, pmd, addr);
> + hugetlb_free_pte_range(tlb, pmd, addr, end, floor, 
> ceiling);
>  
>   continue;
>   }
> -- 
> 2.25.0


Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:08 PM, Anshuman Khandual wrote:



On 09/01/2020 12:07 PM, Aneesh Kumar K.V wrote:

On 9/1/20 8:55 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
   mm/debug_vm_pgtable.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 21329c7d672f..8527ebb75f2c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
   static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
  unsigned long vaddr)
   {
-    pte_t pte = ptep_get(ptep);
+    pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);


Seems like ptep_get_and_clear() here just clears the entry in preparation
for a following set_pte_at() which otherwise would have been a problem on
ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
existing valid entry. So the commit message here is bit misleading.



and also fetch the pte value which is used further.



     pr_debug("Validating PTE clear\n");
   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
   p4d_t *p4dp, *saved_p4dp;
   pud_t *pudp, *saved_pudp;
   pmd_t *pmdp, *saved_pmdp, pmd;
-    pte_t *ptep;
+    pte_t *ptep, pte;
   pgtable_t saved_ptep;
   pgprot_t prot, protnone;
   phys_addr_t paddr;
@@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
    */
     ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+    pte = pfn_pte(pte_aligned, prot);
+    set_pte_at(mm, vaddr, ptep, pte);


Not here, creating and populating an entry must be done in respective
test functions itself. Besides, this seems bit redundant as well. The
test pte_clear_tests() with the above change added, already

- Clears the PTEP entry with ptep_get_and_clear()


and fetch the old value set previously.


In that case, please move above two lines i.e

pte = pfn_pte(pte_aligned, prot);
set_pte_at(mm, vaddr, ptep, pte);

from debug_vm_pgtable() to pte_clear_tests() and update it's arguments
as required.



Frankly, I don't understand what these tests are testing. It all looks 
like some random clear and set.


static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr, unsigned long pfn,
   pgprot_t prot)
{

pte_t pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);

pte =  ptep_get_and_clear(mm, vaddr, ptep);

pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
}


-aneesh


[PATCH v2] powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc

2020-09-01 Thread Aneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support until we get the tests updated.

[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
-- 
2.26.2



Re: [PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-01 Thread Aneesh Kumar K.V





[   17.080644] [ cut here ]
[   17.081342] kernel BUG at mm/pgtable-generic.c:164!
[   17.082091] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   17.082977] Modules linked in:
[   17.083481] CPU: 79 PID: 1 Comm: swapper/0 Tainted: G    W 
5.9.0-rc2-00105-g740e72cd6717 #14
[   17.084998] Hardware name: linux,dummy-virt (DT)
[   17.085745] pstate: 6045 (nZCv daif +PAN -UAO BTYPE=--)
[   17.086645] pc : pgtable_trans_huge_deposit+0x58/0x60
[   17.087462] lr : debug_vm_pgtable+0x4dc/0x8f0
[   17.088168] sp : 80001219bcf0
[   17.088710] x29: 80001219bcf0 x28: 8000114ac000
[   17.089574] x27: 8000114ac000 x26: 00200fd3
[   17.090431] x25: 002081400fd3 x24: fe00175c12c0
[   17.091286] x23: 0005df04d1a8 x22: f18d6e035000
[   17.092143] x21: 0005dd79 x20: 0005df18e1a8
[   17.093003] x19: 0005df04cb80 x18: 0014
[   17.093859] x17: b76667d0 x16: fd4e6611
[   17.094716] x15: 0001 x14: 0002
[   17.095572] x13: 0055d966 x12: fe001755e400
[   17.096429] x11: 0008 x10: 0005fcb6e210
[   17.097292] x9 : 0005fcb6e210 x8 : 002081590fd3
[   17.098149] x7 : 0005dd71e0c0 x6 : 0005df18e1a8
[   17.099006] x5 : 002081590fd3 x4 : 002081590fd3
[   17.099862] x3 : 0005df18e1a8 x2 : fe00175c12c0
[   17.100718] x1 : fe00175c1300 x0 : 
[   17.101583] Call trace:
[   17.101993]  pgtable_trans_huge_deposit+0x58/0x60
[   17.102754]  do_one_initcall+0x74/0x1cc
[   17.103381]  kernel_init_freeable+0x1d0/0x238
[   17.104089]  kernel_init+0x14/0x118
[   17.104658]  ret_from_fork+0x10/0x34
[   17.105251] Code: f9000443 f9000843 f9000822 d65f03c0 (d421)
[   17.106303] ---[ end trace e63471e00f8c147f ]---



IIUC, this should happen even without the series right? This is

 assert_spin_locked(pmd_lockptr(mm, pmdp));


Crash does not happen without this series. A previous patch [PATCH v3 08/13]
added pgtable_trans_huge_deposit/withdraw() in pmd_advanced_tests(). arm64
does not define __HAVE_ARCH_PGTABLE_DEPOSIT and hence falls back on generic
pgtable_trans_huge_deposit() which the asserts the lock.




I fixed that by moving the pgtable deposit after adding the pmd locks 
correctly.


mm/debug_vm_pgtable/locks: Move non page table modifying test together
mm/debug_vm_pgtable/locks: Take correct page table lock
mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

-aneesh




Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 2:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:15, Christophe Leroy a écrit :



Le 01/09/2020 à 10:12, Aneesh Kumar K.V a écrit :

On 9/1/20 1:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
  #
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
-    select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for 
both PPC64 and PPC32. Should you replace the line by:


Does it work on PPC32 with DEBUG_VM enabled? I was expecting it will 
be broken there too.


I was wondering. I have just started a build to test that on my 8xx. 
I'll tell you before noon (Paris).




There are warnings but it boots OK. So up to you, but if you deactivate 
it for both PPC32 and PPC64, say so in the subject like.




I will update the subject line to indicate powerpc instead of ppc64

-aneesh


Re: [PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:40 PM, Christophe Leroy wrote:



Le 01/09/2020 à 10:02, Aneesh Kumar K.V a écrit :

The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.


Signed-off-by: Aneesh Kumar K.V 


Any Fixes: tag ?


---
  arch/powerpc/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
  #
  select ARCH_32BIT_OFF_T if PPC32
  select ARCH_HAS_DEBUG_VIRTUAL
-    select ARCH_HAS_DEBUG_VM_PGTABLE



You say you remove support for ppc64 but you are removing it for both 
PPC64 and PPC32. Should you replace the line by:


Does it work on PPC32 with DEBUG_VM enabled? I was expecting it will be 
broken there too.




 select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32


  select ARCH_HAS_DEVMEM_IS_ALLOWED
  select ARCH_HAS_ELF_RANDOMIZE
  select ARCH_HAS_FORTIFY_SOURCE



What about Documentation/features/debug/debug-vm-pgtable/arch-support.txt ?



I am hoping we can enable the config once we resolve the test issues. 
may be in next merge window.


-aneesh





[PATCH] powerpc/mm: Remove DEBUG_VM_PGTABLE support on ppc64

2020-09-01 Thread Aneesh Kumar K.V
The test is broken w.r.t page table update rules and results in kernel
crash as below. Disable the support untill we get the tests updated.

[   21.083506] [ cut here ]
[   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
cpu 0x0: Vector: 700 (Program Check) at [c00c6d1e76c0]
pc: c009a5ec: assert_pte_locked+0x14c/0x380
lr: c05c: pte_update+0x11c/0x190
sp: c00c6d1e7950
   msr: 82029033
  current = 0xc00c6d172c80
  paca= 0xc3ba   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
kernel BUG at arch/powerpc/mm/pgtable.c:304!
Linux version 5.9.0-rc2-34902-g4da73871507c (kvaneesh@ltczz75-lp2) (gcc (Ubuntu 
9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #301 SMP Tue Sep 
1 02:28:29 CDT 2020
enter ? for help
[link register   ] c05c pte_update+0x11c/0x190
[c00c6d1e7950] 0001 (unreliable)
[c00c6d1e79b0] c05eee14 pte_update+0x44/0x190
[c00c6d1e7a10] c1a2ca9c pte_advanced_tests+0x160/0x3d8
[c00c6d1e7ab0] c1a2d4fc debug_vm_pgtable+0x7e8/0x1338
[c00c6d1e7ba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d1e7c80] c19e4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d1e7db0] c0012474 kernel_init+0x24/0x160
[c00c6d1e7e20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c

With DEBUG_VM disabled

[   20.530152] BUG: Kernel NULL pointer dereference on read at 0x
[   20.530183] Faulting instruction address: 0xc00df330
cpu 0x33: Vector: 380 (Data SLB Access) at [c00c6d19f700]
pc: c00df330: memset+0x68/0x104
lr: c009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
sp: c00c6d19f990
   msr: 82009033
   dar: 0
  current = 0xc00c6d177480
  paca= 0xc0001ec4f400   irqmask: 0x03   irq_happened: 0x01
pid   = 1, comm = swapper/0
Linux version 5.9.0-rc2-34902-g4da73871507c (kvaneesh@ltczz75-lp2) (gcc (Ubuntu 
9.3.0-10ubuntu2) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #302 SMP Tue Sep 
1 02:56:02 CDT 2020
enter ? for help
[link register   ] c009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
[c00c6d19f990] c009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 
(unreliable)
[c00c6d19fa10] c19ebf30 pmd_advanced_tests+0x1f0/0x378
[c00c6d19fab0] c19ed088 debug_vm_pgtable+0x79c/0x1244
[c00c6d19fba0] c00116ec do_one_initcall+0xac/0x5f0
[c00c6d19fc80] c19a4fac kernel_init_freeable+0x4dc/0x5a4
[c00c6d19fdb0] c0012474 kernel_init+0x24/0x160
[c00c6d19fe20] c000cbd0 ret_from_kernel_thread+0x5c/0x6c
33:mon>

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65bed1fdeaad..787e829b6f25 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,7 +116,6 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
-   select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
-- 
2.26.2



Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:16 PM, Anshuman Khandual wrote:



On 09/01/2020 01:06 PM, Aneesh Kumar K.V wrote:

On 9/1/20 1:02 PM, Anshuman Khandual wrote:



On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:

On 9/1/20 8:45 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
    mm/debug_vm_pgtable.c | 13 ++---
    1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
     * entry type. But these bits might affect the ability to clear entries with
     * pxx_clear() because of how dynamic page table folding works on s390. So
     * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
     */
-#define S390_MASK_BITS    4
-#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK    GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASK    GENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK    0x0
+#endif


Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
bits for a s390 platform requirement and can also do so for ppc64 as well. As
mentioned before, please avoid adding any platform specific constructs in the
test.




that is needed so that it can be built on 32 bit architectures.I did face build 
errors with arch-linux


Could not (#if __BITS_PER_LONG == 32) be used instead or something like
that. But should be a generic conditional check identifying 32 bit arch
not anything platform specific.



that _PAGE_PTE bit is pretty much specific to PPC BOOK3S_64.  Not sure why 
other architectures need to bothered about ignoring bit 62.


Thats okay as long it does not adversely affect other architectures, ignoring
some more bits is acceptable. Like existing S390_MASK_BITS gets ignored on all
other platforms even if it is a s390 specific constraint. Not having platform
specific #ifdef here, is essential.



Why is it essential?

-aneesh


Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 12:20 PM, Christophe Leroy wrote:



Le 01/09/2020 à 08:25, Aneesh Kumar K.V a écrit :

On 9/1/20 8:52 AM, Anshuman Khandual wrote:




There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 
chars per line)

#7:
Architectures like ppc64 use deposited page table while updating the 
huge pte


total: 0 errors, 1 warnings, 40 lines checked



I will ignore all these, because they are not really important IMHO.



When doing a git log in a 80 chars terminal window, having wrapping 
lines is not really convenient. It should be easy to avoid it.




We have been ignoring that for a long time  isn't it?

For example ppc64 checkpatch already had
--max-line-length=90


There was also recent discussion whether 80 character limit is valid any 
more. But I do keep it restricted to 80 character where ever it is 
easy/make sense.


-aneesh



Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 1:02 PM, Anshuman Khandual wrote:



On 09/01/2020 11:51 AM, Aneesh Kumar K.V wrote:

On 9/1/20 8:45 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
   mm/debug_vm_pgtable.c | 13 ++---
   1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
    * entry type. But these bits might affect the ability to clear entries with
    * pxx_clear() because of how dynamic page table folding works on s390. So
    * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
    */
-#define S390_MASK_BITS    4
-#define RANDOM_ORVALUE    GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK    GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASK    GENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK    0x0
+#endif


Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
bits for a s390 platform requirement and can also do so for ppc64 as well. As
mentioned before, please avoid adding any platform specific constructs in the
test.




that is needed so that it can be built on 32 bit architectures.I did face build 
errors with arch-linux


Could not (#if __BITS_PER_LONG == 32) be used instead or something like
that. But should be a generic conditional check identifying 32 bit arch
not anything platform specific.



that _PAGE_PTE bit is pretty much specific to PPC BOOK3S_64.  Not sure 
why other architectures need to bothered about ignoring bit 62.


-aneesh


Re: [PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 9:11 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

This will help in adding proper locks in a later patch


It really makes sense to classify these tests here as static and dynamic.
Static are the ones that test via page table entry values modification
without changing anything on the actual page table itself. Where as the
dynamic tests do change the page table. Static tests would probably never
require any lock synchronization but the dynamic ones will do.



Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 52 ---
  1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 0ce5c6a24c5b..78c8af3445ac 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -992,7 +992,7 @@ static int __init debug_vm_pgtable(void)
p4dp = p4d_alloc(mm, pgdp, vaddr);
pudp = pud_alloc(mm, p4dp, vaddr);
pmdp = pmd_alloc(mm, pudp, vaddr);
-   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   ptep = pte_alloc_map(mm, pmdp, vaddr);
  
  	/*

 * Save all the page table page addresses as the page table
@@ -1012,33 +1012,12 @@ static int __init debug_vm_pgtable(void)
p4d_basic_tests(p4d_aligned, prot);
pgd_basic_tests(pgd_aligned, prot);
  
-	pte_clear_tests(mm, ptep, vaddr);

-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
-
-   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);
  
-	pmd_huge_tests(pmdp, pmd_aligned, prot);

-   pud_huge_tests(pudp, pud_aligned, prot);
-
pte_savedwrite_tests(pte_aligned, protnone);
pmd_savedwrite_tests(pmd_aligned, protnone);
  
-	pte_unmap_unlock(ptep, ptl);

-
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
-   p4d_populate_tests(mm, p4dp, saved_pudp);
-   pgd_populate_tests(mm, pgdp, saved_p4dp);
-
pte_special_tests(pte_aligned, prot);
pte_protnone_tests(pte_aligned, protnone);
pmd_protnone_tests(pmd_aligned, protnone);
@@ -1056,11 +1035,38 @@ static int __init debug_vm_pgtable(void)
pmd_swap_tests(pmd_aligned, prot);
  
  	swap_migration_tests();

-   hugetlb_basic_tests(pte_aligned, prot);
  
  	pmd_thp_tests(pmd_aligned, prot);

pud_thp_tests(pud_aligned, prot);
  
+	/*

+* Page table modifying tests
+*/


In this comment, please do add some more details about how these tests
need proper locking mechanism unlike the previous static ones. Also add
a similar comment section for the static tests that dont really change
page table and need not require corresponding locking mechanism. Both
comment sections will make this classification clear.


+   pte_clear_tests(mm, ptep, vaddr);
+   pmd_clear_tests(mm, pmdp);
+   pud_clear_tests(mm, pudp);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
+
+   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+
+
+   pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pud_huge_tests(pudp, pud_aligned, prot);
+
+   pte_unmap_unlock(ptep, ptl);
+
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   p4d_populate_tests(mm, p4dp, saved_pudp);
+   pgd_populate_tests(mm, pgdp, saved_p4dp);
+
+   hugetlb_basic_tests(pte_aligned, prot);


hugetlb_basic_tests() is a non page table modifying static test and
should be classified accordingly.


+
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
pmd_free(mm, saved_pmdp);



Changes till this patch fails to boot on arm64 platform. This should be
folded with the next patch.

[   17.080644] [ cut here ]
[   17.081342] kernel BUG at mm/pgtable-generic.c:164!
[   17.082091] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   17.082977] Modules linked in:
[   17.083481] CPU: 79 PID: 1 Comm: swapper/0 Tainted: GW 
5.9.0-rc2-00105-g740e72cd6717 #14
[   17.084998] Hardware name: linux,dummy-virt (DT)
[   17.085745] pstate: 6045 (nZCv daif +PAN -UAO BTYPE=--)
[   17.086645] pc : pgtable_trans_huge_deposit+0x58/0x60
[   17.087462] lr : debug_vm_pgtable+0x4dc/0x8f0
[   17.088168] sp : 80001219bcf0
[   17.088710

Re: [PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 8:55 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 21329c7d672f..8527ebb75f2c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
  static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr)
  {
-   pte_t pte = ptep_get(ptep);
+   pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);


Seems like ptep_get_and_clear() here just clears the entry in preparation
for a following set_pte_at() which otherwise would have been a problem on
ppc64 as you had pointed out earlier i.e set_pte_at() should not update an
existing valid entry. So the commit message here is bit misleading.



and also fetch the pte value which is used further.


  
  	pr_debug("Validating PTE clear\n");

pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
p4d_t *p4dp, *saved_p4dp;
pud_t *pudp, *saved_pudp;
pmd_t *pmdp, *saved_pmdp, pmd;
-   pte_t *ptep;
+   pte_t *ptep, pte;
pgtable_t saved_ptep;
pgprot_t prot, protnone;
phys_addr_t paddr;
@@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
 */
  
  	ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );

+   pte = pfn_pte(pte_aligned, prot);
+   set_pte_at(mm, vaddr, ptep, pte);


Not here, creating and populating an entry must be done in respective
test functions itself. Besides, this seems bit redundant as well. The
test pte_clear_tests() with the above change added, already

- Clears the PTEP entry with ptep_get_and_clear()


and fetch the old value set previously.


- Creates and populates the PTEP with set_pte_at()
- Clears with pte_clear()
- Checks for pte_none()

If the objective is to clear the PTEP entry before calling set_pte_at(),
then only the first chunk is required and it should also be merged with
a previous patch.

[PATCH v3 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to 
update an existing pte entry



pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
pte_unmap_unlock(ptep, ptl);



There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
#7:
pte_clear_tests operate on an existing pte entry. Make sure that is not a none

total: 0 errors, 1 warnings, 24 lines checked

There is also a build failure on x86 reported from kernel test robot.





Re: [PATCH v3 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 9:33 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

The seems to be missing quite a lot of details w.r.t allocating
the correct pgtable_t page (huge_pte_alloc()), holding the right
lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA.

ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these.
Hence disable the test on ppc64.


Would really like this to get resolved in an uniform and better way
instead, i.e a modified hugetlb_advanced_tests() which works on all
platforms including ppc64.

In absence of a modified version, I do realize the situation here,
where DEBUG_VM_PGTABLE test either runs on ppc64 or just completely
remove hugetlb_advanced_tests() from other platforms as well.



Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index a188b6e4e37e..21329c7d672f 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -813,6 +813,7 @@ static void __init hugetlb_basic_tests(unsigned long pfn, 
pgprot_t prot)
  #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
  }
  
+#ifndef CONFIG_PPC_BOOK3S_64

  static void __init hugetlb_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma,
  pte_t *ptep, unsigned long pfn,
@@ -855,6 +856,7 @@ static void __init hugetlb_advanced_tests(struct mm_struct 
*mm,
pte = huge_ptep_get(ptep);
WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
  }
+#endif


In the worst case if we could not get a new hugetlb_advanced_tests() test
that works on all platforms, this might be the last fallback option. In
which case, it will require a proper comment section with a "FIXME: ",
explaining the current situation (and that #ifdef is temporary in nature)
and a hugetlb_advanced_tests() stub when CONFIG_PPC_BOOK3S_64 is enabled.


  #else  /* !CONFIG_HUGETLB_PAGE */
  static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
  static void __init hugetlb_advanced_tests(struct mm_struct *mm,
@@ -1065,7 +1067,9 @@ static int __init debug_vm_pgtable(void)
pud_populate_tests(mm, pudp, saved_pmdp);
spin_unlock(ptl);
  
+#ifndef CONFIG_PPC_BOOK3S_64

hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+#endif




I actually wanted to add #ifdef BROKEN. That test is completely broken. 
Infact I would suggest to remove that test completely.





#ifdef will not be required here as there would be a stub definition
for hugetlb_advanced_tests() when CONFIG_PPC_BOOK3S_64 is enabled.

  
  	spin_lock(>page_table_lock);

p4d_clear_tests(mm, p4dp);



But again, we should really try and avoid taking this path.



To be frank i am kind of frustrated with how this patch series is being 
looked at. We pushed a completely broken test to upstream and right now 
we have a code in upstream that crash when booted on ppc64. My attempt 
has been to make progress here and you definitely seems to be not in 
agreement to that.


At this point I am tempted to suggest we remove the DEBUG_VM_PGTABLE 
support on ppc64 because AFAIU it doesn't add any value.



-aneesh


Re: [PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 8:52 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

Architectures like ppc64 use deposited page table while updating the huge pte
entries.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index f9f6358899a8..0ce5c6a24c5b 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -154,7 +154,7 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
  static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
  {
pmd_t pmd;
  
@@ -165,6 +165,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,

/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
  
+	pgtable_trans_huge_deposit(mm, pmdp, pgtable);

+
pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
@@ -193,6 +195,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_test_and_clear_young(vma, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmdp);


Should the call sites here be wrapped with __HAVE_ARCH_PGTABLE_DEPOSIT and
__HAVE_ARCH_PGTABLE_WITHDRAW respectively. Though there are generic fallback
definitions, wondering whether they are indeed essential for all platforms.



No, because Any page table helpers operating on pmd level THP can expect 
a deposited page table.


__HAVE_ARCH_PGTABLE_DEPOSIT indicates that fallback to generic version.


  }
  
  static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)

@@ -373,7 +377,7 @@ static void __init pud_basic_tests(unsigned long pfn, 
pgprot_t prot) { }
  static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
  {
  }
  static void __init pud_advanced_tests(struct mm_struct *mm,
@@ -1015,7 +1019,7 @@ static int __init debug_vm_pgtable(void)
pgd_clear_tests(mm, pgdp);
  
  	pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);

-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
  



There is a checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
#7:
Architectures like ppc64 use deposited page table while updating the huge pte

total: 0 errors, 1 warnings, 40 lines checked



I will ignore all these, because they are not really important IMHO.

-aneesh


Re: [PATCH v3 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 8:51 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

kernel expects entries to be marked huge before we use 
set_pmd_at()/set_pud_at().

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 21 -
  1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 5c0680836fe9..de83a20c1b30 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -155,7 +155,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
  {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd;
  
  	if (!has_transparent_hugepage())

return;
@@ -164,19 +164,19 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
  
-	pmd = pfn_pmd(pfn, prot);

+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
  
-	pmd = pfn_pmd(pfn, prot);

+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
  
-	pmd = pfn_pmd(pfn, prot);

+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_wrprotect(pmd);
pmd = pmd_mkclean(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
@@ -237,7 +237,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
  
  static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)

  {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));


There is no set_pmd_at() in this particular test, why change ?



because if you are building a hugepage you should use pmd_mkhuge(). That 
is what is setting _PAGE_PTE with this series. We don't make pfn_pmd set 
_PAGE_PTE



-aneesh


Re: [PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-09-01 Thread Aneesh Kumar K.V

On 9/1/20 8:45 AM, Anshuman Khandual wrote:



On 08/27/2020 01:34 PM, Aneesh Kumar K.V wrote:

ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
   * entry type. But these bits might affect the ability to clear entries with
   * pxx_clear() because of how dynamic page table folding works on s390. So
   * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
   */
-#define S390_MASK_BITS 4
-#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASKGENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK0x0
+#endif


Please drop the #ifdef CONFIG_PPC_BOOK3S_64 here. We already accommodate skip
bits for a s390 platform requirement and can also do so for ppc64 as well. As
mentioned before, please avoid adding any platform specific constructs in the
test.




that is needed so that it can be built on 32 bit architectures.I did 
face build errors with arch-linux



+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
  #define RANDOM_NZVALUEGENMASK(7, 0)


Please fix the alignments here. Feel free to consider following changes after
this patch.

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 122416464e0f..f969031152bb 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -48,14 +48,11 @@
   * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
   * used to mark a pte entry.
   */
-#define S390_SKIP_MASK GENMASK(3, 0)
-#ifdef CONFIG_PPC_BOOK3S_64
-#define PPC64_SKIP_MASKGENMASK(62, 62)
-#else
-#define PPC64_SKIP_MASK0x0
-#endif
-#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
-#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#define PPC64_SKIP_MASKGENMASK(62, 62)
+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
+
  #define RANDOM_NZVALUE GENMASK(7, 0)
  
  
  static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)




Besides, there is also one checkpatch.pl warning here.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
#7:
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in

total: 0 errors, 1 warnings, 20 lines checked




These warnings are not valid. They are mostly long lines (upto 100) . or 
some details mentioned in the () as above.


-aneesh


[PATCH v2] powerpc/book3s64/radix: Fix boot failure with large amount of guest memory

2020-08-28 Thread Aneesh Kumar K.V
If the hypervisor doesn't support hugepages, the kernel ends up allocating a 
large
number of page table pages. The early page table allocation was wrongly
setting the max memblock limit to ppc64_rma_size with radix translation
which resulted in boot failure as shown below.

Kernel panic - not syncing:
early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x100 nid=-1 
from=0x max_addr=0x
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
 Call Trace:
 [c16f3d00] [c07c6470] dump_stack+0xc4/0x114 (unreliable)
 [c16f3d40] [c014c78c] panic+0x164/0x418
 [c16f3dd0] [c0098890] early_alloc_pgtable+0xe0/0xec
 [c16f3e60] [c10a5440] radix__early_init_mmu+0x360/0x4b4
 [c16f3ef0] [c1099bac] early_init_mmu+0x1c/0x3c
 [c16f3f10] [c109a320] early_setup+0x134/0x170

This was because the kernel was checking for the radix feature before we enable 
the
feature via mmu_features. This resulted in the kernel using hash restrictions on
radix.

Rework the early init code such that the kernel boot with memblock restrictions
as imposed by hash. At that point, the kernel still hasn't finalized the
translation the kernel will end up using.

We have three different ways of detecting radix.

1. dt_cpu_ftrs_scan -> used only in case of PowerNV
2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
3. CAS -> Where we negotiate with hypervisor about the supported translation.

We look at 1 or 2 early in the boot and after that, we look at the CAS vector to
finalize the translation the kernel will use. We also support a kernel command
line option (disable_radix) to switch to hash.

Update the memblock limit after mmu_early_init_devtree() if the kernel is going
to use radix translation. This forces some of the memblock allocations we do 
before
mmu_early_init_devtree() to be within the RMA limit.

Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init 
routines")
Reviewed-by: Hari Bathini 
Reported-by: Shirisha Ganta 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 10 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 15 ---
 arch/powerpc/mm/init_64.c| 11 +--
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 55442d45c597..b392384a3b15 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -239,14 +239,14 @@ static inline void early_init_mmu_secondary(void)
 
 extern void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 phys_addr_t first_memblock_size);
-extern void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
-phys_addr_t first_memblock_size);
 static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base,
  phys_addr_t first_memblock_size)
 {
-   if (early_radix_enabled())
-   return radix__setup_initial_memory_limit(first_memblock_base,
-  first_memblock_size);
+   /*
+* Hash has more strict restrictions. At this point we don't
+* know which translations we will pick. Hence go with hash
+* restrictions.
+*/
return hash__setup_initial_memory_limit(first_memblock_base,
   first_memblock_size);
 }
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 28c784976bed..d5f0c10d752a 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -734,21 +734,6 @@ void radix__mmu_cleanup_all(void)
}
 }
 
-void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
-   phys_addr_t first_memblock_size)
-{
-   /*
-* We don't currently support the first MEMBLOCK not mapping 0
-* physical on those processors
-*/
-   BUG_ON(first_memblock_base != 0);
-
-   /*
-* Radix mode is not limited by RMA / VRMA addressing.
-*/
-   ppc64_rma_size = ULONG_MAX;
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static void free_pte_table(pte_t *pte_start, pmd_t *pmd)
 {
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 02e127fa5777..8459056cce67 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -433,9 +433,16 @@ void __init mmu_early_init_devtree(void)
if (!(mfmsr() & MSR_HV))
early_check_vec5();
 
-   if (early_radix_enabled())
+   if (early_radix_enabled()) {
radix__early_init_devtree();
-   else
+   /*
+* We have final

[PATCH v3 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-08-27 Thread Aneesh Kumar K.V
pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 21329c7d672f..8527ebb75f2c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -546,7 +546,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
 static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr)
 {
-   pte_t pte = ptep_get(ptep);
+   pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);
 
pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -944,7 +944,7 @@ static int __init debug_vm_pgtable(void)
p4d_t *p4dp, *saved_p4dp;
pud_t *pudp, *saved_pudp;
pmd_t *pmdp, *saved_pmdp, pmd;
-   pte_t *ptep;
+   pte_t *ptep, pte;
pgtable_t saved_ptep;
pgprot_t prot, protnone;
phys_addr_t paddr;
@@ -1049,6 +1049,8 @@ static int __init debug_vm_pgtable(void)
 */
 
ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   pte = pfn_pte(pte_aligned, prot);
+   set_pte_at(mm, vaddr, ptep, pte);
pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
pte_unmap_unlock(ptep, ptl);
-- 
2.26.2



[PATCH v3 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-08-27 Thread Aneesh Kumar K.V
The seems to be missing quite a lot of details w.r.t allocating
the correct pgtable_t page (huge_pte_alloc()), holding the right
lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA.

ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these.
Hence disable the test on ppc64.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index a188b6e4e37e..21329c7d672f 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -813,6 +813,7 @@ static void __init hugetlb_basic_tests(unsigned long pfn, 
pgprot_t prot)
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 }
 
+#ifndef CONFIG_PPC_BOOK3S_64
 static void __init hugetlb_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma,
  pte_t *ptep, unsigned long pfn,
@@ -855,6 +856,7 @@ static void __init hugetlb_advanced_tests(struct mm_struct 
*mm,
pte = huge_ptep_get(ptep);
WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
 }
+#endif
 #else  /* !CONFIG_HUGETLB_PAGE */
 static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init hugetlb_advanced_tests(struct mm_struct *mm,
@@ -1065,7 +1067,9 @@ static int __init debug_vm_pgtable(void)
pud_populate_tests(mm, pudp, saved_pmdp);
spin_unlock(ptl);
 
+#ifndef CONFIG_PPC_BOOK3S_64
hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+#endif
 
spin_lock(>page_table_lock);
p4d_clear_tests(mm, p4dp);
-- 
2.26.2



[PATCH v3 11/13] mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries

2020-08-27 Thread Aneesh Kumar K.V
pmd_clear() should not be used to clear pmd level pte entries.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 0a6e771ebd13..a188b6e4e37e 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -196,6 +196,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
 
+   /*  Clear the pte entries  */
+   pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pgtable = pgtable_trans_huge_withdraw(mm, pmdp);
 }
 
@@ -321,6 +323,8 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pudp_test_and_clear_young(vma, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_young(pud));
+
+   pudp_huge_get_and_clear(mm, vaddr, pudp);
 }
 
 static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
@@ -444,8 +448,6 @@ static void __init pud_populate_tests(struct mm_struct *mm, 
pud_t *pudp,
 * This entry points to next level page table page.
 * Hence this must not qualify as pud_bad().
 */
-   pmd_clear(pmdp);
-   pud_clear(pudp);
pud_populate(mm, pudp, pmdp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_bad(pud));
@@ -577,7 +579,6 @@ static void __init pmd_populate_tests(struct mm_struct *mm, 
pmd_t *pmdp,
 * This entry points to next level page table page.
 * Hence this must not qualify as pmd_bad().
 */
-   pmd_clear(pmdp);
pmd_populate(mm, pmdp, pgtable);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_bad(pmd));
-- 
2.26.2



[PATCH v3 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-08-27 Thread Aneesh Kumar K.V
This will help in adding proper locks in a later patch

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 52 ---
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 0ce5c6a24c5b..78c8af3445ac 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -992,7 +992,7 @@ static int __init debug_vm_pgtable(void)
p4dp = p4d_alloc(mm, pgdp, vaddr);
pudp = pud_alloc(mm, p4dp, vaddr);
pmdp = pmd_alloc(mm, pudp, vaddr);
-   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   ptep = pte_alloc_map(mm, pmdp, vaddr);
 
/*
 * Save all the page table page addresses as the page table
@@ -1012,33 +1012,12 @@ static int __init debug_vm_pgtable(void)
p4d_basic_tests(p4d_aligned, prot);
pgd_basic_tests(pgd_aligned, prot);
 
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
-
-   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);
 
-   pmd_huge_tests(pmdp, pmd_aligned, prot);
-   pud_huge_tests(pudp, pud_aligned, prot);
-
pte_savedwrite_tests(pte_aligned, protnone);
pmd_savedwrite_tests(pmd_aligned, protnone);
 
-   pte_unmap_unlock(ptep, ptl);
-
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
-   p4d_populate_tests(mm, p4dp, saved_pudp);
-   pgd_populate_tests(mm, pgdp, saved_p4dp);
-
pte_special_tests(pte_aligned, prot);
pte_protnone_tests(pte_aligned, protnone);
pmd_protnone_tests(pmd_aligned, protnone);
@@ -1056,11 +1035,38 @@ static int __init debug_vm_pgtable(void)
pmd_swap_tests(pmd_aligned, prot);
 
swap_migration_tests();
-   hugetlb_basic_tests(pte_aligned, prot);
 
pmd_thp_tests(pmd_aligned, prot);
pud_thp_tests(pud_aligned, prot);
 
+   /*
+* Page table modifying tests
+*/
+   pte_clear_tests(mm, ptep, vaddr);
+   pmd_clear_tests(mm, pmdp);
+   pud_clear_tests(mm, pudp);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
+
+   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+
+
+   pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pud_huge_tests(pudp, pud_aligned, prot);
+
+   pte_unmap_unlock(ptep, ptl);
+
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   p4d_populate_tests(mm, p4dp, saved_pudp);
+   pgd_populate_tests(mm, pgdp, saved_p4dp);
+
+   hugetlb_basic_tests(pte_aligned, prot);
+
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
pmd_free(mm, saved_pmdp);
-- 
2.26.2



[PATCH v3 10/13] mm/debug_vm_pgtable/locks: Take correct page table lock

2020-08-27 Thread Aneesh Kumar K.V
Make sure we call pte accessors with correct lock held.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 78c8af3445ac..0a6e771ebd13 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1039,33 +1039,39 @@ static int __init debug_vm_pgtable(void)
pmd_thp_tests(pmd_aligned, prot);
pud_thp_tests(pud_aligned, prot);
 
+   hugetlb_basic_tests(pte_aligned, prot);
+
/*
 * Page table modifying tests
 */
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
 
ptep = pte_alloc_map_lock(mm, pmdp, vaddr, );
+   pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
+   pte_unmap_unlock(ptep, ptl);
 
+   ptl = pmd_lock(mm, pmdp);
+   pmd_clear_tests(mm, pmdp);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   spin_unlock(ptl);
+
+   ptl = pud_lock(mm, pudp);
+   pud_clear_tests(mm, pudp);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
pud_huge_tests(pudp, pud_aligned, prot);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_unlock(ptl);
 
-   pte_unmap_unlock(ptep, ptl);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
 
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_lock(>page_table_lock);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
p4d_populate_tests(mm, p4dp, saved_pudp);
pgd_populate_tests(mm, pgdp, saved_p4dp);
-
-   hugetlb_basic_tests(pte_aligned, prot);
+   spin_unlock(>page_table_lock);
 
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
-- 
2.26.2



[PATCH v3 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-08-27 Thread Aneesh Kumar K.V
Architectures like ppc64 use deposited page table while updating the huge pte
entries.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index f9f6358899a8..0ce5c6a24c5b 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -154,7 +154,7 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
 static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
 {
pmd_t pmd;
 
@@ -165,6 +165,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
+   pgtable_trans_huge_deposit(mm, pmdp, pgtable);
+
pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
@@ -193,6 +195,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_test_and_clear_young(vma, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmdp);
 }
 
 static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
@@ -373,7 +377,7 @@ static void __init pud_basic_tests(unsigned long pfn, 
pgprot_t prot) { }
 static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
 {
 }
 static void __init pud_advanced_tests(struct mm_struct *mm,
@@ -1015,7 +1019,7 @@ static int __init debug_vm_pgtable(void)
pgd_clear_tests(mm, pgdp);
 
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
 
-- 
2.26.2



[PATCH v3 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry

2020-08-27 Thread Aneesh Kumar K.V
set_pte_at() should not be used to set a pte entry at locations that
already holds a valid pte entry. Architectures like ppc64 don't do TLB
invalidate in set_pte_at() and hence expect it to be used to set locations
that are not a valid PTE.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 35 +++
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index de83a20c1b30..f9f6358899a8 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -79,15 +79,18 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   /*
+* Architectures optimize set_pte_at by avoiding TLB flush.
+* This requires set_pte_at to be not used to update an
+* existing pte entry. Clear pte before we do set_pte_at
+*/
+
pr_debug("Validating PTE advanced\n");
pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);
ptep_set_wrprotect(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(pte_write(pte));
-
-   pte = pfn_pte(pfn, prot);
-   set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
@@ -101,13 +104,11 @@ static void __init pte_advanced_tests(struct mm_struct 
*mm,
ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
pte = ptep_get(ptep);
WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
-
-   pte = pfn_pte(pfn, prot);
-   set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear_full(mm, vaddr, ptep, 1);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
 
+   pte = pfn_pte(pfn, prot);
pte = pte_mkyoung(pte);
set_pte_at(mm, vaddr, ptep, pte);
ptep_test_and_clear_young(vma, vaddr, ptep);
@@ -169,9 +170,6 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
-
-   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-   set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
@@ -185,13 +183,11 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
-
-   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-   set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_mkyoung(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_test_and_clear_young(vma, vaddr, pmdp);
@@ -293,18 +289,10 @@ static void __init pud_advanced_tests(struct mm_struct 
*mm,
WARN_ON(pud_write(pud));
 
 #ifndef __PAGETABLE_PMD_FOLDED
-
-   pud = pud_mkhuge(pfn_pud(pfn, prot));
-   set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 
-   pud = pud_mkhuge(pfn_pud(pfn, prot));
-   set_pud_at(mm, vaddr, pudp, pud);
-   pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
-   pud = READ_ONCE(*pudp);
-   WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
 
pud = pud_mkhuge(pfn_pud(pfn, prot));
@@ -317,6 +305,13 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pud = READ_ONCE(*pudp);
WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
 
+#ifndef __PAGETABLE_PMD_FOLDED
+   pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
+   pud = READ_ONCE(*pudp);
+   WARN_ON(!pud_none(pud));
+#endif /* __PAGETABLE_PMD_FOLDED */
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_mkyoung(pud);
set_pud_at(mm, vaddr, pudp, pud);
pudp_test_and_clear_young(vma, vaddr, pudp);
-- 
2.26.2



[PATCH v3 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at

2020-08-27 Thread Aneesh Kumar K.V
kernel expects entries to be marked huge before we use 
set_pmd_at()/set_pud_at().

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 5c0680836fe9..de83a20c1b30 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -155,7 +155,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
 {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd;
 
if (!has_transparent_hugepage())
return;
@@ -164,19 +164,19 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_wrprotect(pmd);
pmd = pmd_mkclean(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
@@ -237,7 +237,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
 
if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
return;
@@ -277,7 +277,7 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
 {
-   pud_t pud = pfn_pud(pfn, prot);
+   pud_t pud;
 
if (!has_transparent_hugepage())
return;
@@ -286,25 +286,28 @@ static void __init pud_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PUD_SIZE */
vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
 
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_set_wrprotect(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_write(pud));
 
 #ifndef __PAGETABLE_PMD_FOLDED
-   pud = pfn_pud(pfn, prot);
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 
-   pud = pfn_pud(pfn, prot);
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
-   pud = pfn_pud(pfn, prot);
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_wrprotect(pud);
pud = pud_mkclean(pud);
set_pud_at(mm, vaddr, pudp, pud);
-- 
2.26.2



[PATCH v3 05/13] mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING

2020-08-27 Thread Aneesh Kumar K.V
Saved write support was added to track the write bit of a pte after marking the
pte protnone. This was done so that AUTONUMA can convert a write pte to protnone
and still track the old write bit. When converting it back we set the pte write
bit correctly thereby avoiding a write fault again. Hence enable the test only
when CONFIG_NUMA_BALANCING is enabled and use protnone protflags.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 28f9d0558c20..5c0680836fe9 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -119,10 +119,14 @@ static void __init pte_savedwrite_tests(unsigned long 
pfn, pgprot_t prot)
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
pr_debug("Validating PTE saved write\n");
WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte;
WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte;
 }
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -235,6 +239,9 @@ static void __init pmd_savedwrite_tests(unsigned long pfn, 
pgprot_t prot)
 {
pmd_t pmd = pfn_pmd(pfn, prot);
 
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
pr_debug("Validating PMD saved write\n");
WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd;
WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd;
@@ -1020,8 +1027,8 @@ static int __init debug_vm_pgtable(void)
pmd_huge_tests(pmdp, pmd_aligned, prot);
pud_huge_tests(pudp, pud_aligned, prot);
 
-   pte_savedwrite_tests(pte_aligned, prot);
-   pmd_savedwrite_tests(pmd_aligned, prot);
+   pte_savedwrite_tests(pte_aligned, protnone);
+   pmd_savedwrite_tests(pmd_aligned, protnone);
 
pte_unmap_unlock(ptep, ptl);
 
-- 
2.26.2



[PATCH v3 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-08-27 Thread Aneesh Kumar K.V
ppc64 supports huge vmap only with radix translation. Hence use arch helper
to determine the huge vmap support.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index bbf9df0e64c6..28f9d0558c20 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -206,11 +207,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pmd_leaf(pmd));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
prot)
 {
pmd_t pmd;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pmd_supported())
return;
 
pr_debug("Validating PMD huge\n");
@@ -224,6 +226,10 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -320,11 +326,12 @@ static void __init pud_leaf_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pud_leaf(pud));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
prot)
 {
pud_t pud;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pud_supported())
return;
 
pr_debug("Validating PUD huge\n");
@@ -338,6 +345,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned 
long pfn, pgprot_t prot)
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
 static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init pud_advanced_tests(struct mm_struct *mm,
-- 
2.26.2



[PATCH v3 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-08-27 Thread Aneesh Kumar K.V
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..bbf9df0e64c6 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,17 @@
  * entry type. But these bits might affect the ability to clear entries with
  * pxx_clear() because of how dynamic page table folding works on s390. So
  * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
  */
-#define S390_MASK_BITS 4
-#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#ifdef CONFIG_PPC_BOOK3S_64
+#define PPC64_SKIP_MASKGENMASK(62, 62)
+#else
+#define PPC64_SKIP_MASK0x0
+#endif
+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
 #define RANDOM_NZVALUE GENMASK(7, 0)
 
 static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
-- 
2.26.2



[PATCH v3 02/13] powerpc/mm: Move setting pte specific flags to pfn_pte

2020-08-27 Thread Aneesh Kumar K.V
powerpc used to set the pte specific flags in set_pte_at(). This is different
from other architectures. To be consistent with other architecture update
pfn_pte to set _PAGE_PTE on ppc64. Also, drop now unused pte_mkpte.

We add a VM_WARN_ON() to catch the usage of calling set_pte_at() without setting
_PAGE_PTE bit. We will remove that after a few releases.

With respect to huge pmd entries, pmd_mkhuge() takes care of adding the
_PAGE_PTE bit.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +--
 arch/powerpc/include/asm/nohash/pgtable.h|  5 -
 arch/powerpc/mm/pgtable.c|  5 -
 3 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 079211968987..2382fd516f6b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -619,7 +619,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t 
pgprot)
VM_BUG_ON(pfn >> (64 - PAGE_SHIFT));
VM_BUG_ON((pfn << PAGE_SHIFT) & ~PTE_RPN_MASK);
 
-   return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot));
+   return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | 
_PAGE_PTE);
 }
 
 static inline unsigned long pte_pfn(pte_t pte)
@@ -655,11 +655,6 @@ static inline pte_t pte_mkexec(pte_t pte)
return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC));
 }
 
-static inline pte_t pte_mkpte(pte_t pte)
-{
-   return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pte_t pte_mkwrite(pte_t pte)
 {
/*
@@ -823,6 +818,14 @@ static inline int pte_none(pte_t pte)
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
 {
+
+   VM_WARN_ON(!(pte_raw(pte) & cpu_to_be64(_PAGE_PTE)));
+   /*
+* Keep the _PAGE_PTE added till we are sure we handle _PAGE_PTE
+* in all the callers.
+*/
+pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
+
if (radix_enabled())
return radix__set_pte_at(mm, addr, ptep, pte, percpu);
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..6277e7596ae5 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -140,11 +140,6 @@ static inline pte_t pte_mkold(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
 }
 
-static inline pte_t pte_mkpte(pte_t pte)
-{
-   return pte;
-}
-
 static inline pte_t pte_mkspecial(pte_t pte)
 {
return __pte(pte_val(pte) | _PAGE_SPECIAL);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9c0547d77af3..ab57b07ef39a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -184,9 +184,6 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
 */
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   /* Add the pte bit when trying to set a pte */
-   pte = pte_mkpte(pte);
-
/* Note: mm->context.id might not yet have been assigned as
 * this context might not have been activated yet when this
 * is called.
@@ -275,8 +272,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep, pte_
 */
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   pte = pte_mkpte(pte);
-
pte = set_pte_filter(pte);
 
val = pte_val(pte);
-- 
2.26.2



[PATCH v3 01/13] powerpc/mm: Add DEBUG_VM WARN for pmd_clear

2020-08-27 Thread Aneesh Kumar K.V
With the hash page table, the kernel should not use pmd_clear for clearing
huge pte entries. Add a DEBUG_VM WARN to catch the wrong usage.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..079211968987 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -868,6 +868,13 @@ static inline bool pte_ci(pte_t pte)
 
 static inline void pmd_clear(pmd_t *pmdp)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+   /*
+* Don't use this if we can possibly have a hash page table
+* entry mapping this.
+*/
+   WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == 
(H_PAGE_HASHPTE | _PAGE_PTE));
+   }
*pmdp = __pmd(0);
 }
 
@@ -916,6 +923,13 @@ static inline int pmd_bad(pmd_t pmd)
 
 static inline void pud_clear(pud_t *pudp)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+   /*
+* Don't use this if we can possibly have a hash page table
+* entry mapping this.
+*/
+   WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == 
(H_PAGE_HASHPTE | _PAGE_PTE));
+   }
*pudp = __pud(0);
 }
 
-- 
2.26.2



[PATCH v3 00/13] mm/debug_vm_pgtable fixes

2020-08-27 Thread Aneesh Kumar K.V
This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. 
We can
merge them via ppc64 tree if required.

Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
page table update rules.

The patches are on top of 15bc20c6af4ceee97a1f90b43c0e386643c071b4 
(linus/master)

Changes from v2:
* Fix build failure with different configs and architecture.

Changes from v1:
* Address review feedback
* drop test specific pfn_pte and pfn_pmd.
* Update ppc64 page table helper to add _PAGE_PTE 


Aneesh Kumar K.V (13):
  powerpc/mm: Add DEBUG_VM WARN for pmd_clear
  powerpc/mm: Move setting pte specific flags to pfn_pte
  mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
  mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
vmap support.
  mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
CONFIG_NUMA_BALANCING
  mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
set_pmd/pud_at
  mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
existing pte entry
  mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
  mm/debug_vm_pgtable/locks: Move non page table modifying test together
  mm/debug_vm_pgtable/locks: Take correct page table lock
  mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
  mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
  mm/debug_vm_pgtable: populate a pte entry before fetching it

 arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
 arch/powerpc/include/asm/nohash/pgtable.h|   5 -
 arch/powerpc/mm/pgtable.c|   5 -
 mm/debug_vm_pgtable.c| 170 ---
 4 files changed, 131 insertions(+), 78 deletions(-)

-- 
2.26.2



[PATCH v5 23/23] powerpc/book3s64/pkeys: Optimize FTR_KUAP and FTR_KUEP disabled case

2020-08-26 Thread Aneesh Kumar K.V
If FTR_KUAP is disabled kernel will continue to run with the same AMR
value with which it was entered. Hence there is a high chance that
we can return without restoring the AMR value. This also helps the case
when applications are not using the pkey feature. In this case, different
applications will have the same AMR values and hence we can avoid restoring
AMR in this case too.

Also avoid isync() if not really needed.

Do the same for IAMR.

null-syscall benchmark results:

With smap/smep disabled:
Without patch:
957.95 ns2778.17 cycles
With patch:
858.38 ns2489.30 cycles

With smap/smep enabled:
Without patch:
1017.26 ns2950.36 cycles
With patch:
1021.51 ns2962.44 cycles

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 61 +---
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/syscall_64.c | 12 +++--
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 34a412d2a65b..d71e9a958eb5 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -12,28 +12,54 @@
 
 #ifdef __ASSEMBLY__
 
-.macro kuap_restore_user_amr gpr1
+.macro kuap_restore_user_amr gpr1, gpr2
 #if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
+   b   100f  // skip_restore_amr
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
/*
 * AMR and IAMR are going to be different when
 * returning to userspace.
 */
ld  \gpr1, STACK_REGS_KUAP(r1)
+
+   /*
+* If kuap feature is not enabled, do the mtspr
+* only if AMR value is different.
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(68)
+   mfspr   \gpr2, SPRN_AMR
+   cmpd\gpr1, \gpr2
+   beq 99f
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_KUAP, 68)
+
isync
mtspr   SPRN_AMR, \gpr1
+99:
/*
 * Restore IAMR only when returning to userspace
 */
ld  \gpr1, STACK_REGS_KUEP(r1)
+
+   /*
+* If kuep feature is not enabled, do the mtspr
+* only if IAMR value is different.
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(69)
+   mfspr   \gpr2, SPRN_IAMR
+   cmpd\gpr1, \gpr2
+   beq 100f
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_KUEP, 69)
+
+   isync
mtspr   SPRN_IAMR, \gpr1
 
+100: //skip_restore_amr
/* No isync required, see kuap_restore_user_amr() */
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
 #endif
 .endm
 
-.macro kuap_restore_kernel_amr gpr1, gpr2
+.macro kuap_restore_kernel_amr gpr1, gpr2
 #if defined(CONFIG_PPC_PKEY)
 
BEGIN_MMU_FTR_SECTION_NESTED(67)
@@ -190,18 +216,41 @@ static inline u64 current_thread_iamr(void)
 
 static inline void kuap_restore_user_amr(struct pt_regs *regs)
 {
+   bool restore_amr = false, restore_iamr = false;
+   unsigned long amr, iamr;
+
if (!mmu_has_feature(MMU_FTR_PKEY))
return;
 
-   isync();
-   mtspr(SPRN_AMR, regs->kuap);
-   mtspr(SPRN_IAMR, regs->kuep);
+   if (!mmu_has_feature(MMU_FTR_KUAP)) {
+   amr = mfspr(SPRN_AMR);
+   if (amr != regs->kuap)
+   restore_amr = true;
+   } else
+   restore_amr = true;
+
+   if (!mmu_has_feature(MMU_FTR_KUEP)) {
+   iamr = mfspr(SPRN_IAMR);
+   if (iamr != regs->kuep)
+   restore_iamr = true;
+   } else
+   restore_iamr = true;
+
+
+   if (restore_amr || restore_iamr) {
+   isync();
+   if (restore_amr)
+   mtspr(SPRN_AMR, regs->kuap);
+   if (restore_iamr)
+   mtspr(SPRN_IAMR, regs->kuep);
+   }
/*
 * No isync required here because we are about to rfi
 * back to previous context before any user accesses
 * would be made, which is a CSI.
 */
 }
+
 static inline void kuap_restore_kernel_amr(struct pt_regs *regs,
   unsigned long amr)
 {
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 68171689db5d..ac6c84a53ff8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -667,7 +667,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return)
bne-.Lrestore_nvgprs
 
 .Lfast_user_interrupt_return_amr:
-   kuap_restore_user_amr r3
+   kuap_restore_user_amr r3, r4
 .Lfast_user_interrupt_return:
ld  r11,_NIP(r1)
ld  r12,_MSR(r1)
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index e49d604b811b..945a14e41898 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -38,6 +38,7 @@ notrace long system_call_exceptio

[PATCH v5 22/23] powerpc/book3s64/hash/kup: Don't hardcode kup key

2020-08-26 Thread Aneesh Kumar K.V
Make KUAP/KUEP key a variable and also check whether the platform
limit the max key such that we can't use the key for KUAP/KEUP.

Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/hash-pkey.h | 22 +---
 arch/powerpc/include/asm/book3s/64/pkeys.h|  1 +
 arch/powerpc/mm/book3s64/pkeys.c  | 53 ---
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h 
b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
index 9f44e208f036..ff9907c72ee3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-pkey.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -2,9 +2,7 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 #define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 
-/*  We use key 3 for KERNEL */
-#define HASH_DEFAULT_KERNEL_KEY (HPTE_R_KEY_BIT0 | HPTE_R_KEY_BIT1)
-
+u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags);
 static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
@@ -14,24 +12,6 @@ static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
 }
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
-{
-   unsigned long pte_pkey;
-
-   pte_pkey = (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
-
-   if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_KUEP)) {
-   if ((pte_pkey == 0) && (flags & HPTE_USE_KERNEL_KEY))
-   return HASH_DEFAULT_KERNEL_KEY;
-   }
-
-   return pte_pkey;
-}
-
 static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
 {
return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
index 3b8640498f5b..a2b6c4a7275f 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -8,6 +8,7 @@
 extern u64 __ro_after_init default_uamor;
 extern u64 __ro_after_init default_amr;
 extern u64 __ro_after_init default_iamr;
+extern int kup_key;
 
 static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index b862d5cd78ff..cb1d7d39e801 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -37,7 +37,10 @@ u64 default_uamor __ro_after_init;
  */
 static int execute_only_key = 2;
 static bool pkey_execute_disable_supported;
-
+/*
+ * key used to implement KUAP/KUEP with hash translation.
+ */
+int kup_key = 3;
 
 #define AMR_BITS_PER_PKEY 2
 #define AMR_RD_BIT 0x1UL
@@ -185,6 +188,25 @@ void __init pkey_early_init_devtree(void)
default_uamor &= ~(0x3ul << pkeyshift(execute_only_key));
}
 
+   if (unlikely(num_pkey <= kup_key)) {
+   /*
+* Insufficient number of keys to support
+* KUAP/KUEP feature.
+*/
+   kup_key = -1;
+   } else {
+   /*  handle key which is used by kernel for KAUP */
+   reserved_allocation_mask |= (0x1 << kup_key);
+   /*
+* Mark access for kup_key in default amr so that
+* we continue to operate with that AMR in
+* copy_to/from_user().
+*/
+   default_amr   &= ~(0x3ul << pkeyshift(kup_key));
+   default_iamr  &= ~(0x1ul << pkeyshift(kup_key));
+   default_uamor &= ~(0x3ul << pkeyshift(kup_key));
+   }
+
/*
 * Allow access for only key 0. And prevent any other modification.
 */
@@ -205,9 +227,6 @@ void __init pkey_early_init_devtree(void)
reserved_allocation_mask |= (0x1 << 1);
default_uamor &= ~(0x3ul << pkeyshift(1));
 
-   /*  handle key 3 which is used by kernel for KAUP */
-   reserved_allocation_mask |= (0x1 << 3);
-   default_uamor &= ~(0x3ul << pkeyshift(3));
 
/*
 * Prevent the usage of OS reserved keys. Update UAMOR
@@ -236,7 +255,7 @@ void __init pkey_early_init_devtree(void)
 #ifdef CONFIG_PPC_KUEP
 void __init setup_kuep(bool disabled)
 {
-   if (disabled)
+   if (disabled || kup_key == -1)
return;
/*
 * On hash if PKEY feature is not enabled, disable KUAP too.
@@ -262,7 +281,7 @@ void __init setup_kuep(bool disabled)
 #ifdef CONFIG_PPC_KUAP

[PATCH v5 21/23] powerpc/book3s64/hash/kuep: Enable KUEP on hash

2020-08-26 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 16ea0b2f0ea5..b862d5cd78ff 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -236,7 +236,12 @@ void __init pkey_early_init_devtree(void)
 #ifdef CONFIG_PPC_KUEP
 void __init setup_kuep(bool disabled)
 {
-   if (disabled || !early_radix_enabled())
+   if (disabled)
+   return;
+   /*
+* On hash if PKEY feature is not enabled, disable KUAP too.
+*/
+   if (!early_radix_enabled() && !early_mmu_has_feature(MMU_FTR_PKEY))
return;
 
if (smp_processor_id() == boot_cpuid) {
-- 
2.26.2



[PATCH v5 20/23] powerpc/book3s64/hash/kuap: Enable kuap on hash

2020-08-26 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 391230f93da2..16ea0b2f0ea5 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -257,7 +257,12 @@ void __init setup_kuep(bool disabled)
 #ifdef CONFIG_PPC_KUAP
 void __init setup_kuap(bool disabled)
 {
-   if (disabled || !early_radix_enabled())
+   if (disabled)
+   return;
+   /*
+* On hash if PKEY feature is not enabled, disable KUAP too.
+*/
+   if (!early_radix_enabled() && !early_mmu_has_feature(MMU_FTR_PKEY))
return;
 
if (smp_processor_id() == boot_cpuid) {
-- 
2.26.2



[PATCH v5 19/23] powerpc/book3s64/kuep: Use Key 3 to implement KUEP with hash translation.

2020-08-26 Thread Aneesh Kumar K.V
Radix use IAMR Key 0 and hash translation use IAMR key 3.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index f326be9e0db7..34a412d2a65b 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -7,7 +7,7 @@
 
 #define AMR_KUAP_BLOCK_READUL(0x5455)
 #define AMR_KUAP_BLOCK_WRITE   UL(0xa8aa)
-#define AMR_KUEP_BLOCKED   (1UL << 62)
+#define AMR_KUEP_BLOCKED   UL(0x5455)
 #define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 
 #ifdef __ASSEMBLY__
-- 
2.26.2



[PATCH v5 18/23] powerpc/book3s64/kuap: Use Key 3 to implement KUAP with hash translation.

2020-08-26 Thread Aneesh Kumar K.V
Radix use AMR Key 0 and hash translation use AMR key 3.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 9c85e4397b2d..f326be9e0db7 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -5,11 +5,10 @@
 #include 
 #include 
 
-#define AMR_KUAP_BLOCK_READUL(0x4000)
-#define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
+#define AMR_KUAP_BLOCK_READUL(0x5455)
+#define AMR_KUAP_BLOCK_WRITE   UL(0xa8aa)
 #define AMR_KUEP_BLOCKED   (1UL << 62)
 #define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
-#define AMR_KUAP_SHIFT 62
 
 #ifdef __ASSEMBLY__
 
@@ -61,8 +60,8 @@
 #ifdef CONFIG_PPC_KUAP_DEBUG
BEGIN_MMU_FTR_SECTION_NESTED(67)
mfspr   \gpr1, SPRN_AMR
-   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
+   /* Prevent access to userspace using any key values */
+   LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
 999:   tdne\gpr1, \gpr2
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
-- 
2.26.2



[PATCH v5 16/23] powerpc/book3s64/kuap: Restrict access to userspace based on userspace AMR

2020-08-26 Thread Aneesh Kumar K.V
If an application has configured address protection such that read/write is
denied using pkey even the kernel should receive a FAULT on accessing the same.

This patch use user AMR value stored in pt_regs.kuap to achieve the same.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 4e1d666032f6..878cd84922d8 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -292,14 +292,20 @@ static inline void set_kuap(unsigned long value)
 static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
  unsigned long size, unsigned long 
dir)
 {
+   unsigned long thread_amr = 0;
+
// This is written so we can resolve to a single case at build time
BUILD_BUG_ON(!__builtin_constant_p(dir));
+
+   if (mmu_has_feature(MMU_FTR_PKEY))
+   thread_amr = current_thread_amr();
+
if (dir == KUAP_READ)
-   set_kuap(AMR_KUAP_BLOCK_WRITE);
+   set_kuap(thread_amr | AMR_KUAP_BLOCK_WRITE);
else if (dir == KUAP_WRITE)
-   set_kuap(AMR_KUAP_BLOCK_READ);
+   set_kuap(thread_amr | AMR_KUAP_BLOCK_READ);
else if (dir == KUAP_READ_WRITE)
-   set_kuap(0);
+   set_kuap(thread_amr);
else
BUILD_BUG();
 }
-- 
2.26.2



[PATCH v5 17/23] powerpc/book3s64/kuap: Improve error reporting with KUAP

2020-08-26 Thread Aneesh Kumar K.V
With hash translation use DSISR_KEYFAULT to identify a wrong access.
With Radix we look at the AMR value and type of fault.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/kup.h |  4 +--
 arch/powerpc/include/asm/book3s/64/kup.h | 27 
 arch/powerpc/include/asm/kup.h   |  4 +--
 arch/powerpc/include/asm/nohash/32/kup-8xx.h |  4 +--
 arch/powerpc/mm/fault.c  |  2 +-
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 32fd4452e960..b18cd931e325 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long flags)
allow_user_access(to, to, end - addr, KUAP_READ_WRITE);
 }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
unsigned long begin = regs->kuap & 0xf000;
unsigned long end = regs->kuap << 28;
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 878cd84922d8..9c85e4397b2d 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -330,12 +330,29 @@ static inline void restore_user_access(unsigned long 
flags)
set_kuap(flags);
 }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+#define RADIX_KUAP_BLOCK_READ  UL(0x4000)
+#define RADIX_KUAP_BLOCK_WRITE UL(0x8000)
+
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
-   return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
-   (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
-   "Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
+   if (!mmu_has_feature(MMU_FTR_KUAP))
+   return false;
+
+   if (radix_enabled()) {
+   /*
+* Will be a storage protection fault.
+* Only check the details of AMR[0]
+*/
+   return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : 
RADIX_KUAP_BLOCK_READ)),
+   "Bug: %s fault blocked by AMR!", is_write ? "Write" 
: "Read");
+   }
+   /*
+* We don't want to WARN here because userspace can setup
+* keys such that a kernel access to user address can cause
+* fault
+*/
+   return !!(error_code & DSISR_KEYFAULT);
 }
 #endif /* CONFIG_PPC_KUAP */
 
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 6c3ee976ee15..8f5e2d820723 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -67,8 +67,8 @@ static inline void prevent_user_access(void __user *to, const 
void __user *from,
   unsigned long size, unsigned long dir) { 
}
 static inline unsigned long prevent_user_access_return(void) { return 0UL; }
 static inline void restore_user_access(unsigned long flags) { }
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
return false;
 }
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 85ed2390fb99..c401e4e404d4 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -60,8 +60,8 @@ static inline void restore_user_access(unsigned long flags)
mtspr(SPRN_MD_AP, flags);
 }
 
-static inline bool
-bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
+static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
+ bool is_write, unsigned long error_code)
 {
return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xf000),
"Bug: fault blocked by AP register !");
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0add963a849b..c91621df0c61 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -227,7 +227,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned 
long error_code,
 
// Read/write fault in a valid region (the exception table search passed
// above), but blocked by KUAP is bad, it can never succe

[PATCH v5 15/23] powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.

2020-08-26 Thread Aneesh Kumar K.V
Now that kernel correctly store/restore userspace AMR/IAMR values, avoid
manipulating AMR and IAMR from the kernel on behalf of userspace.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 18 
 arch/powerpc/include/asm/processor.h |  4 --
 arch/powerpc/kernel/process.c|  4 --
 arch/powerpc/kernel/traps.c  |  6 ---
 arch/powerpc/mm/book3s64/pkeys.c | 57 +---
 5 files changed, 28 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 3f5b97b2a3d8..4e1d666032f6 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -171,6 +171,24 @@
 #include 
 #include 
 
+/*
+ * For kernel thread that doesn't have thread.regs return
+ * default AMR/IAMR values.
+ */
+static inline u64 current_thread_amr(void)
+{
+   if (current->thread.regs)
+   return current->thread.regs->kuap;
+   return AMR_KUAP_BLOCKED;
+}
+
+static inline u64 current_thread_iamr(void)
+{
+   if (current->thread.regs)
+   return current->thread.regs->kuep;
+   return AMR_KUEP_BLOCKED;
+}
+
 static inline void kuap_restore_user_amr(struct pt_regs *regs)
 {
if (!mmu_has_feature(MMU_FTR_PKEY))
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..8adf44a7e54f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -234,10 +234,6 @@ struct thread_struct {
struct thread_vr_state ckvr_state; /* Checkpointed VR state */
unsigned long   ckvrsave; /* Checkpointed VRSAVE */
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#ifdef CONFIG_PPC_MEM_KEYS
-   unsigned long   amr;
-   unsigned long   iamr;
-#endif
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void*   kvm_shadow_vcpu; /* KVM internal data */
 #endif /* CONFIG_KVM_BOOK3S_32_HANDLER */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 75fd30e023bd..c8f57afba3a0 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -603,7 +603,6 @@ static void save_all(struct task_struct *tsk)
__giveup_spe(tsk);
 
msr_check_and_clear(msr_all_available);
-   thread_pkey_regs_save(>thread);
 }
 
 void flush_all_to_thread(struct task_struct *tsk)
@@ -1127,8 +1126,6 @@ static inline void save_sprs(struct thread_struct *t)
t->tar = mfspr(SPRN_TAR);
}
 #endif
-
-   thread_pkey_regs_save(t);
 }
 
 static inline void restore_sprs(struct thread_struct *old_thread,
@@ -1169,7 +1166,6 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
mtspr(SPRN_TIDR, new_thread->tidr);
 #endif
 
-   thread_pkey_regs_restore(new_thread, old_thread);
 }
 
 struct task_struct *__switch_to(struct task_struct *prev,
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index d1ebe152f210..5bda54454a2d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -347,12 +347,6 @@ static bool exception_common(int signr, struct pt_regs 
*regs, int code,
 
current->thread.trap_nr = code;
 
-   /*
-* Save all the pkey registers AMR/IAMR/UAMOR. Eg: Core dumps need
-* to capture the content, if the task gets killed.
-*/
-   thread_pkey_regs_save(>thread);
-
return true;
 }
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index f47d11f2743d..391230f93da2 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -273,30 +273,17 @@ void __init setup_kuap(bool disabled)
 }
 #endif
 
-static inline u64 read_amr(void)
+static inline void update_current_thread_amr(u64 value)
 {
-   return mfspr(SPRN_AMR);
+   current->thread.regs->kuap = value;
 }
 
-static inline void write_amr(u64 value)
-{
-   mtspr(SPRN_AMR, value);
-}
-
-static inline u64 read_iamr(void)
-{
-   if (!likely(pkey_execute_disable_supported))
-   return 0x0UL;
-
-   return mfspr(SPRN_IAMR);
-}
-
-static inline void write_iamr(u64 value)
+static inline void update_current_thread_iamr(u64 value)
 {
if (!likely(pkey_execute_disable_supported))
return;
 
-   mtspr(SPRN_IAMR, value);
+   current->thread.regs->kuep = value;
 }
 
 #ifdef CONFIG_PPC_MEM_KEYS
@@ -311,17 +298,17 @@ void pkey_mm_init(struct mm_struct *mm)
 static inline void init_amr(int pkey, u8 init_bits)
 {
u64 new_amr_bits = (((u64)init_bits & 0x3UL) << pkeyshift(pkey));
-   u64 old_amr = read_amr() & ~((u64)(0x3ul) << pkeyshift(pkey));
+   u64 old_amr = current_thread_amr() & ~((u64)(0x3ul) << pkeyshift(pkey));
 
-   write_amr(old_amr | new_amr_bits);
+   update_current_thread_amr(old_amr

[PATCH v5 14/23] powerpc/ptrace-view: Use pt_regs values instead of thread_struct based one.

2020-08-26 Thread Aneesh Kumar K.V
We will remove thread.amr/iamr/uamor in a later patch

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/ptrace/ptrace-view.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c 
b/arch/powerpc/kernel/ptrace/ptrace-view.c
index 7e6478e7ed07..c719e29aff76 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -470,12 +470,12 @@ static int pkey_active(struct task_struct *target, const 
struct user_regset *reg
 static int pkey_get(struct task_struct *target, const struct user_regset 
*regset,
struct membuf to)
 {
-   BUILD_BUG_ON(TSO(amr) + sizeof(unsigned long) != TSO(iamr));
 
if (!arch_pkeys_enabled())
return -ENODEV;
 
-   membuf_write(, >thread.amr, 2 * sizeof(unsigned long));
+   membuf_store(, target->thread.regs->kuap);
+   membuf_store(, target->thread.regs->kuep);
return membuf_store(, default_uamor);
 }
 
@@ -508,7 +508,8 @@ static int pkey_set(struct task_struct *target, const 
struct user_regset *regset
 * Pick the AMR values for the keys that kernel is using. This
 * will be indicated by the ~default_uamor bits.
 */
-   target->thread.amr = (new_amr & default_uamor) | (target->thread.amr & 
~default_uamor);
+   target->thread.regs->kuap = (new_amr & default_uamor) |
+   (target->thread.regs->kuap & ~default_uamor);
 
return 0;
 }
-- 
2.26.2



[PATCH v5 13/23] powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec

2020-08-26 Thread Aneesh Kumar K.V
On fork, we inherit from the parent and on exec, we should switch to 
default_amr values.

Also, avoid changing the AMR register value within the kernel. The kernel now 
runs with
different AMR values.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pkeys.h |  2 ++
 arch/powerpc/kernel/process.c  |  6 +-
 arch/powerpc/mm/book3s64/pkeys.c   | 16 ++--
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
index b7d9f4267bcd..3b8640498f5b 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -6,6 +6,8 @@
 #include 
 
 extern u64 __ro_after_init default_uamor;
+extern u64 __ro_after_init default_amr;
+extern u64 __ro_after_init default_iamr;
 
 static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 725fd1bed2b6..75fd30e023bd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1506,6 +1506,11 @@ void arch_setup_new_exec(void)
current->thread.regs = regs - 1;
}
 
+#ifdef CONFIG_PPC_MEM_KEYS
+   current->thread.regs->kuap  = default_amr;
+   current->thread.regs->kuep  = default_iamr;
+#endif
+
 }
 #else
 void arch_setup_new_exec(void)
@@ -1866,7 +1871,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.load_tm = 0;
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-   thread_pkey_regs_init(>thread);
 }
 EXPORT_SYMBOL(start_thread);
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 640f090b9f9d..f47d11f2743d 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -28,8 +28,8 @@ static u32 initial_allocation_mask __ro_after_init;
  * Even if we allocate keys with sys_pkey_alloc(), we need to make sure
  * other thread still find the access denied using the same keys.
  */
-static u64 default_amr = ~0x0UL;
-static u64 default_iamr = 0xUL;
+u64 default_amr __ro_after_init  = ~0x0UL;
+u64 default_iamr __ro_after_init = 0xUL;
 u64 default_uamor __ro_after_init;
 /*
  * Key used to implement PROT_EXEC mmap. Denies READ/WRITE
@@ -388,18 +388,6 @@ void thread_pkey_regs_restore(struct thread_struct 
*new_thread,
write_iamr(new_thread->iamr);
 }
 
-void thread_pkey_regs_init(struct thread_struct *thread)
-{
-   if (!mmu_has_feature(MMU_FTR_PKEY))
-   return;
-
-   thread->amr   = default_amr;
-   thread->iamr  = default_iamr;
-
-   write_amr(default_amr);
-   write_iamr(default_iamr);
-}
-
 int execute_only_pkey(struct mm_struct *mm)
 {
return mm->context.execute_only_pkey;
-- 
2.26.2



[PATCH v5 12/23] powerpc/book3s64/pkeys: Inherit correctly on fork.

2020-08-26 Thread Aneesh Kumar K.V
Child thread.kuap value is inherited from the parent in copy_thread_tls. We 
still
need to make sure when the child returns from a fork in the kernel we start 
with the kernel
default AMR value.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/process.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4633924ea77f..725fd1bed2b6 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1732,6 +1732,15 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
childregs->ppr = DEFAULT_PPR;
 
p->thread.tidr = 0;
+#endif
+   /*
+* Run with the current AMR value of the kernel
+*/
+#if defined(CONFIG_PPC_MEM_KEYS)
+   if (mmu_has_feature(MMU_FTR_KUAP))
+   kregs->kuap = AMR_KUAP_BLOCKED;
+   if (mmu_has_feature(MMU_FTR_KUEP))
+   kregs->kuep = AMR_KUEP_BLOCKED;
 #endif
kregs->nip = ppc_function_entry(f);
return 0;
-- 
2.26.2



[PATCH v5 11/23] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel

2020-08-26 Thread Aneesh Kumar K.V
This prepare kernel to operate with a different value than userspace AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return from the
kernel.

With KUAP we modify kernel AMR when accessing user address from the kernel
via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.

If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.

We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.

If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace
beause kernel will be running with a different IAMR value.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 177 ---
 arch/powerpc/include/asm/ptrace.h|   4 +-
 arch/powerpc/kernel/asm-offsets.c|   2 +
 arch/powerpc/kernel/entry_64.S   |   6 +-
 arch/powerpc/kernel/exceptions-64s.S |   4 +-
 arch/powerpc/kernel/syscall_64.c |  30 +++-
 6 files changed, 192 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 5cec202dc42f..3f5b97b2a3d8 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -13,17 +13,46 @@
 
 #ifdef __ASSEMBLY__
 
-.macro kuap_restore_amrgpr1, gpr2
-#ifdef CONFIG_PPC_KUAP
+.macro kuap_restore_user_amr gpr1
+#if defined(CONFIG_PPC_PKEY)
BEGIN_MMU_FTR_SECTION_NESTED(67)
-   mfspr   \gpr1, SPRN_AMR
+   /*
+* AMR and IAMR are going to be different when
+* returning to userspace.
+*/
+   ld  \gpr1, STACK_REGS_KUAP(r1)
+   isync
+   mtspr   SPRN_AMR, \gpr1
+   /*
+* Restore IAMR only when returning to userspace
+*/
+   ld  \gpr1, STACK_REGS_KUEP(r1)
+   mtspr   SPRN_IAMR, \gpr1
+
+   /* No isync required, see kuap_restore_user_amr() */
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
+#endif
+.endm
+
+.macro kuap_restore_kernel_amr gpr1, gpr2
+#if defined(CONFIG_PPC_PKEY)
+
+   BEGIN_MMU_FTR_SECTION_NESTED(67)
+   /*
+* AMR is going to be mostly the same since we are
+* returning to the kernel. Compare and do a mtspr.
+*/
ld  \gpr2, STACK_REGS_KUAP(r1)
+   mfspr   \gpr1, SPRN_AMR
cmpd\gpr1, \gpr2
-   beq 998f
+   beq 100f
isync
mtspr   SPRN_AMR, \gpr2
-   /* No isync required, see kuap_restore_amr() */
-998:
+   /*
+* No isync required, see kuap_restore_amr()
+* No need to restore IAMR when returning to kernel space.
+*/
+100:
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
@@ -40,23 +69,98 @@
 #endif
 .endm
 
+/*
+ * if (pkey) {
+ *
+ * save AMR -> stack;
+ * if (kuap) {
+ * if (AMR != BLOCKED)
+ * KUAP_BLOCKED -> AMR;
+ * }
+ * if (from_user) {
+ * save IAMR -> stack;
+ * if (kuep) {
+ * KUEP_BLOCKED ->IAMR
+ * }
+ * }
+ * return;
+ * }
+ *
+ * if (kuap) {
+ * if (from_kernel) {
+ * save AMR -> stack;
+ * if (AMR != BLOCKED)
+ * KUAP_BLOCKED -> AMR;
+ * }
+ *
+ * }
+ */
 .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
-#ifdef CONFIG_PPC_KUAP
+#if defined(CONFIG_PPC_PKEY)
+
+   /*
+* if both pkey and kuap is disabled, nothing to do
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(68)
+   b   100f  // skip_save_amr
+   END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68)
+
+   /*
+* if pkey is disabled and we are entering from userspace
+* don't do anything.
+*/
BEGIN_MMU_FTR_SECTION_NESTED(67)
.ifnb \msr_pr_cr
-   bne \msr_pr_cr, 99f
+   /*
+* Without pkey we are not changing AMR outside the kernel
+* hence skip this completely.
+*/
+   bne \msr_pr_cr, 100f  // from userspace
.endif
+END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
+
+   /*
+* pkey is enabled or pkey is disabled but entering from kernel
+*/
mfspr   \gpr1, SPRN_AMR
std \gpr1, STACK_REGS_KUAP(r1)
-   li  \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
-   sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
+
+   /*
+* update kernel AMR with AMR_KUAP_BLOCKED only
+* if KUAP feature is enabled
+*/
+   BEGIN_MMU_FTR_SECTION_NESTED(69)
+   

[PATCH v5 10/23] powerpc/exec: Set thread.regs early during exec

2020-08-26 Thread Aneesh Kumar K.V
In later patches during exec, we would like to access default regs.kuap to
control access to the user mapping. Having thread.regs set early makes the
code changes simpler.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/process.c  | 37 +-
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index ca6c97025704..9418dff1cfe1 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -77,10 +77,8 @@ struct thread_info {
 /* how to get the thread information struct from C */
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct 
*src);
 
-#ifdef CONFIG_PPC_BOOK3S_64
 void arch_setup_new_exec(void);
 #define arch_setup_new_exec arch_setup_new_exec
-#endif
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..4633924ea77f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1494,10 +1494,32 @@ void flush_thread(void)
 #ifdef CONFIG_PPC_BOOK3S_64
 void arch_setup_new_exec(void)
 {
-   if (radix_enabled())
-   return;
-   hash__setup_new_exec();
+   if (!radix_enabled())
+   hash__setup_new_exec();
+
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
+
 }
+#else
+void arch_setup_new_exec(void)
+{
+   /*
+* If we exec out of a kernel thread then thread.regs will not be
+* set.  Do it now.
+*/
+   if (!current->thread.regs) {
+   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
+   current->thread.regs = regs - 1;
+   }
+}
+
 #endif
 
 #ifdef CONFIG_PPC64
@@ -1731,15 +1753,6 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 #endif
 #endif
 
-   /*
-* If we exec out of a kernel thread then thread.regs will not be
-* set.  Do it now.
-*/
-   if (!current->thread.regs) {
-   struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE;
-   current->thread.regs = regs - 1;
-   }
-
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
 * Clear any transactional state, we're exec()ing. The cause is
-- 
2.26.2



[PATCH v5 08/23] powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP to MMU_FTR_KUAP

2020-08-26 Thread Aneesh Kumar K.V
This is in preparate to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 18 +-
 arch/powerpc/include/asm/mmu.h   | 16 
 arch/powerpc/mm/book3s64/pkeys.c |  2 +-
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 918a2fcceee7..5cec202dc42f 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -24,7 +24,7 @@
mtspr   SPRN_AMR, \gpr2
/* No isync required, see kuap_restore_amr() */
 998:
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
 
@@ -36,7 +36,7 @@
sldi\gpr2, \gpr2, AMR_KUAP_SHIFT
 999:   tdne\gpr1, \gpr2
EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | 
BUGFLAG_ONCE)
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
 
@@ -56,7 +56,7 @@
mtspr   SPRN_AMR, \gpr2
isync
 99:
-   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
+   END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
 #endif
 .endm
 
@@ -69,7 +69,7 @@
 
 static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
 {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP) && unlikely(regs->kuap != amr)) 
{
+   if (mmu_has_feature(MMU_FTR_KUAP) && unlikely(regs->kuap != amr)) {
isync();
mtspr(SPRN_AMR, regs->kuap);
/*
@@ -82,7 +82,7 @@ static inline void kuap_restore_amr(struct pt_regs *regs, 
unsigned long amr)
 
 static inline unsigned long kuap_get_and_check_amr(void)
 {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
+   if (mmu_has_feature(MMU_FTR_KUAP)) {
unsigned long amr = mfspr(SPRN_AMR);
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */
WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED);
@@ -93,7 +93,7 @@ static inline unsigned long kuap_get_and_check_amr(void)
 
 static inline void kuap_check_amr(void)
 {
-   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_KUAP))
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
 }
 
@@ -122,7 +122,7 @@ static inline unsigned long kuap_get_and_check_amr(void)
 
 static inline unsigned long get_kuap(void)
 {
-   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (!early_mmu_has_feature(MMU_FTR_KUAP))
return 0;
 
return mfspr(SPRN_AMR);
@@ -130,7 +130,7 @@ static inline unsigned long get_kuap(void)
 
 static inline void set_kuap(unsigned long value)
 {
-   if (!early_mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (!early_mmu_has_feature(MMU_FTR_KUAP))
return;
 
/*
@@ -180,7 +180,7 @@ static inline void restore_user_access(unsigned long flags)
 static inline bool
 bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 {
-   return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
+   return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 255a1837e9f7..04e7a65637fb 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -29,7 +29,12 @@
  */
 
 /*
- * Support for KUEP feature.
+ * Supports KUAP (key 0 controlling userspace addresses) on radix
+ */
+#define MMU_FTR_KUAP   ASM_CONST(0x0200)
+
+/*
+ * Suppor for KUEP feature.
  */
 #define MMU_FTR_KUEP   ASM_CONST(0x0400)
 
@@ -120,11 +125,6 @@
  */
 #define MMU_FTR_1T_SEGMENT ASM_CONST(0x4000)
 
-/*
- * Supports KUAP (key 0 controlling userspace addresses) on radix
- */
-#define MMU_FTR_RADIX_KUAP ASM_CONST(0x8000)
-
 /* MMU feature bit sets for various CPUs */
 #define MMU_FTRS_DEFAULT_HPTE_ARCH_V2  \
MMU_FTR_HPTE_TABLE | MMU_FTR_PPCAS_ARCH_V2
@@ -187,10 +187,10 @@ enum {
 #ifdef CONFIG_PPC_RADIX_MMU
MMU_FTR_TYPE_RADIX |
MMU_FTR_GTSE |
+#endif /* CONFIG_PPC_RADIX_MMU */
 #ifdef CONFIG_PPC_KUAP
-   MMU_FTR_RADIX_KUAP |
+   MMU_FTR_KUAP |
 #endif /* CONFIG_PPC_KUAP */
-#endif /* CONFIG_PPC_RADIX_MMU */
 #ifdef CONFIG_PPC_MEM_KEYS
MMU_FTR_PKEY |
 #endif
diff 

[PATCH v5 09/23] powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation

2020-08-26 Thread Aneesh Kumar K.V
This patch updates kernel hash page table entries to use storage key 3
for its mapping. This implies all kernel access will now use key 3 to
control READ/WRITE. The patch also prevents the allocation of key 3 from
userspace and UAMOR value is updated such that userspace cannot modify key 3.

Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/hash-pkey.h | 24 ++-
 arch/powerpc/include/asm/book3s/64/hash.h |  2 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  1 +
 arch/powerpc/include/asm/mmu_context.h|  2 +-
 arch/powerpc/mm/book3s64/hash_4k.c|  2 +-
 arch/powerpc/mm/book3s64/hash_64k.c   |  4 ++--
 arch/powerpc/mm/book3s64/hash_hugepage.c  |  2 +-
 arch/powerpc/mm/book3s64/hash_hugetlbpage.c   |  2 +-
 arch/powerpc/mm/book3s64/hash_pgtable.c   |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c | 10 
 arch/powerpc/mm/book3s64/pkeys.c  |  4 
 11 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h 
b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
index 795010897e5d..9f44e208f036 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-pkey.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -2,6 +2,9 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 #define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
 
+/*  We use key 3 for KERNEL */
+#define HASH_DEFAULT_KERNEL_KEY (HPTE_R_KEY_BIT0 | HPTE_R_KEY_BIT1)
+
 static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
@@ -11,13 +14,22 @@ static inline u64 hash__vmflag_to_pte_pkey_bits(u64 
vm_flags)
((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
 }
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
 {
-   return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+   unsigned long pte_pkey;
+
+   pte_pkey = (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+
+   if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_KUEP)) {
+   if ((pte_pkey == 0) && (flags & HPTE_USE_KERNEL_KEY))
+   return HASH_DEFAULT_KERNEL_KEY;
+   }
+
+   return pte_pkey;
 }
 
 static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 73ad038ed10b..d959b0195ad9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -145,7 +145,7 @@ extern void hash__mark_initmem_nx(void);
 
 extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long pte, int huge);
-extern unsigned long htab_convert_pte_flags(unsigned long pteflags);
+unsigned long htab_convert_pte_flags(unsigned long pteflags, unsigned long 
flags);
 /* Atomic PTE updates */
 static inline unsigned long hash__pte_update(struct mm_struct *mm,
 unsigned long addr,
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 93d18da5e7ec..fa8a1c51b8f1 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -452,6 +452,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
 
 #define HPTE_LOCAL_UPDATE  0x1
 #define HPTE_NOHPTE_UPDATE 0x2
+#define HPTE_USE_KERNEL_KEY0x4
 
 extern int __hash_page_4K(unsigned long ea, unsigned long access,
  unsigned long vsid, pte_t *ptep, unsigned long trap,
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 7f3658a97384..ece806a590d6 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -284,7 +284,7 @@ static inline bool arch_vma_access_permitted(struct 
vm_area_struct *vma,
 #define thread_pkey_regs_init(thread)
 #define arch_dup_pkeys(oldmm, mm)
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags, unsigned long flags)
 {
return 0x0UL;
 }
diff --git a/arch/powerpc

[PATCH v5 07/23] powerpc/book3s64/kuep: Move KUEP related function outside radix

2020-08-26 Thread Aneesh Kumar K.V
The next set of patches adds support for kuep with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.

Also set MMU_FTR_KUEP and add the missing isync().

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kup.h |  1 +
 arch/powerpc/mm/book3s64/pkeys.c | 21 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 20 
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index f4008f8be8e3..918a2fcceee7 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -7,6 +7,7 @@
 
 #define AMR_KUAP_BLOCK_READUL(0x4000)
 #define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
+#define AMR_KUEP_BLOCKED   (1UL << 62)
 #define AMR_KUAP_BLOCKED   (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 #define AMR_KUAP_SHIFT 62
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index c75994cf50a7..82c722fbce52 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -229,6 +229,27 @@ void __init pkey_early_init_devtree(void)
return;
 }
 
+#ifdef CONFIG_PPC_KUEP
+void __init setup_kuep(bool disabled)
+{
+   if (disabled || !early_radix_enabled())
+   return;
+
+   if (smp_processor_id() == boot_cpuid) {
+   pr_info("Activating Kernel Userspace Execution Prevention\n");
+   cur_cpu_spec->mmu_features |= MMU_FTR_KUEP;
+   }
+
+   /*
+* Radix always uses key0 of the IAMR to determine if an access is
+* allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
+* fetch.
+*/
+   mtspr(SPRN_IAMR, AMR_KUEP_BLOCKED);
+   isync();
+}
+#endif
+
 #ifdef CONFIG_PPC_KUAP
 void __init setup_kuap(bool disabled)
 {
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 5c0c74e131ca..ace662231be6 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -587,26 +587,6 @@ static void radix_init_amor(void)
mtspr(SPRN_AMOR, (3ul << 62));
 }
 
-#ifdef CONFIG_PPC_KUEP
-void setup_kuep(bool disabled)
-{
-   if (disabled || !early_radix_enabled())
-   return;
-
-   if (smp_processor_id() == boot_cpuid) {
-   pr_info("Activating Kernel Userspace Execution Prevention\n");
-   cur_cpu_spec->mmu_features |= MMU_FTR_KUEP;
-   }
-
-   /*
-* Radix always uses key0 of the IAMR to determine if an access is
-* allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
-* fetch.
-*/
-   mtspr(SPRN_IAMR, (1ul << 62));
-}
-#endif
-
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
-- 
2.26.2



[PATCH v5 06/23] powerpc/book3s64/kup: Use the correct #ifdef when including headers

2020-08-26 Thread Aneesh Kumar K.V
Use CONFIG_PPC_BOOK3S_64 instead of CONFIG_PPC64. This avoid wrong inclusion
with other 64bit platforms. To fix booke 64 build error add macro 
kuap_check_amr.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/kup.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 1cff92953384..6c3ee976ee15 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -15,8 +15,16 @@
 #define KUAP_CURRENT   (KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
 
 #ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S_64
 #include 
+#else
+#ifdef __ASSEMBLY__
+.macro kuap_check_amr gpr1, gpr2
+.endm
 #endif
+#endif
+#endif /* CONFIG_PPC_64 */
+
 #ifdef CONFIG_PPC_8xx
 #include 
 #endif
-- 
2.26.2



[PATCH v5 05/23] powerpc/book3s64/kuap: Move KUAP related function outside radix

2020-08-26 Thread Aneesh Kumar K.V
The next set of patches adds support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.

Signed-off-by: Aneesh Kumar K.V 
---
 .../asm/book3s/64/{kup-radix.h => kup.h}  |  6 ++---
 arch/powerpc/include/asm/kup.h|  2 +-
 arch/powerpc/kernel/syscall_64.c  |  2 +-
 arch/powerpc/mm/book3s64/pkeys.c  | 22 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c  | 19 
 5 files changed, 27 insertions(+), 24 deletions(-)
 rename arch/powerpc/include/asm/book3s/64/{kup-radix.h => kup.h} (97%)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
similarity index 97%
rename from arch/powerpc/include/asm/book3s/64/kup-radix.h
rename to arch/powerpc/include/asm/book3s/64/kup.h
index 19a8e640a4e5..f4008f8be8e3 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
-#define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
+#ifndef _ASM_POWERPC_BOOK3S_64_KUP_H
+#define _ASM_POWERPC_BOOK3S_64_KUP_H
 
 #include 
 #include 
@@ -187,4 +187,4 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, 
bool is_write)
 
 #endif /* __ASSEMBLY__ */
 
-#endif /* _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H */
+#endif /* _ASM_POWERPC_BOOK3S_64_KUP_H */
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 1d0f7d838b2e..1cff92953384 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -15,7 +15,7 @@
 #define KUAP_CURRENT   (KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
 
 #ifdef CONFIG_PPC64
-#include 
+#include 
 #endif
 #ifdef CONFIG_PPC_8xx
 #include 
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 8e50818aa50b..22a31a988264 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -2,7 +2,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 7dc71f85683d..c75994cf50a7 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -9,9 +9,12 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 
+
 int  num_pkey; /* Max number of pkeys supported */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
@@ -226,6 +229,25 @@ void __init pkey_early_init_devtree(void)
return;
 }
 
+#ifdef CONFIG_PPC_KUAP
+void __init setup_kuap(bool disabled)
+{
+   if (disabled || !early_radix_enabled())
+   return;
+
+   if (smp_processor_id() == boot_cpuid) {
+   pr_info("Activating Kernel Userspace Access Prevention\n");
+   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
+   }
+
+   /*
+* Set the default kernel AMR values on all cpus.
+*/
+   mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
+   isync();
+}
+#endif
+
 static inline u64 read_amr(void)
 {
return mfspr(SPRN_AMR);
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 730e2771a2c8..5c0c74e131ca 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -607,25 +607,6 @@ void setup_kuep(bool disabled)
 }
 #endif
 
-#ifdef CONFIG_PPC_KUAP
-void setup_kuap(bool disabled)
-{
-   if (disabled || !early_radix_enabled())
-   return;
-
-   if (smp_processor_id() == boot_cpuid) {
-   pr_info("Activating Kernel Userspace Access Prevention\n");
-   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
-   }
-
-   /*
-* Set the default kernel AMR values on all cpus.
-*/
-   mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
-   isync();
-}
-#endif
-
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
-- 
2.26.2



[PATCH v5 04/23] powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init

2020-08-26 Thread Aneesh Kumar K.V
This patch consolidates UAMOR update across pkey, kuap and kuep features.
The boot cpu initialize UAMOR via pkey init and both radix/hash do the
secondary cpu UAMOR init in early_init_mmu_secondary.

We don't check for mmu_feature in radix secondary init because UAMOR
is a supported SPRN with all CPUs supporting radix translation.
The old code was not updating UAMOR if we had smap disabled and smep enabled.
This change handles that case.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 28c784976bed..730e2771a2c8 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -618,9 +618,6 @@ void setup_kuap(bool disabled)
cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
}
 
-   /* Make sure userspace can't change the AMR */
-   mtspr(SPRN_UAMOR, 0);
-
/*
 * Set the default kernel AMR values on all cpus.
 */
@@ -719,6 +716,11 @@ void radix__early_init_mmu_secondary(void)
 
radix__switch_mmu_context(NULL, _mm);
tlbiel_all();
+
+#ifdef CONFIG_PPC_PKEY
+   /* Make sure userspace can't change the AMR */
+   mtspr(SPRN_UAMOR, 0);
+#endif
 }
 
 void radix__mmu_cleanup_all(void)
-- 
2.26.2



[PATCH v5 03/23] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS

2020-08-26 Thread Aneesh Kumar K.V
The next set of patches adds support for kuap with hash translation.
Hence make KUAP a BOOK3S_64 feature. Also make it a subfeature of
PPC_MEM_KEYS. Hash translation is going to use pkeys to support
KUAP/KUEP. Adding this dependency reduces the code complexity and
enables us to move some of the initialization code to pkeys.c

Signed-off-by: Aneesh Kumar K.V 
---
 .../powerpc/include/asm/book3s/64/kup-radix.h | 33 +++
 arch/powerpc/include/asm/book3s/64/mmu.h  |  2 +-
 arch/powerpc/include/asm/ptrace.h |  2 +-
 arch/powerpc/kernel/asm-offsets.c |  2 +-
 arch/powerpc/mm/book3s64/Makefile |  2 +-
 arch/powerpc/mm/book3s64/pkeys.c  | 24 +-
 arch/powerpc/platforms/Kconfig.cputype|  4 +++
 7 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 3ee1ec60be84..19a8e640a4e5 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -61,7 +61,7 @@
 
 #else /* !__ASSEMBLY__ */
 
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_PKEY
 
 #include 
 #include 
@@ -96,6 +96,24 @@ static inline void kuap_check_amr(void)
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
 }
 
+#else /* CONFIG_PPC_PKEY */
+
+static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
+{
+}
+
+static inline void kuap_check_amr(void)
+{
+}
+
+static inline unsigned long kuap_get_and_check_amr(void)
+{
+   return 0;
+}
+#endif /* CONFIG_PPC_PKEY */
+
+
+#ifdef CONFIG_PPC_KUAP
 /*
  * We support individually allowing read or write, but we don't support nesting
  * because that would require an expensive read/modify write of the AMR.
@@ -165,19 +183,6 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
address, bool is_write)
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
-#else /* CONFIG_PPC_KUAP */
-static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
-{
-}
-
-static inline void kuap_check_amr(void)
-{
-}
-
-static inline unsigned long kuap_get_and_check_amr(void)
-{
-   return 0;
-}
 #endif /* CONFIG_PPC_KUAP */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 55442d45c597..381146afaf80 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -214,7 +214,7 @@ extern int mmu_io_psize;
 void mmu_early_init_devtree(void);
 void hash__early_init_devtree(void);
 void radix__early_init_devtree(void);
-#ifdef CONFIG_PPC_MEM_KEYS
+#ifdef CONFIG_PPC_PKEY
 void pkey_early_init_devtree(void);
 #else
 static inline void pkey_early_init_devtree(void) {}
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 155a197c0aa1..5f62ce579a8b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -53,7 +53,7 @@ struct pt_regs
 #ifdef CONFIG_PPC64
unsigned long ppr;
 #endif
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_PKEY
unsigned long kuap;
 #endif
};
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..63548992b5ab 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -353,7 +353,7 @@ int main(void)
STACK_PT_REGS_OFFSET(_PPR, ppr);
 #endif /* CONFIG_PPC64 */
 
-#ifdef CONFIG_PPC_KUAP
+#ifdef CONFIG_PPC_PKEY
STACK_PT_REGS_OFFSET(STACK_REGS_KUAP, kuap);
 #endif
 
diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index fd393b8be14f..1b56d3af47d4 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -17,7 +17,7 @@ endif
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += hash_hugepage.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage_prot.o
 obj-$(CONFIG_SPAPR_TCE_IOMMU)  += iommu_api.o
-obj-$(CONFIG_PPC_MEM_KEYS) += pkeys.o
+obj-$(CONFIG_PPC_PKEY) += pkeys.o
 
 # Instrumenting the SLB fault path can lead to duplicate SLB entries
 KCOV_INSTRUMENT_slb.o := n
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index b1d091a97611..7dc71f85683d 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -89,12 +89,14 @@ static int scan_pkey_feature(void)
}
}
 
+#ifdef CONFIG_PPC_MEM_KEYS
/*
 * Adjust the upper limit, based on the number of bits supported by
 * arch-neutral code.
 */
pkeys_total = min_t(int, pkeys_total,
((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
+#endif
return pkeys_total;
 }
 
@@ -102,6 +104,7 

[PATCH v5 02/23] KVM: PPC: BOOK3S: PR: Ignore UAMOR SPR

2020-08-26 Thread Aneesh Kumar K.V
With power7 and above we expect the cpu to support keys. The
number of keys are firmware controlled based on device tree.
PR KVM do not expose key details via device tree. Hence when running with PR KVM
we do run with MMU_FTR_KEY support disabled. But we can still
get updates on UAMOR. Hence ignore access to them and for mfstpr return
0 indicating no AMR/IAMR update is no allowed.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kvm/book3s_emulate.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 0effd48c8f4d..b08cc15f31c7 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -840,6 +840,9 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
case SPRN_MMCR1:
case SPRN_MMCR2:
case SPRN_UMMCR2:
+   case SPRN_UAMOR:
+   case SPRN_IAMR:
+   case SPRN_AMR:
 #endif
break;
 unprivileged:
@@ -1004,6 +1007,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, 
int sprn, ulong *spr_val
case SPRN_MMCR2:
case SPRN_UMMCR2:
case SPRN_TIR:
+   case SPRN_UAMOR:
+   case SPRN_IAMR:
+   case SPRN_AMR:
 #endif
*spr_val = 0;
break;
-- 
2.26.2



[PATCH v5 01/23] powerpc: Add new macro to handle NESTED_IFCLR

2020-08-26 Thread Aneesh Kumar K.V
This will be used by the following patches

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/feature-fixups.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/feature-fixups.h 
b/arch/powerpc/include/asm/feature-fixups.h
index b0af97add751..122c22161268 100644
--- a/arch/powerpc/include/asm/feature-fixups.h
+++ b/arch/powerpc/include/asm/feature-fixups.h
@@ -100,6 +100,9 @@ label##5:   
\
 #define END_MMU_FTR_SECTION_NESTED_IFSET(msk, label)   \
END_MMU_FTR_SECTION_NESTED((msk), (msk), label)
 
+#define END_MMU_FTR_SECTION_NESTED_IFCLR(msk, label)   \
+   END_MMU_FTR_SECTION_NESTED((msk), 0, label)
+
 #define END_MMU_FTR_SECTION_IFSET(msk) END_MMU_FTR_SECTION((msk), (msk))
 #define END_MMU_FTR_SECTION_IFCLR(msk) END_MMU_FTR_SECTION((msk), 0)
 
-- 
2.26.2



  1   2   3   4   5   6   7   8   9   10   >