Re: [PATCH 2/2] powerpc/64: Only warn for kuap locked when KCSAN not present

2024-04-10 Thread Rohan McLure
On Thu, 2024-04-04 at 06:14 +, Christophe Leroy wrote:
> 
> 
> Le 04/04/2024 à 06:45, Rohan McLure a écrit :
> > Arbitrary instrumented locations, including syscall handlers, can
> > call
> > arch_local_irq_restore() transitively when KCSAN is enabled, and in
> > turn
> > also replay_soft_interrupts_irqrestore(). The precondition on entry
> > to
> > this routine that is checked is that KUAP is enabled (user access
> > prohibited). Failure to meet this condition only triggers a warning
> > however, and afterwards KUAP is enabled anyway. That is, KUAP being
> > disabled on entry is in fact permissable, but not possible on an
> > uninstrumented kernel.
> > 
> > Disable this assertion only when KCSAN is enabled.
> 
> Please elaborate on that arbitrary call to arch_local_irq_restore() 
> transitively, when does it happen and why, and why only when KCSAN is
> enabled.

The implementation of kcsan depends on this_cpu_* routines, which in
turn need to manage irqs for correctness. This means that the presence
of KCSAN in a uaccess enabled epoch can introduce calls to
arch_local_irq_restore().

For this reason, the warning really only applies in the instance of
uninstrumented code, and so to prevent KCSAN from issuing a false-
positive here, it makes sense to issue it only when KCSAN is not
present.

> 
> I don't understand the reasoning, if it is permissible as you say,
> just 
> drop the warning. If the warning is there, it should stay also with 
> KCSAN. You should fix the root cause instead.

By dropping this assertion when KCSAN is enabled, we open up the
opportunity for KCSAN to warn only when data races are actually
observed, rather than just instances where an unblocked AMR state is
inherited into an IRQ context.

> 
> > 
> > Suggested-by: Nicholas Piggin 
> > Signed-off-by: Rohan McLure 
> > ---
> >   arch/powerpc/kernel/irq_64.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/irq_64.c
> > b/arch/powerpc/kernel/irq_64.c
> > index d5c48d1b0a31..18b2048389a2 100644
> > --- a/arch/powerpc/kernel/irq_64.c
> > +++ b/arch/powerpc/kernel/irq_64.c
> > @@ -189,7 +189,8 @@ static inline __no_kcsan void
> > replay_soft_interrupts_irqrestore(void)
> >      * and re-locking AMR but we shouldn't get here in the
> > first place,
> >      * hence the warning.
> >      */
> > -   kuap_assert_locked();
> > +   if (!IS_ENABLED(CONFIG_KCSAN))
> > +   kuap_assert_locked();
> >   
> >     if (kuap_state != AMR_KUAP_BLOCKED)
> >     set_kuap(AMR_KUAP_BLOCKED);



[PATCH 2/2] powerpc/64: Only warn for kuap locked when KCSAN not present

2024-04-03 Thread Rohan McLure
Arbitrary instrumented locations, including syscall handlers, can call
arch_local_irq_restore() transitively when KCSAN is enabled, and in turn
also replay_soft_interrupts_irqrestore(). The precondition on entry to
this routine that is checked is that KUAP is enabled (user access
prohibited). Failure to meet this condition only triggers a warning
however, and afterwards KUAP is enabled anyway. That is, KUAP being
disabled on entry is in fact permissable, but not possible on an
uninstrumented kernel.

Disable this assertion only when KCSAN is enabled.

Suggested-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/irq_64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq_64.c b/arch/powerpc/kernel/irq_64.c
index d5c48d1b0a31..18b2048389a2 100644
--- a/arch/powerpc/kernel/irq_64.c
+++ b/arch/powerpc/kernel/irq_64.c
@@ -189,7 +189,8 @@ static inline __no_kcsan void 
replay_soft_interrupts_irqrestore(void)
 * and re-locking AMR but we shouldn't get here in the first place,
 * hence the warning.
 */
-   kuap_assert_locked();
+   if (!IS_ENABLED(CONFIG_KCSAN))
+   kuap_assert_locked();
 
if (kuap_state != AMR_KUAP_BLOCKED)
set_kuap(AMR_KUAP_BLOCKED);
-- 
2.44.0



[PATCH 1/2] powerpc: Apply __always_inline to interrupt_{enter,exit}_prepare()

2024-04-03 Thread Rohan McLure
In keeping with the advice given by Documentation/core-api/entry.rst,
entry and exit handlers for interrupts should not be instrumented.
Guarantee that the interrupt_{enter,exit}_prepare() routines are inlined
so that they will inheret instrumentation from their caller.

KCSAN kernels were observed to compile without inlining these routines,
which would lead to grief on NMI handlers.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/interrupt.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 7b610864b364..f4343e0bfb13 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -150,7 +150,7 @@ static inline void booke_restore_dbcr0(void)
 #endif
 }
 
-static inline void interrupt_enter_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_enter_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC64
irq_soft_mask_set(IRQS_ALL_DISABLED);
@@ -215,11 +215,11 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs)
  * However interrupt_nmi_exit_prepare does return directly to regs, because
  * NMIs do not do "exit work" or replay soft-masked interrupts.
  */
-static inline void interrupt_exit_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_exit_prepare(struct pt_regs *regs)
 {
 }
 
-static inline void interrupt_async_enter_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_async_enter_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC64
/* Ensure interrupt_enter_prepare does not enable MSR[EE] */
@@ -238,7 +238,7 @@ static inline void interrupt_async_enter_prepare(struct 
pt_regs *regs)
irq_enter();
 }
 
-static inline void interrupt_async_exit_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_async_exit_prepare(struct pt_regs *regs)
 {
/*
 * Adjust at exit so the main handler sees the true NIA. This must
@@ -278,7 +278,7 @@ static inline bool nmi_disables_ftrace(struct pt_regs *regs)
return true;
 }
 
-static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+static __always_inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, 
struct interrupt_nmi_state *state)
 {
 #ifdef CONFIG_PPC64
state->irq_soft_mask = local_paca->irq_soft_mask;
@@ -340,7 +340,7 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
nmi_enter();
 }
 
-static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+static __always_inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, 
struct interrupt_nmi_state *state)
 {
if (mfmsr() & MSR_DR) {
// nmi_exit if relocations are on
-- 
2.44.0



[PATCH] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2024-04-03 Thread Rohan McLure
Prior to this patch, data races are detectable by KCSAN of the following
forms:

[1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
or otherwise outside of a critical section
[2] Interrupted critical sections, where the interrupt will itself
acquire a lock

In case [1], calling context does not need an mmiowb() call to be
issued, otherwise it would do so itself. Such calls to
mmiowb_set_pending() are either idempotent or no-ops.

In case [2], irrespective of when the interrupt occurs, the interrupt
will acquire and release its locks prior to its return, nesting_count
will continue balanced. In the worst case, the interrupted critical
section during a mmiowb_spin_unlock() call observes an mmiowb to be
pending and afterward is interrupted, leading to an extraneous call to
mmiowb(). This data race is clearly innocuous.

Resolve KCSAN warnings of type [1] by means of READ_ONCE, WRITE_ONCE.
As increments and decrements to nesting_count are balanced by interrupt
contexts, resolve type [2] warnings by simply revoking instrumentation,
with data_race() rather than READ_ONCE() and WRITE_ONCE(), the memory
consistency semantics of plain-accesses will still lead to correct
behaviour.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
Reported-by: Gautam Menghani 
Tested-by: Gautam Menghani 
Acked-by: Arnd Bergmann 
---
Previously discussed here:
https://lore.kernel.org/linuxppc-dev/20230510033117.1395895-4-rmcl...@linux.ibm.com/
But pushed back due to affecting other architectures. Reissuing, to
linuxppc-dev, as it does not enact a functional change.
---
 include/asm-generic/mmiowb.h | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..f8c7c8a84e9e 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -37,25 +37,28 @@ static inline void mmiowb_set_pending(void)
struct mmiowb_state *ms = __mmiowb_state();
 
if (likely(ms->nesting_count))
-   ms->mmiowb_pending = ms->nesting_count;
+   WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
 }
 
 static inline void mmiowb_spin_lock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
-   ms->nesting_count++;
+
+   /* Increment need not be atomic. Nestedness is balanced over 
interrupts. */
+   data_race(ms->nesting_count++);
 }
 
 static inline void mmiowb_spin_unlock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
+   u16 pending = READ_ONCE(ms->mmiowb_pending);
 
-   if (unlikely(ms->mmiowb_pending)) {
-   ms->mmiowb_pending = 0;
+   WRITE_ONCE(ms->mmiowb_pending, 0);
+   if (unlikely(pending))
mmiowb();
-   }
 
-   ms->nesting_count--;
+   /* Decrement need not be atomic. Nestedness is balanced over 
interrupts. */
+   data_race(ms->nesting_count--);
 }
 #else
 #define mmiowb_set_pending()   do { } while (0)
-- 
2.44.0



[PATCH v12 11/11] powerpc: mm: Support page table check

2024-04-01 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: Updated for new API. Instrument pmdp_collapse_flush's two
constituent calls to avoid header hell
v10: Cause p{u,m}dp_huge_get_and_clear() to resemble one another
v12: Add instrumentation to ptep_get_and_clear() for nohash
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 45 +++-
 arch/powerpc/include/asm/nohash/pgtable.h|  8 +++-
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  4 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  3 ++
 arch/powerpc/mm/pgtable.c|  4 ++
 8 files changed, 68 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..56de6c2b6b98 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 83f7b98ef49f..703deb5749e6 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va);
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
@@ -314,7 +315,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index d8640ddbcad1..6199d2b4bded 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -145,6 +145,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -415,8 +417,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -425,11 +430,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -1306,19 +1316,34 @@ extern int pudp_test_and_clear_young(struct 
vm_area_struct *vma,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
 {
-   if (radix_enabled())
-   return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
-   return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   pmd_t old_pmd;
+
+   if (radix_enabled()) {
+   old_pmd = r

[PATCH v12 04/11] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pud_clear()

2024-04-01 Thread Rohan McLure
This reverts commit 931c38e16499 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pud_clear").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 82bbe115a1a4..e35b2b4f5ea1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1356,7 +1356,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pud_t pud = native_pudp_get_and_clear(pudp);
 
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, addr, pud);
 
return pud;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 9243c920ed02..d01a00ffc1f9 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops;
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
@@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
__page_table_check_pmd_clear(mm, pmd);
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_clear(mm, pud);
+   __page_table_check_pud_clear(mm, addr, pud);
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
@@ -125,7 +127,8 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
 {
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b2b4c1160d4a..6a5c44c2208e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -570,7 +570,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
pud_t pud = *pudp;
 
pud_clear(pudp);
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, address, pud);
 
return pud;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 3a338fee6d00..a8c8fd7f06f8 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -171,7 +171,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_clear);
 
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud)
 {
if (_mm == mm)
return;
@@ -217,7 +218,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pud_clear(mm, *pudp);
+   __page_table_check_pud_clear(mm, addr, *pudp);
if (pud_user_accessible_page(pud)) {
page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT,
 pud_write(pud));
-- 
2.44.0



[PATCH v12 10/11] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages

2024-04-01 Thread Rohan McLure
In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
is intended to be instrumented by the page table check facility. There
are however several other routines that constitute the API for setting
page table entries, including set_pmd_at() among others. Such routines
are themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc
must take care to avoid duplicate calls to
page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries
without instrumentation through the set_pte_at_unchecked() routine
introduced in this patch.

Cause API-facing routines that call set_pte_at() to instead call
set_pte_at_unchecked(), which will remain uninstrumented by page
table check. set_ptes() is itself implemented by calls to
__set_pte_at(), so this eliminates redundant code.

Also prefer set_pte_at_unchecked() in early-boot usages which should not be
instrumented.

Signed-off-by: Rohan McLure 
---
v9: New patch
v10: don't reuse __set_pte_at(), as that will not apply filters. Instead
use new set_pte_at_unchecked().
v11: Include the assertion that hwvalid => !protnone. It is possible that
some of these calls can be safely replaced with __set_pte_at(), however
that will have to be done at a later stage.
---
 arch/powerpc/include/asm/pgtable.h   | 2 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 6 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 
 arch/powerpc/mm/nohash/book3e_pgtable.c  | 2 +-
 arch/powerpc/mm/pgtable.c| 8 
 arch/powerpc/mm/pgtable_32.c | 2 +-
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 26be61b00259..9dffd9313242 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -46,6 +46,8 @@ struct mm_struct;
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
 #define set_ptes set_ptes
+void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
 #define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d69bc1..871472f99a01 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long 
pa, pgprot_t prot)
ptep = pte_alloc_kernel(pmdp, ea);
if (!ptep)
return -ENOMEM;
-   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pa >> 
PAGE_SHIFT, prot));
} else {
/*
 * If the mm subsystem is not fully up, we cannot create a
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 83823db3488b..f7be5fa058e8 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_leaf(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
-   return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
+   return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
 }
 
 void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pud_leaf(pud)));
 #endif
trace_hugepage_set_pud(addr, pud_val(pud));
-   return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
+   return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
 }
 
 static void do_serialize(void *arg)
@@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
  ptep, old_pte, pte);
-   set_pte_at(vma->vm_mm, addr, ptep, pte);
+   set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 15e88f1439ec..e8da30536bd5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pfn, flags));
asm volatile("ptesync": : :"memory");
  

[PATCH v12 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-04-01 Thread Rohan McLure
Page table checking depends on architectures providing an
implementation of p{te,md,ud}_user_accessible_page. With
refactorisations made on powerpc/mm, the pte_access_permitted() and
similar methods verify whether a userland page is accessible with the
required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminary checks taken by pte_access_permitted()
on that platform.

Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
pte_user() is no longer required to be present on all platforms as it
may be equivalent to or implied by pte_read(). Hence implementations of
pte_user_accessible_page() are specialised.

Signed-off-by: Rohan McLure 
---
v9: New implementation
v10: Let book3s/64 use pte_user(), but otherwise default other platforms
to using the address provided with the call to infer whether it is a
user page or not. pmd/pud variants will warn on all other platforms, as
they should not be used for user page mappings
v11: Conditionally define p{m,u}d_user_accessible_page(), as not all
platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 17 +
 arch/powerpc/include/asm/nohash/pgtable.h|  5 +
 arch/powerpc/include/asm/pgtable.h   |  8 
 4 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 52971ee30717..83f7b98ef49f 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fac5615e6bc5..d8640ddbcad1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud)
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
 
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr);
+}
+
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 427db14292c9..413d01a51e6f 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index baa2c4cd35db..26be61b00259 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -220,6 +220,14 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+#ifndef pmd_user_accessible_page
+#define pmd_user_accessible_page(pmd, addr)false
+#endif
+
+#ifndef pud_user_accessible_page
+#define pud_user_accessible_page(pud, addr)false
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v12 08/11] powerpc: mm: Add pud_pfn() stub

2024-04-01 Thread Rohan McLure
The page table check feature requires that pud_pfn() be defined
on each consuming architecture. Since only 64-bit, Book3S platforms
allow for hugepages at this upper level, and since the calling code is
gated by a call to pud_user_accessible_page(), which will return zero,
include this stub as a BUILD_BUG().

Signed-off-by: Rohan McLure 
---
v11: pud_pfn() stub has been removed upstream as it has valid users now
in transparent hugepages. Create a BUG_ON() for other, non Book3S64
platforms.
v12: Add missing return line to stub.
---
 arch/powerpc/include/asm/pgtable.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 239709a2f68e..baa2c4cd35db 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -211,6 +211,15 @@ static inline bool arch_supports_memmap_on_memory(unsigned 
long vmemmap_size)
 
 #endif /* CONFIG_PPC64 */
 
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   BUILD_BUG();
+   return 0;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v12 07/11] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()

2024-04-01 Thread Rohan McLure
On several powerpc platforms, a page table entry may not imply whether
the relevant mapping is for userspace or kernelspace. Instead, such
platforms infer this by the address which is being accessed.

Add an additional address argument to each of these routines in order to
provide support for page table check on powerpc.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  6 +++---
 arch/riscv/include/asm/pgtable.h |  6 +++---
 arch/x86/include/asm/pgtable.h   |  6 +++---
 mm/page_table_check.c| 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 040c2e664cff..f698b30463f3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1074,17 +1074,17 @@ static inline int pgd_devmap(pgd_t pgd)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || 
pmd_user_exec(pmd));
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 92bf5c309055..b9663e03475b 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -724,17 +724,17 @@ static inline void set_pud_at(struct mm_struct *mm, 
unsigned long addr,
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && pte_user(pte);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && pmd_user(pmd);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && pud_user(pud);
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b2b3902f8df4..e898813fce01 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1688,17 +1688,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) 
& _PAGE_USER);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) 
& _PAGE_USER);
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 98cccee74b02..aa5e16c8328e 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -155,7 +155,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pte_user_accessible_page(pte)) {
+   if (pte_user_accessible_page(pte, addr)) {
page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
}
 }
@@ -167,7 +167,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pmd_user_accessible_page(pmd)) {
+   if (pmd_user_accessible_page(pmd, addr)) {
page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT);
}
 }
@@ -179,7 +179,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pud_user_accessible_page(pud)) {
+   if (pud_user_accessible_page(pud, addr)) {
page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT);
}
 }
@@ -195,7 +195,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, 
unsigned long addr,
 
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, addr, ptep_get(ptep + i));
-   if (pte_use

[PATCH v12 06/11] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pte_clear()

2024-04-01 Thread Rohan McLure
This reverts commit aa232204c468 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pte_clear").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  7 ---
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d20afcfae530..040c2e664cff 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1145,7 +1145,7 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct 
*mm,
 {
pte_t pte = __pte(xchg_relaxed(_val(*ptep), 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 0066626159a5..92bf5c309055 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -563,7 +563,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9876e6d92799..b2b3902f8df4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1276,7 +1276,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
   pte_t *ptep)
 {
pte_t pte = native_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
return pte;
 }
 
@@ -1292,7 +1292,7 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
 * care about updates and native needs no locking
 */
pte = native_local_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 0a6ebfa46a31..48721a4a2b84 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled;
 extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
-void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
@@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
__page_table_check_zero(page, order);
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pte_clear(mm, pte);
+   __page_table_check_pte_clear(mm, addr, pte);
 }
 
 static inline void page_table_check_pmd_clear(struct mm_struct *mm,
@@ -121,7 +123,8 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
 {
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d17fbca4da7b..7c18a1e55696 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -454,7 +454,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = ptep_get(ptep);
pte_clear(mm, address, ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
return pte;
 }
 #endif
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7afaad9c6e6f..98cccee74b02 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -149,7 +149

[PATCH v12 03/11] mm/page_table_check: Provide addr parameter to page_table_check_pte_set()

2024-04-01 Thread Rohan McLure
To provide support for powerpc platforms, provide an addr parameter to
the page_table_check_pte_set() routine. This parameter is needed on some
powerpc platforms which do not encode whether a mapping is for user or
kernel in the pte. On such platforms, this can be inferred form the
addr parameter.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 include/linux/page_table_check.h | 12 +++-
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  4 ++--
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 995cc6213d0d..b3938f80a1b6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -376,7 +376,7 @@ static inline void __set_ptes(struct mm_struct *mm,
  unsigned long __always_unused addr,
  pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
__sync_cache_and_tags(pte, nr);
 
for (;;) {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7b4053ff597e..a153d3d143d2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -532,7 +532,7 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pteval, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
 
for (;;) {
__set_pte_at(ptep, pteval);
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 5855d690c48a..9243c920ed02 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -17,8 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int 
order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -68,12 +68,13 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_ptes_set(mm, ptep, pte, nr);
+   __page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -129,7 +130,8 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..b2b4c1160d4a 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -264,7 +264,7 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigned 
long nr)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 
arch_enter_lazy_mmu_mode();
for (;;) {
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7b9d7b45505d..3a338fee6d00 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -182,8 +182,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
pud_t pud)
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr)
 {
unsigned int i;
 
-- 
2.44.0



[PATCH v12 05/11] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pmd_clear()

2024-04-01 Thread Rohan McLure
This reverts commit 1831414cd729 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pmd_clear").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b3938f80a1b6..d20afcfae530 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1188,7 +1188,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(xchg_relaxed(_val(*pmdp), 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index a153d3d143d2..0066626159a5 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -767,7 +767,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e35b2b4f5ea1..9876e6d92799 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1345,7 +1345,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm, unsigned long
 {
pmd_t pmd = native_pmdp_get_and_clear(pmdp);
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, addr, pmd);
 
return pmd;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d01a00ffc1f9..0a6ebfa46a31 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -15,7 +15,8 @@ extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
  pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
@@ -52,12 +53,13 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
__page_table_check_pte_clear(mm, pte);
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pmd_clear(mm, pmd);
+   __page_table_check_pmd_clear(mm, addr, pmd);
 }
 
 static inline void page_table_check_pud_clear(struct mm_struct *mm,
@@ -123,7 +125,8 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
 {
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 6a5c44c2208e..d17fbca4da7b 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -557,7 +557,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
pmd_t pmd = *pmdp;
 
pmd_clear(pmdp);
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index a8c8fd7f06f8..7afaad9c6e6f 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -160,7 +160,8 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
pte_t pte)
 }
 EXPORT_SYMBOL(__page_table_check_pte_clear);
 
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd)
 {
if (_mm == mm)
return;
@@ -204,7 +205,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)

[PATCH v12 01/11] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pud_set()

2024-04-01 Thread Rohan McLure
This reverts commit 6d144436d954 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pud_set").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..7334e5526185 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -568,7 +568,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud),
PUD_SIZE >> PAGE_SHIFT);
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 20242402fc11..1e0c0717b3f9 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -719,7 +719,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at((pte_t *)pudp, pud_pte(pud));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 315535ffb258..09db55fa8856 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1245,7 +1245,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
native_set_pud(pudp, pud);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 6722941c7cb8..d188428512f5 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,7 +20,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t 
pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud);
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+   pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
unsigned long addr,
pmd_t pmd);
@@ -83,13 +84,14 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
__page_table_check_pmd_set(mm, pmdp, pmd);
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_set(mm, pudp, pud);
+   __page_table_check_pud_set(mm, addr, pudp, pud);
 }
 
 static inline void page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -134,7 +136,8 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
 {
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index af69c3c8f7c2..75167537ebd7 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -210,7 +210,8 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t 
*pmdp, pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_p

[PATCH v12 00/11] Support page table check PowerPC

2024-04-01 Thread Rohan McLure
Support page table check on all PowerPC platforms. This works by
serialising assignments, reassignments and clears of page table
entries at each level in order to ensure that anonymous mappings
have at most one writable consumer, and likewise that file-backed
mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at()
and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings.

v12:
 * Rename commits that revert changes to instead reflect that we are
   reinstating old behaviour due to it providing more flexibility
 * Add return line to pud_pfn() stub
 * Instrument ptep_get_and_clear() for nohash

v11:
 * The pud_pfn() stub, which previously had no legitimate users on any
   powerpc platform, now has users in Book3s64 with transparent pages.
   Include a stub of the same name for each platform that does not
   define their own.
 * Drop patch that standardised use of p*d_leaf(), as already included
   upstream in v6.9.
 * Provide fallback definitions of p{m,u}d_user_accessible_page() that
   do not reference p*d_leaf(), p*d_pte(), as they are defined after
   powerpc/mm headers by linux/mm headers.
 * Ensure that set_pte_at_unchecked() has the same checks as
   set_pte_at().
Link: 
https://lore.kernel.org/linuxppc-dev/20240328045535.194800-14-rmcl...@linux.ibm.com/
 

v10:
 * Revert patches that removed address and mm parameters from page table
   check routines, including consuming code from arm64, x86_64 and
   riscv.
 * Implement *_user_accessible_page() routines in terms of pte_user()
   where available (64-bit, book3s) but otherwise by checking the
   address (on platforms where the pte does not imply whether the
   mapping is for user or kernel) 
 * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which
   is identical, but prevents double instrumentation.
Link: 
https://lore.kernel.org/linuxppc-dev/20240313042118.230397-9-rmcl...@linux.ibm.com/T/

v9:
 * Adapt to using the set_ptes() API, using __set_pte_at() where we need
   must avoid instrumentation.
 * Use the logic of *_access_permitted() for implementing
   *_user_accessible_page(), which are required routines for page table
   check.
 * Even though we no longer need p{m,u,4}d_leaf(), still default
   implement these to assist in refactoring out extant
   p{m,u,4}_is_leaf().
 * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
   page table check wants all *user_accessible_page() variants, and we
   would like to default implement the variants in terms of
   pte_user_accessible_page().
 * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
   its constituent calls instead for radix and hash.
Link: 
https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (11):
  mm/page_table_check: Reinstate address parameter in
[__]page_table_check_pud_set()
  mm/page_table_check: Reinstate address parameter in
[__]page_table_check_pmd_set()
  mm/page_table_check: Provide addr parameter to
page_table_check_pte_set()
  mm/page_table_check: Reinstate address parameter in
[__]page_table_check_pud_clear()
  mm/page_table_check: Reinstate address parameter in
[__]page_table_check_pmd_clear()
  mm/page_table_check: Reinstate address parameter in
[__]page_table_check_pte_clear()
  mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
  powerpc: mm: Add pud_pfn() stub
  poweprc: mm: Implement *_user_accessible_page() for ptes
  powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal
usages
  powerpc: mm: Support page table check

 arch/arm64/include/asm/pgtable.h | 18 +++---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 12 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 62 +++---
 arch/powerpc/include/asm/nohash/pgtable.h| 13 +++-
 arch/powerpc/include/asm/pgtable.h   | 19 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  6 +-
 arch

[PATCH v12 02/11] mm/page_table_check: Reinstate address parameter in [__]page_table_check_pmd_set()

2024-04-01 Thread Rohan McLure
This reverts commit a3b837130b58 ("mm/page_table_check: remove unused
parameter in [__]page_table_check_pmd_set").

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/riscv/include/asm/pgtable.h |  4 ++--
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7334e5526185..995cc6213d0d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -560,7 +560,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd),
PMD_SIZE >> PAGE_SHIFT);
 }
@@ -1239,7 +1239,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(xchg_relaxed(_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 1e0c0717b3f9..7b4053ff597e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -712,7 +712,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd));
 }
 
@@ -783,7 +783,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 09db55fa8856..82bbe115a1a4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1238,7 +1238,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t 
*pudp)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
set_pmd(pmdp, pmd);
 }
 
@@ -1383,7 +1383,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
if (IS_ENABLED(CONFIG_SMP)) {
return xchg(pmdp, pmd);
} else {
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d188428512f5..5855d690c48a 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -19,7 +19,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t 
pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
-void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+   pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -75,13 +76,14 @@ static inline void page_table_check_ptes_set(struct 
mm_struct *mm,
__page_table_check_ptes_set(mm, ptep, pte, nr);
 }
 
-static in

Re: [PATCH v11 00/11] Support page table check PowerPC

2024-04-01 Thread Rohan McLure
On Thu, 2024-03-28 at 10:28 +0100, Ingo Molnar wrote:
> 
> * Rohan McLure  wrote:
> 
> > Rohan McLure (11):
> >   Revert "mm/page_table_check: remove unused parameter in
> > [__]page_table_check_pud_set"
> >   Revert "mm/page_table_check: remove unused parameter in
> > [__]page_table_check_pmd_set"
> >   Revert "mm/page_table_check: remove unused parameter in
> > [__]page_table_check_pud_clear"
> >   Revert "mm/page_table_check: remove unused parameter in
> > [__]page_table_check_pmd_clear"
> >   Revert "mm/page_table_check: remove unused parameter in
> > [__]page_table_check_pte_clear"
> 
> Just a process request: please give these commits proper titles, they
> are not really 'reverts' in the classical sense, and this title hides
> what is being done in the commit. The typical use of reverts is to 
> revert a bad change because it broke something. Here the goal is to 
> reintroduce functionality.
> 
> So please name these 5 patches accordingly, to shed light on what is 
> being reintroduced. You can mention it at the end of the changelog
> that 
> it's a functional revert of commit XYZ, but that's not the primary 
> purpose of the commit.

Thanks for your email, I'll do just that.

> 
> Thanks,
> 
>   Ingo

Cheers,
Rohan



Re: [PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-27 Thread Rohan McLure
On Thu, 2024-03-28 at 05:40 +, Christophe Leroy wrote:
> 
> 
> Le 28/03/2024 à 05:55, Rohan McLure a écrit :
> > Page table checking depends on architectures providing an
> > implementation of p{te,md,ud}_user_accessible_page. With
> > refactorisations made on powerpc/mm, the pte_access_permitted() and
> > similar methods verify whether a userland page is accessible with
> > the
> > required permissions.
> > 
> > Since page table checking is the only user of
> > p{te,md,ud}_user_accessible_page(), implement these for all
> > platforms,
> > using some of the same preliminary checks taken by
> > pte_access_permitted()
> > on that platform.
> > 
> > Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by
> > pte_read()")
> > pte_user() is no longer required to be present on all platforms as
> > it
> > may be equivalent to or implied by pte_read(). Hence
> > implementations of
> > pte_user_accessible_page() are specialised.
> > 
> > Signed-off-by: Rohan McLure 
> > ---
> > v9: New implementation
> > v10: Let book3s/64 use pte_user(), but otherwise default other
> > platforms
> > to using the address provided with the call to infer whether it is
> > a
> > user page or not. pmd/pud variants will warn on all other
> > platforms, as
> > they should not be used for user page mappings
> > v11: Conditionally define p{m,u}d_user_accessible_page(), as not
> > all
> > platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs.
> 
> See my comment to v10 patch 10.
> 
> p{m,u}d_leaf() is defined for all platforms (There is a fallback 
> definition in include/linux/pgtable.h) so
> p{m,u}d_user_accessible_page() 
> can be defined for all platforms, no need for a conditionally define.

The issue I see is that the definition in include/linux/pgtable.h
occurs after this header is included. Prior to the removal of a local
definition of p{m,u}d_leaf() etc we didn't run into this issue, but we
still do now.

Not insistent on doing it this way with ifndef, so amenable to
suggestions if you have a preference.

> 
> > ---
> >   arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
> >   arch/powerpc/include/asm/book3s/64/pgtable.h | 17
> > +
> >   arch/powerpc/include/asm/nohash/pgtable.h    |  5 +
> >   arch/powerpc/include/asm/pgtable.h   |  8 
> >   4 files changed, 35 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h
> > b/arch/powerpc/include/asm/book3s/32/pgtable.h
> > index 52971ee30717..83f7b98ef49f 100644
> > --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> > @@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t
> > pte, bool write)
> >     return true;
> >   }
> >   
> > +static inline bool pte_user_accessible_page(pte_t pte, unsigned
> > long addr)
> > +{
> > +   return pte_present(pte) && !is_kernel_addr(addr);
> > +}
> > +
> >   /* Conversion functions: convert a page and protection to a page
> > entry,
> >    * and a page entry and page directory to the page they refer to.
> >    *
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index fac5615e6bc5..d8640ddbcad1 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t
> > pte, bool write)
> >     return arch_pte_access_permitted(pte_val(pte), write, 0);
> >   }
> >   
> > +static inline bool pte_user_accessible_page(pte_t pte, unsigned
> > long addr)
> > +{
> > +   return pte_present(pte) && pte_user(pte);
> > +}
> > +
> >   /*
> >    * Conversion functions: convert a page and protection to a page
> > entry,
> >    * and a page entry and page directory to the page they refer to.
> > @@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud)
> >     return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> >   }
> >   
> > +#define pmd_user_accessible_page pmd_user_accessible_page
> > +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned
> > long addr)
> > +{
> > +   return pmd_leaf(pmd) &&
> > pte_user_accessible_page(pmd_pte(pmd), addr);
> > +}
> > +
> > +#define pud_user_accessible_page pud_user_accessible_page
> > +static inline bool pud_user_accessible_page(pud_t pud,

[PATCH v11 02/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set"

2024-03-27 Thread Rohan McLure
This reverts commit a3b837130b5865521fa8662aceaa6ebc8d29389a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/riscv/include/asm/pgtable.h |  4 ++--
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7334e5526185..995cc6213d0d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -560,7 +560,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd),
PMD_SIZE >> PAGE_SHIFT);
 }
@@ -1239,7 +1239,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(xchg_relaxed(_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 1e0c0717b3f9..7b4053ff597e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -712,7 +712,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd));
 }
 
@@ -783,7 +783,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 09db55fa8856..82bbe115a1a4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1238,7 +1238,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t 
*pudp)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
set_pmd(pmdp, pmd);
 }
 
@@ -1383,7 +1383,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
if (IS_ENABLED(CONFIG_SMP)) {
return xchg(pmdp, pmd);
} else {
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d188428512f5..5855d690c48a 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -19,7 +19,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t 
pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
-void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+   pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -75,13 +76,14 @@ static inline void page_table_check_ptes_set(struct 
mm_struct *mm,
__page_table_check_ptes_set(mm, ptep, pte, nr);
 }
 
-static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *p

[PATCH v11 08/11] powerpc: mm: Add pud_pfn() stub

2024-03-27 Thread Rohan McLure
The page table check feature requires that pud_pfn() be defined
on each consuming architecture. Since only 64-bit, Book3S platforms
allow for hugepages at this upper level, and since the calling code is
gated by a call to pud_user_accessible_page(), which will return zero,
include this stub as a BUILD_BUG().

Signed-off-by: Rohan McLure 
---
v11: pud_pfn() stub has been removed upstream as it has valid users now
in transparent hugepages. Create a BUG_ON() for other, non Book3S64
platforms.
---
 arch/powerpc/include/asm/pgtable.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 239709a2f68e..ee8c82c0528f 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -211,6 +211,14 @@ static inline bool arch_supports_memmap_on_memory(unsigned 
long vmemmap_size)
 
 #endif /* CONFIG_PPC64 */
 
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   BUILD_BUG();
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v11 03/11] mm: Provide addr parameter to page_table_check_pte_set()

2024-03-27 Thread Rohan McLure
To provide support for powerpc platforms, provide an addr parameter to
the page_table_check_pte_set() routine. This parameter is needed on some
powerpc platforms which do not encode whether a mapping is for user or
kernel in the pte. On such platforms, this can be inferred form the
addr parameter.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 include/linux/page_table_check.h | 12 +++-
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  4 ++--
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 995cc6213d0d..b3938f80a1b6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -376,7 +376,7 @@ static inline void __set_ptes(struct mm_struct *mm,
  unsigned long __always_unused addr,
  pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
__sync_cache_and_tags(pte, nr);
 
for (;;) {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7b4053ff597e..a153d3d143d2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -532,7 +532,7 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pteval, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
 
for (;;) {
__set_pte_at(ptep, pteval);
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 5855d690c48a..9243c920ed02 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -17,8 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int 
order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -68,12 +68,13 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_ptes_set(mm, ptep, pte, nr);
+   __page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -129,7 +130,8 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..b2b4c1160d4a 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -264,7 +264,7 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigned 
long nr)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 
arch_enter_lazy_mmu_mode();
for (;;) {
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7b9d7b45505d..3a338fee6d00 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -182,8 +182,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
pud_t pud)
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr)
 {
unsigned int i;
 
-- 
2.44.0



[PATCH v11 11/11] powerpc: mm: Support page table check

2024-03-27 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: Updated for new API. Instrument pmdp_collapse_flush's two
constituent calls to avoid header hell
v10: Cause p{u,m}dp_huge_get_and_clear() to resemble one another
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 45 +++-
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  4 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  3 ++
 arch/powerpc/mm/pgtable.c|  4 ++
 7 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a68b9e637eda..66a72f9078f5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 83f7b98ef49f..703deb5749e6 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va);
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
@@ -314,7 +315,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index d8640ddbcad1..6199d2b4bded 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -145,6 +145,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -415,8 +417,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -425,11 +430,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -1306,19 +1316,34 @@ extern int pudp_test_and_clear_young(struct 
vm_area_struct *vma,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
 {
-   if (radix_enabled())
-   return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
-   return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   pmd_t old_pmd;
+
+   if (radix_enabled()) {
+   old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   } else {
+   old_pmd = hash__

[PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-27 Thread Rohan McLure
Page table checking depends on architectures providing an
implementation of p{te,md,ud}_user_accessible_page. With
refactorisations made on powerpc/mm, the pte_access_permitted() and
similar methods verify whether a userland page is accessible with the
required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminary checks taken by pte_access_permitted()
on that platform.

Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
pte_user() is no longer required to be present on all platforms as it
may be equivalent to or implied by pte_read(). Hence implementations of
pte_user_accessible_page() are specialised.

Signed-off-by: Rohan McLure 
---
v9: New implementation
v10: Let book3s/64 use pte_user(), but otherwise default other platforms
to using the address provided with the call to infer whether it is a
user page or not. pmd/pud variants will warn on all other platforms, as
they should not be used for user page mappings
v11: Conditionally define p{m,u}d_user_accessible_page(), as not all
platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 17 +
 arch/powerpc/include/asm/nohash/pgtable.h|  5 +
 arch/powerpc/include/asm/pgtable.h   |  8 
 4 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 52971ee30717..83f7b98ef49f 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fac5615e6bc5..d8640ddbcad1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud)
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
 
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr);
+}
+
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 427db14292c9..413d01a51e6f 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index ee8c82c0528f..f1ceae778cb1 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -219,6 +219,14 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+#ifndef pmd_user_accessible_page
+#define pmd_user_accessible_page(pmd, addr)false
+#endif
+
+#ifndef pud_user_accessible_page
+#define pud_user_accessible_page(pud, addr)false
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v11 10/11] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages

2024-03-27 Thread Rohan McLure
In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
is intended to be instrumented by the page table check facility. There
are however several other routines that constitute the API for setting
page table entries, including set_pmd_at() among others. Such routines
are themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc
must take care to avoid duplicate calls to
page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries
without instrumentation through the set_pte_at_unchecked() routine
introduced in this patch.

Cause API-facing routines that call set_pte_at() to instead call
set_pte_at_unchecked(), which will remain uninstrumented by page
table check. set_ptes() is itself implemented by calls to
__set_pte_at(), so this eliminates redundant code.

Also prefer set_pte_at_unchecked() in early-boot usages which should not be
instrumented.

Signed-off-by: Rohan McLure 
---
v9: New patch
v10: don't reuse __set_pte_at(), as that will not apply filters. Instead
use new set_pte_at_unchecked().
v11: Include the assertion that hwvalid => !protnone. It is possible that
some of these calls can be safely replaced with __set_pte_at(), however
that will have to be done at a later stage.
---
 arch/powerpc/include/asm/pgtable.h   | 2 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 6 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 
 arch/powerpc/mm/nohash/book3e_pgtable.c  | 2 +-
 arch/powerpc/mm/pgtable.c| 8 
 arch/powerpc/mm/pgtable_32.c | 2 +-
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index f1ceae778cb1..ad0c1451502d 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -46,6 +46,8 @@ struct mm_struct;
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
 #define set_ptes set_ptes
+void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
 #define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d69bc1..871472f99a01 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long 
pa, pgprot_t prot)
ptep = pte_alloc_kernel(pmdp, ea);
if (!ptep)
return -ENOMEM;
-   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pa >> 
PAGE_SHIFT, prot));
} else {
/*
 * If the mm subsystem is not fully up, we cannot create a
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 83823db3488b..f7be5fa058e8 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_leaf(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
-   return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
+   return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
 }
 
 void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pud_leaf(pud)));
 #endif
trace_hugepage_set_pud(addr, pud_val(pud));
-   return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
+   return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
 }
 
 static void do_serialize(void *arg)
@@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
  ptep, old_pte, pte);
-   set_pte_at(vma->vm_mm, addr, ptep, pte);
+   set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 15e88f1439ec..e8da30536bd5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pfn, flags));
asm volatile("ptesync": : :"memory");
  

[PATCH v11 07/11] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()

2024-03-27 Thread Rohan McLure
On several powerpc platforms, a page table entry may not imply whether
the relevant mapping is for userspace or kernelspace. Instead, such
platforms infer this by the address which is being accessed.

Add an additional address argument to each of these routines in order to
provide support for page table check on powerpc.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  6 +++---
 arch/riscv/include/asm/pgtable.h |  6 +++---
 arch/x86/include/asm/pgtable.h   |  6 +++---
 mm/page_table_check.c| 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 040c2e664cff..f698b30463f3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1074,17 +1074,17 @@ static inline int pgd_devmap(pgd_t pgd)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || 
pmd_user_exec(pmd));
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 92bf5c309055..b9663e03475b 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -724,17 +724,17 @@ static inline void set_pud_at(struct mm_struct *mm, 
unsigned long addr,
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && pte_user(pte);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && pmd_user(pmd);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && pud_user(pud);
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b2b3902f8df4..e898813fce01 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1688,17 +1688,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) 
& _PAGE_USER);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) 
& _PAGE_USER);
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 98cccee74b02..aa5e16c8328e 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -155,7 +155,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pte_user_accessible_page(pte)) {
+   if (pte_user_accessible_page(pte, addr)) {
page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
}
 }
@@ -167,7 +167,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pmd_user_accessible_page(pmd)) {
+   if (pmd_user_accessible_page(pmd, addr)) {
page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT);
}
 }
@@ -179,7 +179,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pud_user_accessible_page(pud)) {
+   if (pud_user_accessible_page(pud, addr)) {
page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT);
}
 }
@@ -195,7 +195,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, 
unsigned long addr,
 
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, addr, ptep_get(ptep + i));
-   if (pte_use

[PATCH v11 06/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"

2024-03-27 Thread Rohan McLure
This reverts commit aa232204c4689427cefa55fe975692b57291523a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  7 ---
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d20afcfae530..040c2e664cff 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1145,7 +1145,7 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct 
*mm,
 {
pte_t pte = __pte(xchg_relaxed(_val(*ptep), 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 0066626159a5..92bf5c309055 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -563,7 +563,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9876e6d92799..b2b3902f8df4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1276,7 +1276,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
   pte_t *ptep)
 {
pte_t pte = native_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
return pte;
 }
 
@@ -1292,7 +1292,7 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
 * care about updates and native needs no locking
 */
pte = native_local_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 0a6ebfa46a31..48721a4a2b84 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled;
 extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
-void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
@@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
__page_table_check_zero(page, order);
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pte_clear(mm, pte);
+   __page_table_check_pte_clear(mm, addr, pte);
 }
 
 static inline void page_table_check_pmd_clear(struct mm_struct *mm,
@@ -121,7 +123,8 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
 {
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d17fbca4da7b..7c18a1e55696 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -454,7 +454,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = ptep_get(ptep);
pte_clear(mm, address, ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
return pte;
 }
 #endif
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7afaad9c6e6f..98cccee74b02 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -149,7 +149,8 @@ void __page_table_check_zero(struct page *page, unsigned 
int order

[PATCH v11 05/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear"

2024-03-27 Thread Rohan McLure
This reverts commit 1831414cd729a34af937d56ad684a66599de6344.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b3938f80a1b6..d20afcfae530 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1188,7 +1188,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(xchg_relaxed(_val(*pmdp), 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index a153d3d143d2..0066626159a5 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -767,7 +767,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e35b2b4f5ea1..9876e6d92799 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1345,7 +1345,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm, unsigned long
 {
pmd_t pmd = native_pmdp_get_and_clear(pmdp);
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, addr, pmd);
 
return pmd;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d01a00ffc1f9..0a6ebfa46a31 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -15,7 +15,8 @@ extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
  pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
@@ -52,12 +53,13 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
__page_table_check_pte_clear(mm, pte);
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pmd_clear(mm, pmd);
+   __page_table_check_pmd_clear(mm, addr, pmd);
 }
 
 static inline void page_table_check_pud_clear(struct mm_struct *mm,
@@ -123,7 +125,8 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
 {
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 6a5c44c2208e..d17fbca4da7b 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -557,7 +557,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
pmd_t pmd = *pmdp;
 
pmd_clear(pmdp);
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index a8c8fd7f06f8..7afaad9c6e6f 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -160,7 +160,8 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
pte_t pte)
 }
 EXPORT_SYMBOL(__page_table_check_pte_clear);
 
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd)
 {
if (_mm == mm)
return;
@@ -204,7 +205,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pmd_clear(mm, *pmdp

[PATCH v11 04/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"

2024-03-27 Thread Rohan McLure
This reverts commit 931c38e16499a057e30a3033f4d6a9c242f0f156.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 82bbe115a1a4..e35b2b4f5ea1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1356,7 +1356,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pud_t pud = native_pudp_get_and_clear(pudp);
 
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, addr, pud);
 
return pud;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 9243c920ed02..d01a00ffc1f9 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops;
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
@@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
__page_table_check_pmd_clear(mm, pmd);
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_clear(mm, pud);
+   __page_table_check_pud_clear(mm, addr, pud);
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
@@ -125,7 +127,8 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
 {
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b2b4c1160d4a..6a5c44c2208e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -570,7 +570,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
pud_t pud = *pudp;
 
pud_clear(pudp);
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, address, pud);
 
return pud;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 3a338fee6d00..a8c8fd7f06f8 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -171,7 +171,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_clear);
 
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud)
 {
if (_mm == mm)
return;
@@ -217,7 +218,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pud_clear(mm, *pudp);
+   __page_table_check_pud_clear(mm, addr, *pudp);
if (pud_user_accessible_page(pud)) {
page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT,
 pud_write(pud));
-- 
2.44.0



[PATCH v11 00/11] Support page table check PowerPC

2024-03-27 Thread Rohan McLure
Support page table check on all PowerPC platforms. This works by
serialising assignments, reassignments and clears of page table
entries at each level in order to ensure that anonymous mappings
have at most one writable consumer, and likewise that file-backed
mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at()
and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings.

v11:
 * The pud_pfn() stub, which previously had no legitimate users on any
   powerpc platform, now has users in Book3s64 with transparent pages.
   Include a stub of the same name for each platform that does not
   define their own.
 * Drop patch that standardised use of p*d_leaf(), as already included
   upstream in v6.9.
 * Provide fallback definitions of p{m,u}d_user_accessible_page() that
   do not reference p*d_leaf(), p*d_pte(), as they are defined after
   powerpc/mm headers by linux/mm headers.
 * Ensure that set_pte_at_unchecked() has the same checks as
   set_pte_at().

v10:
 * Revert patches that removed address and mm parameters from page table
   check routines, including consuming code from arm64, x86_64 and
   riscv.
 * Implement *_user_accessible_page() routines in terms of pte_user()
   where available (64-bit, book3s) but otherwise by checking the
   address (on platforms where the pte does not imply whether the
   mapping is for user or kernel) 
 * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which
   is identical, but prevents double instrumentation.
Link: 
https://lore.kernel.org/linuxppc-dev/20240313042118.230397-9-rmcl...@linux.ibm.com/T/

v9:
 * Adapt to using the set_ptes() API, using __set_pte_at() where we need
   must avoid instrumentation.
 * Use the logic of *_access_permitted() for implementing
   *_user_accessible_page(), which are required routines for page table
   check.
 * Even though we no longer need p{m,u,4}d_leaf(), still default
   implement these to assist in refactoring out extant
   p{m,u,4}_is_leaf().
 * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
   page table check wants all *user_accessible_page() variants, and we
   would like to default implement the variants in terms of
   pte_user_accessible_page().
 * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
   its constituent calls instead for radix and hash.
Link: 
https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (11):
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_set"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_set"
  mm: Provide addr parameter to page_table_check_pte_set()
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pte_clear"
  mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
  powerpc: mm: Add pud_pfn() stub
  poweprc: mm: Implement *_user_accessible_page() for ptes
  powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal
usages
  powerpc: mm: Support page table check

 arch/arm64/include/asm/pgtable.h | 18 +++---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 12 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 62 +++---
 arch/powerpc/include/asm/nohash/pgtable.h|  5 ++
 arch/powerpc/include/asm/pgtable.h   | 18 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  6 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 17 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 11 ++--
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c| 12 
 arch/powerpc/mm/pgtable_32.c |  2 +-

[PATCH v11 01/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set"

2024-03-27 Thread Rohan McLure
This reverts commit 6d144436d954311f2dbacb5bf7b084042448d83e.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..7334e5526185 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -568,7 +568,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud),
PUD_SIZE >> PAGE_SHIFT);
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 20242402fc11..1e0c0717b3f9 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -719,7 +719,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at((pte_t *)pudp, pud_pte(pud));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 315535ffb258..09db55fa8856 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1245,7 +1245,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
native_set_pud(pudp, pud);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 6722941c7cb8..d188428512f5 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,7 +20,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t 
pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud);
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+   pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
unsigned long addr,
pmd_t pmd);
@@ -83,13 +84,14 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
__page_table_check_pmd_set(mm, pmdp, pmd);
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_set(mm, pudp, pud);
+   __page_table_check_pud_set(mm, addr, pudp, pud);
 }
 
 static inline void page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -134,7 +136,8 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
 {
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index af69c3c8f7c2..75167537ebd7 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -210,7 +210,8 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t 
*pmdp, pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_set);
 
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t

[PATCH v10 07/12] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()

2024-03-12 Thread Rohan McLure
On several powerpc platforms, a page table entry may not imply whether
the relevant mapping is for userspace or kernelspace. Instead, such
platforms infer this by the address which is being accessed.

Add an additional address argument to each of these routines in order to
provide support for page table check on powerpc.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  6 +++---
 arch/riscv/include/asm/pgtable.h |  6 +++---
 arch/x86/include/asm/pgtable.h   |  6 +++---
 mm/page_table_check.c| 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 74bb81744df2..8ea22deff9a3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -874,17 +874,17 @@ static inline int pgd_devmap(pgd_t pgd)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || 
pmd_user_exec(pmd));
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 2fa6625f591a..f5c937007590 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -722,17 +722,17 @@ static inline void set_pud_at(struct mm_struct *mm, 
unsigned long addr,
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && pte_user(pte);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && pmd_user(pmd);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && pud_user(pud);
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e8fd625de280..514374a27124 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1695,17 +1695,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) 
& _PAGE_USER);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) 
& _PAGE_USER);
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 98cccee74b02..aa5e16c8328e 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -155,7 +155,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pte_user_accessible_page(pte)) {
+   if (pte_user_accessible_page(pte, addr)) {
page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
}
 }
@@ -167,7 +167,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pmd_user_accessible_page(pmd)) {
+   if (pmd_user_accessible_page(pmd, addr)) {
page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT);
}
 }
@@ -179,7 +179,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pud_user_accessible_page(pud)) {
+   if (pud_user_accessible_page(pud, addr)) {
page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT);
}
 }
@@ -195,7 +195,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, 
unsigned long addr,
 
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, addr, ptep_get(ptep + i));
-   if (pte_use

[PATCH v10 03/12] mm: Provide addr parameter to page_table_check_pte_set()

2024-03-12 Thread Rohan McLure
To provide support for powerpc platforms, provide an address parameter
to the page_table_check_pte_set() routine. This parameter is needed on
some powerpc platforms which do not encode whether a mapping is for
user or kernel in the pte. On such platforms, this can be inferred
from the addr parameter.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 include/linux/page_table_check.h | 12 +++-
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  4 ++--
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e97c1b7e3ee1..965e35adb206 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -345,7 +345,7 @@ static inline void set_ptes(struct mm_struct *mm,
unsigned long __always_unused addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
__sync_cache_and_tags(pte, nr);
 
for (;;) {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4e1ef3a77879..a4b5da7f0704 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -530,7 +530,7 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pteval, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
 
for (;;) {
__set_pte_at(ptep, pteval);
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 5855d690c48a..9243c920ed02 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -17,8 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int 
order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -68,12 +68,13 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_ptes_set(mm, ptep, pte, nr);
+   __page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -129,7 +130,8 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index f6d0e3513948..5da04d056bc3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -238,7 +238,7 @@ static inline pte_t pte_next_pfn(pte_t pte)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 
arch_enter_lazy_mmu_mode();
for (;;) {
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7b9d7b45505d..3a338fee6d00 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -182,8 +182,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
pud_t pud)
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr)
 {
unsigned int i;
 
-- 
2.44.0



[PATCH v10 05/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear"

2024-03-12 Thread Rohan McLure
This reverts commit 1831414cd729a34af937d56ad684a66599de6344.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 965e35adb206..4210d3b071ec 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -965,7 +965,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(xchg_relaxed(_val(*pmdp), 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index a4b5da7f0704..cf8e18f27649 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -765,7 +765,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 3e7003b01c9d..26722f553c43 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1352,7 +1352,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm, unsigned long
 {
pmd_t pmd = native_pmdp_get_and_clear(pmdp);
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, addr, pmd);
 
return pmd;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d01a00ffc1f9..0a6ebfa46a31 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -15,7 +15,8 @@ extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
  pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
@@ -52,12 +53,13 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
__page_table_check_pte_clear(mm, pte);
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pmd_clear(mm, pmd);
+   __page_table_check_pmd_clear(mm, addr, pmd);
 }
 
 static inline void page_table_check_pud_clear(struct mm_struct *mm,
@@ -123,7 +125,8 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
 {
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 77877ae8abef..d0d1a0bbf905 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -531,7 +531,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
pmd_t pmd = *pmdp;
 
pmd_clear(pmdp);
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index a8c8fd7f06f8..7afaad9c6e6f 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -160,7 +160,8 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
pte_t pte)
 }
 EXPORT_SYMBOL(__page_table_check_pte_clear);
 
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd)
 {
if (_mm == mm)
return;
@@ -204,7 +205,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pmd_clear(mm, *pmdp

[PATCH v10 00/12] Support page table check PowerPC

2024-03-12 Thread Rohan McLure
Support page table check on all PowerPC platforms. This works by
serialising assignments, reassignments and clears of page table
entries at each level in order to ensure that anonymous mappings
have at most one writable consumer, and likewise that file-backed
mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at()
and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings.

v10:
 * Revert patches that removed address and mm parameters from page table
   check routines, including consuming code from arm64, x86_64 and
   riscv.
 * Implement *_user_accessible_page() routines in terms of pte_user()
   where available (64-bit, book3s) but otherwise by checking the
   address (on platforms where the pte does not imply whether the
   mapping is for user or kernel) 
 * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which
   is identical, but prevents double instrumentation.

v9:
 * Adapt to using the set_ptes() API, using __set_pte_at() where we need
   must avoid instrumentation.
 * Use the logic of *_access_permitted() for implementing
   *_user_accessible_page(), which are required routines for page table
   check.
 * Even though we no longer need p{m,u,4}d_leaf(), still default
   implement these to assist in refactoring out extant
   p{m,u,4}_is_leaf().
 * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
   page table check wants all *user_accessible_page() variants, and we
   would like to default implement the variants in terms of
   pte_user_accessible_page().
 * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
   its constituent calls instead for radix and hash.
Link: 
https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (12):
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_set"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_set"
  mm: Provide addr parameter to page_table_check_pte_set()
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pte_clear"
  mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
  powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
  powerpc: mm: Add common pud_pfn stub for all platforms
  poweprc: mm: Implement *_user_accessible_page() for ptes
  powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal
usages
  powerpc: mm: Support page table check

 arch/arm64/include/asm/pgtable.h | 18 ++---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 74 +++-
 arch/powerpc/include/asm/pgtable.h   | 66 ++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  6 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 17 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 25 ---
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c| 17 -
 arch/powerpc/mm/pgtable_32.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  6 +-
 arch/powerpc/xmon/xmon.c |  6 +-
 arch/riscv/include/asm/pgtable.h | 18 ++---
 arch/x86/include/asm/pgtable.h   | 20 +++---
 include/linux/page_table_check.h | 67 +++---
 include/linux/pgtable.h  |  8 +--
 mm/page_table_check.c| 39 ++-
 19 files changed, 261 insertions(+), 150 deletions(-)

-- 
2.44.0



[PATCH v10 08/12] powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf

2024-03-12 Thread Rohan McLure
Replace occurrences of p{u,m,4}d_is_leaf with p{u,m,4}_leaf, as the
latter is the name given to checking that a higher-level entry in
multi-level paging contains a page translation entry (pte) throughout
all other archs.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: No longer required in order to implement page table check, just a
refactor.
v10: Fix more occurances, and just delete p{u,m,4}_is_leaf() stubs as
equivalent p{u,m,4}_leaf() stubs already exist.
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 
 arch/powerpc/include/asm/pgtable.h   | 24 
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 14 ++--
 arch/powerpc/mm/pgtable.c|  6 ++---
 arch/powerpc/mm/pgtable_64.c |  6 ++---
 arch/powerpc/xmon/xmon.c |  6 ++---
 7 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 62c43d3d80ec..382724c5e872 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1443,16 +1443,14 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
 /*
  * Like pmd_huge() and pmd_large(), but works regardless of config options
  */
-#define pmd_is_leaf pmd_is_leaf
-#define pmd_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
 }
 
-#define pud_is_leaf pud_is_leaf
-#define pud_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9224f23065ff..0c0ffbe7a3b5 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -180,30 +180,6 @@ static inline void pte_frag_set(mm_context_t *ctx, void *p)
 }
 #endif
 
-#ifndef pmd_is_leaf
-#define pmd_is_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
-{
-   return false;
-}
-#endif
-
-#ifndef pud_is_leaf
-#define pud_is_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
-{
-   return false;
-}
-#endif
-
-#ifndef p4d_is_leaf
-#define p4d_is_leaf p4d_is_leaf
-static inline bool p4d_is_leaf(p4d_t p4d)
-{
-   return false;
-}
-#endif
-
 #define pmd_pgtable pmd_pgtable
 static inline pgtable_t pmd_pgtable(pmd_t pmd)
 {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 4a1abb9f7c05..408d98f8a514 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -503,7 +503,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_leaf(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -532,7 +532,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t 
*pud,
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
if (!pud_present(*p))
continue;
-   if (pud_is_leaf(*p)) {
+   if (pud_leaf(*p)) {
pud_clear(p);
} else {
pmd_t *pmd;
@@ -635,12 +635,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = pud_alloc_one(kvm->mm, gpa);
 
pmd = NULL;
-   if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
+   if (pud && pud_present(*pud) && !pud_leaf(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -658,7 +658,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = NULL;
}
pud = pud_offset(p4d, gpa);
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
 
/* Check if we raced and someone else has set the same thing */
@@ -709,7 +709,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset

[PATCH v10 12/12] powerpc: mm: Support page table check

2024-03-12 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: Updated for new API. Instrument pmdp_collapse_flush's two
constituent calls to avoid header hell
v10: Cause p{u,m}dp_huge_get_and_clear() to resemble one another
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 45 +++-
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  4 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  3 ++
 arch/powerpc/mm/pgtable.c|  4 ++
 7 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b9fc064d38d2..2dfa5ccb25cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 52971ee30717..a97edbc09984 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va);
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
@@ -314,7 +315,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ca765331e21d..4ad88d4ede88 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -145,6 +145,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -415,8 +417,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -425,11 +430,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -1334,19 +1344,34 @@ extern int pudp_test_and_clear_young(struct 
vm_area_struct *vma,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
 {
-   if (radix_enabled())
-   return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
-   return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   pmd_t old_pmd;
+
+   if (radix_enabled()) {
+   old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   } else {
+   old_pmd = hash__

[PATCH v10 02/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set"

2024-03-12 Thread Rohan McLure
This reverts commit a3b837130b5865521fa8662aceaa6ebc8d29389a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/riscv/include/asm/pgtable.h |  4 ++--
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index a965b59401b3..e97c1b7e3ee1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -540,7 +540,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd),
PMD_SIZE >> PAGE_SHIFT);
 }
@@ -1001,7 +1001,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(xchg_relaxed(_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4fc99dd3bb42..4e1ef3a77879 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -710,7 +710,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd));
 }
 
@@ -781,7 +781,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a9b1e8e6d4b9..6a7dc2524344 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1245,7 +1245,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t 
*pudp)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
set_pmd(pmdp, pmd);
 }
 
@@ -1390,7 +1390,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
if (IS_ENABLED(CONFIG_SMP)) {
return xchg(pmdp, pmd);
} else {
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d188428512f5..5855d690c48a 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -19,7 +19,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t 
pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
-void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+   pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -75,13 +76,14 @@ static inline void page_table_check_ptes_set(struct 
mm_struct *mm,
__page_table_check_ptes_set(mm, ptep, pte, nr);
 }
 
-static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *p

[PATCH v10 10/12] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-12 Thread Rohan McLure
Page table checking depends on architectures providing an
implementation of p{te,md,ud}_user_accessible_page. With
refactorisations made on powerpc/mm, the pte_access_permitted() and
similar methods verify whether a userland page is accessible with the
required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminay checks taken by pte_access_permitted()
on that platform.

Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
pte_user() is no longer required to be present on all platforms as it
may be equivalent to or implied by pte_read(). Hence implementations are
specialised.

Signed-off-by: Rohan McLure 
---
v9: New implementation
v10: Let book3s/64 use pte_user(), but otherwise default other platforms
to using the address provided with the call to infer whether it is a
user page or not. pmd/pud variants will warn on all other platforms, as
they should not be used for user page mappings
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++
 arch/powerpc/include/asm/pgtable.h   | 26 
 2 files changed, 45 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 382724c5e872..ca765331e21d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,12 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
 }
 
+#define pte_user_accessible_page pte_user_accessible_page
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -881,6 +887,7 @@ static inline int pud_present(pud_t pud)
 
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
+
 static inline pte_t pud_pte(pud_t pud)
 {
return __pte_raw(pud_raw(pud));
@@ -926,6 +933,12 @@ static inline bool pud_access_permitted(pud_t pud, bool 
write)
return pte_access_permitted(pud_pte(pud), write);
 }
 
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   return pte_user_accessible_page(pud_pte(pud), addr);
+}
+
 #define __p4d_raw(x)   ((p4d_t) { __pgd_raw(x) })
 static inline __be64 p4d_raw(p4d_t x)
 {
@@ -1091,6 +1104,12 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool 
write)
return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   return pte_user_accessible_page(pmd_pte(pmd), addr);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot);
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 13f661831333..3741a63fb82e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -227,6 +227,32 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+#ifndef pte_user_accessible_page
+#define pte_user_accessible_page pte_user_accessible_page
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+#endif
+
+#ifndef pmd_user_accessible_page
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   WARN_ONCE(1, "pmd: platform does not use pmd entries directly");
+   return false;
+}
+#endif
+
+#ifndef pud_user_accessible_page
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return false;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v10 11/12] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages

2024-03-12 Thread Rohan McLure
In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
is intended to be instrumented by the page table check facility. There
are however several other routines that constitute the API for setting
page table entries, including set_pmd_at() among others. Such routines
are themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc
must take care to avoid duplicate calls to
page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries
without instrumentation through the set_pte_at_unchecked() routine
introduced in this patch.

Cause API-facing routines that call set_pte_at() to instead call
set_pte_at_unchecked(), which will remain uninstrumented by page
table check. set_ptes() is itself implemented by calls to
__set_pte_at(), so this eliminates redundant code.

Also prefer set_pte_at_unchecked() in early-boot usages which should not be
instrumented.

Signed-off-by: Rohan McLure 
---
v9: New patch
v10: don't reuse __set_pte_at(), as that will not apply filters. Instead
use new set_pte_at_unchecked().
---
 arch/powerpc/include/asm/pgtable.h   | 2 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 6 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 
 arch/powerpc/mm/nohash/book3e_pgtable.c  | 2 +-
 arch/powerpc/mm/pgtable.c| 7 +++
 arch/powerpc/mm/pgtable_32.c | 2 +-
 7 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 3741a63fb82e..6ff1d8cfa216 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -44,6 +44,8 @@ struct mm_struct;
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
 #define set_ptes set_ptes
+void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
 #define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d69bc1..871472f99a01 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long 
pa, pgprot_t prot)
ptep = pte_alloc_kernel(pmdp, ea);
if (!ptep)
return -ENOMEM;
-   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pa >> 
PAGE_SHIFT, prot));
} else {
/*
 * If the mm subsystem is not fully up, we cannot create a
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 3438ab72c346..25082ab6018b 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_large(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
-   return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
+   return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
 }
 
 void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pud_large(pud)));
 #endif
trace_hugepage_set_pud(addr, pud_val(pud));
-   return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
+   return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
 }
 
 static void do_serialize(void *arg)
@@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
  ptep, old_pte, pte);
-   set_pte_at(vma->vm_mm, addr, ptep, pte);
+   set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 46fa46ce6526..c661e42bb2f1 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pfn, flags));
asm volatile("ptesync": : :"memory");
return 0;
 }
@@ -1522,7 +1522,7 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct 
*vma,
(atomic_read(>context.copros) > 0))
radix__flush_tlb_pa

[PATCH v10 06/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"

2024-03-12 Thread Rohan McLure
This reverts commit aa232204c4689427cefa55fe975692b57291523a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  7 ---
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4210d3b071ec..74bb81744df2 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -953,7 +953,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(xchg_relaxed(_val(*ptep), 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index cf8e18f27649..2fa6625f591a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -561,7 +561,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 26722f553c43..e8fd625de280 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1283,7 +1283,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
   pte_t *ptep)
 {
pte_t pte = native_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
return pte;
 }
 
@@ -1299,7 +1299,7 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
 * care about updates and native needs no locking
 */
pte = native_local_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 0a6ebfa46a31..48721a4a2b84 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled;
 extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
-void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
@@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
__page_table_check_zero(page, order);
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pte_clear(mm, pte);
+   __page_table_check_pte_clear(mm, addr, pte);
 }
 
 static inline void page_table_check_pmd_clear(struct mm_struct *mm,
@@ -121,7 +123,8 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
 {
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d0d1a0bbf905..89af325129f2 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -428,7 +428,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = ptep_get(ptep);
pte_clear(mm, address, ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
return pte;
 }
 #endif
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7afaad9c6e6f..98cccee74b02 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -149,7 +149,8 @@ void __page_table_check_zero(struct page *page, unsigned 
int order

[PATCH v10 04/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"

2024-03-12 Thread Rohan McLure
This reverts commit 931c38e16499a057e30a3033f4d6a9c242f0f156.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 6a7dc2524344..3e7003b01c9d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1363,7 +1363,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pud_t pud = native_pudp_get_and_clear(pudp);
 
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, addr, pud);
 
return pud;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 9243c920ed02..d01a00ffc1f9 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops;
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
@@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
__page_table_check_pmd_clear(mm, pmd);
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_clear(mm, pud);
+   __page_table_check_pud_clear(mm, addr, pud);
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
@@ -125,7 +127,8 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
 {
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5da04d056bc3..77877ae8abef 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -544,7 +544,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
pud_t pud = *pudp;
 
pud_clear(pudp);
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, address, pud);
 
return pud;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 3a338fee6d00..a8c8fd7f06f8 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -171,7 +171,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_clear);
 
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud)
 {
if (_mm == mm)
return;
@@ -217,7 +218,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pud_clear(mm, *pudp);
+   __page_table_check_pud_clear(mm, addr, *pudp);
if (pud_user_accessible_page(pud)) {
page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT,
 pud_write(pud));
-- 
2.44.0



[PATCH v10 01/12] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set"

2024-03-12 Thread Rohan McLure
This reverts commit 6d144436d954311f2dbacb5bf7b084042448d83e.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 79ce70fbb751..a965b59401b3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -548,7 +548,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud),
PUD_SIZE >> PAGE_SHIFT);
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 6066822e7396..4fc99dd3bb42 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -717,7 +717,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at((pte_t *)pudp, pud_pte(pud));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9d077bca6a10..a9b1e8e6d4b9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1252,7 +1252,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
native_set_pud(pudp, pud);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 6722941c7cb8..d188428512f5 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,7 +20,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t 
pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud);
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+   pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
unsigned long addr,
pmd_t pmd);
@@ -83,13 +84,14 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
__page_table_check_pmd_set(mm, pmdp, pmd);
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_set(mm, pudp, pud);
+   __page_table_check_pud_set(mm, addr, pudp, pud);
 }
 
 static inline void page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -134,7 +136,8 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
 {
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index af69c3c8f7c2..75167537ebd7 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -210,7 +210,8 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t 
*pmdp, pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_set);
 
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t

[PATCH v10 09/12] powerpc: mm: Add common pud_pfn stub for all platforms

2024-03-12 Thread Rohan McLure
Prior to this commit, pud_pfn was implemented with BUILD_BUG as the inline
function for 64-bit Book3S systems but is never included, as its
invocations in generic code are guarded by calls to pud_devmap which return
zero on such systems. A future patch will provide support for page table
checks, the generic code for which depends on a pud_pfn stub being
implemented, even while the patch will not interact with puds directly.

Remove the 64-bit Book3S stub and define pud_pfn to warn on all
platforms. pud_pfn may be defined properly on a per-platform basis
should it grow real usages in future.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 0c0ffbe7a3b5..13f661831333 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -213,6 +213,20 @@ static inline bool arch_supports_memmap_on_memory(unsigned 
long vmemmap_size)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * Currently only consumed by page_table_check_pud_{set,clear}. Since clears
+ * and sets to page table entries at any level are done through
+ * page_table_check_pte_{set,clear}, provide stub implementation.
+ */
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return 0;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



Re: [PATCH v9 4/7] powerpc: mm: Default p{m,u}d_pte implementations

2023-12-10 Thread Rohan McLure
On 11/30/23 18:35, Christophe Leroy wrote:
>
> Le 30/11/2023 à 03:53, Rohan McLure a écrit :
>> For 32-bit systems which have no usecases for p{m,u}d_pte() prior to
>> page table checking, implement default stubs.
> Is that the best solution ?
>
> If I understand correctly, it is only needed for 
> pmd_user_accessible_page(). Why not provide a stub 
> pmd_user_accessible_page() that returns false on those architectures ?
Yep, this seems reasonable to me.
>
> Same for pud_user_accessible_page()
>
> But if you decide to keep it I think that:
> - It should be squashed with following patch to make it clear it's 
> needed for that only.
> - Remove the WARN_ONCE().
I might however move those WARN_ONCE() calls to the default, false-returning
p{m,u}d_user_accessible_page() implementations, to be consistent with
pud_pfn().
> - Only have a special one for books/64 and a generic only common to he 3 
> others.
>
>> Signed-off-by: Rohan McLure 
>> ---
>> v9: New patch
>> ---
>>   arch/powerpc/include/asm/book3s/64/pgtable.h |  3 +++
>>   arch/powerpc/include/asm/nohash/64/pgtable.h |  2 ++
>>   arch/powerpc/include/asm/pgtable.h   | 17 +
>>   3 files changed, 22 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 8fdb7667c509..2454174b26cb 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -887,6 +887,8 @@ static inline int pud_present(pud_t pud)
>>   
>>   extern struct page *pud_page(pud_t pud);
>>   extern struct page *pmd_page(pmd_t pmd);
>> +
>> +#define pud_pte pud_pte
>>   static inline pte_t pud_pte(pud_t pud)
>>   {
>>  return __pte_raw(pud_raw(pud));
>> @@ -1043,6 +1045,7 @@ static inline void __kernel_map_pages(struct page 
>> *page, int numpages, int enabl
>>   }
>>   #endif
>>   
>> +#define pmd_pte pmd_pte
>>   static inline pte_t pmd_pte(pmd_t pmd)
>>   {
>>  return __pte_raw(pmd_raw(pmd));
>> diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
>> b/arch/powerpc/include/asm/nohash/64/pgtable.h
>> index f58cbebde26e..09a34fe196ae 100644
>> --- a/arch/powerpc/include/asm/nohash/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
>> @@ -93,6 +93,7 @@ static inline void pmd_clear(pmd_t *pmdp)
>>  *pmdp = __pmd(0);
>>   }
>>   
>> +#define pmd_pte pmd_pte
>>   static inline pte_t pmd_pte(pmd_t pmd)
>>   {
>>  return __pte(pmd_val(pmd));
>> @@ -134,6 +135,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
>>   
>>   extern struct page *pud_page(pud_t pud);
>>   
>> +#define pud_pte pud_pte
>>   static inline pte_t pud_pte(pud_t pud)
>>   {
>>  return __pte(pud_val(pud));
>> diff --git a/arch/powerpc/include/asm/pgtable.h 
>> b/arch/powerpc/include/asm/pgtable.h
>> index 9c0f2151f08f..d7d0f47760d3 100644
>> --- a/arch/powerpc/include/asm/pgtable.h
>> +++ b/arch/powerpc/include/asm/pgtable.h
>> @@ -233,6 +233,23 @@ static inline int pud_pfn(pud_t pud)
>>   }
>>   #endif
>>   
>> +#ifndef pmd_pte
>> +#define pmd_pte pmd_pte
>> +static inline pte_t pmd_pte(pmd_t pmd)
>> +{
>> +WARN_ONCE(1, "pmd: platform does not use pmd entries directly");
>> +return __pte(pmd_val(pmd));
>> +}
>> +#endif
>> +
>> +#ifndef pud_pte
>> +#define pud_pte pud_pte
>> +static inline pte_t pud_pte(pud_t pud)
>> +{
>> +WARN_ONCE(1, "pud: platform does not use pud entries directly");
>> +return __pte(pud_val(pud));
>> +}
>> +#endif
>>   #endif /* __ASSEMBLY__ */
>>   
>>   #endif /* _ASM_POWERPC_PGTABLE_H */



[PATCH v9 6/7] powerpc: mm: Use __set_pte_at() for early-boot / internal usages

2023-11-29 Thread Rohan McLure
In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
is intended to be instrumented by the page table check facility. There
are however several other routines that constitute the API for setting
page table entries, including set_pmd_at() among others. Such routines
are themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc
must take care to avoid duplicate calls to
page_table_check_p{te,md,ud}_set().

Cause API-facing routines that call set_pte_at() to instead call
__set_pte_at(), which will remain uninstrumented by page table check.
set_ptes() is itself implemented by calls to __set_pte_at(), so this
eliminates redundant code.

Also prefer __set_pte_at() in early-boot usages which should not be
instrumented.

Signed-off-by: Rohan McLure 
---
v9: New patch
---
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c   |  4 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 10 +-
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable_32.c |  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d69bc1..ae52c8db45b7 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long 
pa, pgprot_t prot)
ptep = pte_alloc_kernel(pmdp, ea);
if (!ptep)
return -ENOMEM;
-   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot));
+   __set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, 
prot), 0);
} else {
/*
 * If the mm subsystem is not fully up, we cannot create a
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index be229290a6a7..9a0a2accb261 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_large(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
-   return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
+   return __set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd), 0);
 }
 
 void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
  ptep, old_pte, pte);
-   set_pte_at(vma->vm_mm, addr, ptep, pte);
+   __set_pte_at(vma->vm_mm, addr, ptep, pte, 0);
 }
 
 /*
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 1f8db10693e3..ae4a5f66ccd2 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
+   __set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags), 0);
asm volatile("ptesync": : :"memory");
return 0;
 }
@@ -169,7 +169,7 @@ static int __map_kernel_page(unsigned long ea, unsigned 
long pa,
return -ENOMEM;
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
+   __set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags), 0);
asm volatile("ptesync": : :"memory");
return 0;
 }
@@ -1536,7 +1536,7 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct 
*vma,
(atomic_read(>context.copros) > 0))
radix__flush_tlb_page(vma, addr);
 
-   set_pte_at(mm, addr, ptep, pte);
+   __set_pte_at(mm, addr, ptep, pte, 0);
 }
 
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
@@ -1547,7 +1547,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t 
prot)
if (!radix_enabled())
return 0;
 
-   set_pte_at(_mm, 0 /* radix unused */, ptep, new_pud);
+   __set_pte_at(_mm, 0 /* radix unused */, ptep, new_pud, 0);
 
return 1;
 }
@@ -1594,7 +1594,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t 
prot)
if (!radix_enabled())
return 0;
 
-   set_pte_at(_mm, 0 /* radix unused */, ptep, new_pmd);
+   __set_pte_at(_mm, 0 /* radix unused */, ptep, new_pmd, 0);
 
return 1;
 }
diff --git a/arch/powerpc/mm/nohash/book3e_pgtable.c 
b/arch/powerpc/mm/nohash/book3e_pgtable.c
index 1c5e4ecbebeb..4de91b54fd89 100644
--- a/arch/powerpc/mm/nohash/book3e_pgtable.c
+++ b/arch/powerpc/mm/nohash/book3e_pgtable.c
@@ -111,7 +111,7 @@ int __ref 

[PATCH v9 2/7] powerpc: mm: Implement p{m,u,4}d_leaf on all platforms

2023-11-29 Thread Rohan McLure
The check that a higher-level entry in multi-level pages contains a page
translation entry (pte) is performed by p{m,u,4}d_leaf stubs, which may
be specialised for each choice of mmu. In a prior commit, we replace
uses to the catch-all stubs, p{m,u,4}d_is_leaf with p{m,u,4}d_leaf.

Replace the catch-all stub definitions for p{m,u,4}d_is_leaf with
definitions for p{m,u,4}d_leaf. A future patch will assume that
p{m,u,4}d_leaf is defined on all platforms.

In particular, implement pud_leaf for Book3E-64, pmd_leaf for all Book3E
and Book3S-64 platforms, with a catch-all definition for p4d_leaf.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: No longer required in order implement page table check, just
refactoring.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 -
 arch/powerpc/include/asm/nohash/64/pgtable.h |  6 ++
 arch/powerpc/include/asm/pgtable.h   | 22 ++--
 4 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 52971ee30717..9cc95a61d2a6 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -223,6 +223,11 @@ static inline void pmd_clear(pmd_t *pmdp)
*pmdp = __pmd(0);
 }
 
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+   return false;
+}
 
 /*
  * When flushing the tlb entry for a page, we also need to flush the hash
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cb77eddca54b..8fdb7667c509 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1459,16 +1459,14 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
 /*
  * Like pmd_huge() and pmd_large(), but works regardless of config options
  */
-#define pmd_is_leaf pmd_is_leaf
-#define pmd_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
 }
 
-#define pud_is_leaf pud_is_leaf
-#define pud_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 2202c78730e8..f58cbebde26e 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -116,6 +116,12 @@ static inline void pud_clear(pud_t *pudp)
*pudp = __pud(0);
 }
 
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
+{
+   return false;
+}
+
 #define pud_none(pud)  (!pud_val(pud))
 #definepud_bad(pud)(!is_kernel_addr(pud_val(pud)) \
 || (pud_val(pud) & PUD_BAD_BITS))
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9224f23065ff..db0231afca9d 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -180,29 +180,11 @@ static inline void pte_frag_set(mm_context_t *ctx, void 
*p)
 }
 #endif
 
-#ifndef pmd_is_leaf
-#define pmd_is_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define p4d_leaf p4d_leaf
+static inline bool p4d_leaf(p4d_t p4d)
 {
return false;
 }
-#endif
-
-#ifndef pud_is_leaf
-#define pud_is_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
-{
-   return false;
-}
-#endif
-
-#ifndef p4d_is_leaf
-#define p4d_is_leaf p4d_is_leaf
-static inline bool p4d_is_leaf(p4d_t p4d)
-{
-   return false;
-}
-#endif
 
 #define pmd_pgtable pmd_pgtable
 static inline pgtable_t pmd_pgtable(pmd_t pmd)
-- 
2.43.0



[PATCH v9 3/7] powerpc: mm: Add common pud_pfn stub for all platforms

2023-11-29 Thread Rohan McLure
Prior to this commit, pud_pfn was implemented with BUILD_BUG as the inline
function for 64-bit Book3S systems but is never included, as its
invocations in generic code are guarded by calls to pud_devmap which return
zero on such systems. A future patch will provide support for page table
checks, the generic code for which depends on a pud_pfn stub being
implemented, even while the patch will not interact with puds directly.

Remove the 64-bit Book3S stub and define pud_pfn to warn on all
platforms. pud_pfn may be defined properly on a per-platform basis
should it grow real usages in future.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index db0231afca9d..9c0f2151f08f 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -219,6 +219,20 @@ static inline bool arch_supports_memmap_on_memory(unsigned 
long vmemmap_size)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * Currently only consumed by page_table_check_pud_{set,clear}. Since clears
+ * and sets to page table entries at any level are done through
+ * page_table_check_pte_{set,clear}, provide stub implementation.
+ */
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return 0;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.43.0



[PATCH v9 1/7] powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf

2023-11-29 Thread Rohan McLure
Replace occurrences of p{u,m,4}d_is_leaf with p{u,m,4}_leaf, as the
latter is the name given to checking that a higher-level entry in
multi-level paging contains a page translation entry (pte) throughout
all other archs.

A future patch will implement p{u,m,4}_leaf stubs on all platforms so
that they may be referenced in generic code.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: No longer required in order to implement page table check, just a
refactor.
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 14 +++---
 arch/powerpc/mm/pgtable.c|  6 +++---
 arch/powerpc/mm/pgtable_64.c |  6 +++---
 arch/powerpc/xmon/xmon.c |  6 +++---
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 175a8eb2681f..fdb178fe754c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -498,7 +498,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_leaf(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -527,7 +527,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t 
*pud,
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
if (!pud_present(*p))
continue;
-   if (pud_is_leaf(*p)) {
+   if (pud_leaf(*p)) {
pud_clear(p);
} else {
pmd_t *pmd;
@@ -630,12 +630,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = pud_alloc_one(kvm->mm, gpa);
 
pmd = NULL;
-   if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
+   if (pud && pud_present(*pud) && !pud_leaf(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -653,7 +653,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = NULL;
}
pud = pud_offset(p4d, gpa);
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
 
/* Check if we raced and someone else has set the same thing */
@@ -704,7 +704,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
unsigned long lgpa = gpa & PMD_MASK;
 
/* Check if we raced and someone else has set the same thing */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c6a4ac766b2b..1f8db10693e3 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -204,14 +204,14 @@ static void radix__change_memory_range(unsigned long 
start, unsigned long end,
pudp = pud_alloc(_mm, p4dp, idx);
if (!pudp)
continue;
-   if (pud_is_leaf(*pudp)) {
+   if (pud_leaf(*pudp)) {
ptep = (pte_t *)pudp;
goto update_the_pte;
}
pmdp = pmd_alloc(_mm, pudp, idx);
if (!pmdp)
continue;
-   if (pmd_is_leaf(*pmdp)) {
+   if (pmd_leaf(*pmdp)) {
ptep = pmdp_ptep(pmdp);
goto update_the_pte;
}
@@ -767,7 +767,7 @@ static void __meminit remove_pmd_table(pmd_t *pmd_start, 
unsigned long addr,
if (!pmd_present(*pmd))
continue;
 
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
if (!direct)
@@ -807,7 +807,7 @@ static void __meminit remove_pud_table(pud_t *pud_start, 
unsigned long addr,
if (!pud_present(*pud))
continue;
 
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
 

[PATCH v9 7/7] powerpc: mm: Support page table check

2023-11-29 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: Updated for new API. Instrument pmdp_collapse_flush's two
constituent calls to avoid header hell
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 39 
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  4 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 13 +--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  3 ++
 arch/powerpc/mm/pgtable.c|  4 ++
 7 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..5bd6d367ef40 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index bd6f8cdd25aa..48f4e7b98340 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va);
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
@@ -319,7 +320,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index dd3e7b190ab7..834c997ba657 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -151,6 +151,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -421,8 +423,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -431,11 +436,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, old_pte);
+
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -1339,17 +1349,30 @@ extern int pudp_test_and_clear_young(struct 
vm_area_struct *vma,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
 {
-   if (radix_enabled())
-   return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
-   return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   pmd_t old_pmd;
+
+   if (radix_enabled()) {
+   old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   } else {
+   old_pmd = hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   }
+
+   page_table_check_pmd_clear(mm, old_pmd);
+

[PATCH v9 5/7] poweprc: mm: Implement *_user_accessible_page() for ptes

2023-11-29 Thread Rohan McLure
Page table checking depends on architectures providing an
implementation of p{te,md,ud}_user_accessible_page. With
refactorisations made on powerpc/mm, the pte_access_permitted() and
similar methods verify whether a userland page is accessible with the
required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminay checks taken by pte_access_permitted()
on that platform.

Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
pte_user() is no longer required to be present on all platforms as it
may be equivalent to or implied by pte_read(). Hence implementations are
specialised.

Signed-off-by: Rohan McLure 
---
v9: New implementation
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  5 +
 arch/powerpc/include/asm/nohash/pgtable.h|  5 +
 arch/powerpc/include/asm/pgtable.h   | 15 +++
 4 files changed, 30 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 9cc95a61d2a6..bd6f8cdd25aa 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -441,6 +441,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+   return pte_present(pte) && pte_read(pte);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 2454174b26cb..dd3e7b190ab7 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -544,6 +544,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
 }
 
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+   return pte_present(pte) && pte_user(pte) && pte_read(pte);
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 427db14292c9..33b4a4267f66 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+   return pte_present(pte) && pte_read(pte);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index d7d0f47760d3..661bf3afca37 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -250,6 +250,21 @@ static inline pte_t pud_pte(pud_t pud)
return __pte(pud_val(pud));
 }
 #endif
+
+static inline bool pmd_user_accessible_page(pmd_t pmd)
+{
+   pte_t pte = pmd_pte(pmd);
+
+   return pte_user_accessible_page(pte);
+}
+
+static inline bool pud_user_accessible_page(pud_t pud)
+{
+   pte_t pte = pud_pte(pud);
+
+   return pte_user_accessible_page(pte);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.43.0



[PATCH v9 4/7] powerpc: mm: Default p{m,u}d_pte implementations

2023-11-29 Thread Rohan McLure
For 32-bit systems which have no usecases for p{m,u}d_pte() prior to
page table checking, implement default stubs.

Signed-off-by: Rohan McLure 
---
v9: New patch
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  3 +++
 arch/powerpc/include/asm/nohash/64/pgtable.h |  2 ++
 arch/powerpc/include/asm/pgtable.h   | 17 +
 3 files changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8fdb7667c509..2454174b26cb 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -887,6 +887,8 @@ static inline int pud_present(pud_t pud)
 
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
+
+#define pud_pte pud_pte
 static inline pte_t pud_pte(pud_t pud)
 {
return __pte_raw(pud_raw(pud));
@@ -1043,6 +1045,7 @@ static inline void __kernel_map_pages(struct page *page, 
int numpages, int enabl
 }
 #endif
 
+#define pmd_pte pmd_pte
 static inline pte_t pmd_pte(pmd_t pmd)
 {
return __pte_raw(pmd_raw(pmd));
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index f58cbebde26e..09a34fe196ae 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -93,6 +93,7 @@ static inline void pmd_clear(pmd_t *pmdp)
*pmdp = __pmd(0);
 }
 
+#define pmd_pte pmd_pte
 static inline pte_t pmd_pte(pmd_t pmd)
 {
return __pte(pmd_val(pmd));
@@ -134,6 +135,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
 
 extern struct page *pud_page(pud_t pud);
 
+#define pud_pte pud_pte
 static inline pte_t pud_pte(pud_t pud)
 {
return __pte(pud_val(pud));
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9c0f2151f08f..d7d0f47760d3 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -233,6 +233,23 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+#ifndef pmd_pte
+#define pmd_pte pmd_pte
+static inline pte_t pmd_pte(pmd_t pmd)
+{
+   WARN_ONCE(1, "pmd: platform does not use pmd entries directly");
+   return __pte(pmd_val(pmd));
+}
+#endif
+
+#ifndef pud_pte
+#define pud_pte pud_pte
+static inline pte_t pud_pte(pud_t pud)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return __pte(pud_val(pud));
+}
+#endif
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.43.0



[PATCH v9 0/7] Support page table check

2023-11-29 Thread Rohan McLure
Support the page table check sanitiser on all PowerPC platforms. This
sanitiser works by serialising assignments, reassignments and clears of
page table entries at each level in order to ensure that anonymous
mappings have at most one writable consumer, and likewise that
file-backed mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at
and set_pte, to allow for internal, uninstrumented mappings.

v9:
 * Adapt to using the set_ptes() API, using __set_pte_at() where we need
   must avoid instrumentation.
 * Use the logic of *_access_permitted() for implementing
   *_user_accessible_page(), which are required routines for page table
   check.
 * Even though we no longer need p{m,u,4}d_leaf(), still default
   implement these to assist in refactoring out extant
   p{m,u,4}_is_leaf().
 * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
   page table check wants all *user_accessible_page() variants, and we
   would like to default implement the variants in terms of
   pte_user_accessible_page().
 * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
   its constituent calls instead for radix and hash.

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (7):
  powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
  powerpc: mm: Implement p{m,u,4}d_leaf on all platforms
  powerpc: mm: Add common pud_pfn stub for all platforms
  powerpc: mm: Default p{m,u}d_pte implementations
  poweprc: mm: Implement *_user_accessible_page() for ptes
  powerpc: mm: Use __set_pte_at() for early-boot / internal usages
  powerpc: mm: Support page table check

 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 17 -
 arch/powerpc/include/asm/book3s/64/pgtable.h | 57 
 arch/powerpc/include/asm/nohash/64/pgtable.h |  8 +++
 arch/powerpc/include/asm/nohash/pgtable.h|  5 ++
 arch/powerpc/include/asm/pgtable.h   | 68 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  6 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 17 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 27 
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c| 10 ++-
 arch/powerpc/mm/pgtable_32.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  6 +-
 arch/powerpc/xmon/xmon.c |  6 +-
 15 files changed, 173 insertions(+), 71 deletions(-)

-- 
2.43.0



[PATCH 3/3] powerpc/64: Only warn for kuap locked when KCSAN not present

2023-11-26 Thread Rohan McLure
Arbitrary instrumented locations, including syscall handlers, can call
arch_local_irq_restore() transitively when KCSAN is enabled, and in turn
also replay_soft_interrupts_irqrestore(). The precondition on entry to
this routine that is checked is that KUAP is enabled (user access
prohibited). Failure to meet this condition only triggers a warning
however, and afterwards KUAP is enabled anyway. That is, KUAP being
disabled on entry is in fact permissable, but not possible on an
uninstrumented kernel.

Disable this assertion only when KCSAN is enabled.

Suggested-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/irq_64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq_64.c b/arch/powerpc/kernel/irq_64.c
index 938e66829eae..1b7e8ebb052a 100644
--- a/arch/powerpc/kernel/irq_64.c
+++ b/arch/powerpc/kernel/irq_64.c
@@ -189,7 +189,8 @@ static inline __no_kcsan void 
replay_soft_interrupts_irqrestore(void)
 * and re-locking AMR but we shouldn't get here in the first place,
 * hence the warning.
 */
-   kuap_assert_locked();
+   if (!IS_ENABLED(CONFIG_KCSAN))
+   kuap_assert_locked();
 
if (kuap_state != AMR_KUAP_BLOCKED)
set_kuap(AMR_KUAP_BLOCKED);
-- 
2.43.0



[PATCH 2/3] powerpc: Apply __always_inline to interrupt_{enter,exit}_prepare()

2023-11-26 Thread Rohan McLure
In keeping with the advice given by Documentation/core-api/entry.rst,
entry and exit handlers for interrupts should not be instrumented.
Guarantee that the interrupt_{enter,exit}_prepare() routines are inlined
so that they will inheret instrumentation from their caller.

KCSAN kernels were observed to compile without inlining these routines,
which would lead to grief on NMI handlers.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/interrupt.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index a4196ab1d016..5f9be87c01ca 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -150,7 +150,7 @@ static inline void booke_restore_dbcr0(void)
 #endif
 }
 
-static inline void interrupt_enter_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_enter_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC64
irq_soft_mask_set(IRQS_ALL_DISABLED);
@@ -215,11 +215,11 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs)
  * However interrupt_nmi_exit_prepare does return directly to regs, because
  * NMIs do not do "exit work" or replay soft-masked interrupts.
  */
-static inline void interrupt_exit_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_exit_prepare(struct pt_regs *regs)
 {
 }
 
-static inline void interrupt_async_enter_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_async_enter_prepare(struct pt_regs *regs)
 {
 #ifdef CONFIG_PPC64
/* Ensure interrupt_enter_prepare does not enable MSR[EE] */
@@ -238,7 +238,7 @@ static inline void interrupt_async_enter_prepare(struct 
pt_regs *regs)
irq_enter();
 }
 
-static inline void interrupt_async_exit_prepare(struct pt_regs *regs)
+static __always_inline void interrupt_async_exit_prepare(struct pt_regs *regs)
 {
/*
 * Adjust at exit so the main handler sees the true NIA. This must
@@ -278,7 +278,7 @@ static inline bool nmi_disables_ftrace(struct pt_regs *regs)
return true;
 }
 
-static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+static __always_inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, 
struct interrupt_nmi_state *state)
 {
 #ifdef CONFIG_PPC64
state->irq_soft_mask = local_paca->irq_soft_mask;
@@ -340,7 +340,7 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
nmi_enter();
 }
 
-static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct 
interrupt_nmi_state *state)
+static __always_inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, 
struct interrupt_nmi_state *state)
 {
if (mfmsr() & MSR_DR) {
// nmi_exit if relocations are on
-- 
2.43.0



[PATCH 1/3] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2023-11-26 Thread Rohan McLure
Prior to this patch, data races are detectable by KCSAN of the following
forms:

[1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
or otherwise outside of a critical section
[2] Interrupted critical sections, where the interrupt will itself
acquire a lock

In case [1], calling context does not need an mmiowb() call to be
issued, otherwise it would do so itself. Such calls to
mmiowb_set_pending() are either idempotent or no-ops.

In case [2], irrespective of when the interrupt occurs, the interrupt
will acquire and release its locks prior to its return, nesting_count
will continue balanced. In the worst case, the interrupted critical
section during a mmiowb_spin_unlock() call observes an mmiowb to be
pending and afterward is interrupted, leading to an extraneous call to
mmiowb(). This data race is clearly innocuous.

Resolve KCSAN warnings of type [1] by means of READ_ONCE, WRITE_ONCE.
As increments and decrements to nesting_count are balanced by interrupt
contexts, resolve type [2] warnings by simply revoking instrumentation,
with data_race() rather than READ_ONCE() and WRITE_ONCE(), the memory
consistency semantics of plain-accesses will still lead to correct
behaviour.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
Reported-by: Gautam Menghani 
Tested-by: Gautam Menghani 
Acked-by: Arnd Bergmann 
---
Previously discussed here:
https://lore.kernel.org/linuxppc-dev/20230510033117.1395895-4-rmcl...@linux.ibm.com/
But pushed back due to affecting other architectures. Reissuing, to
linuxppc-dev, as it does not enact a functional change.
---
 include/asm-generic/mmiowb.h | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..f8c7c8a84e9e 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -37,25 +37,28 @@ static inline void mmiowb_set_pending(void)
struct mmiowb_state *ms = __mmiowb_state();
 
if (likely(ms->nesting_count))
-   ms->mmiowb_pending = ms->nesting_count;
+   WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
 }
 
 static inline void mmiowb_spin_lock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
-   ms->nesting_count++;
+
+   /* Increment need not be atomic. Nestedness is balanced over 
interrupts. */
+   data_race(ms->nesting_count++);
 }
 
 static inline void mmiowb_spin_unlock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
+   u16 pending = READ_ONCE(ms->mmiowb_pending);
 
-   if (unlikely(ms->mmiowb_pending)) {
-   ms->mmiowb_pending = 0;
+   WRITE_ONCE(ms->mmiowb_pending, 0);
+   if (unlikely(pending))
mmiowb();
-   }
 
-   ms->nesting_count--;
+   /* Decrement need not be atomic. Nestedness is balanced over 
interrupts. */
+   data_race(ms->nesting_count--);
 }
 #else
 #define mmiowb_set_pending()   do { } while (0)
-- 
2.43.0



Re: [PATCH v2 03/11] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2023-05-22 Thread Rohan McLure
On Wed May 10, 2023 at 1:31 PM AEST, Rohan McLure wrote:
> Prior to this patch, data races are detectable by KCSAN of the following
> forms:
> 
> [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
>or otherwise outside of a critical section
> [2] Interrupted critical sections, where the interrupt will itself
>acquire a lock
> 
> In case [1], calling context does not need an mmiowb() call to be
> issued, otherwise it would do so itself. Such calls to
> mmiowb_set_pending() are either idempotent or no-ops.
> 
> In case [2], irrespective of when the interrupt occurs, the interrupt
> will acquire and release its locks prior to its return, nesting_count
> will continue balanced. In the worst case, the interrupted critical
> section during a mmiowb_spin_unlock() call observes an mmiowb to be
> pending and afterward is interrupted, leading to an extraneous call to
> mmiowb(). This data race is clearly innocuous.
> 
> Mark all potentially asynchronous memory accesses with READ_ONCE or
> WRITE_ONCE, including increments and decrements to nesting_count. This
> has the effect of removing KCSAN warnings at consumer's callsites.
> 
> Signed-off-by: Rohan McLure 
> Reported-by: Michael Ellerman 
> Reported-by: Gautam Menghani 
> Tested-by: Gautam Menghani 
> Acked-by: Arnd Bergmann 
> ---
> v2: Remove extraneous READ_ONCE in mmiowb_set_pending for nesting_count
> ---
> include/asm-generic/mmiowb.h | 14 +-
> 1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
> index 5698fca3bf56..6dea28c8835b 100644
> --- a/include/asm-generic/mmiowb.h
> +++ b/include/asm-generic/mmiowb.h
> @@ -37,25 +37,29 @@ static inline void mmiowb_set_pending(void)
>   struct mmiowb_state *ms = __mmiowb_state();
> 
>   if (likely(ms->nesting_count))
> - ms->mmiowb_pending = ms->nesting_count;
> + WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
> }
> 
> static inline void mmiowb_spin_lock(void)
> {
>   struct mmiowb_state *ms = __mmiowb_state();
> - ms->nesting_count++;
> +
> + /* Increment need not be atomic. Nestedness is balanced over 
> interrupts. */
> + WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) + 1);
> }
> 
> static inline void mmiowb_spin_unlock(void)
> {
>   struct mmiowb_state *ms = __mmiowb_state();
> + u16 pending = READ_ONCE(ms->mmiowb_pending);
> 
> - if (unlikely(ms->mmiowb_pending)) {
> - ms->mmiowb_pending = 0;
> + WRITE_ONCE(ms->mmiowb_pending, 0);
> + if (unlikely(pending)) {
>   mmiowb();
>   }
> 
> - ms->nesting_count--;
> + /* Decrement need not be atomic. Nestedness is balanced over 
> interrupts. */
> + WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) - 1);

Still think the nesting_counts don't need WRITE_ONCE/READ_ONCE.
data_race() maybe but I don't know if it's even classed as a data
race. How does KCSAN handle/annotate preempt_count, for example?

Thanks,
Nick

Re: [PATCH v2 03/11] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2023-05-22 Thread Rohan McLure
On 23 May 2023, at 10:28 am, Rohan McLure  wrote:
> 
> On Wed May 10, 2023 at 1:31 PM AEST, Rohan McLure wrote:
>> Prior to this patch, data races are detectable by KCSAN of the following
>> forms:
>> 
>> [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
>>or otherwise outside of a critical section
>> [2] Interrupted critical sections, where the interrupt will itself
>>acquire a lock
>> 
>> In case [1], calling context does not need an mmiowb() call to be
>> issued, otherwise it would do so itself. Such calls to
>> mmiowb_set_pending() are either idempotent or no-ops.
>> 
>> In case [2], irrespective of when the interrupt occurs, the interrupt
>> will acquire and release its locks prior to its return, nesting_count
>> will continue balanced. In the worst case, the interrupted critical
>> section during a mmiowb_spin_unlock() call observes an mmiowb to be
>> pending and afterward is interrupted, leading to an extraneous call to
>> mmiowb(). This data race is clearly innocuous.
>> 
>> Mark all potentially asynchronous memory accesses with READ_ONCE or
>> WRITE_ONCE, including increments and decrements to nesting_count. This
>> has the effect of removing KCSAN warnings at consumer's callsites.
>> 
>> Signed-off-by: Rohan McLure 
>> Reported-by: Michael Ellerman 
>> Reported-by: Gautam Menghani 
>> Tested-by: Gautam Menghani 
>> Acked-by: Arnd Bergmann 
>> ---
>> v2: Remove extraneous READ_ONCE in mmiowb_set_pending for nesting_count
>> ---
>> include/asm-generic/mmiowb.h | 14 +-
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
>> index 5698fca3bf56..6dea28c8835b 100644
>> --- a/include/asm-generic/mmiowb.h
>> +++ b/include/asm-generic/mmiowb.h
>> @@ -37,25 +37,29 @@ static inline void mmiowb_set_pending(void)
>> struct mmiowb_state *ms = __mmiowb_state();
>> 
>> if (likely(ms->nesting_count))
>> - ms->mmiowb_pending = ms->nesting_count;
>> + WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
>> }
>> 
>> static inline void mmiowb_spin_lock(void)
>> {
>> struct mmiowb_state *ms = __mmiowb_state();
>> - ms->nesting_count++;
>> +
>> + /* Increment need not be atomic. Nestedness is balanced over interrupts. */
>> + WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) + 1);
>> }
>> 
>> static inline void mmiowb_spin_unlock(void)
>> {
>> struct mmiowb_state *ms = __mmiowb_state();
>> + u16 pending = READ_ONCE(ms->mmiowb_pending);
>> 
>> - if (unlikely(ms->mmiowb_pending)) {
>> - ms->mmiowb_pending = 0;
>> + WRITE_ONCE(ms->mmiowb_pending, 0);
>> + if (unlikely(pending)) {
>> mmiowb();
>> }
>> 
>> - ms->nesting_count--;
>> + /* Decrement need not be atomic. Nestedness is balanced over interrupts. */
>> + WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) - 1);
> 
> Still think the nesting_counts don't need WRITE_ONCE/READ_ONCE.
> data_race() maybe but I don't know if it's even classed as a data
> race. How does KCSAN handle/annotate preempt_count, for example?

Wow sorry my mail client has some unhelpful keybindings - I don’t know why it
thought I’d want to resend your last item!

Yeah I agree, we don’t need the compiler guarantees of WRITE_ONCE/READ_ONCE, and
yet it’s also not a real data-race. I think I’ll apply data_race() and comment 
as
I’m still seeing KCSAN warnings here.

Just from inspection, it appears as if __preempt_count_{add,sub} are unmarked 
and
so likely to generate KCSAN warnings also, but also asm-generic/preempt.h I 
think
hasn’t been updated to address any such warnings.

> 
> Thanks,
> Nick




Re: [PATCH v2 05/11] powerpc: Mark accesses to power_save callback in arch_cpu_idle

2023-05-15 Thread Rohan McLure
> On 15 May 2023, at 3:50 pm, Nicholas Piggin  wrote:
> 
> On Wed May 10, 2023 at 1:31 PM AEST, Rohan McLure wrote:
>> The power_save callback can be overwritten by another core at boot time.
>> Specifically, null values will be replaced exactly once with the callback
>> suitable for the particular platform (PowerNV / pseries lpars), making
>> this value a good candidate for __ro_after_init.
>> 
>> Even with this the case, KCSAN sees unmarked reads to the callback
>> variable, and notices that unfortunate compiler reorderings could lead
>> to distinct function pointers being read. In reality this is impossible,
>> so don't instrument at this read.
>> 
>> Signed-off-by: Rohan McLure 
>> ---
>> v2: Mark instances at init where the callback is written to, and
>> data_race() read as there is no capacity for the value to change
>> underneath.
>> ---
>> arch/powerpc/kernel/idle.c | 9 ++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
>> index b1c0418b25c8..43d96c0e3b96 100644
>> --- a/arch/powerpc/kernel/idle.c
>> +++ b/arch/powerpc/kernel/idle.c
>> @@ -35,7 +35,7 @@ EXPORT_SYMBOL(cpuidle_disable);
>> 
>> static int __init powersave_off(char *arg)
>> {
>> - ppc_md.power_save = NULL;
>> + WRITE_ONCE(ppc_md.power_save, NULL);
>> cpuidle_disable = IDLE_POWERSAVE_OFF;
>> return 1;
>> }
> 
> Shouldn't need the WRITE_ONCE if you don't need a READ_ONCE. Does
> data_race work here too? What about the other writers? Does
> KCSAN know it's single threaded in early boot so skips marking,
> but perhaps this comes later? Would be good to have a little
> comment if so.

Apologies, yep I was meant to remove this WRITE_ONCE now that
the read-side has data_race. Sorry for the confusion.

> 
> Thanks,
> Nick
> 
>> @@ -43,10 +43,13 @@ __setup("powersave=off", powersave_off);
>> 
>> void arch_cpu_idle(void)
>> {
>> + /* power_save callback assigned only at init so no data race */
>> + void (*power_save)(void) = data_race(ppc_md.power_save);
>> +
>> ppc64_runlatch_off();
>> 
>> - if (ppc_md.power_save) {
>> - ppc_md.power_save();
>> + if (power_save) {
>> + power_save();
>> /*
>> * Some power_save functions return with
>> * interrupts enabled, some don't.
>> -- 
>> 2.37.2
> 



Re: [PATCH v2 08/11] powerpc: Mark writes registering ipi to host cpu through kvm and polling

2023-05-15 Thread Rohan McLure
> On 15 May 2023, at 3:53 pm, Nicholas Piggin  wrote:
> 
> On Wed May 10, 2023 at 1:31 PM AEST, Rohan McLure wrote:
>> Mark writes to hypervisor ipi state so that KCSAN recognises these
>> asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with
>> atomic writes. Mark asynchronous polls to this variable in
>> kvm_ppc_read_one_intr().
>> 
>> Signed-off-by: Rohan McLure 
> 
> What's the go with accesses in asm? Does it just assume you know
> what you're doing?

Exactly, KCSAN only emits instrumentation calls to around load/store 
instructions that the compiler itself generated. So by default, asm
accesses are not instrumented.

Thanks

> 
> Reviewed-by: Nicholas Piggin 
> 
>> ---
>> v2: Add read-side annotations to both polling locations in
>> kvm_ppc_read_one_intr().
>> ---
>> arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
>> arch/powerpc/kvm/book3s_hv_builtin.c | 4 ++--
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index bc57d058ad5b..d701df006c08 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -548,12 +548,12 @@ static inline void kvmppc_set_host_ipi(int cpu)
>> * pairs with the barrier in kvmppc_clear_host_ipi()
>> */
>> smp_mb();
>> - paca_ptrs[cpu]->kvm_hstate.host_ipi = 1;
>> + WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 1);
>> }
>> 
>> static inline void kvmppc_clear_host_ipi(int cpu)
>> {
>> - paca_ptrs[cpu]->kvm_hstate.host_ipi = 0;
>> + WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 0);
>> /*
>> * order clearing of host_ipi flag vs. processing of IPI messages
>> *
>> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
>> b/arch/powerpc/kvm/book3s_hv_builtin.c
>> index da85f046377a..0f5b021fa559 100644
>> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
>> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
>> @@ -406,7 +406,7 @@ static long kvmppc_read_one_intr(bool *again)
>> return 1;
>> 
>> /* see if a host IPI is pending */
>> - host_ipi = local_paca->kvm_hstate.host_ipi;
>> + host_ipi = READ_ONCE(local_paca->kvm_hstate.host_ipi);
>> if (host_ipi)
>> return 1;
>> 
>> @@ -466,7 +466,7 @@ static long kvmppc_read_one_intr(bool *again)
>> * meantime. If it's clear, we bounce the interrupt to the
>> * guest
>> */
>> - host_ipi = local_paca->kvm_hstate.host_ipi;
>> + host_ipi = READ_ONCE(local_paca->kvm_hstate.host_ipi);
>> if (unlikely(host_ipi != 0)) {
>> /* We raced with the host,
>> * we need to resend that IPI, bummer
>> -- 
>> 2.37.2
> 



[PATCH v2 09/11] powerpc: powernv: Annotate data races in opal events

2023-05-09 Thread Rohan McLure
The kopald thread handles opal events as they appear, but by polling a
static bit-vector in last_outstanding_events. Annotate these data races
accordingly. We are not at risk of missing events, but use of READ_ONCE,
WRITE_ONCE will assist readers in seeing that kopald only consumes the
events it is aware of when it is scheduled. Also removes extraneous
KCSAN warnings.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/opal-irqchip.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c 
b/arch/powerpc/platforms/powernv/opal-irqchip.c
index d55652b5f6fa..f9a7001dacb7 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -59,7 +59,7 @@ void opal_handle_events(void)
 
cond_resched();
}
-   last_outstanding_events = 0;
+   WRITE_ONCE(last_outstanding_events, 0);
if (opal_poll_events() != OPAL_SUCCESS)
return;
e = be64_to_cpu(events) & opal_event_irqchip.mask;
@@ -69,7 +69,7 @@ void opal_handle_events(void)
 
 bool opal_have_pending_events(void)
 {
-   if (last_outstanding_events & opal_event_irqchip.mask)
+   if (READ_ONCE(last_outstanding_events) & opal_event_irqchip.mask)
return true;
return false;
 }
@@ -124,7 +124,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
__be64 events;
 
opal_handle_interrupt(virq_to_hw(irq), );
-   last_outstanding_events = be64_to_cpu(events);
+   WRITE_ONCE(last_outstanding_events, be64_to_cpu(events));
if (opal_have_pending_events())
opal_wake_poller();
 
-- 
2.37.2



[PATCH v2 11/11] powerpc: Mark asynchronous accesses to irq_data

2023-05-09 Thread Rohan McLure
KCSAN revealed that while irq_data entries are written to either from
behind a mutex, or otherwise atomically, accesses to irq_data->hwirq can
occur asynchronously, without volatile annotation. Mark these accesses
with READ_ONCE to avoid unfortunate compiler reorderings and remove
KCSAN warnings.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/irq.c |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++--
 include/linux/irq.h   |  2 +-
 kernel/irq/irqdomain.c|  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6f7d4edaa0bc..4ac192755510 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -353,7 +353,7 @@ void do_softirq_own_stack(void)
 irq_hw_number_t virq_to_hw(unsigned int virq)
 {
struct irq_data *irq_data = irq_get_irq_data(virq);
-   return WARN_ON(!irq_data) ? 0 : irq_data->hwirq;
+   return WARN_ON(!irq_data) ? 0 : READ_ONCE(irq_data->hwirq);
 }
 EXPORT_SYMBOL_GPL(virq_to_hw);
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f851f4983423..141491e86bba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1986,7 +1986,7 @@ int64_t pnv_opal_pci_msi_eoi(struct irq_data *d)
struct pci_controller *hose = 
irq_data_get_irq_chip_data(d->parent_data);
struct pnv_phb *phb = hose->private_data;
 
-   return opal_pci_msi_eoi(phb->opal_id, d->parent_data->hwirq);
+   return opal_pci_msi_eoi(phb->opal_id, READ_ONCE(d->parent_data->hwirq));
 }
 
 /*
@@ -2162,11 +2162,11 @@ static void pnv_msi_compose_msg(struct irq_data *d, 
struct msi_msg *msg)
struct pnv_phb *phb = hose->private_data;
int rc;
 
-   rc = __pnv_pci_ioda_msi_setup(phb, pdev, d->hwirq,
+   rc = __pnv_pci_ioda_msi_setup(phb, pdev, READ_ONCE(d->hwirq),
  entry->pci.msi_attrib.is_64, msg);
if (rc)
dev_err(>dev, "Failed to setup %s-bit MSI #%ld : %d\n",
-   entry->pci.msi_attrib.is_64 ? "64" : "32", d->hwirq, 
rc);
+   entry->pci.msi_attrib.is_64 ? "64" : "32", 
data_race(d->hwirq), rc);
 }
 
 /*
@@ -2184,7 +2184,7 @@ static void pnv_msi_eoi(struct irq_data *d)
 * since it is translated into a vector number in
 * OPAL, use that directly.
 */
-   WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, d->hwirq));
+   WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, 
READ_ONCE(d->hwirq)));
}
 
irq_chip_eoi_parent(d);
@@ -2263,9 +2263,9 @@ static void pnv_irq_domain_free(struct irq_domain 
*domain, unsigned int virq,
struct pnv_phb *phb = hose->private_data;
 
pr_debug("%s bridge %pOF %d/%lx #%d\n", __func__, hose->dn,
-virq, d->hwirq, nr_irqs);
+virq, data_race(d->hwirq), nr_irqs);
 
-   msi_bitmap_free_hwirqs(>msi_bmp, d->hwirq, nr_irqs);
+   msi_bitmap_free_hwirqs(>msi_bmp, READ_ONCE(d->hwirq), nr_irqs);
/* XIVE domain is cleared through ->msi_free() */
 }
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index b1b28affb32a..a6888bcb3c5b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -452,7 +452,7 @@ static inline bool irqd_affinity_on_activate(struct 
irq_data *d)
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
 {
-   return d->hwirq;
+   return READ_ONCE(d->hwirq);
 }
 
 /**
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index f34760a1e222..dd9054494f84 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -549,7 +549,7 @@ static void irq_domain_disassociate(struct irq_domain 
*domain, unsigned int irq)
 "virq%i doesn't exist; cannot disassociate\n", irq))
return;
 
-   hwirq = irq_data->hwirq;
+   hwirq = READ_ONCE(irq_data->hwirq);
 
mutex_lock(>root->mutex);
 
@@ -948,7 +948,7 @@ struct irq_desc *__irq_resolve_mapping(struct irq_domain 
*domain,
if (irq_domain_is_nomap(domain)) {
if (hwirq < domain->hwirq_max) {
data = irq_domain_get_irq_data(domain, hwirq);
-   if (data && data->hwirq == hwirq)
+   if (data && READ_ONCE(data->hwirq) == hwirq)
desc = irq_data_to_desc(data);
if (irq && desc)
*irq = hwirq;
-- 
2.37.2



[PATCH v2 04/11] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid

2023-05-09 Thread Rohan McLure
Checks to see if the [H]SRR registers have been clobbered by (soft)
NMI interrupts imply the possibility for a data race on the
[h]srr_valid entries in the PACA. Annotate accesses to these fields with
READ_ONCE, removing the need for the barrier.

The diagnostic can use plain-access reads and writes, but annotate with
data_race.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h |  4 ++--
 arch/powerpc/kernel/interrupt.c   | 14 ++
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 0eb90a013346..9db8b16567e2 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -180,8 +180,8 @@ void do_syscall_trace_leave(struct pt_regs *regs);
 static inline void set_return_regs_changed(void)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
-   local_paca->hsrr_valid = 0;
-   local_paca->srr_valid = 0;
+   WRITE_ONCE(local_paca->hsrr_valid, 0);
+   WRITE_ONCE(local_paca->srr_valid, 0);
 #endif
 }
 
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index e34c72285b4e..1f033f11b871 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -125,7 +125,7 @@ static notrace void check_return_regs_valid(struct pt_regs 
*regs)
case 0x1600:
case 0x1800:
validp = _paca->hsrr_valid;
-   if (!*validp)
+   if (!READ_ONCE(*validp))
return;
 
srr0 = mfspr(SPRN_HSRR0);
@@ -135,7 +135,7 @@ static notrace void check_return_regs_valid(struct pt_regs 
*regs)
break;
default:
validp = _paca->srr_valid;
-   if (!*validp)
+   if (!READ_ONCE(*validp))
return;
 
srr0 = mfspr(SPRN_SRR0);
@@ -161,19 +161,17 @@ static notrace void check_return_regs_valid(struct 
pt_regs *regs)
 * such things will get caught most of the time, statistically
 * enough to be able to get a warning out.
 */
-   barrier();
-
-   if (!*validp)
+   if (!READ_ONCE(*validp))
return;
 
-   if (!warned) {
-   warned = true;
+   if (!data_race(warned)) {
+   data_race(warned = true);
printk("%sSRR0 was: %lx should be: %lx\n", h, srr0, regs->nip);
printk("%sSRR1 was: %lx should be: %lx\n", h, srr1, regs->msr);
show_regs(regs);
}
 
-   *validp = 0; /* fixup */
+   WRITE_ONCE(*validp, 0); /* fixup */
 #endif
 }
 
-- 
2.37.2



[PATCH v2 08/11] powerpc: Mark writes registering ipi to host cpu through kvm and polling

2023-05-09 Thread Rohan McLure
Mark writes to hypervisor ipi state so that KCSAN recognises these
asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with
atomic writes. Mark asynchronous polls to this variable in
kvm_ppc_read_one_intr().

Signed-off-by: Rohan McLure 
---
v2: Add read-side annotations to both polling locations in
kvm_ppc_read_one_intr().
---
 arch/powerpc/include/asm/kvm_ppc.h   | 4 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index bc57d058ad5b..d701df006c08 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -548,12 +548,12 @@ static inline void kvmppc_set_host_ipi(int cpu)
 * pairs with the barrier in kvmppc_clear_host_ipi()
 */
smp_mb();
-   paca_ptrs[cpu]->kvm_hstate.host_ipi = 1;
+   WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 1);
 }
 
 static inline void kvmppc_clear_host_ipi(int cpu)
 {
-   paca_ptrs[cpu]->kvm_hstate.host_ipi = 0;
+   WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 0);
/*
 * order clearing of host_ipi flag vs. processing of IPI messages
 *
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index da85f046377a..0f5b021fa559 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -406,7 +406,7 @@ static long kvmppc_read_one_intr(bool *again)
return 1;
 
/* see if a host IPI is pending */
-   host_ipi = local_paca->kvm_hstate.host_ipi;
+   host_ipi = READ_ONCE(local_paca->kvm_hstate.host_ipi);
if (host_ipi)
return 1;
 
@@ -466,7 +466,7 @@ static long kvmppc_read_one_intr(bool *again)
 * meantime. If it's clear, we bounce the interrupt to the
 * guest
 */
-   host_ipi = local_paca->kvm_hstate.host_ipi;
+   host_ipi = READ_ONCE(local_paca->kvm_hstate.host_ipi);
if (unlikely(host_ipi != 0)) {
/* We raced with the host,
 * we need to resend that IPI, bummer
-- 
2.37.2



[PATCH v2 03/11] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2023-05-09 Thread Rohan McLure
Prior to this patch, data races are detectable by KCSAN of the following
forms:

[1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
or otherwise outside of a critical section
[2] Interrupted critical sections, where the interrupt will itself
acquire a lock

In case [1], calling context does not need an mmiowb() call to be
issued, otherwise it would do so itself. Such calls to
mmiowb_set_pending() are either idempotent or no-ops.

In case [2], irrespective of when the interrupt occurs, the interrupt
will acquire and release its locks prior to its return, nesting_count
will continue balanced. In the worst case, the interrupted critical
section during a mmiowb_spin_unlock() call observes an mmiowb to be
pending and afterward is interrupted, leading to an extraneous call to
mmiowb(). This data race is clearly innocuous.

Mark all potentially asynchronous memory accesses with READ_ONCE or
WRITE_ONCE, including increments and decrements to nesting_count. This
has the effect of removing KCSAN warnings at consumer's callsites.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
Reported-by: Gautam Menghani 
Tested-by: Gautam Menghani 
Acked-by: Arnd Bergmann 
---
v2: Remove extraneous READ_ONCE in mmiowb_set_pending for nesting_count
---
 include/asm-generic/mmiowb.h | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..6dea28c8835b 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -37,25 +37,29 @@ static inline void mmiowb_set_pending(void)
struct mmiowb_state *ms = __mmiowb_state();
 
if (likely(ms->nesting_count))
-   ms->mmiowb_pending = ms->nesting_count;
+   WRITE_ONCE(ms->mmiowb_pending, ms->nesting_count);
 }
 
 static inline void mmiowb_spin_lock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
-   ms->nesting_count++;
+
+   /* Increment need not be atomic. Nestedness is balanced over 
interrupts. */
+   WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) + 1);
 }
 
 static inline void mmiowb_spin_unlock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
+   u16 pending = READ_ONCE(ms->mmiowb_pending);
 
-   if (unlikely(ms->mmiowb_pending)) {
-   ms->mmiowb_pending = 0;
+   WRITE_ONCE(ms->mmiowb_pending, 0);
+   if (unlikely(pending)) {
mmiowb();
}
 
-   ms->nesting_count--;
+   /* Decrement need not be atomic. Nestedness is balanced over 
interrupts. */
+   WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) - 1);
 }
 #else
 #define mmiowb_set_pending()   do { } while (0)
-- 
2.37.2



[PATCH v2 07/11] powerpc: Annotate accesses to ipi message flags

2023-05-09 Thread Rohan McLure
IPI message flags are observed and consequently consumed in the
smp_ipi_demux_relaxed function, which handles these message sources
until it observes none more arriving. Mark the checked loop guard with
READ_ONCE, to signal to KCSAN that the read is known to be volatile, and
that non-determinism is expected. Mark write for message source in
smp_muxed_ipi_set_message().

Signed-off-by: Rohan McLure 
---
v2: Add missing WRITE_ONCE() in smp_muxed_ipi_set_message().
---
 arch/powerpc/kernel/smp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6b90f10a6c81..fb35a147b4fa 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -289,7 +289,7 @@ void smp_muxed_ipi_set_message(int cpu, int msg)
 * Order previous accesses before accesses in the IPI handler.
 */
smp_mb();
-   message[msg] = 1;
+   WRITE_ONCE(message[msg], 1);
 }
 
 void smp_muxed_ipi_message_pass(int cpu, int msg)
@@ -348,7 +348,7 @@ irqreturn_t smp_ipi_demux_relaxed(void)
if (all & IPI_MESSAGE(PPC_MSG_NMI_IPI))
nmi_ipi_action(0, NULL);
 #endif
-   } while (info->messages);
+   } while (READ_ONCE(info->messages));
 
return IRQ_HANDLED;
 }
-- 
2.37.2



[PATCH v2 06/11] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention

2023-05-09 Thread Rohan McLure
The idle_state entry in the PACA on PowerNV features a bit which is
atomically tested and set through ldarx/stdcx. to be used as a spinlock.
This lock then guards access to other bit fields of idle_state. KCSAN
cannot differentiate between any of these bitfield accesses as they all
are implemented by 8-byte store/load instructions, thus cores contending
on the bit-lock appear to data race with modifications to idle_state.

Separate the bit-lock entry from the data guarded by the lock to avoid
the possibility of data races being detected by KCSAN.

Suggested-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
v2: Remove extraneous WRITE_ONCE on paca thread_idle_state, which are
only read diagnostically.
---
 arch/powerpc/include/asm/paca.h   |  1 +
 arch/powerpc/platforms/powernv/idle.c | 16 +---
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index da0377f46597..cb325938766a 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -191,6 +191,7 @@ struct paca_struct {
 #ifdef CONFIG_PPC_POWERNV
/* PowerNV idle fields */
/* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
+   unsigned long idle_lock; /* A value of 1 means acquired */
unsigned long idle_state;
union {
/* P7/P8 specific fields */
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 841cb7f31f4f..c1e0ecb014a5 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -246,9 +246,9 @@ static inline void atomic_lock_thread_idle(void)
 {
int cpu = raw_smp_processor_id();
int first = cpu_first_thread_sibling(cpu);
-   unsigned long *state = _ptrs[first]->idle_state;
+   unsigned long *lock = _ptrs[first]->idle_lock;
 
-   while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, 
state)))
+   while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, lock)))
barrier();
 }
 
@@ -258,29 +258,31 @@ static inline void 
atomic_unlock_and_stop_thread_idle(void)
int first = cpu_first_thread_sibling(cpu);
unsigned long thread = 1UL << cpu_thread_in_core(cpu);
unsigned long *state = _ptrs[first]->idle_state;
+   unsigned long *lock = _ptrs[first]->idle_lock;
u64 s = READ_ONCE(*state);
u64 new, tmp;
 
-   BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
+   BUG_ON(!(READ_ONCE(*lock) & PNV_CORE_IDLE_LOCK_BIT));
BUG_ON(s & thread);
 
 again:
-   new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
+   new = s | thread;
tmp = cmpxchg(state, s, new);
if (unlikely(tmp != s)) {
s = tmp;
goto again;
}
+   clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
 }
 
 static inline void atomic_unlock_thread_idle(void)
 {
int cpu = raw_smp_processor_id();
int first = cpu_first_thread_sibling(cpu);
-   unsigned long *state = _ptrs[first]->idle_state;
+   unsigned long *lock = _ptrs[first]->idle_lock;
 
-   BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
-   clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
+   BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, lock));
+   clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
 }
 
 /* P7 and P8 */
-- 
2.37.2



[PATCH v2 05/11] powerpc: Mark accesses to power_save callback in arch_cpu_idle

2023-05-09 Thread Rohan McLure
The power_save callback can be overwritten by another core at boot time.
Specifically, null values will be replaced exactly once with the callback
suitable for the particular platform (PowerNV / pseries lpars), making
this value a good candidate for __ro_after_init.

Even with this the case, KCSAN sees unmarked reads to the callback
variable, and notices that unfortunate compiler reorderings could lead
to distinct function pointers being read. In reality this is impossible,
so don't instrument at this read.

Signed-off-by: Rohan McLure 
---
v2: Mark instances at init where the callback is written to, and
data_race() read as there is no capacity for the value to change
underneath.
---
 arch/powerpc/kernel/idle.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index b1c0418b25c8..43d96c0e3b96 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -35,7 +35,7 @@ EXPORT_SYMBOL(cpuidle_disable);
 
 static int __init powersave_off(char *arg)
 {
-   ppc_md.power_save = NULL;
+   WRITE_ONCE(ppc_md.power_save, NULL);
cpuidle_disable = IDLE_POWERSAVE_OFF;
return 1;
 }
@@ -43,10 +43,13 @@ __setup("powersave=off", powersave_off);
 
 void arch_cpu_idle(void)
 {
+   /* power_save callback assigned only at init so no data race */
+   void (*power_save)(void) = data_race(ppc_md.power_save);
+
ppc64_runlatch_off();
 
-   if (ppc_md.power_save) {
-   ppc_md.power_save();
+   if (power_save) {
+   power_save();
/*
 * Some power_save functions return with
 * interrupts enabled, some don't.
-- 
2.37.2



[PATCH v2 01/11] powerpc: qspinlock: Mark accesses to qnode lock checks

2023-05-09 Thread Rohan McLure
The powerpc implemenation of qspinlocks will both poll and spin on the
bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
to KCSAN that polling is intentional here.

Signed-off-by: Rohan McLure 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/lib/qspinlock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index e4bd145255d0..b76c1f6acce5 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock 
*lock, struct qnode *
 
smp_rmb(); /* See __yield_to_locked_owner comment */
 
-   if (!node->locked) {
+   if (!READ_ONCE(node->locked)) {
yield_to_preempted(prev_cpu, yield_count);
spin_begin();
return preempted;
@@ -584,7 +584,7 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
 
/* Wait for mcs node lock to be released */
spin_begin();
-   while (!node->locked) {
+   while (!READ_ONCE(node->locked)) {
spec_barrier();
 
if (yield_to_prev(lock, node, old, paravirt))
-- 
2.37.2



[PATCH v2 10/11] powerpc: powernv: Annotate asynchronous access to opal tokens

2023-05-09 Thread Rohan McLure
The opal-async.c unit contains code for polling event sources, which
implies intentional data races. Ensure that the compiler will atomically
access such variables by means of {READ,WRITE}_ONCE calls, which in turn
inform KCSAN that polling behaviour is intended.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/platforms/powernv/opal-async.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index c094fdf5825c..282d2ac6fbb0 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -146,7 +146,7 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 * functional.
 */
opal_wake_poller();
-   wait_event(opal_async_wait, opal_async_tokens[token].state
+   wait_event(opal_async_wait, READ_ONCE(opal_async_tokens[token].state)
== ASYNC_TOKEN_COMPLETED);
memcpy(msg, _async_tokens[token].response, sizeof(*msg));
 
@@ -185,7 +185,7 @@ int opal_async_wait_response_interruptible(uint64_t token, 
struct opal_msg *msg)
 * interruptible version before doing anything else with the
 * token.
 */
-   if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED) {
+   if (READ_ONCE(opal_async_tokens[token].state) == ASYNC_TOKEN_ALLOCATED) 
{
spin_lock_irqsave(_async_comp_lock, flags);
if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED)
opal_async_tokens[token].state = ASYNC_TOKEN_DISPATCHED;
@@ -199,7 +199,7 @@ int opal_async_wait_response_interruptible(uint64_t token, 
struct opal_msg *msg)
 */
opal_wake_poller();
ret = wait_event_interruptible(opal_async_wait,
-   opal_async_tokens[token].state ==
+   READ_ONCE(opal_async_tokens[token].state) ==
ASYNC_TOKEN_COMPLETED);
if (!ret)
memcpy(msg, _async_tokens[token].response, sizeof(*msg));
-- 
2.37.2



[PATCH v2 02/11] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue

2023-05-09 Thread Rohan McLure
Annotate the release barrier and memory clobber (in effect, producing a
compiler barrier) in the publish_tail_cpu call. These barriers have the
effect of ensuring that qnode attributes are all written to prior to
publish the node to the waitqueue.

Even while the initial write to the 'locked' attribute is guaranteed to
terminate prior to the node being visible, KCSAN still complains that
the write is reorderable by the compiler. Issue a kcsan_release() to
inform KCSAN of the release barrier contained in publish_tail_cpu().

Signed-off-by: Rohan McLure 
---
v2: Remove extraneous compiler barrier, but annotate release-barrier
contained in call publish_tail_cpu(), and include kcsan_release().
---
 arch/powerpc/lib/qspinlock.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index b76c1f6acce5..253620979d0c 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -161,6 +161,8 @@ static __always_inline u32 publish_tail_cpu(struct 
qspinlock *lock, u32 tail)
 {
u32 prev, tmp;
 
+   kcsan_release();
+
asm volatile(
 "\t"   PPC_RELEASE_BARRIER "   \n"
 "1:lwarx   %0,0,%2 # publish_tail_cpu  \n"
@@ -570,6 +572,11 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
 
tail = encode_tail_cpu(node->cpu);
 
+   /*
+* Assign all attributes of a node before it can be published.
+* Issues an lwsync, serving as a release barrier, as well as a
+* compiler barrier.
+*/
old = publish_tail_cpu(lock, tail);
 
/*
-- 
2.37.2



[PATCH v2 00/11] powerpc: KCSAN fix warnings and mark accesses

2023-05-09 Thread Rohan McLure
v1 of this patch series available here:
Link: 
https://lore.kernel.org/linuxppc-dev/20230508020120.218494-1-rmcl...@linux.ibm.com/

The KCSAN sanitiser notifies programmers of instances where unmarked
accesses to shared state has lead to a data race, or when the compiler
has liberty to reorder an unmarked access and so generate a data race.
This patch series deals with benign data races, which nonetheless need
annotation in order to ensure the correctness of the emitted code.

In keeping with the principles given in
tools/memory-model/Documentation/access-marking.txt, racing reads of
shared state for purely diagnostic/debug purposes are annotated with
data_race, while reads/writes that are examples of intention polling of
shared variables are performed with READ_ONCE, WRITE_ONCE.

These changes remove the majority of warnings observable on pseries and
powernv, where for development, I was able to narrow down to only power
relevant bugs by temporarily disabling sanitisation for all other files.
Future patch series will deal with the subtler bugs which persist under
this configuration.

KCSAN races addressed:
 - qspinlock: assignign of qnode->locked and polling
 - check_return_regs_valid [h]srr_valid
 - arch_cpu_idle idle callback
 - powernv idle_state paca entry (polling the bit-lock is viewed by
   KCSAN as asynchronous access to the fields it protects)
 - Asynchronous access to irq_data->hwirq
 - Opal asynchronous event handling
 - IPIs

Miscellaneous other changes:

 - Annotate the asm-generic/mmiowb code, which riscv and powerpc each
   consume
 - Update usages of qnode->locked in powerpc's qspinlock interpretation
   to reflect the comment beside this field

v2:
 - Match READ_ONCE with WRITE_ONCE and vice versa where required
 - In arch/powerpc/lib/qspinlock.c, use kcsan_release() to notify KCSAN
   of locked being assigned prior to publish, and remove extraneous
   compiler barrier (publish_tail_cpu features memory clobber).
 - Keep polarity for locked variable in qspinlock
 - Remove extraneous READ_ONCE in mmiowb()
 - Use data_race() for power_save callback to remove instrumentation, as
   there is no real data race

Rohan McLure (11):
  powerpc: qspinlock: Mark accesses to qnode lock checks
  powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
  asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
  powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
  powerpc: Mark accesses to power_save callback in arch_cpu_idle
  powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
  powerpc: Annotate accesses to ipi message flags
  powerpc: Mark writes registering ipi to host cpu through kvm and
polling
  powerpc: powernv: Annotate data races in opal events
  powerpc: powernv: Annotate asynchronous access to opal tokens
  powerpc: Mark asynchronous accesses to irq_data

 arch/powerpc/include/asm/kvm_ppc.h|  4 ++--
 arch/powerpc/include/asm/paca.h   |  1 +
 arch/powerpc/include/asm/ptrace.h |  4 ++--
 arch/powerpc/kernel/idle.c|  9 ++---
 arch/powerpc/kernel/interrupt.c   | 14 ++
 arch/powerpc/kernel/irq.c |  2 +-
 arch/powerpc/kernel/smp.c |  4 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c  |  4 ++--
 arch/powerpc/lib/qspinlock.c  | 11 +--
 arch/powerpc/platforms/powernv/idle.c | 16 +---
 arch/powerpc/platforms/powernv/opal-async.c   |  6 +++---
 arch/powerpc/platforms/powernv/opal-irqchip.c |  6 +++---
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++--
 include/asm-generic/mmiowb.h  | 14 +-
 include/linux/irq.h   |  2 +-
 kernel/irq/irqdomain.c|  4 ++--
 16 files changed, 64 insertions(+), 49 deletions(-)

-- 
2.37.2



Re: [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention

2023-05-09 Thread Rohan McLure
> On 9 May 2023, at 12:26 pm, Nicholas Piggin  wrote:
> 
> On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
>> The idle_state entry in the PACA on PowerNV features a bit which is
>> atomically tested and set through ldarx/stdcx. to be used as a spinlock.
>> This lock then guards access to other bit fields of idle_state. KCSAN
>> cannot differentiate between any of these bitfield accesses as they all
>> are implemented by 8-byte store/load instructions, thus cores contending
>> on the bit-lock appear to data race with modifications to idle_state.
>> 
>> Separate the bit-lock entry from the data guarded by the lock to avoid
>> the possibility of data races being detected by KCSAN.
>> 
>> Suggested-by: Nicholas Piggin 
>> Signed-off-by: Rohan McLure 
>> ---
>> arch/powerpc/include/asm/paca.h   |  1 +
>> arch/powerpc/platforms/powernv/idle.c | 20 +++-
>> 2 files changed, 12 insertions(+), 9 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/paca.h 
>> b/arch/powerpc/include/asm/paca.h
>> index da0377f46597..cb325938766a 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -191,6 +191,7 @@ struct paca_struct {
>> #ifdef CONFIG_PPC_POWERNV
>> /* PowerNV idle fields */
>> /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
>> + unsigned long idle_lock; /* A value of 1 means acquired */
>> unsigned long idle_state;
>> union {
>> /* P7/P8 specific fields */
>> diff --git a/arch/powerpc/platforms/powernv/idle.c 
>> b/arch/powerpc/platforms/powernv/idle.c
>> index 841cb7f31f4f..97dbb7bc2b00 100644
>> --- a/arch/powerpc/platforms/powernv/idle.c
>> +++ b/arch/powerpc/platforms/powernv/idle.c
>> @@ -246,9 +246,9 @@ static inline void atomic_lock_thread_idle(void)
>> {
>> int cpu = raw_smp_processor_id();
>> int first = cpu_first_thread_sibling(cpu);
>> - unsigned long *state = _ptrs[first]->idle_state;
>> + unsigned long *lock = _ptrs[first]->idle_lock;
>> 
>> - while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, state)))
>> + while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, lock)))
>> barrier();
>> }
>> 
>> @@ -258,29 +258,31 @@ static inline void 
>> atomic_unlock_and_stop_thread_idle(void)
>> int first = cpu_first_thread_sibling(cpu);
>> unsigned long thread = 1UL << cpu_thread_in_core(cpu);
>> unsigned long *state = _ptrs[first]->idle_state;
>> + unsigned long *lock = _ptrs[first]->idle_lock;
>> u64 s = READ_ONCE(*state);
>> u64 new, tmp;
>> 
>> - BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
>> + BUG_ON(!(READ_ONCE(*lock) & PNV_CORE_IDLE_LOCK_BIT));
>> BUG_ON(s & thread);
>> 
>> again:
>> - new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
>> + new = s | thread;
>> tmp = cmpxchg(state, s, new);
>> if (unlikely(tmp != s)) {
>> s = tmp;
>> goto again;
>> }
>> + clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
> 
> Sigh, another atomic. It's in a slow path though so I won't get too
> upset. Would be nice to add a comment here and revert it when KCSCAN
> can be taught about this pattern though, so we don't lose it.
> 
>> }
>> 
>> static inline void atomic_unlock_thread_idle(void)
>> {
>> int cpu = raw_smp_processor_id();
>> int first = cpu_first_thread_sibling(cpu);
>> - unsigned long *state = _ptrs[first]->idle_state;
>> + unsigned long *lock = _ptrs[first]->idle_lock;
>> 
>> - BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
>> - clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
>> + BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, lock));
>> + clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
>> }
>> 
>> /* P7 and P8 */
>> @@ -380,9 +382,9 @@ static unsigned long power7_idle_insn(unsigned long type)
>> sprs.uamor = mfspr(SPRN_UAMOR);
>> }
>> 
>> - local_paca->thread_idle_state = type;
>> + WRITE_ONCE(local_paca->thread_idle_state, type);
>> srr1 = isa206_idle_insn_mayloss(type); /* go idle */
>> - local_paca->thread_idle_state = PNV_THREAD_RUNNING;
>> + WRITE_ONCE(local_paca->thread_idle_state, PNV_THREAD_RUNNING);
> 
> Where is the thread_idle_state concurrency coming from?

Yeah, I agree, WRITE_ONCE isn’t necessary here, as all reads of this variable
by xmon are purely diagnostic (data races permitted), and the 
isa206_idle_insn_mayloss() call is a compiler barrier. So write instructions
will be emitted on each side of the call.

> 
> Thanks,
> Nick




Re: [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue

2023-05-08 Thread Rohan McLure
> On 9 May 2023, at 12:04 pm, Nicholas Piggin  wrote:
> 
> On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
>> Use a compiler barrier to enforce that all fields of a new struct qnode
>> be written to (especially the lock value) before publishing the qnode to
>> the waitqueue.
> 
> publish_tail_cpu is the release barrier for this and includes the memory
> clobber there. Can we annotate that instead?

Got it, I see that one now.

On another note though, it looks like the memory clobber doesn’t serve
to squash KCSAN warnings here.

==
BUG: KCSAN: data-race in queued_spin_lock_slowpath / queued_spin_lock_slowpath

write (marked) to 0xc00ff3790b20 of 1 bytes by task 1045 on cpu 64:

write (reordered) to 0xc00ff3790b20 of 1 bytes by task 1063 on cpu 31:
  |
  +-> reordered to: queued_spin_lock_slowpath+0xcec/0x1b70

Reported by Kernel Concurrency Sanitizer on:
==

The one byte memory access has to be ‘locked’ in this instance. KCSAN
takes issue with how the assignment of locked is unmarked in this
instance, even while during the epoch at which this assignment can occur,
the node is inaccessible. Looks like I’ll have to issue a data_race, even
while there is no capacity for a real data race.

> 
> Thanks,
> Nick
> 
>> 
>> Signed-off-by: Rohan McLure 
>> ---
>> arch/powerpc/lib/qspinlock.c | 4 
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
>> index 579290d55abf..d548001a86be 100644
>> --- a/arch/powerpc/lib/qspinlock.c
>> +++ b/arch/powerpc/lib/qspinlock.c
>> @@ -567,6 +567,10 @@ static __always_inline void 
>> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> node->cpu = smp_processor_id();
>> node->yield_cpu = -1;
>> node->locked = 1;
>> + /*
>> + * Assign all attributes of a node before it can be published.
>> + */
>> + barrier();
>> 
>> tail = encode_tail_cpu(node->cpu);
>> 
>> -- 
>> 2.37.2
> 



Re: [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation

2023-05-08 Thread Rohan McLure
> On 9 May 2023, at 12:01 pm, Nicholas Piggin  wrote:
> 
> On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
>> A comment accompanying the locked attribute of a qnode assigns a value
>> of 1 to mean that the lock has been acquired. The usages of this
>> variable however assume opposite semantics. Update usages so that the
>> assertions of this comment are reflected in this file.
> 
> 1 actually means if the lock is acquired for this waiter. The
> previous owner sets it to 1 which means we now have the lock.
> It's slightly confusing but that's how generic qspinlock calls
> it too.
> 
> It actually doesn't even really mean we have acquired the lock
> though, it means we got through the MCS queue. "Waiting" or
> "released" or something like that might be a better name.

This makes more sense. Seemed pretty unlikely to me that swapped
polarity have gone unnoticed, so glad to have that cleared up.

> 
> Could change the name or comment to make that a bit clearer, but
> while it'the same as kernel/locking/qspinlock.c then better
> keep polarity the same.

Yeah since ‘locked’ is an mcs intrinsic I think I’d rather keep
the name from kernel/locking/qspinlock.c.

> 
> Thanks,
> Nick
> 
>> 
>> Signed-off-by: Rohan McLure 
>> ---
>> arch/powerpc/lib/qspinlock.c | 10 +-
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
>> index e4bd145255d0..9cf93963772b 100644
>> --- a/arch/powerpc/lib/qspinlock.c
>> +++ b/arch/powerpc/lib/qspinlock.c
>> @@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct 
>> qspinlock *lock, struct qnode *
>> 
>> smp_rmb(); /* See __yield_to_locked_owner comment */
>> 
>> - if (!node->locked) {
>> + if (node->locked) {
>> yield_to_preempted(prev_cpu, yield_count);
>> spin_begin();
>> return preempted;
>> @@ -566,7 +566,7 @@ static __always_inline void 
>> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> node->lock = lock;
>> node->cpu = smp_processor_id();
>> node->yield_cpu = -1;
>> - node->locked = 0;
>> + node->locked = 1;
>> 
>> tail = encode_tail_cpu(node->cpu);
>> 
>> @@ -584,7 +584,7 @@ static __always_inline void 
>> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> 
>> /* Wait for mcs node lock to be released */
>> spin_begin();
>> - while (!node->locked) {
>> + while (node->locked) {
>> spec_barrier();
>> 
>> if (yield_to_prev(lock, node, old, paravirt))
>> @@ -693,13 +693,13 @@ static __always_inline void 
>> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> */
>> if (paravirt && pv_prod_head) {
>> int next_cpu = next->cpu;
>> - WRITE_ONCE(next->locked, 1);
>> + WRITE_ONCE(next->locked, 0);
>> if (_Q_SPIN_MISO)
>> asm volatile("miso" ::: "memory");
>> if (vcpu_is_preempted(next_cpu))
>> prod_cpu(next_cpu);
>> } else {
>> - WRITE_ONCE(next->locked, 1);
>> + WRITE_ONCE(next->locked, 0);
>> if (_Q_SPIN_MISO)
>> asm volatile("miso" ::: "memory");
>> }
>> -- 
>> 2.37.2
> 



[PATCH 10/12] powerpc: powernv: Annotate data races in opal events

2023-05-07 Thread Rohan McLure
The kopald thread handles opal events as they appear, but by polling a
static bit-vector in last_outstanding_events. Annotate these data races
accordingly. We are not at risk of missing events, but use of READ_ONCE,
WRITE_ONCE will assist readers in seeing that kopald only consumes the
events it is aware of when it is scheduled. Also removes extraneous
KCSAN warnings.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/platforms/powernv/opal-irqchip.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c 
b/arch/powerpc/platforms/powernv/opal-irqchip.c
index d55652b5f6fa..f9a7001dacb7 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -59,7 +59,7 @@ void opal_handle_events(void)
 
cond_resched();
}
-   last_outstanding_events = 0;
+   WRITE_ONCE(last_outstanding_events, 0);
if (opal_poll_events() != OPAL_SUCCESS)
return;
e = be64_to_cpu(events) & opal_event_irqchip.mask;
@@ -69,7 +69,7 @@ void opal_handle_events(void)
 
 bool opal_have_pending_events(void)
 {
-   if (last_outstanding_events & opal_event_irqchip.mask)
+   if (READ_ONCE(last_outstanding_events) & opal_event_irqchip.mask)
return true;
return false;
 }
@@ -124,7 +124,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
__be64 events;
 
opal_handle_interrupt(virq_to_hw(irq), );
-   last_outstanding_events = be64_to_cpu(events);
+   WRITE_ONCE(last_outstanding_events, be64_to_cpu(events));
if (opal_have_pending_events())
opal_wake_poller();
 
-- 
2.37.2



[PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks

2023-05-07 Thread Rohan McLure
The powerpc implemenation of qspinlocks will both poll and spin on the
bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
to KCSAN that polling is intentional here.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/lib/qspinlock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 9cf93963772b..579290d55abf 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock 
*lock, struct qnode *
 
smp_rmb(); /* See __yield_to_locked_owner comment */
 
-   if (node->locked) {
+   if (READ_ONCE(node->locked)) {
yield_to_preempted(prev_cpu, yield_count);
spin_begin();
return preempted;
@@ -584,7 +584,7 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
 
/* Wait for mcs node lock to be released */
spin_begin();
-   while (node->locked) {
+   while (READ_ONCE(node->locked)) {
spec_barrier();
 
if (yield_to_prev(lock, node, old, paravirt))
-- 
2.37.2



[PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle

2023-05-07 Thread Rohan McLure
The power_save callback can be overwritten by another core at boot time.
Specifically, null values will be replaced exactly once with the callback
suitable for the particular platform (PowerNV / pseries lpars). Mark
reads to this variable with READ_ONCE to signal to KCSAN that this race
is acceptable, as well as to rule-out the possibility for compiler reorderings
leading to calling a null pointer.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/idle.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index b1c0418b25c8..a1589bb97c98 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -43,10 +43,12 @@ __setup("powersave=off", powersave_off);
 
 void arch_cpu_idle(void)
 {
+   void (*power_save)(void) = READ_ONCE(ppc_md.power_save);
+
ppc64_runlatch_off();
 
-   if (ppc_md.power_save) {
-   ppc_md.power_save();
+   if (power_save) {
+   power_save();
/*
 * Some power_save functions return with
 * interrupts enabled, some don't.
-- 
2.37.2



[PATCH 11/12] powerpc: powernv: Annotate asynchronous access to opal tokens

2023-05-07 Thread Rohan McLure
The opal-async.c unit contains code for polling event sources, which
implies intentional data races. Ensure that the compiler will atomically
access such variables by means of {READ,WRITE}_ONCE calls, which in turn
inform KCSAN that polling behaviour is intended.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/platforms/powernv/opal-async.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-async.c 
b/arch/powerpc/platforms/powernv/opal-async.c
index c094fdf5825c..282d2ac6fbb0 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -146,7 +146,7 @@ int opal_async_wait_response(uint64_t token, struct 
opal_msg *msg)
 * functional.
 */
opal_wake_poller();
-   wait_event(opal_async_wait, opal_async_tokens[token].state
+   wait_event(opal_async_wait, READ_ONCE(opal_async_tokens[token].state)
== ASYNC_TOKEN_COMPLETED);
memcpy(msg, _async_tokens[token].response, sizeof(*msg));
 
@@ -185,7 +185,7 @@ int opal_async_wait_response_interruptible(uint64_t token, 
struct opal_msg *msg)
 * interruptible version before doing anything else with the
 * token.
 */
-   if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED) {
+   if (READ_ONCE(opal_async_tokens[token].state) == ASYNC_TOKEN_ALLOCATED) 
{
spin_lock_irqsave(_async_comp_lock, flags);
if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED)
opal_async_tokens[token].state = ASYNC_TOKEN_DISPATCHED;
@@ -199,7 +199,7 @@ int opal_async_wait_response_interruptible(uint64_t token, 
struct opal_msg *msg)
 */
opal_wake_poller();
ret = wait_event_interruptible(opal_async_wait,
-   opal_async_tokens[token].state ==
+   READ_ONCE(opal_async_tokens[token].state) ==
ASYNC_TOKEN_COMPLETED);
if (!ret)
memcpy(msg, _async_tokens[token].response, sizeof(*msg));
-- 
2.37.2



[PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid

2023-05-07 Thread Rohan McLure
Checks to see if the [H]SRR registers have been clobbered by (soft)
NMI interrupts imply the possibility for a data race on the
[h]srr_valid entries in the PACA. Annotate accesses to these fields with
READ_ONCE, removing the need for the barrier.

The diagnostic can use plain-access reads and writes, but annotate with
data_race.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
---
 arch/powerpc/include/asm/ptrace.h |  4 ++--
 arch/powerpc/kernel/interrupt.c   | 14 ++
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 0eb90a013346..9db8b16567e2 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -180,8 +180,8 @@ void do_syscall_trace_leave(struct pt_regs *regs);
 static inline void set_return_regs_changed(void)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
-   local_paca->hsrr_valid = 0;
-   local_paca->srr_valid = 0;
+   WRITE_ONCE(local_paca->hsrr_valid, 0);
+   WRITE_ONCE(local_paca->srr_valid, 0);
 #endif
 }
 
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index e34c72285b4e..1f033f11b871 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -125,7 +125,7 @@ static notrace void check_return_regs_valid(struct pt_regs 
*regs)
case 0x1600:
case 0x1800:
validp = _paca->hsrr_valid;
-   if (!*validp)
+   if (!READ_ONCE(*validp))
return;
 
srr0 = mfspr(SPRN_HSRR0);
@@ -135,7 +135,7 @@ static notrace void check_return_regs_valid(struct pt_regs 
*regs)
break;
default:
validp = _paca->srr_valid;
-   if (!*validp)
+   if (!READ_ONCE(*validp))
return;
 
srr0 = mfspr(SPRN_SRR0);
@@ -161,19 +161,17 @@ static notrace void check_return_regs_valid(struct 
pt_regs *regs)
 * such things will get caught most of the time, statistically
 * enough to be able to get a warning out.
 */
-   barrier();
-
-   if (!*validp)
+   if (!READ_ONCE(*validp))
return;
 
-   if (!warned) {
-   warned = true;
+   if (!data_race(warned)) {
+   data_race(warned = true);
printk("%sSRR0 was: %lx should be: %lx\n", h, srr0, regs->nip);
printk("%sSRR1 was: %lx should be: %lx\n", h, srr1, regs->msr);
show_regs(regs);
}
 
-   *validp = 0; /* fixup */
+   WRITE_ONCE(*validp, 0); /* fixup */
 #endif
 }
 
-- 
2.37.2



[PATCH 12/12] powerpc: Mark asynchronous accesses to irq_data

2023-05-07 Thread Rohan McLure
KCSAN revealed that while irq_data entries are written to either from
behind a mutex, or otherwise atomically, accesses to irq_data->hwirq can
occur asynchronously, without volatile annotation. Mark these accesses
with READ_ONCE to avoid unfortunate compiler reorderings and remove
KCSAN warnings.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/irq.c |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++--
 include/linux/irq.h   |  2 +-
 kernel/irq/irqdomain.c|  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6f7d4edaa0bc..4ac192755510 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -353,7 +353,7 @@ void do_softirq_own_stack(void)
 irq_hw_number_t virq_to_hw(unsigned int virq)
 {
struct irq_data *irq_data = irq_get_irq_data(virq);
-   return WARN_ON(!irq_data) ? 0 : irq_data->hwirq;
+   return WARN_ON(!irq_data) ? 0 : READ_ONCE(irq_data->hwirq);
 }
 EXPORT_SYMBOL_GPL(virq_to_hw);
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f851f4983423..141491e86bba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1986,7 +1986,7 @@ int64_t pnv_opal_pci_msi_eoi(struct irq_data *d)
struct pci_controller *hose = 
irq_data_get_irq_chip_data(d->parent_data);
struct pnv_phb *phb = hose->private_data;
 
-   return opal_pci_msi_eoi(phb->opal_id, d->parent_data->hwirq);
+   return opal_pci_msi_eoi(phb->opal_id, READ_ONCE(d->parent_data->hwirq));
 }
 
 /*
@@ -2162,11 +2162,11 @@ static void pnv_msi_compose_msg(struct irq_data *d, 
struct msi_msg *msg)
struct pnv_phb *phb = hose->private_data;
int rc;
 
-   rc = __pnv_pci_ioda_msi_setup(phb, pdev, d->hwirq,
+   rc = __pnv_pci_ioda_msi_setup(phb, pdev, READ_ONCE(d->hwirq),
  entry->pci.msi_attrib.is_64, msg);
if (rc)
dev_err(>dev, "Failed to setup %s-bit MSI #%ld : %d\n",
-   entry->pci.msi_attrib.is_64 ? "64" : "32", d->hwirq, 
rc);
+   entry->pci.msi_attrib.is_64 ? "64" : "32", 
data_race(d->hwirq), rc);
 }
 
 /*
@@ -2184,7 +2184,7 @@ static void pnv_msi_eoi(struct irq_data *d)
 * since it is translated into a vector number in
 * OPAL, use that directly.
 */
-   WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, d->hwirq));
+   WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, 
READ_ONCE(d->hwirq)));
}
 
irq_chip_eoi_parent(d);
@@ -2263,9 +2263,9 @@ static void pnv_irq_domain_free(struct irq_domain 
*domain, unsigned int virq,
struct pnv_phb *phb = hose->private_data;
 
pr_debug("%s bridge %pOF %d/%lx #%d\n", __func__, hose->dn,
-virq, d->hwirq, nr_irqs);
+virq, data_race(d->hwirq), nr_irqs);
 
-   msi_bitmap_free_hwirqs(>msi_bmp, d->hwirq, nr_irqs);
+   msi_bitmap_free_hwirqs(>msi_bmp, READ_ONCE(d->hwirq), nr_irqs);
/* XIVE domain is cleared through ->msi_free() */
 }
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index b1b28affb32a..a6888bcb3c5b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -452,7 +452,7 @@ static inline bool irqd_affinity_on_activate(struct 
irq_data *d)
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
 {
-   return d->hwirq;
+   return READ_ONCE(d->hwirq);
 }
 
 /**
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index f34760a1e222..dd9054494f84 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -549,7 +549,7 @@ static void irq_domain_disassociate(struct irq_domain 
*domain, unsigned int irq)
 "virq%i doesn't exist; cannot disassociate\n", irq))
return;
 
-   hwirq = irq_data->hwirq;
+   hwirq = READ_ONCE(irq_data->hwirq);
 
mutex_lock(>root->mutex);
 
@@ -948,7 +948,7 @@ struct irq_desc *__irq_resolve_mapping(struct irq_domain 
*domain,
if (irq_domain_is_nomap(domain)) {
if (hwirq < domain->hwirq_max) {
data = irq_domain_get_irq_data(domain, hwirq);
-   if (data && data->hwirq == hwirq)
+   if (data && READ_ONCE(data->hwirq) == hwirq)
desc = irq_data_to_desc(data);
if (irq && desc)
*irq = hwirq;
-- 
2.37.2



[PATCH 08/12] powerpc: Annotate accesses to ipi message flags

2023-05-07 Thread Rohan McLure
IPI message flags are observed and consequently consumed in the
smp_ipi_demux_relaxed function, which handles these message sources
until it observes none more arriving. Mark the checked loop guard with
READ_ONCE, to signal to KCSAN that the read is known to be volatile, and
that non-determinism is expected.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6b90f10a6c81..00b74d66b771 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -348,7 +348,7 @@ irqreturn_t smp_ipi_demux_relaxed(void)
if (all & IPI_MESSAGE(PPC_MSG_NMI_IPI))
nmi_ipi_action(0, NULL);
 #endif
-   } while (info->messages);
+   } while (READ_ONCE(info->messages));
 
return IRQ_HANDLED;
 }
-- 
2.37.2



[PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses

2023-05-07 Thread Rohan McLure
The KCSAN sanitiser notifies programmers of instances where unmarked
accesses to shared state has lead to a data race, or when the compiler
has liberty to reorder an unmarked access and so generate a data race.
This patch series deals with benign data races, which nonetheless need
annotation in order to ensure the correctness of the emitted code.

In keeping with the principles given in
tools/memory-model/Documentation/access-marking.txt, racing reads of
shared state for purely diagnostic/debug purposes are annotated with
data_race, while reads/writes that are examples of intention polling of
shared variables are performed with READ_ONCE, WRITE_ONCE.

These changes remove the majority of warnings observable on pseries and
powernv, where for development, I was able to narrow down to only power
relevant bugs by temporarily disabling sanitisation for all other files.
Future patch series will deal with the subtler bugs which persist under
this configuration.

KCSAN races addressed:
 - qspinlock: assignign of qnode->locked and polling
 - check_return_regs_valid [h]srr_valid
 - arch_cpu_idle idle callback
 - powernv idle_state paca entry (polling the bit-lock is viewed by
   KCSAN as asynchronous access to the fields it protects)
 - Asynchronous access to irq_data->hwirq
 - Opal asynchronous event handling
 - IPIs

Miscellaneous other changes:

 - Annotate the asm-generic/mmiowb code, which riscv and powerpc each
   consume
 - Update usages of qnode->locked in powerpc's qspinlock interpretation
   to reflect the comment beside this field

Rohan McLure (12):
  powerpc: qspinlock: Fix qnode->locked value interpretation
  powerpc: qspinlock: Mark accesses to qnode lock checks
  powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
  asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
  powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
  powerpc: Mark accesses to power_save callback in arch_cpu_idle
  powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
  powerpc: Annotate accesses to ipi message flags
  powerpc: Mark writes registering ipi to host cpu through kvm
  powerpc: powernv: Annotate data races in opal events
  powerpc: powernv: Annotate asynchronous access to opal tokens
  powerpc: Mark asynchronous accesses to irq_data

 arch/powerpc/include/asm/kvm_ppc.h|  4 ++--
 arch/powerpc/include/asm/paca.h   |  1 +
 arch/powerpc/include/asm/ptrace.h |  4 ++--
 arch/powerpc/kernel/idle.c|  6 --
 arch/powerpc/kernel/interrupt.c   | 14 ++---
 arch/powerpc/kernel/irq.c |  2 +-
 arch/powerpc/kernel/smp.c |  2 +-
 arch/powerpc/lib/qspinlock.c  | 14 -
 arch/powerpc/platforms/powernv/idle.c | 20 ++-
 arch/powerpc/platforms/powernv/opal-async.c   |  6 +++---
 arch/powerpc/platforms/powernv/opal-irqchip.c |  6 +++---
 arch/powerpc/platforms/powernv/pci-ioda.c | 12 +--
 include/asm-generic/mmiowb.h  | 17 ++--
 include/linux/irq.h   |  2 +-
 kernel/irq/irqdomain.c|  4 ++--
 15 files changed, 63 insertions(+), 51 deletions(-)

-- 
2.37.2



[PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm

2023-05-07 Thread Rohan McLure
Mark writes to hypervisor ipi state so that KCSAN recognises these
asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with
atomic writes.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index bc57d058ad5b..d701df006c08 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -548,12 +548,12 @@ static inline void kvmppc_set_host_ipi(int cpu)
 * pairs with the barrier in kvmppc_clear_host_ipi()
 */
smp_mb();
-   paca_ptrs[cpu]->kvm_hstate.host_ipi = 1;
+   WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 1);
 }
 
 static inline void kvmppc_clear_host_ipi(int cpu)
 {
-   paca_ptrs[cpu]->kvm_hstate.host_ipi = 0;
+   WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 0);
/*
 * order clearing of host_ipi flag vs. processing of IPI messages
 *
-- 
2.37.2



[PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention

2023-05-07 Thread Rohan McLure
The idle_state entry in the PACA on PowerNV features a bit which is
atomically tested and set through ldarx/stdcx. to be used as a spinlock.
This lock then guards access to other bit fields of idle_state. KCSAN
cannot differentiate between any of these bitfield accesses as they all
are implemented by 8-byte store/load instructions, thus cores contending
on the bit-lock appear to data race with modifications to idle_state.

Separate the bit-lock entry from the data guarded by the lock to avoid
the possibility of data races being detected by KCSAN.

Suggested-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/paca.h   |  1 +
 arch/powerpc/platforms/powernv/idle.c | 20 +++-
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index da0377f46597..cb325938766a 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -191,6 +191,7 @@ struct paca_struct {
 #ifdef CONFIG_PPC_POWERNV
/* PowerNV idle fields */
/* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
+   unsigned long idle_lock; /* A value of 1 means acquired */
unsigned long idle_state;
union {
/* P7/P8 specific fields */
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 841cb7f31f4f..97dbb7bc2b00 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -246,9 +246,9 @@ static inline void atomic_lock_thread_idle(void)
 {
int cpu = raw_smp_processor_id();
int first = cpu_first_thread_sibling(cpu);
-   unsigned long *state = _ptrs[first]->idle_state;
+   unsigned long *lock = _ptrs[first]->idle_lock;
 
-   while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, 
state)))
+   while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, lock)))
barrier();
 }
 
@@ -258,29 +258,31 @@ static inline void 
atomic_unlock_and_stop_thread_idle(void)
int first = cpu_first_thread_sibling(cpu);
unsigned long thread = 1UL << cpu_thread_in_core(cpu);
unsigned long *state = _ptrs[first]->idle_state;
+   unsigned long *lock = _ptrs[first]->idle_lock;
u64 s = READ_ONCE(*state);
u64 new, tmp;
 
-   BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
+   BUG_ON(!(READ_ONCE(*lock) & PNV_CORE_IDLE_LOCK_BIT));
BUG_ON(s & thread);
 
 again:
-   new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
+   new = s | thread;
tmp = cmpxchg(state, s, new);
if (unlikely(tmp != s)) {
s = tmp;
goto again;
}
+   clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
 }
 
 static inline void atomic_unlock_thread_idle(void)
 {
int cpu = raw_smp_processor_id();
int first = cpu_first_thread_sibling(cpu);
-   unsigned long *state = _ptrs[first]->idle_state;
+   unsigned long *lock = _ptrs[first]->idle_lock;
 
-   BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
-   clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
+   BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, lock));
+   clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
 }
 
 /* P7 and P8 */
@@ -380,9 +382,9 @@ static unsigned long power7_idle_insn(unsigned long type)
sprs.uamor  = mfspr(SPRN_UAMOR);
}
 
-   local_paca->thread_idle_state = type;
+   WRITE_ONCE(local_paca->thread_idle_state, type);
srr1 = isa206_idle_insn_mayloss(type);  /* go idle */
-   local_paca->thread_idle_state = PNV_THREAD_RUNNING;
+   WRITE_ONCE(local_paca->thread_idle_state, PNV_THREAD_RUNNING);
 
WARN_ON_ONCE(!srr1);
WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
-- 
2.37.2



[PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue

2023-05-07 Thread Rohan McLure
Use a compiler barrier to enforce that all fields of a new struct qnode
be written to (especially the lock value) before publishing the qnode to
the waitqueue.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/lib/qspinlock.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 579290d55abf..d548001a86be 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -567,6 +567,10 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
node->cpu = smp_processor_id();
node->yield_cpu = -1;
node->locked = 1;
+   /*
+* Assign all attributes of a node before it can be published.
+*/
+   barrier();
 
tail = encode_tail_cpu(node->cpu);
 
-- 
2.37.2



[PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings

2023-05-07 Thread Rohan McLure
Prior to this patch, data races are detectable by KCSAN of the following
forms:

[1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
or otherwise outside of a critical section
[2] Interrupted critical sections, where the interrupt will itself
acquire a lock

In case [1], calling context does not need an mmiowb() call to be
issued, otherwise it would do so itself. Such calls to
mmiowb_set_pending() are either idempotent or no-ops.

In case [2], irrespective of when the interrupt occurs, the interrupt
will acquire and release its locks prior to its return, nesting_count
will continue balanced. In the worst case, the interrupted critical
section during a mmiowb_spin_unlock() call observes an mmiowb to be
pending and afterward is interrupted, leading to an extraneous call to
mmiowb(). This data race is clearly innocuous.

Mark all potentially asynchronous memory accesses with READ_ONCE or
WRITE_ONCE, including increments and decrements to nesting_count. This
has the effect of removing KCSAN warnings at consumer's callsites.

Signed-off-by: Rohan McLure 
Reported-by: Michael Ellerman 
Reported-by: Gautam Menghani 
---
 include/asm-generic/mmiowb.h | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..0b8b794150db 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -35,27 +35,32 @@ DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
 static inline void mmiowb_set_pending(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
+   u16 nesting_count = READ_ONCE(ms->nesting_count);
 
-   if (likely(ms->nesting_count))
-   ms->mmiowb_pending = ms->nesting_count;
+   if (likely(nesting_count))
+   WRITE_ONCE(ms->mmiowb_pending, nesting_count);
 }
 
 static inline void mmiowb_spin_lock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
-   ms->nesting_count++;
+
+   /* Increment need not be atomic. Nestedness is balanced over 
interrupts. */
+   WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) + 1);
 }
 
 static inline void mmiowb_spin_unlock(void)
 {
struct mmiowb_state *ms = __mmiowb_state();
+   u16 pending = READ_ONCE(ms->mmiowb_pending);
 
-   if (unlikely(ms->mmiowb_pending)) {
-   ms->mmiowb_pending = 0;
+   WRITE_ONCE(ms->mmiowb_pending, 0);
+   if (unlikely(pending)) {
mmiowb();
}
 
-   ms->nesting_count--;
+   /* Decrement need not be atomic. Nestedness is balanced over 
interrupts. */
+   WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) - 1);
 }
 #else
 #define mmiowb_set_pending()   do { } while (0)
-- 
2.37.2



[PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation

2023-05-07 Thread Rohan McLure
A comment accompanying the locked attribute of a qnode assigns a value
of 1 to mean that the lock has been acquired. The usages of this
variable however assume opposite semantics. Update usages so that the
assertions of this comment are reflected in this file.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/lib/qspinlock.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index e4bd145255d0..9cf93963772b 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock 
*lock, struct qnode *
 
smp_rmb(); /* See __yield_to_locked_owner comment */
 
-   if (!node->locked) {
+   if (node->locked) {
yield_to_preempted(prev_cpu, yield_count);
spin_begin();
return preempted;
@@ -566,7 +566,7 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
node->lock = lock;
node->cpu = smp_processor_id();
node->yield_cpu = -1;
-   node->locked = 0;
+   node->locked = 1;
 
tail = encode_tail_cpu(node->cpu);
 
@@ -584,7 +584,7 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
 
/* Wait for mcs node lock to be released */
spin_begin();
-   while (!node->locked) {
+   while (node->locked) {
spec_barrier();
 
if (yield_to_prev(lock, node, old, paravirt))
@@ -693,13 +693,13 @@ static __always_inline void 
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
 */
if (paravirt && pv_prod_head) {
int next_cpu = next->cpu;
-   WRITE_ONCE(next->locked, 1);
+   WRITE_ONCE(next->locked, 0);
if (_Q_SPIN_MISO)
asm volatile("miso" ::: "memory");
if (vcpu_is_preempted(next_cpu))
prod_cpu(next_cpu);
} else {
-   WRITE_ONCE(next->locked, 1);
+   WRITE_ONCE(next->locked, 0);
if (_Q_SPIN_MISO)
asm volatile("miso" ::: "memory");
}
-- 
2.37.2



Re: [PATCH v8 0/7] Support page table check

2023-03-29 Thread Rohan McLure
Anyone got time to review this one?

> On 16 Feb 2023, at 10:11 am, Rohan McLure  wrote:
> 
> Support the page table check sanitiser on all PowerPC platforms. This
> sanitiser works by serialising assignments, reassignments and clears of
> page table entries at each level in order to ensure that anonymous
> mappings have at most one writable consumer, and likewise that
> file-backed mappings are not simultaneously also anonymous mappings.
> 
> In order to support this infrastructure, a number of stubs must be
> defined for all powerpc platforms. Additionally, seperate set_pte_at
> and set_pte, to allow for internal, uninstrumented mappings.
> 
> v8:
> * Fix linux/page_table_check.h include in asm/pgtable.h breaking
>   32-bit.
> 
> v7:
> * Remove use of extern in set_pte prototypes
> * Clean up pmdp_collapse_flush macro
> * Replace set_pte_at with static inline function
> * Fix commit message for patch 7
> Link: 
> https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/
> 
> v6:
> * Support huge pages and p{m,u}d accounting.
> * Remove instrumentation from set_pte from kernel internal pages.
> * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
>   as access to the mm_struct * is required.
> Link: 
> https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/
> 
> v5:
> Link: 
> https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/
> 
> Rohan McLure (7):
>  powerpc: mm: Separate set_pte, set_pte_at for internal, external use
>  powerpc/64s: mm: Introduce __pmdp_collapse_flush with mm_struct
>argument
>  powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
>  powerpc: mm: Implement p{m,u,4}d_leaf on all platforms
>  powerpc: mm: Add common pud_pfn stub for all platforms
>  powerpc: mm: Add p{te,md,ud}_user_accessible_page helpers
>  powerpc: mm: Support page table check
> 
> arch/powerpc/Kconfig |  1 +
> arch/powerpc/include/asm/book3s/32/pgtable.h | 17 +++-
> arch/powerpc/include/asm/book3s/64/pgtable.h | 85 +---
> arch/powerpc/include/asm/book3s/pgtable.h|  3 +-
> arch/powerpc/include/asm/nohash/32/pgtable.h | 12 ++-
> arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +-
> arch/powerpc/include/asm/nohash/pgtable.h|  9 ++-
> arch/powerpc/include/asm/pgtable.h   | 60 +-
> arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 +--
> arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
> arch/powerpc/mm/book3s64/pgtable.c   | 16 ++--
> arch/powerpc/mm/book3s64/radix_pgtable.c | 24 +++---
> arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
> arch/powerpc/mm/pgtable.c|  9 +--
> arch/powerpc/mm/pgtable_32.c |  2 +-
> arch/powerpc/mm/pgtable_64.c |  6 +-
> arch/powerpc/xmon/xmon.c |  6 +-
> 17 files changed, 197 insertions(+), 93 deletions(-)
> 
> -- 
> 2.37.2
> 



Re: [PATCH 1/2] kcsan: xtensa: Add atomic builtin stubs for 32-bit systems

2023-02-16 Thread Rohan McLure
> On 16 Feb 2023, at 7:09 pm, Marco Elver  wrote:
> 
> On Thu, Feb 16, 2023 at 07:12AM +, Christophe Leroy wrote:
>> 
>> 
>> Le 16/02/2023 à 06:09, Rohan McLure a écrit :
>>> KCSAN instruments calls to atomic builtins, and will in turn call these
>>> builtins itself. As such, architectures supporting KCSAN must have
>>> compiler support for these atomic primitives.
>>> 
>>> Since 32-bit systems are unlikely to have 64-bit compiler builtins,
>>> provide a stub for each missing builtin, and use BUG() to assert
>>> unreachability.
>>> 
>>> In commit 725aea873261 ("xtensa: enable KCSAN"), xtensa implements these
>>> locally. Move these definitions to be accessible to all 32-bit
>>> architectures that do not provide the necessary builtins, with opt in
>>> for PowerPC and xtensa.
>>> 
>>> Signed-off-by: Rohan McLure 
>>> Reviewed-by: Max Filippov 
>> 
>> This series should also be addressed to KCSAN Maintainers, shouldn't it ?
>> 
>> KCSAN
>> M: Marco Elver 
>> R: Dmitry Vyukov 
>> L: kasan-...@googlegroups.com
>> S: Maintained
>> F: Documentation/dev-tools/kcsan.rst
>> F: include/linux/kcsan*.h
>> F: kernel/kcsan/
>> F: lib/Kconfig.kcsan
>> F: scripts/Makefile.kcsan
>> 
>> 
>>> ---
>>> Previously issued as a part of a patch series adding KCSAN support to
>>> 64-bit.
>>> Link: 
>>> https://lore.kernel.org/linuxppc-dev/167646486000.1421441.10070059569986228558.b4...@ellerman.id.au/T/#t
>>> v1: Remove __has_builtin check, as gcc is not obligated to inline
>>> builtins detected using this check, but instead is permitted to supply
>>> them in libatomic:
>>> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108734
>>> Instead, opt-in PPC32 and xtensa.
>>> ---
>>>  arch/xtensa/lib/Makefile  | 1 -
>>>  kernel/kcsan/Makefile | 2 ++
>>>  arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c | 0
>>>  3 files changed, 2 insertions(+), 1 deletion(-)
>>>  rename arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c (100%)
>>> 
>>> diff --git a/arch/xtensa/lib/Makefile b/arch/xtensa/lib/Makefile
>>> index 7ecef0519a27..d69356dc97df 100644
>>> --- a/arch/xtensa/lib/Makefile
>>> +++ b/arch/xtensa/lib/Makefile
>>> @@ -8,5 +8,4 @@ lib-y += memcopy.o memset.o checksum.o \
>>>  divsi3.o udivsi3.o modsi3.o umodsi3.o mulsi3.o umulsidi3.o \
>>>  usercopy.o strncpy_user.o strnlen_user.o
>>>  lib-$(CONFIG_PCI) += pci-auto.o
>>> -lib-$(CONFIG_KCSAN) += kcsan-stubs.o
>>>  KCSAN_SANITIZE_kcsan-stubs.o := n
>>> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
>>> index 8cf70f068d92..86dd713d8855 100644
>>> --- a/kernel/kcsan/Makefile
>>> +++ b/kernel/kcsan/Makefile
>>> @@ -12,6 +12,8 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
>>>   -fno-stack-protector -DDISABLE_BRANCH_PROFILING
>>> 
>>>  obj-y := core.o debugfs.o report.o
>>> +obj-$(CONFIG_PPC32) += stubs.o
>>> +obj-$(CONFIG_XTENSA) += stubs.o
>> 
>> Not sure it is acceptable to do it that way.
>> 
>> There should likely be something like a CONFIG_ARCH_WANTS_KCSAN_STUBS in 
>> KCSAN's Kconfig then PPC32 and XTENSA should select it.
> 
> The longer I think about it, since these stubs all BUG() anyway, perhaps
> we ought to just avoid them altogether. If you delete all the stubs from
> ppc and xtensa, but do this:
> 
> | diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> | index 54d077e1a2dc..8169d6dadd0e 100644
> | --- a/kernel/kcsan/core.c
> | +++ b/kernel/kcsan/core.c
> | @@ -1261,7 +1261,9 @@ static __always_inline void 
> kcsan_atomic_builtin_memorder(int memorder)
> |  DEFINE_TSAN_ATOMIC_OPS(8);
> |  DEFINE_TSAN_ATOMIC_OPS(16);
> |  DEFINE_TSAN_ATOMIC_OPS(32);
> | +#ifdef CONFIG_64BIT
> |  DEFINE_TSAN_ATOMIC_OPS(64);
> | +#endif
> |  
> |  void __tsan_atomic_thread_fence(int memorder);
> |  void __tsan_atomic_thread_fence(int memorder)
> 
> Does that work?

This makes much more sense. Rather than assume that kcsan is the only
consumer of __atomic_*_8, and stubbing accordingly, we should just
remove its mention from relevant sub-archs.




Re: [PATCH 2/2] powerpc/{32,book3e}: kcsan: Extend KCSAN Support

2023-02-16 Thread Rohan McLure
> On 16 Feb 2023, at 6:14 pm, Christophe Leroy  
> wrote:
> 
> 
> 
> Le 16/02/2023 à 06:09, Rohan McLure a écrit :
>> Enable HAVE_ARCH_KCSAN on all powerpc platforms, permitting use of the
>> kernel concurrency sanitiser through the CONFIG_KCSAN_* kconfig options.
>> 
>> Boots and passes selftests on 32-bit and 64-bit platforms. See
>> documentation in Documentation/dev-tools/kcsan.rst for more information.
>> 
>> Signed-off-by: Rohan McLure 
>> ---
>> New patch
>> ---
>>  arch/powerpc/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 2c9cdf1d8761..45771448d47a 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -197,7 +197,7 @@ config PPC
>>   select HAVE_ARCH_KASAN if PPC_RADIX_MMU
>>   select HAVE_ARCH_KASAN if PPC_BOOK3E_64
>>   select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
>> - select HAVE_ARCH_KCSAN if PPC_BOOK3S_64
>> + select HAVE_ARCH_KCSAN
> 
> So that's a followup of a not yet posted version v5 of the other series ?
> Why not just add patch 1 in that series and have KCSAN for all powerpc 
> at once ?

So the v3 was accepted upstream, likely to appear in 6.3. This patch series is
just to extend support to other platforms, once kcsan supports us.

Link: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230206021801.105268-1-rmcl...@linux.ibm.com/

> 
>>   select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
>>   select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>>   select HAVE_ARCH_KGDB




[PATCH 1/2] kcsan: xtensa: Add atomic builtin stubs for 32-bit systems

2023-02-15 Thread Rohan McLure
KCSAN instruments calls to atomic builtins, and will in turn call these
builtins itself. As such, architectures supporting KCSAN must have
compiler support for these atomic primitives.

Since 32-bit systems are unlikely to have 64-bit compiler builtins,
provide a stub for each missing builtin, and use BUG() to assert
unreachability.

In commit 725aea873261 ("xtensa: enable KCSAN"), xtensa implements these
locally. Move these definitions to be accessible to all 32-bit
architectures that do not provide the necessary builtins, with opt in
for PowerPC and xtensa.

Signed-off-by: Rohan McLure 
Reviewed-by: Max Filippov 
---
Previously issued as a part of a patch series adding KCSAN support to
64-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/167646486000.1421441.10070059569986228558.b4...@ellerman.id.au/T/#t
v1: Remove __has_builtin check, as gcc is not obligated to inline
builtins detected using this check, but instead is permitted to supply
them in libatomic:
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108734
Instead, opt-in PPC32 and xtensa.
---
 arch/xtensa/lib/Makefile  | 1 -
 kernel/kcsan/Makefile | 2 ++
 arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c | 0
 3 files changed, 2 insertions(+), 1 deletion(-)
 rename arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c (100%)

diff --git a/arch/xtensa/lib/Makefile b/arch/xtensa/lib/Makefile
index 7ecef0519a27..d69356dc97df 100644
--- a/arch/xtensa/lib/Makefile
+++ b/arch/xtensa/lib/Makefile
@@ -8,5 +8,4 @@ lib-y   += memcopy.o memset.o checksum.o \
   divsi3.o udivsi3.o modsi3.o umodsi3.o mulsi3.o umulsidi3.o \
   usercopy.o strncpy_user.o strnlen_user.o
 lib-$(CONFIG_PCI) += pci-auto.o
-lib-$(CONFIG_KCSAN) += kcsan-stubs.o
 KCSAN_SANITIZE_kcsan-stubs.o := n
diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
index 8cf70f068d92..86dd713d8855 100644
--- a/kernel/kcsan/Makefile
+++ b/kernel/kcsan/Makefile
@@ -12,6 +12,8 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
-fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 obj-y := core.o debugfs.o report.o
+obj-$(CONFIG_PPC32) += stubs.o
+obj-$(CONFIG_XTENSA) += stubs.o
 
 KCSAN_INSTRUMENT_BARRIERS_selftest.o := y
 obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
diff --git a/arch/xtensa/lib/kcsan-stubs.c b/kernel/kcsan/stubs.c
similarity index 100%
rename from arch/xtensa/lib/kcsan-stubs.c
rename to kernel/kcsan/stubs.c
-- 
2.37.2



[PATCH 2/2] powerpc/{32,book3e}: kcsan: Extend KCSAN Support

2023-02-15 Thread Rohan McLure
Enable HAVE_ARCH_KCSAN on all powerpc platforms, permitting use of the
kernel concurrency sanitiser through the CONFIG_KCSAN_* kconfig options.

Boots and passes selftests on 32-bit and 64-bit platforms. See
documentation in Documentation/dev-tools/kcsan.rst for more information.

Signed-off-by: Rohan McLure 
---
New patch
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2c9cdf1d8761..45771448d47a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -197,7 +197,7 @@ config PPC
select HAVE_ARCH_KASAN  if PPC_RADIX_MMU
select HAVE_ARCH_KASAN  if PPC_BOOK3E_64
select HAVE_ARCH_KASAN_VMALLOC  if HAVE_ARCH_KASAN
-   select HAVE_ARCH_KCSAN  if PPC_BOOK3S_64
+   select HAVE_ARCH_KCSAN
select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_KGDB
-- 
2.37.2



[PATCH v8 7/7] powerpc: mm: Support page table check

2023-02-15 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

Use set_pte internally, and cause this function to reassign a page table
entry without instrumentation. Generic code will be instrumented, as it
references set_pte_at.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
V2: Update spacing and types assigned to pte_update calls.
V3: Update one last pte_update call to remove __pte invocation.
V5: Fix 32-bit nohash double set
V6: Omit __set_pte_at instrumentation - should be instrumented by
set_pte_at, with set_pte in between, performing all prior checks.
Instrument pmds. Use set_pte where needed.
V7: Make set_pte_at an inline function. Fix commit message.
Detail changes of internal references to set_pte_at, and its semantics.
V8: Move linux/page_table_check.h import to be below
{nohash,book3s}/pgtable.h includes.
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  8 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 44 
 arch/powerpc/include/asm/nohash/32/pgtable.h |  7 +++-
 arch/powerpc/include/asm/nohash/64/pgtable.h |  8 +++-
 arch/powerpc/include/asm/pgtable.h   | 10 -
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 16 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 10 ++---
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable_32.c |  2 +-
 11 files changed, 83 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2c9cdf1d8761..2474e2699037 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -154,6 +154,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index afd672e84791..8850b4fb22a4 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -53,6 +53,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 static inline bool pte_user(pte_t pte)
 {
return pte_val(pte) & _PAGE_USER;
@@ -338,7 +340,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a6ed93d01da1..0c6838875720 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -162,6 +162,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -431,8 +433,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -441,11 +446,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm

[PATCH v8 6/7] powerpc: mm: Add p{te,md,ud}_user_accessible_page helpers

2023-02-15 Thread Rohan McLure
Add the following helpers for detecting whether a page table entry
is a leaf and is accessible to user space.

 * pte_user_accessible_page
 * pmd_user_accessible_page
 * pud_user_accessible_page

Also implement missing pud_user definitions for both Book3S/nohash 64-bit
systems, and pmd_user for Book3S/nohash 32-bit systems.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
V2: Provide missing pud_user implementations, use p{u,m}d_is_leaf.
V3: Provide missing pmd_user implementations as stubs in 32-bit.
V4: Use pmd_leaf, pud_leaf, and define pmd_user for 32 Book3E with
static inline method rather than macro.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  4 
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h |  5 +
 arch/powerpc/include/asm/nohash/64/pgtable.h | 10 ++
 arch/powerpc/include/asm/pgtable.h   | 15 +++
 5 files changed, 44 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index a090cb13a4a0..afd672e84791 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -516,6 +516,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return 0;
+}
 
 
 /* This low level function performs the actual PTE insertion
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index df5ee856444d..a6ed93d01da1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,16 @@ static inline bool pte_user(pte_t pte)
return !(pte_raw(pte) & cpu_to_be64(_PAGE_PRIVILEGED));
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return !(pmd_raw(pmd) & cpu_to_be64(_PAGE_PRIVILEGED));
+}
+
+static inline bool pud_user(pud_t pud)
+{
+   return !(pud_raw(pud) & cpu_to_be64(_PAGE_PRIVILEGED));
+}
+
 #define pte_access_permitted pte_access_permitted
 static inline bool pte_access_permitted(pte_t pte, bool write)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 70edad44dff6..d953533c56ff 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -209,6 +209,11 @@ static inline void pmd_clear(pmd_t *pmdp)
*pmdp = __pmd(0);
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return false;
+}
+
 /*
  * PTE updates. This function is called whenever an existing
  * valid PTE is updated. This does -not- include set_pte_at()
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index d391a45e0f11..14e69ebad31f 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -123,6 +123,11 @@ static inline pte_t pmd_pte(pmd_t pmd)
return __pte(pmd_val(pmd));
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return (pmd_val(pmd) & _PAGE_USER) == _PAGE_USER;
+}
+
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(!is_kernel_addr(pmd_val(pmd)) \
 || (pmd_val(pmd) & PMD_BAD_BITS))
@@ -164,6 +169,11 @@ static inline pte_t pud_pte(pud_t pud)
return __pte(pud_val(pud));
 }
 
+static inline bool pud_user(pud_t pud)
+{
+   return (pud_val(pud) & _PAGE_USER) == _PAGE_USER;
+}
+
 static inline pud_t pte_pud(pte_t pte)
 {
return __pud(pte_val(pte));
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index ad0829f816e9..b76fdb80b6c9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -167,6 +167,21 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
+static inline bool pmd_user_accessible_page(pmd_t pmd)
+{
+   return pmd_leaf(pmd) && pmd_present(pmd) && pmd_user(pmd);
+}
+
+static inline bool pud_user_accessible_page(pud_t pud)
+{
+   return pud_leaf(pud) && pud_present(pud) && pud_user(pud);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.37.2



[PATCH v8 5/7] powerpc: mm: Add common pud_pfn stub for all platforms

2023-02-15 Thread Rohan McLure
Prior to this commit, pud_pfn was implemented with BUILD_BUG as the inline
function for 64-bit Book3S systems but is never included, as its
invocations in generic code are guarded by calls to pud_devmap which return
zero on such systems. A future patch will provide support for page table
checks, the generic code for which depends on a pud_pfn stub being
implemented, even while the patch will not interact with puds directly.

Remove the 64-bit Book3S stub and define pud_pfn to warn on all
platforms. pud_pfn may be defined properly on a per-platform basis
should it grow real usages in future.

Signed-off-by: Rohan McLure 
---
V2: Remove conditional BUILD_BUG and BUG. Instead warn on usage.
V3: Replace WARN with WARN_ONCE, which should suffice to demonstrate
misuse of puds.
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 --
 arch/powerpc/include/asm/pgtable.h   | 14 ++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 589d2dbe3873..df5ee856444d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1327,16 +1327,6 @@ static inline int pgd_devmap(pgd_t pgd)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-static inline int pud_pfn(pud_t pud)
-{
-   /*
-* Currently all calls to pud_pfn() are gated around a pud_devmap()
-* check so this should never be used. If it grows another user we
-* want to know about it.
-*/
-   BUILD_BUG();
-   return 0;
-}
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
 pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
 void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 284408829fa3..ad0829f816e9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -153,6 +153,20 @@ struct seq_file;
 void arch_report_meminfo(struct seq_file *m);
 #endif /* CONFIG_PPC64 */
 
+/*
+ * Currently only consumed by page_table_check_pud_{set,clear}. Since clears
+ * and sets to page table entries at any level are done through
+ * page_table_check_pte_{set,clear}, provide stub implementation.
+ */
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return 0;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.37.2



[PATCH v8 4/7] powerpc: mm: Implement p{m,u,4}d_leaf on all platforms

2023-02-15 Thread Rohan McLure
The check that a higher-level entry in multi-level pages contains a page
translation entry (pte) is performed by p{m,u,4}d_leaf stubs, which may
be specialised for each choice of mmu. In a prior commit, we replace
uses to the catch-all stubs, p{m,u,4}d_is_leaf with p{m,u,4}d_leaf.

Replace the catch-all stub definitions for p{m,u,4}d_is_leaf with
definitions for p{m,u,4}d_leaf. A future patch will assume that
p{m,u,4}d_leaf is defined on all platforms.

In particular, implement pud_leaf for Book3E-64, pmd_leaf for all Book3E
and Book3S-64 platforms, with a catch-all definition for p4d_leaf.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v5: Split patch that replaces p{m,u,4}d_is_leaf into two patches, first
replacing callsites and afterward providing generic definition.
Remove ifndef-defines implementing p{m,u}d_leaf in favour of
implementing stubs in headers belonging to the particular platforms
needing them.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 -
 arch/powerpc/include/asm/nohash/64/pgtable.h |  6 ++
 arch/powerpc/include/asm/nohash/pgtable.h|  6 ++
 arch/powerpc/include/asm/pgtable.h   | 22 ++--
 5 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 75823f39e042..a090cb13a4a0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -242,6 +242,11 @@ static inline void pmd_clear(pmd_t *pmdp)
*pmdp = __pmd(0);
 }
 
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+   return false;
+}
 
 /*
  * When flushing the tlb entry for a page, we also need to flush the hash
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7e0d546f4b3c..589d2dbe3873 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1359,16 +1359,14 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
 /*
  * Like pmd_huge() and pmd_large(), but works regardless of config options
  */
-#define pmd_is_leaf pmd_is_leaf
-#define pmd_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
 }
 
-#define pud_is_leaf pud_is_leaf
-#define pud_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 879e9a6e5a87..d391a45e0f11 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -141,6 +141,12 @@ static inline void pud_clear(pud_t *pudp)
*pudp = __pud(0);
 }
 
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
+{
+   return false;
+}
+
 #define pud_none(pud)  (!pud_val(pud))
 #definepud_bad(pud)(!is_kernel_addr(pud_val(pud)) \
 || (pud_val(pud) & PUD_BAD_BITS))
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index f36dd2e2d591..43b50fd8d236 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -60,6 +60,12 @@ static inline bool pte_hw_valid(pte_t pte)
return pte_val(pte) & _PAGE_PRESENT;
 }
 
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+   return false;
+}
+
 /*
  * Don't just check for any non zero bits in __PAGE_USER, since for book3e
  * and PTE_64BIT, PAGE_KERNEL_X contains _PAGE_BAP_SR which is also in
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 17d30359d1f4..284408829fa3 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -128,29 +128,11 @@ static inline void pte_frag_set(mm_context_t *ctx, void 
*p)
 }
 #endif
 
-#ifndef pmd_is_leaf
-#define pmd_is_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define p4d_leaf p4d_leaf
+static inline bool p4d_leaf(p4d_t p4d)
 {
return false;
 }
-#endif
-
-#ifndef pud_is_leaf
-#define pud_is_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
-{
-   return false;
-}
-#endif
-
-#ifndef p4d_is_leaf
-#define p4d_is_leaf p4d_is_leaf
-static inline bool p4d_is_leaf(p4d_t p4d)
-{
-   return false;
-}
-#endif
 
 #define pmd_pgtable pmd_pgtable
 static inline pgtable_t pmd_pgtable(pmd_t pmd)
-- 
2.37.2



[PATCH v8 3/7] powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf

2023-02-15 Thread Rohan McLure
Replace occurrences of p{u,m,4}d_is_leaf with p{u,m,4}_leaf, as the
latter is the name given to checking that a higher-level entry in
multi-level paging contains a page translation entry (pte) throughout
all other archs.

A future patch will implement p{u,m,4}_leaf stubs on all platforms so
that they may be referenced in generic code.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
V4: New patch
V5: Previously replaced stub definition for *_is_leaf with *_leaf. Do
that in a later patch
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 14 +++---
 arch/powerpc/mm/pgtable.c|  6 +++---
 arch/powerpc/mm/pgtable_64.c |  6 +++---
 arch/powerpc/xmon/xmon.c |  6 +++---
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 9d3743ca16d5..0d24fd984d16 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -497,7 +497,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_leaf(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -526,7 +526,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t 
*pud,
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
if (!pud_present(*p))
continue;
-   if (pud_is_leaf(*p)) {
+   if (pud_leaf(*p)) {
pud_clear(p);
} else {
pmd_t *pmd;
@@ -629,12 +629,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = pud_alloc_one(kvm->mm, gpa);
 
pmd = NULL;
-   if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
+   if (pud && pud_present(*pud) && !pud_leaf(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -652,7 +652,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = NULL;
}
pud = pud_offset(p4d, gpa);
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
 
/* Check if we raced and someone else has set the same thing */
@@ -703,7 +703,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
unsigned long lgpa = gpa & PMD_MASK;
 
/* Check if we raced and someone else has set the same thing */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 26245aaf12b8..4e46e001c3c3 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -205,14 +205,14 @@ static void radix__change_memory_range(unsigned long 
start, unsigned long end,
pudp = pud_alloc(_mm, p4dp, idx);
if (!pudp)
continue;
-   if (pud_is_leaf(*pudp)) {
+   if (pud_leaf(*pudp)) {
ptep = (pte_t *)pudp;
goto update_the_pte;
}
pmdp = pmd_alloc(_mm, pudp, idx);
if (!pmdp)
continue;
-   if (pmd_is_leaf(*pmdp)) {
+   if (pmd_leaf(*pmdp)) {
ptep = pmdp_ptep(pmdp);
goto update_the_pte;
}
@@ -786,7 +786,7 @@ static void __meminit remove_pmd_table(pmd_t *pmd_start, 
unsigned long addr,
if (!pmd_present(*pmd))
continue;
 
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
if (!IS_ALIGNED(addr, PMD_SIZE) ||
!IS_ALIGNED(next, PMD_SIZE)) {
WARN_ONCE(1, "%s: unaligned range\n", __func__);
@@ -816,7 +816,7 @@ static void __meminit remove_pud_table(pud_t *pud_start, 
unsigned long addr,
if (!pud_present(*pud))
continue;
 
-   if (pud

[PATCH v8 2/7] powerpc/64s: mm: Introduce __pmdp_collapse_flush with mm_struct argument

2023-02-15 Thread Rohan McLure
pmdp_collapse_flush has references in generic code with just three
parameters, due to the choice of mm context being implied by the vm_area
context parameter.

Define __pmdp_collapse_flush to accept an additional mm_struct *
parameter, with pmdp_collapse_flush an inline function that unpacks
the vma and calls __pmdp_collapse_flush. The mm_struct * parameter
is needed in a future patch providing Page Table Check support,
which is defined in terms of mm context objects.

Signed-off-by: Rohan McLure 
---
v6: New patch
v7: Remove explicit `return' in macro. Prefix macro args with __
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cb4c67bf45d7..7e0d546f4b3c 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1244,14 +1244,19 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
 }
 
-static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
-   unsigned long address, pmd_t *pmdp)
+static inline pmd_t __pmdp_collapse_flush(struct vm_area_struct *vma, struct 
mm_struct *mm,
+ unsigned long address, pmd_t *pmdp)
 {
if (radix_enabled())
return radix__pmdp_collapse_flush(vma, address, pmdp);
return hash__pmdp_collapse_flush(vma, address, pmdp);
 }
-#define pmdp_collapse_flush pmdp_collapse_flush
+#define pmdp_collapse_flush(__vma, __addr, __pmdp) \
+({ \
+   struct vm_area_struct *_vma = (__vma);  \
+   \
+   __pmdp_collapse_flush(_vma, _vma->vm_mm, (__addr), (__pmdp));   \
+})
 
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL
 pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
-- 
2.37.2



[PATCH v8 1/7] powerpc: mm: Separate set_pte, set_pte_at for internal, external use

2023-02-15 Thread Rohan McLure
Produce separate symbols for set_pte, which is to be used in
arch/powerpc for reassignment of pte's, and set_pte_at, used in generic
code.

The reason for this distinction is to support the Page Table Check
sanitiser. Having this distinction allows for set_pte_at to
instrumented, but set_pte not to be, permitting for uninstrumented
internal mappings. This distinction in names is also present in x86.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v6: new patch
v7: Remove extern, move set_pte args to be in a single line.
---
 arch/powerpc/include/asm/book3s/pgtable.h | 3 +--
 arch/powerpc/include/asm/nohash/pgtable.h | 3 +--
 arch/powerpc/include/asm/pgtable.h| 1 +
 arch/powerpc/mm/pgtable.c | 3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h 
b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..1386ed705e66 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -12,8 +12,7 @@
 /* Insert a PTE, top-level function is out of line. It uses an inline
  * low level function in the respective pgtable-* files
  */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-  pte_t pte);
+void set_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte);
 
 
 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 69c3a050a3d8..f36dd2e2d591 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -154,8 +154,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 /* Insert a PTE, top-level function is out of line. It uses an inline
  * low level function in the respective pgtable-* files
  */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-  pte_t pte);
+void set_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte);
 
 /* This low level function performs the actual PTE insertion
  * Setting the PTE depends on the MMU type and other factors. It's
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9972626ddaf6..17d30359d1f4 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -48,6 +48,7 @@ struct mm_struct;
 /* Keep these as a macros to avoid include dependency mess */
 #define pte_page(x)pfn_to_page(pte_pfn(x))
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
+#define set_pte_at set_pte
 /*
  * Select all bits except the pfn
  */
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..d7cce317cef8 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -187,8 +187,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct 
vm_area_struct *vma,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-   pte_t pte)
+void set_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
 {
/*
 * Make sure hardware valid bit is not set. We don't do
-- 
2.37.2



[PATCH v8 0/7] Support page table check

2023-02-15 Thread Rohan McLure
Support the page table check sanitiser on all PowerPC platforms. This
sanitiser works by serialising assignments, reassignments and clears of
page table entries at each level in order to ensure that anonymous
mappings have at most one writable consumer, and likewise that
file-backed mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at
and set_pte, to allow for internal, uninstrumented mappings.

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (7):
  powerpc: mm: Separate set_pte, set_pte_at for internal, external use
  powerpc/64s: mm: Introduce __pmdp_collapse_flush with mm_struct
argument
  powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
  powerpc: mm: Implement p{m,u,4}d_leaf on all platforms
  powerpc: mm: Add common pud_pfn stub for all platforms
  powerpc: mm: Add p{te,md,ud}_user_accessible_page helpers
  powerpc: mm: Support page table check

 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 17 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 85 +---
 arch/powerpc/include/asm/book3s/pgtable.h|  3 +-
 arch/powerpc/include/asm/nohash/32/pgtable.h | 12 ++-
 arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +-
 arch/powerpc/include/asm/nohash/pgtable.h|  9 ++-
 arch/powerpc/include/asm/pgtable.h   | 60 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 +--
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 16 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 24 +++---
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c|  9 +--
 arch/powerpc/mm/pgtable_32.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  6 +-
 arch/powerpc/xmon/xmon.c |  6 +-
 17 files changed, 197 insertions(+), 93 deletions(-)

-- 
2.37.2



  1   2   3   4   >