date:20240314

Re: [PATCH v2] powerpc/pseries: fix max polling time in plpks_confirm_object_flushed() function

2024-03-14 Thread Andrew Donnellan

On Thu, 2024-03-14 at 00:17 -0400, Nayna Jain wrote:
> usleep_range() function takes input time and range in usec. However,
> currently it is assumed in msec in the function
> plpks_confirm_object_flushed().
> 
> Fix the total polling time for the object flushing from 5msec to
> 5sec.
> 
> Reported-by: Nageswara R Sastry 
> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform
> KeyStore")
> Suggested-by: Michael Ellerman 
> Signed-off-by: Nayna Jain 
> Tested-by: Nageswara R Sastry 

plpks_signed_update_var() also relies on PLPKS_MAX_TIMEOUT, and assumes
it to be in milliseconds rather than microseconds.

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited

Re: [PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching

2024-03-14 Thread Benjamin Gray

Also supersedes
https://lore.kernel.org/all/20240213043638.168048-1-bg...@linux.ibm.com/

[PATCH v1 2/2] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()

2024-03-14 Thread Benjamin Gray

The existing patching alias page setup and teardown sections can be
simplified to make use of the new open_patch_window() abstraction.

This eliminates the _mm variants of the helpers, consumers no longer
need to check mm_patch_enabled(), and consumers no longer need to worry
about synchronization and flushing beyond the changes they make in the
patching window.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/lib/code-patching.c | 180 +++
 1 file changed, 16 insertions(+), 164 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index d1b812f84154..fd6f8576033a 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -66,40 +66,6 @@ static bool mm_patch_enabled(void)
return IS_ENABLED(CONFIG_SMP) && radix_enabled();
 }
 
-/*
- * The following applies for Radix MMU. Hash MMU has different requirements,
- * and so is not supported.
- *
- * Changing mm requires context synchronising instructions on both sides of
- * the context switch, as well as a hwsync between the last instruction for
- * which the address of an associated storage access was translated using
- * the current context.
- *
- * switch_mm_irqs_off() performs an isync after the context switch. It is
- * the responsibility of the caller to perform the CSI and hwsync before
- * starting/stopping the temp mm.
- */
-static struct mm_struct *start_using_temp_mm(struct mm_struct *temp_mm)
-{
-   struct mm_struct *orig_mm = current->active_mm;
-
-   lockdep_assert_irqs_disabled();
-   switch_mm_irqs_off(orig_mm, temp_mm, current);
-
-   WARN_ON(!mm_is_thread_local(temp_mm));
-
-   suspend_breakpoints();
-   return orig_mm;
-}
-
-static void stop_using_temp_mm(struct mm_struct *temp_mm,
-  struct mm_struct *orig_mm)
-{
-   lockdep_assert_irqs_disabled();
-   switch_mm_irqs_off(temp_mm, orig_mm, current);
-   restore_breakpoints();
-}
-
 static int text_area_cpu_up(unsigned int cpu)
 {
struct vm_struct *area;
@@ -389,73 +355,20 @@ static void close_patch_window(struct patch_window *ctx)
pte_unmap_unlock(ctx->ptep, ctx->ptl);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
-{
-   int err;
-   u32 *patch_addr;
-   unsigned long text_poke_addr;
-   pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-   struct mm_struct *patching_mm;
-   struct mm_struct *orig_mm;
-   spinlock_t *ptl;
-
-   patching_mm = __this_cpu_read(cpu_patching_context.mm);
-   text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
-
-   pte = get_locked_pte(patching_mm, text_poke_addr, );
-   if (!pte)
-   return -ENOMEM;
-
-   __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, 
PAGE_KERNEL), 0);
-
-   /* order PTE update before use, also serves as the hwsync */
-   asm volatile("ptesync": : :"memory");
-
-   /* order context switch after arbitrary prior code */
-   isync();
-
-   orig_mm = start_using_temp_mm(patching_mm);
-
-   err = __patch_instruction(addr, instr, patch_addr);
-
-   /* context synchronisation performed by __patch_instruction (isync or 
exception) */
-   stop_using_temp_mm(patching_mm, orig_mm);
-
-   pte_clear(patching_mm, text_poke_addr, pte);
-   /*
-* ptesync to order PTE update before TLB invalidation done
-* by radix__local_flush_tlb_page_psize (in _tlbiel_va)
-*/
-   local_flush_tlb_page_psize(patching_mm, text_poke_addr, 
mmu_virtual_psize);
-
-   pte_unmap_unlock(pte, ptl);
-
-   return err;
-}
-
 static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   int err;
+   struct patch_window ctx;
u32 *patch_addr;
-   unsigned long text_poke_addr;
-   pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-
-   text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+   int err;
 
-   pte = __this_cpu_read(cpu_patching_context.pte);
-   __set_pte_at(_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 
0);
-   /* See ptesync comment in radix__set_pte_at() */
-   if (radix_enabled())
-   asm volatile("ptesync": : :"memory");
+   err = open_patch_window(addr, );
+   if (err)
+   return err;
 
+   patch_addr = (u32 *)(ctx.text_poke_addr + offset_in_page(addr));
err = __patch_instruction(addr, instr, patch_addr);
 
-   pte_clear(_mm, text_poke_addr, pte);
-   flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
+   close_patch_window();
 
return err;
 }
@@ -475,10 +388,7 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
return raw_patch_instruction(addr, instr);

[PATCH v1 1/2] powerpc/code-patching: Introduce open_patch_window()/close_patch_window()

2024-03-14 Thread Benjamin Gray

The code patching capabilities have grown, and the specific quirks for
setting up and tearing down the patching window are getting duplicated.

This commit introduces an abstraction for working with this patching
window. It defines open_patch_window() to set up the writable alias
page, and close_patch_window() to remove it and flush the TLB. The
actual mechanism for providing this patching window is an implementation
detail that consumers don't need to worry about. Consumers are still
responsible for flushing/syncing any changes they make through this
alias page though.

Signed-off-by: Benjamin Gray 

---

This design can be readily extended to remap the writable page to
another physical page without incurring all of the entry and exit
overhead. But that might have problems with spending too long in
an interrupts disabled context, so I've left it out for now.
---
 arch/powerpc/lib/code-patching.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index ed450a32918c..d1b812f84154 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -276,6 +276,119 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
+/*
+ * Represents an active patching window.
+ */
+struct patch_window {
+   /*
+* Page aligned patching window address. The page is a writable alias of
+* the configured target page.
+*/
+   unsigned long text_poke_addr;
+   /*
+* Pointer to the PTE for the patching window page.
+*/
+   pte_t *ptep;
+   /*
+* (Temporary MM patching only) The original MM before creating the
+* patching window.
+*/
+   struct mm_struct *orig_mm;
+   /*
+* (Temporary MM patching only) The patching window MM.
+*/
+   struct mm_struct *patch_mm;
+   /*
+* (Temporary MM patching only) Lock for the patching window PTE.
+*/
+   spinlock_t *ptl;
+};
+
+/*
+ * Calling this function opens a 'patching window' that maps a
+ * page starting at ctx->text_poke_addr (which is set by this function)
+ * as a writable alias to the page starting at addr.
+ *
+ * Upon success, callers must invoke the corresponding close_patch_window()
+ * function to return to the original state. Callers are also responsible
+ * for syncing any changes they make inside the window.
+ *
+ * Interrupts must be disabled for the entire duration of the patching. The 
PIDR
+ * is potentially changed during this time.
+ */
+static int open_patch_window(void *addr, struct patch_window *ctx)
+{
+   unsigned long pfn = get_patch_pfn(addr);
+
+   lockdep_assert_irqs_disabled();
+
+   ctx->text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr);
+
+   if (!mm_patch_enabled()) {
+   ctx->ptep = __this_cpu_read(cpu_patching_context.pte);
+   __set_pte_at(_mm, ctx->text_poke_addr,
+ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0);
+
+   /* See ptesync comment in radix__set_pte_at() */
+   if (radix_enabled())
+   asm volatile("ptesync" ::: "memory");
+
+   return 0;
+   }
+
+   ctx->orig_mm = current->active_mm;
+   ctx->patch_mm = __this_cpu_read(cpu_patching_context.mm);
+
+   ctx->ptep = get_locked_pte(ctx->patch_mm, ctx->text_poke_addr, 
>ptl);
+   if (!ctx->ptep)
+   return -ENOMEM;
+
+   __set_pte_at(ctx->patch_mm, ctx->text_poke_addr,
+ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0);
+
+   /* order PTE update before use, also serves as the hwsync */
+   asm volatile("ptesync" ::: "memory");
+
+   /*
+* Changing mm requires context synchronising instructions on both 
sides of
+* the context switch, as well as a hwsync between the last instruction 
for
+* which the address of an associated storage access was translated 
using
+* the current context. switch_mm_irqs_off() performs an isync after the
+* context switch.
+*/
+   isync();
+   switch_mm_irqs_off(ctx->orig_mm, ctx->patch_mm, current);
+
+   WARN_ON(!mm_is_thread_local(ctx->patch_mm));
+
+   suspend_breakpoints();
+   return 0;
+}
+
+static void close_patch_window(struct patch_window *ctx)
+{
+   lockdep_assert_irqs_disabled();
+
+   if (!mm_patch_enabled()) {
+   pte_clear(_mm, ctx->text_poke_addr, ctx->ptep);
+   flush_tlb_kernel_range(ctx->text_poke_addr, ctx->text_poke_addr 
+ PAGE_SIZE);
+   return;
+   }
+
+   isync();
+   switch_mm_irqs_off(ctx->patch_mm, ctx->orig_mm, current);
+   restore_breakpoints();
+
+   pte_clear(ctx->patch_mm, ctx->text_poke_addr, ctx->ptep);
+   /*
+* ptesync to order PTE update before TLB invalidation done
+* by

[PATCH v1 3/3] powerpc/code-patching: Optimise patch_memcpy() to 4 byte chunks

2024-03-14 Thread Benjamin Gray

As we are patching instructions, we can assume the length is a multiple
of 4 and the destination address is aligned.

Atomicity of patching a prefixed instruction is not a concern, as the
original implementation doesn't provide it anyway.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/lib/code-patching.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6633759b509..ed450a32918c 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -394,10 +394,10 @@ static int patch_memset32(u32 *addr, u32 val, size_t 
count)
return -EPERM;
 }
 
-static int patch_memcpy(void *dst, void *src, size_t len)
+static int patch_memcpy32(u32 *dst, u32 *src, size_t count)
 {
-   for (void *end = src + len; src < end; dst++, src++)
-   __put_kernel_nofault(dst, src, u8, failed);
+   for (u32 *end = src + count; src < end; dst++, src++)
+   __put_kernel_nofault(dst, src, u32, failed);
 
return 0;
 
@@ -424,7 +424,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, 
size_t len, bool rep
err = patch_memset32(patch_addr, val, len / 4);
}
} else {
-   err = patch_memcpy(patch_addr, code, len);
+   err = patch_memcpy32(patch_addr, code, len / 4);
}
 
smp_wmb();  /* smp write barrier */
-- 
2.44.0

[PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching

2024-03-14 Thread Benjamin Gray

The patching page set up as a writable alias may be in quadrant 1
(userspace) if the temporary mm path is used. This causes sanitiser
failures if so. Sanitiser failures also occur on the non-mm path
because the plain memset family is instrumented, and KASAN treats the
patching window as poisoned.

Introduce locally defined patch_* variants of memset that perform an
uninstrumented lower level set, as well as detecting write errors like
the original single patch variant does.

copy_to_user() is not correct here, as the PTE makes it a proper kernel
page (the EEA is privileged access only, RW). It just happens to be in
quadrant 1 because that's the hardware's mechanism for using the current
PID vs PID 0 in translations. Importantly, it's incorrect to allow user
page accesses.

Now that the patching memsets are used, we also propagate a failure up
to the caller as the single patch variant does.

Signed-off-by: Benjamin Gray 

---

The patch_memcpy() can be optimised to 4 bytes at a time assuming the
same requirements as regular instruction patching are being followed
for the 'copy sequence of instructions' mode (i.e., they actually are
instructions following instruction alignment rules).
---
 arch/powerpc/lib/code-patching.c | 42 +---
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..c6633759b509 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -372,9 +372,43 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+static int patch_memset64(u64 *addr, u64 val, size_t count)
+{
+   for (u64 *end = addr + count; addr < end; addr++)
+   __put_kernel_nofault(addr, , u64, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
+static int patch_memset32(u32 *addr, u32 val, size_t count)
+{
+   for (u32 *end = addr + count; addr < end; addr++)
+   __put_kernel_nofault(addr, , u32, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
+static int patch_memcpy(void *dst, void *src, size_t len)
+{
+   for (void *end = src + len; src < end; dst++, src++)
+   __put_kernel_nofault(dst, src, u8, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
 static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool 
repeat_instr)
 {
unsigned long start = (unsigned long)patch_addr;
+   int err;
 
/* Repeat instruction */
if (repeat_instr) {
@@ -383,19 +417,19 @@ static int __patch_instructions(u32 *patch_addr, u32 
*code, size_t len, bool rep
if (ppc_inst_prefixed(instr)) {
u64 val = ppc_inst_as_ulong(instr);
 
-   memset64((u64 *)patch_addr, val, len / 8);
+   err = patch_memset64((u64 *)patch_addr, val, len / 8);
} else {
u32 val = ppc_inst_val(instr);
 
-   memset32(patch_addr, val, len / 4);
+   err = patch_memset32(patch_addr, val, len / 4);
}
} else {
-   memcpy(patch_addr, code, len);
+   err = patch_memcpy(patch_addr, code, len);
}
 
smp_wmb();  /* smp write barrier */
flush_icache_range(start, start + len);
-   return 0;
+   return err;
 }
 
 /*
-- 
2.44.0

[PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot

2024-03-14 Thread Benjamin Gray

patch_instructions() introduces new behaviour with a couple of
variations. Test each case of

  * a repeated 32-bit instruction,
  * a repeated 64-bit instruction (ppc64), and
  * a copied sequence of instructions

for both on a single page and when it crosses a page boundary.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/lib/test-code-patching.c | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/arch/powerpc/lib/test-code-patching.c 
b/arch/powerpc/lib/test-code-patching.c
index c44823292f73..35a3756272df 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -347,6 +347,97 @@ static void __init test_prefixed_patching(void)
check(!memcmp(iptr, expected, sizeof(expected)));
 }
 
+static void __init test_multi_instruction_patching(void)
+{
+   u32 code[256];
+   void *buf;
+   u32 *addr32;
+   u64 *addr64;
+   ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL << 24, 
PPC_RAW_TRAP());
+   u32 inst32 = PPC_RAW_NOP();
+
+   buf = vzalloc(PAGE_SIZE * 8);
+   check(buf);
+   if (!buf)
+   return;
+
+   /* Test single page 32-bit repeated instruction */
+   addr32 = buf + PAGE_SIZE;
+   check(!patch_instructions(addr32 + 1, , 12, true));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == inst32);
+   check(addr32[2] == inst32);
+   check(addr32[3] == inst32);
+   check(addr32[4] == 0);
+
+   /* Test single page 64-bit repeated instruction */
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   check(ppc_inst_prefixed(inst64));
+
+   addr64 = buf + PAGE_SIZE * 2;
+   ppc_inst_write(code, inst64);
+   check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true));
+
+   check(addr64[0] == 0);
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64));
+   check(addr64[4] == 0);
+   }
+
+   /* Test single page memcpy */
+   addr32 = buf + PAGE_SIZE * 3;
+
+   for (int i = 0; i < ARRAY_SIZE(code); i++)
+   code[i] = i + 1;
+
+   check(!patch_instructions(addr32 + 1, code, sizeof(code), false));
+
+   check(addr32[0] == 0);
+   check(!memcmp([1], code, sizeof(code)));
+   check(addr32[ARRAY_SIZE(code) + 1] == 0);
+
+   /* Test multipage 32-bit repeated instruction */
+   addr32 = buf + PAGE_SIZE * 4 - 8;
+   check(!patch_instructions(addr32 + 1, , 12, true));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == inst32);
+   check(addr32[2] == inst32);
+   check(addr32[3] == inst32);
+   check(addr32[4] == 0);
+
+   /* Test multipage 64-bit repeated instruction */
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   check(ppc_inst_prefixed(inst64));
+
+   addr64 = buf + PAGE_SIZE * 5 - 8;
+   ppc_inst_write(code, inst64);
+   check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true));
+
+   check(addr64[0] == 0);
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64));
+   check(addr64[4] == 0);
+   }
+
+   /* Test multipage memcpy */
+   addr32 = buf + PAGE_SIZE * 6 - 12;
+
+   for (int i = 0; i < ARRAY_SIZE(code); i++)
+   code[i] = i + 1;
+
+   check(!patch_instructions(addr32 + 1, code, sizeof(code), false));
+
+   check(addr32[0] == 0);
+   check(!memcmp([1], code, sizeof(code)));
+   check(addr32[ARRAY_SIZE(code) + 1] == 0);
+
+   vfree(buf);
+}
+
 static int __init test_code_patching(void)
 {
pr_info("Running code patching self-tests ...\n");
@@ -356,6 +447,7 @@ static int __init test_code_patching(void)
test_create_function_call();
test_translate_branch();
test_prefixed_patching();
+   test_multi_instruction_patching();
 
return 0;
 }
-- 
2.44.0

Re: [PATCH v10 11/12] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages

2024-03-14 Thread LTC IMAP

On Wed, 2024-03-13 at 11:30 +, Christophe Leroy wrote:
> 
> 
> Le 13/03/2024 à 05:21, Rohan McLure a écrit :
> > In the new set_ptes() API, set_pte_at() (a special case of
> > set_ptes())
> > is intended to be instrumented by the page table check facility.
> > There
> > are however several other routines that constitute the API for
> > setting
> > page table entries, including set_pmd_at() among others. Such
> > routines
> > are themselves implemented in terms of set_ptes_at().
> > 
> > A future patch providing support for page table checking on powerpc
> > must take care to avoid duplicate calls to
> > page_table_check_p{te,md,ud}_set(). Allow for assignment of pte
> > entries
> > without instrumentation through the set_pte_at_unchecked() routine
> > introduced in this patch.
> > 
> > Cause API-facing routines that call set_pte_at() to instead call
> > set_pte_at_unchecked(), which will remain uninstrumented by page
> > table check. set_ptes() is itself implemented by calls to
> > __set_pte_at(), so this eliminates redundant code.
> > 
> > Also prefer set_pte_at_unchecked() in early-boot usages which
> > should not be
> > instrumented.
> > 
> > Signed-off-by: Rohan McLure 
> > ---
> > v9: New patch
> > v10: don't reuse __set_pte_at(), as that will not apply filters.
> > Instead
> > use new set_pte_at_unchecked().
> 
> Are filters needed at all in those usecases ?

I'm just retaining the original semantics of these calls. I think
another patch can replace this call with __set_pte_at() if filters are
deemed unnecessary.

> 
> > ---
> >   arch/powerpc/include/asm/pgtable.h   | 2 ++
> >   arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
> >   arch/powerpc/mm/book3s64/pgtable.c   | 6 +++---
> >   arch/powerpc/mm/book3s64/radix_pgtable.c | 8 
> >   arch/powerpc/mm/nohash/book3e_pgtable.c  | 2 +-
> >   arch/powerpc/mm/pgtable.c    | 7 +++
> >   arch/powerpc/mm/pgtable_32.c | 2 +-
> >   7 files changed, 19 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/pgtable.h
> > b/arch/powerpc/include/asm/pgtable.h
> > index 3741a63fb82e..6ff1d8cfa216 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -44,6 +44,8 @@ struct mm_struct;
> >   void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t
> > *ptep,
> >     pte_t pte, unsigned int nr);
> >   #define set_ptes set_ptes
> > +void set_pte_at_unchecked(struct mm_struct *mm, unsigned long
> > addr,
> > +     pte_t *ptep, pte_t pte);
> >   #define update_mmu_cache(vma, addr, ptep) \
> >     update_mmu_cache_range(NULL, vma, addr, ptep, 1)
> >   
> > diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > index 988948d69bc1..871472f99a01 100644
> > --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> > @@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea,
> > unsigned long pa, pgprot_t prot)
> >     ptep = pte_alloc_kernel(pmdp, ea);
> >     if (!ptep)
> >     return -ENOMEM;
> > -   set_pte_at(_mm, ea, ptep, pfn_pte(pa >>
> > PAGE_SHIFT, prot));
> > +   set_pte_at_unchecked(_mm, ea, ptep,
> > pfn_pte(pa >> PAGE_SHIFT, prot));
> >     } else {
> >     /*
> >      * If the mm subsystem is not fully up, we cannot
> > create a
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c
> > b/arch/powerpc/mm/book3s64/pgtable.c
> > index 3438ab72c346..25082ab6018b 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned
> > long addr,
> >     WARN_ON(!(pmd_large(pmd)));
> >   #endif
> >     trace_hugepage_set_pmd(addr, pmd_val(pmd));
> > -   return set_pte_at(mm, addr, pmdp_ptep(pmdp),
> > pmd_pte(pmd));
> > +   return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp),
> > pmd_pte(pmd));
> >   }
> >   
> >   void set_pud_at(struct mm_struct *mm, unsigned long addr,
> > @@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned
> > long addr,
> >     WARN_ON(!(pud_large(pud)));
> >   #endif
> >     trace_hugepage_set_pud(addr, pud_val(pud));
> > -   return set_pte_at(mm, addr, pudp_ptep(pudp),
> > pud_pte(pud));
> > +   return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp),
> > pud_pte(pud));
> >   }
> >   
> >   static void do_serialize(void *arg)
> > @@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct
> > vm_area_struct *vma, unsigned long addr,
> >     if (radix_enabled())
> >     return radix__ptep_modify_prot_commit(vma, addr,
> >       ptep,
> > old_pte, pte);
> > -   set_pte_at(vma->vm_mm, addr, ptep, pte);
> > +   set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
> >   }
> >   
> >   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c
> >

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread John Baldwin


On 3/14/24 10:10 AM, Dave Hansen wrote:

On 3/14/24 09:45, John Baldwin wrote:

On 3/14/24 8:37 AM, Dave Hansen wrote:

On 3/14/24 04:23, Vignesh Balasubramanian wrote:

Add a new .note section containing type, size, offset and flags of
every xfeature that is present.


Mechanically, I'd much rather have all of that info in the cover letter
in the actual changelog instead.

I'd also love to see a practical example of what an actual example core
dump looks like on two conflicting systems:

     * Total XSAVE size
     * XCR0 value
     * XSTATE_BV from the core dump
     * XFEATURE offsets for each feature


I noticed this when I bought an AMD Ryzen 9 5900X based system for
my desktop running FreeBSD and found that the XSAVE core dump notes
were not recognized by GDB (FreeBSD dumps an XSAVE register set note
that matches the same layout of NT_X86_XSTATE used by Linux).


I just want to make sure that you heard what I asked.  I'd like to see a
practical example of how the real-world enumeration changes between two
real world systems.

Is that possible?

Here's the raw CPUID data from the XSAVE region on my laptop:


0x000d 0x00: eax=0x02e7 ebx=0x0a88 ecx=0x0a88 edx=0x
0x000d 0x01: eax=0x000f ebx=0x0998 ecx=0x3900 edx=0x
0x000d 0x02: eax=0x0100 ebx=0x0240 ecx=0x edx=0x
0x000d 0x05: eax=0x0040 ebx=0x0440 ecx=0x edx=0x
0x000d 0x06: eax=0x0200 ebx=0x0480 ecx=0x edx=0x
0x000d 0x07: eax=0x0400 ebx=0x0680 ecx=0x edx=0x
0x000d 0x08: eax=0x0080 ebx=0x ecx=0x0001 edx=0x
0x000d 0x09: eax=0x0008 ebx=0x0a80 ecx=0x edx=0x
0x000d 0x0b: eax=0x0010 ebx=0x ecx=0x0001 edx=0x
0x000d 0x0c: eax=0x0018 ebx=0x ecx=0x0001 edx=0x
0x000d 0x0d: eax=0x0008 ebx=0x ecx=0x0001 edx=0x


Could we get that for an impacted AMD system, please?

cpuid -1 --raw | grep "   0x000d "

should do it.


   0x000d 0x00: eax=0x0207 ebx=0x0988 ecx=0x0988 edx=0x
   0x000d 0x01: eax=0x000f ebx=0x0348 ecx=0x1800 edx=0x
   0x000d 0x02: eax=0x0100 ebx=0x0240 ecx=0x edx=0x
   0x000d 0x09: eax=0x0008 ebx=0x0980 ecx=0x edx=0x
   0x000d 0x0b: eax=0x0010 ebx=0x ecx=0x0001 edx=0x
   0x000d 0x0c: eax=0x0018 ebx=0x ecx=0x0001 edx=0x

Here, I think the ebx value for the 0x09 leaf (PKRU) is the relevant difference
here, it is 0xa80 on your laptop and 0x980 on the AMD CPU.  (This is the
missing MPX gap on AMD.)


This is pretty close to just a raw dump of the XSAVE CPUID leaves.
Rather than come up with an XSAVE-specific ABI that depends on CPUID
*ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
should just bite the bullet and dump out (some of) the raw CPUID space.


So the current note I initially proposed and implemented for FreeBSD
(https://reviews.freebsd.org/D42136) and an initial patch set for GDB
(https://sourceware.org/pipermail/gdb-patches/2023-October/203083.html)
do indeed dump a raw set of CPUID leaves.  The version I have for FreeBSD
only dumps the raw leaf values for leaf 0x0d though the note format is
extensible should additional leaves be needed in the future.  One of the
questions if we wanted to use a CPUID leaf note is which leaves to dump
(e.g. do you dump all of them, or do you just dump the subset that is
currently needed).


You dump what is needed and add to the dump over time.


That is what I started with, yes, but am attempting to anticipate future
problems in my list of caveats.


Another quirky question is what to do about systems with hetergeneous
cores (E vs P for example).

That's irrelevant for now.  The cores may be heterogeneous but the
userspace ISA and (and thus XSAVE formats) are identical.  If they're
not, then we have bigger problems on our hands.


Yes, I agree on the bigger problems and hope we don't have to solve
them.


Currently those systems use the same XSAVE layout across all cores,
but other CPUID leaves do already vary across cores on those systems.


There shouldn't be any CPUID leaves that differ _and_ matter to
userspace and thus core dumps.


Today that is true, yes.  I'm fine with making that tradeoff (along
with only dumping a subset of leaves) so long as the consensus is that
is an acceptable tradeoff to make.


However, there are other wrinkles with the leaf approach.  Namely, one
of the use cases that I currently have an ugly hack for in GDB is if
you are using gdb against a remote host running gdbserver and then use
'gcore' to generate a core dump.  GDB needs to write out a NT_X86_XSTATE
note, but that note requires a layout.  What GDB does today is just pick
a known Intel layout based

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread John Baldwin


On 3/14/24 8:37 AM, Dave Hansen wrote:

On 3/14/24 04:23, Vignesh Balasubramanian wrote:

But this patch series depends on heuristics based on the total XSAVE
register set size and the XCR0 mask to infer the layouts of the
various register blocks for core dumps, and hence, is not a foolproof
mechanism to determine the layout of the XSAVE area.


It may not be theoretically foolproof.  But I'm struggling to think of a
case where it would matter in practice.  Is there any CPU from any
vendor where this is actually _needed_?

Sure, it's ugly as hell, but these notes aren't going to be available
universally _ever_, so it's not like the crummy heuristic code gets to
go away.


I forgot to mention one other use case for this note.

Today (and before my earlier patch series to add the ugly heuristic),
when the NT_X86_XSTATE core dump note grows because a CPU vendor adds
a new xfeature and OS's which just dump the entire XSAVE state start
including that, GDB fails to parse the entire note.

Having a note describing the layout (whichever format is chosen),
allows GDB to still pull registers for features it understands from
the larger note and ignoring the parts of the XSAVE block it doesn't
understand.

--
John Baldwin

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread John Baldwin


On 3/14/24 8:37 AM, Dave Hansen wrote:

On 3/14/24 04:23, Vignesh Balasubramanian wrote:

Add a new .note section containing type, size, offset and flags of
every xfeature that is present.


Mechanically, I'd much rather have all of that info in the cover letter
in the actual changelog instead.

I'd also love to see a practical example of what an actual example core
dump looks like on two conflicting systems:

* Total XSAVE size
* XCR0 value
* XSTATE_BV from the core dump
* XFEATURE offsets for each feature


I noticed this when I bought an AMD Ryzen 9 5900X based system for my desktop
running FreeBSD and found that the XSAVE core dump notes were not recognized
by GDB (FreeBSD dumps an XSAVE register set note that matches the same
layout of NT_X86_XSTATE used by Linux).

In particular, as the cover letter notes, on this AMD processor, there is
no "gap" for MPX, so the PKRU registers it provides are at a different offset
than on Intel CPUs.  Furthermore, my reading of the SDM is that there is no
guarantee of architectural offsets of a given XSAVE feature and that software
should be querying CPUID to determine the layout.

FWIW, the relevant CPUID leaves for my AMD system:

   XSAVE features (0xd/0):
  XCR0 valid bit field mask   = 0x0207
 x87 state= true
 SSE state= true
 AVX state= true
 MPX BNDREGS  = false
 MPX BNDCSR   = false
 AVX-512 opmask   = false
 AVX-512 ZMM_Hi256= false
 AVX-512 Hi16_ZMM = false
 PKRU state   = true
 XTILECFG state   = false
 XTILEDATA state  = false
  bytes required by fields in XCR0= 0x0988 (2440)
  bytes required by XSAVE/XRSTOR area = 0x0988 (2440)
  XSAVEOPT instruction= true
  XSAVEC instruction  = true
  XGETBV instruction  = true
  XSAVES/XRSTORS instructions = true
  XFD: extended feature disable supported = false
  SAVE area size in bytes = 0x0348 (840)
  IA32_XSS valid bit field mask   = 0x1800
 PT state = false
 PASID state  = false
 CET_U user state = true
 CET_S supervisor state   = true
 HDC state= false
 UINTR state  = false
 LBR state= false
 HWP state= false


Do you have any information about what other OSes are doing in this
area?  I thought Windows, for instance, was even less flexible about the
XSAVE format than Linux is.


I have an implementation of a similar note for FreeBSD already as well as
patches for GDB to make use of the note (for FreeBSD) and generate it
via 'gcore' (on FreeBSD).  However, I would very much like to reach
consensus on a shared format of the note to avoid gratuitous differences
between FreeBSD and Linux.  The AMD folks were gracious enough to work on
the Linux kernel implementation.  A bit more on that below though.


Why didn't LWP cause this problem?

 From the cover letter:


But this patch series depends on heuristics based on the total XSAVE
register set size and the XCR0 mask to infer the layouts of the
various register blocks for core dumps, and hence, is not a foolproof
mechanism to determine the layout of the XSAVE area.


It may not be theoretically foolproof.  But I'm struggling to think of a
case where it would matter in practice.  Is there any CPU from any
vendor where this is actually _needed_?

Sure, it's ugly as hell, but these notes aren't going to be available
universally _ever_, so it's not like the crummy heuristic code gets to
go away.

Have you seen the APX spec?




https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

It makes this even more fun because it adds a new XSAVE state component,
but reuses the MPX offsets.


This information will be used by the debuggers to understand the XSAVE
layout of the machine where the core file is dumped, and to read XSAVE
registers, especially during cross-platform debugging.


This is pretty close to just a raw dump of the XSAVE CPUID leaves.
Rather than come up with an XSAVE-specific ABI that depends on CPUID
*ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
should just bite the bullet and dump out (some of) the raw CPUID space.


So the current note I initially proposed and implemented for FreeBSD
(https://reviews.freebsd.org/D42136) and an initial patch set for GDB

RE: [PATCH 0/1] Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts

2024-03-14 Thread Willgerodt, Felix

> -Original Message-
> From: Vignesh Balasubramanian 
> Sent: Donnerstag, 14. März 2024 12:23
> To: linux-ker...@vger.kernel.org; linux-toolcha...@vger.kernel.org
> Cc: m...@ellerman.id.au; npig...@gmail.com; christophe.le...@csgroup.eu;
> aneesh.ku...@kernel.org; naveen.n@linux.ibm.com;
> ebied...@xmission.com; keesc...@chromium.org; x...@kernel.org; linuxppc-
> d...@lists.ozlabs.org; linux...@kvack.org; bpet...@amd.com;
> jinisusan.geo...@amd.com; m...@suse.de; binut...@sourceware.org;
> j...@freebsd.org; Willgerodt, Felix ; Vignesh
> Balasubramanian 
> Subject: [PATCH 0/1] Add XSAVE layout description to Core files for debuggers 
> to
> support varying XSAVE layouts
> 
> This patch proposes to add an extra .note section in the corefile to dump the
> CPUID information of a machine. This is being done to solve the issue of 
> tools like
> the debuggers having to deal with coredumps from machines with varying XSAVE
> layouts in spite of having the same XCR0 bits. The new proposed .note 
> section, at
> this point, consists of an array of records containing the information of each
> extended feature that is present. This provides details about the offsets and 
> the
> sizes of the various extended save state components of the machine where the
> application crash occurred. Requesting a review for this patch.
> 
> Some background:
> 
> The XSAVE layouts of modern AMD and Intel CPUs differ, especially since Memory
> Protection Keys and the AVX-512 features have been inculcated into the AMD
> CPUs. This is since AMD never adopted (and hence never left room in the XSAVE
> layout for) the Intel MPX feature. Tools like GDB had assumed a fixed XSAVE
> layout matching that of Intel (based on the XCR0 mask). 

Hi,

I am a GDB developer and very much in favour of getting rid of the 
interim solution added to GDB. It doesn't scale well, as soon as we add new 
state
that has the same size as some existing state.

I am wondering if it wouldn't be easier for everyone if corefiles would just
contain space for all possible XSAVE components? Regardless of whether the CPU
supports it or whether or not AMD or Intel ever supported the feature.
Or if we would at least not drop some state from the middle, like in this case.

Regards,
Felix
Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de 
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

Re: [PATCH 00/14] Add support for suppressing warning backtraces

2024-03-14 Thread Maxime Ripard

On Thu, Mar 14, 2024 at 07:37:13AM -0700, Guenter Roeck wrote:
> On 3/14/24 06:36, Geert Uytterhoeven wrote:
> > Hi Günter,
> > 
> > On Tue, Mar 12, 2024 at 6:03 PM Guenter Roeck  wrote:
> > > Some unit tests intentionally trigger warning backtraces by passing bad
> > > parameters to kernel API functions. Such unit tests typically check the
> > > return value from such calls, not the existence of the warning backtrace.
> > > 
> > > Such intentionally generated warning backtraces are neither desirable
> > > nor useful for a number of reasons.
> > > - They can result in overlooked real problems.
> > > - A warning that suddenly starts to show up in unit tests needs to be
> > >investigated and has to be marked to be ignored, for example by
> > >adjusting filter scripts. Such filters are ad-hoc because there is
> > >no real standard format for warnings. On top of that, such filter
> > >scripts would require constant maintenance.
> > > 
> > > One option to address problem would be to add messages such as "expected
> > > warning backtraces start / end here" to the kernel log.  However, that
> > > would again require filter scripts, it might result in missing real
> > > problematic warning backtraces triggered while the test is running, and
> > > the irrelevant backtrace(s) would still clog the kernel log.
> > > 
> > > Solve the problem by providing a means to identify and suppress specific
> > > warning backtraces while executing test code. Support suppressing multiple
> > > backtraces while at the same time limiting changes to generic code to the
> > > absolute minimum. Architecture specific changes are kept at minimum by
> > > retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
> > > CONFIG_KUNIT are enabled.
> > > 
> > > The first patch of the series introduces the necessary infrastructure.
> > > The second patch introduces support for counting suppressed backtraces.
> > > This capability is used in patch three to implement unit tests.
> > > Patch four documents the new API.
> > > The next two patches add support for suppressing backtraces in drm_rect
> > > and dev_addr_lists unit tests. These patches are intended to serve as
> > > examples for the use of the functionality introduced with this series.
> > > The remaining patches implement the necessary changes for all
> > > architectures with GENERIC_BUG support.
> > 
> > Thanks for your series!
> > 
> > I gave it a try on m68k, just running backtrace-suppression-test,
> > and that seems to work fine.
> > 
> > > Design note:
> > >Function pointers are only added to the __bug_table section if both
> > >CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
> > >size increases if CONFIG_KUNIT=n. There would be some benefits to
> > >adding those pointers all the time (reduced complexity, ability to
> > >display function names in BUG/WARNING messages). That change, if
> > >desired, can be made later.
> > 
> > Unfortunately this also increases kernel size in the CONFIG_KUNIT=m
> > case (ca. 80 KiB for atari_defconfig), making it less attractive to have
> > kunit and all tests enabled as modules in my standard kernel.
> > 
> 
> Good point. Indeed, it does. I wanted to avoid adding a configuration option,
> but maybe I should add it after all. How about something like this ?
> 
> +config KUNIT_SUPPRESS_BACKTRACE
> +   bool "KUnit - Enable backtrace suppression"
> +   default y
> +   help
> + Enable backtrace suppression for KUnit. If enabled, backtraces
> + generated intentionally by KUnit tests can be suppressed. Disable
> + to reduce kernel image size if image size is more important than
> + suppression of backtraces generated by KUnit tests.
> +

How are tests using that API supposed to handle it then?

Select the config option or put an ifdef?

If the former, we end up in the same situation than without the symbol.
If the latter, we end up in a similar situation than disabling KUNIT
entirely, with some tests not being run which is just terrible.

Maxime


signature.asc
Description: PGP signature

[PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Vignesh Balasubramanian

Add a new .note section containing type, size, offset and flags of
every xfeature that is present.

This information will be used by the debuggers to understand the XSAVE
layout of the machine where the core file is dumped, and to read XSAVE
registers, especially during cross-platform debugging.

Co-developed-by: Jini Susan George 
Signed-off-by: Jini Susan George 
Signed-off-by: Vignesh Balasubramanian 
---
 arch/Kconfig   |   9 +++
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/elf.h |   2 -
 arch/x86/Kconfig   |   1 +
 arch/x86/include/asm/elf.h |   7 +++
 arch/x86/kernel/fpu/xstate.c   | 101 +
 include/linux/elf.h|   2 +-
 include/uapi/linux/elf.h   |   1 +
 8 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index fd18b7db2c77..3bd8a0b2bba1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -502,6 +502,15 @@ config MMU_LAZY_TLB_SHOOTDOWN
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
+config ARCH_HAVE_EXTRA_ELF_NOTES
+   bool
+   help
+ An architecture should select this in order to enable adding an
+ arch-specific ELF note section to core files. It must provide two
+ functions: elf_coredump_extra_notes_size() and
+ elf_coredump_extra_notes_write() which are invoked by the ELF core
+ dumper.
+
 config ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a91cb070ca4a..3b31bd7490e2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -156,6 +156,7 @@ config PPC
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UBSAN
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_HAVE_EXTRA_ELF_NOTESif SPU_BASE
select ARCH_KEEP_MEMBLOCK
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 79f1c480b5eb..bb4b9d3e 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -127,8 +127,6 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
 /* Notes used in ET_CORE. Note name is "SPU//". */
 #define NT_SPU 1
 
-#define ARCH_HAVE_EXTRA_ELF_NOTES
-
 #endif /* CONFIG_SPU_BASE */
 
 #ifdef CONFIG_PPC64
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 78050d5d7fac..35e8d1201099 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -104,6 +104,7 @@ config X86
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_HAVE_EXTRA_ELF_NOTES
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 1fb83d47711f..1b9f0b4bf6bc 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -13,6 +13,13 @@
 #include 
 #include 
 
+struct xfeat_component {
+   u32 xfeat_type;
+   u32 xfeat_sz;
+   u32 xfeat_off;
+   u32 xfeat_flags;
+} __packed;
+
 typedef unsigned long elf_greg_t;
 
 #define ELF_NGREG (sizeof(struct user_regs_struct) / sizeof(elf_greg_t))
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 117e74c44e75..6e5ea483ec1d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1836,3 +1837,103 @@ int proc_pid_arch_status(struct seq_file *m, struct 
pid_namespace *ns,
return 0;
 }
 #endif /* CONFIG_PROC_PID_ARCH_STATUS */
+
+/*
+ * Dump type, size, offset and flag values for every xfeature that is present.
+ */
+static int dump_xsave_layout_desc(struct coredump_params *cprm)
+{
+
+   struct xfeat_component xc;
+   int num_records = 0;
+   int i;
+
+   /* XFEATURE_FPU and XFEATURE_SSE, both are fixed legacy states. */
+   for (i = 0; i < FIRST_EXTENDED_XFEATURE; i++) {
+   xc.xfeat_type = i;
+   xc.xfeat_sz = xstate_sizes[i];
+   xc.xfeat_off = xstate_offsets[i];
+   xc.xfeat_flags = xstate_flags[i];
+
+   if (!dump_emit(cprm, , sizeof(struct xfeat_component)))
+   return 0;
+   num_records++;
+   }
+
+   for_each_extended_xfeature(i, fpu_user_cfg.max_features) {
+   xc.xfeat_type = i;
+   xc.xfeat_sz = xstate_sizes[i];
+   xc.xfeat_off = xstate_offsets[i];
+   xc.xfeat_flags = xstate_flags[i];
+
+   if (!dump_emit(cprm, , sizeof(struct xfeat_component)))
+   return 0;
+   num_records++;
+   }
+
+   return num_records;
+}
+
+static int get_xsave_desc_size(void)
+{
+   /* XFEATURE_FP and XFEATURE_SSE, both

[PATCH 0/1] Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts

2024-03-14 Thread Vignesh Balasubramanian

This patch proposes to add an extra .note section in the corefile to dump the 
CPUID information of a machine. This is being done to solve the issue of tools 
like the debuggers having to deal with coredumps from machines with varying 
XSAVE layouts in spite of having the same XCR0 bits. The new proposed .note 
section, at this point, consists of an array of records containing the 
information of each extended feature that is present. This provides details 
about the offsets and the sizes of the various extended save state components 
of the machine where the application crash occurred. Requesting a review for 
this patch.

Some background:

The XSAVE layouts of modern AMD and Intel CPUs differ, especially since Memory 
Protection Keys and the AVX-512 features have been inculcated into the AMD 
CPUs. This is since AMD never adopted (and hence never left room in the XSAVE 
layout for) the Intel MPX feature. Tools like GDB had assumed a fixed XSAVE 
layout matching that of Intel (based on the XCR0 mask). Hence, the core dumps 
from AMD CPUs didn't match the known size for the XCR0 mask. This resulted in 
GDB and other tools not being able to access the values of the AVX-512 and PKRU 
registers on AMD CPUs. To solve this, an interim solution has been accepted 
into GDB, and is already a part of GDB 14, thanks to these series of patches : 
[ https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html ].
But this patch series depends on heuristics based on the total XSAVE register 
set size and the XCR0 mask to infer the layouts of the various register blocks 
for core dumps, and hence, is not a foolproof mechanism to determine the layout 
of the XSAVE area.

Hence this new core dump note has been proposed as a more sturdy mechanism to 
allow GDB/LLDB and other relevant tools to determine the layout of the XSAVE 
area of the machine where the corefile was dumped.
The  new core dump note (which is being proposed as a per-process .note 
section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures.
Each structure describes an individual extended feature containing offset, size 
and flags (that is obtained through CPUID instruction) in a format roughly 
matching the follow C structure:

struct xfeat_component {
   u32 xfeat_type;
   u32 xfeat_sz;
   u32 xfeat_off;
   u32 xfeat_flags;
};


Vignesh Balasubramanian (1):
  x86/elf: Add a new .note section containing Xfeatures information to
x86 core files

 arch/Kconfig   |   9 +++
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/elf.h |   2 -
 arch/x86/Kconfig   |   1 +
 arch/x86/include/asm/elf.h |   7 +++
 arch/x86/kernel/fpu/xstate.c   | 101 +
 include/linux/elf.h|   2 +-
 include/uapi/linux/elf.h   |   1 +
 8 files changed, 121 insertions(+), 3 deletions(-)

-- 
2.43.0

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Michael Ellerman

Hi Vignesh,

Vignesh Balasubramanian  writes:
> Add a new .note section containing type, size, offset and flags of
> every xfeature that is present.
>
> This information will be used by the debuggers to understand the XSAVE
> layout of the machine where the core file is dumped, and to read XSAVE
> registers, especially during cross-platform debugging.
>
> Co-developed-by: Jini Susan George 
> Signed-off-by: Jini Susan George 
> Signed-off-by: Vignesh Balasubramanian 
> ---
>  arch/Kconfig   |   9 +++
>  arch/powerpc/Kconfig   |   1 +
>  arch/powerpc/include/asm/elf.h |   2 -
>  arch/x86/Kconfig   |   1 +
>  arch/x86/include/asm/elf.h |   7 +++
>  arch/x86/kernel/fpu/xstate.c   | 101 +
>  include/linux/elf.h|   2 +-
>  include/uapi/linux/elf.h   |   1 +
>  8 files changed, 121 insertions(+), 3 deletions(-)

IMHO you should split the changes to replace ARCH_HAVE_EXTRA_ELF_NOTES
with CONFIG_ARCH_HAVE_EXTRA_ELF_NOTES into a lead-up patch.

cheers

Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-03-14 Thread Steven Rostedt

On Thu, 14 Mar 2024 09:57:57 -0700
Alison Schofield  wrote:

> On Fri, Feb 23, 2024 at 12:56:34PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > [
> >This is a treewide change. I will likely re-create this patch again in
> >the second week of the merge window of v6.9 and submit it then. Hoping
> >to keep the conflicts that it will cause to a minimum.
> > ]

Note, change of plans. I plan on sending this in the next merge window, as
this merge window I have this patch:

https://lore.kernel.org/linux-trace-kernel/20240312113002.00031...@gandalf.local.home/

That will warn if the source string of __string() is different than the
source string of __assign_str(). I want to make sure they are identical
before just dropping one of them.

> 
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index bdf117a33744..07ba4e033347 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h  
> 
> snip to poison
> 
> > @@ -668,8 +668,8 @@ TRACE_EVENT(cxl_poison,
> > ),
> >  
> > TP_fast_assign(
> > -   __assign_str(memdev, dev_name(>dev));
> > -   __assign_str(host, dev_name(cxlmd->dev.parent));
> > +   __assign_str(memdev);
> > +   __assign_str(host);  
> 
> I think I get that the above changes work because the TP_STRUCT__entry for
> these did:
>   __string(memdev, dev_name(>dev))
>   __string(host, dev_name(cxlmd->dev.parent))

That's the point. They have to be identical or you will likely bug.

The __string(name, src) is used to find the string length of src which
allocates the necessary length on the ring buffer. The __assign_str(name, src)
will copy src into the ring buffer.

Similar to:

len = strlen(src);
buf = malloc(len);
strcpy(buf, str);

Where __string() is strlen() and __assign_str() is strcpy(). It doesn't
make sense to use two different strings, and if you did, it would likely be
a bug.

But the magic behind __string() does much more than just get the length of
the string, and it could easily save the pointer to the string (along with
its length) and have it copy that in the __assign_str() call, making the
src parameter of __assign_str() useless.

> 
> > __entry->serial = cxlmd->cxlds->serial;
> > __entry->overflow_ts = cxl_poison_overflow(flags, overflow_ts);
> > __entry->dpa = cxl_poison_record_dpa(record);
> > @@ -678,12 +678,12 @@ TRACE_EVENT(cxl_poison,
> > __entry->trace_type = trace_type;
> > __entry->flags = flags;
> > if (region) {
> > -   __assign_str(region, dev_name(>dev));
> > +   __assign_str(region);
> > memcpy(__entry->uuid, >params.uuid, 16);
> > __entry->hpa = cxl_trace_hpa(region, cxlmd,
> >  __entry->dpa);
> > } else {
> > -   __assign_str(region, "");
> > +   __assign_str(region);
> > memset(__entry->uuid, 0, 16);
> > __entry->hpa = ULLONG_MAX;  
> 
> For the above 2, there was no helper in TP_STRUCT__entry. A recently
> posted patch is fixing that up to be __string(region, NULL) See [1],
> with the actual assignment still happening in TP_fast_assign.

__string(region, NULL) doesn't make sense. It's like:

len = strlen(NULL);
buf = malloc(len);
strcpy(buf, NULL);

??

I'll reply to that email.

-- Steve

> 
> Does that assign logic need to move to the TP_STRUCT__entry definition
> when you merge these changes? I'm not clear how much logic is able to be
> included, ie like 'C' style code in the TP_STRUCT__entry.
> 
> [1]
> https://lore.kernel.org/linux-cxl/20240314044301.2108650-1-alison.schofi...@intel.com/

Re: [PATCH 2/4] PCI: Generalize TLP Header Log reading

2024-03-14 Thread Bjorn Helgaas

[+cc Greg, Jeff -- ancient history, I know, sorry!]

On Tue, Feb 06, 2024 at 03:57:15PM +0200, Ilpo Järvinen wrote:
> Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1
> secs 7.8.4 & 7.9.14) to convey error diagnostics but the struct is
> named after AER as the struct aer_header_log_regs. Also, not all places
> that handle TLP Header Log use the struct and the struct members are
> named individually.
> 
> Generalize the struct name and members, and use it consistently where
> TLP Header Log is being handled so that a pcie_read_tlp_log() helper
> can be easily added.
> 
> Signed-off-by: Ilpo Järvinen 

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index bd541527c8c7..5fdf37968b2d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright(c) 1999 - 2018 Intel Corporation. */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -391,22 +392,6 @@ u16 ixgbe_read_pci_cfg_word(struct ixgbe_hw *hw, u32 reg)
>   return value;
>  }
>  
> -#ifdef CONFIG_PCI_IOV
> -static u32 ixgbe_read_pci_cfg_dword(struct ixgbe_hw *hw, u32 reg)
> -{
> - struct ixgbe_adapter *adapter = hw->back;
> - u32 value;
> -
> - if (ixgbe_removed(hw->hw_addr))
> - return IXGBE_FAILED_READ_CFG_DWORD;
> - pci_read_config_dword(adapter->pdev, reg, );
> - if (value == IXGBE_FAILED_READ_CFG_DWORD &&
> - ixgbe_check_cfg_remove(hw, adapter->pdev))
> - return IXGBE_FAILED_READ_CFG_DWORD;
> - return value;
> -}
> -#endif /* CONFIG_PCI_IOV */
> -
>  void ixgbe_write_pci_cfg_word(struct ixgbe_hw *hw, u32 reg, u16 value)
>  {
>   struct ixgbe_adapter *adapter = hw->back;
> @@ -11332,8 +11317,8 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>  #ifdef CONFIG_PCI_IOV
>   struct ixgbe_hw *hw = >hw;
>   struct pci_dev *bdev, *vfdev;
> - u32 dw0, dw1, dw2, dw3;
> - int vf, pos;
> + struct pcie_tlp_log tlp_log;
> + int vf, pos, ret;
>   u16 req_id, pf_func;
>  
>   if (adapter->hw.mac.type == ixgbe_mac_82598EB ||
> @@ -11351,14 +11336,13 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>   if (!pos)
>   goto skip_bad_vf_detection;
>  
> - dw0 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG);
> - dw1 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 4);
> - dw2 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 8);
> - dw3 = ixgbe_read_pci_cfg_dword(hw, pos + PCI_ERR_HEADER_LOG + 12);
> - if (ixgbe_removed(hw->hw_addr))
> + ret = pcie_read_tlp_log(pdev, pos + PCI_ERR_HEADER_LOG, _log);
> + if (ret < 0) {
> + ixgbe_check_cfg_remove(hw, pdev);
>   goto skip_bad_vf_detection;
> + }
>  
> - req_id = dw1 >> 16;
> + req_id = tlp_log.dw[1] >> 16;
>   /* On the 82599 if bit 7 of the requestor ID is set then it's a VF */
>   if (!(req_id & 0x0080))
>   goto skip_bad_vf_detection;
> @@ -11369,9 +11353,8 @@ static pci_ers_result_t 
> ixgbe_io_error_detected(struct pci_dev *pdev,
>  
>   vf = FIELD_GET(0x7F, req_id);
>   e_dev_err("VF %d has caused a PCIe error\n", vf);
> - e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: "
> - "%8.8x\tdw3: %8.8x\n",
> - dw0, dw1, dw2, dw3);
> + e_dev_err("TLP: dw0: %8.8x\tdw1: %8.8x\tdw2: %8.8x\tdw3: 
> %8.8x\n",
> +   tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], 
> tlp_log.dw[3]);
>   switch (adapter->hw.mac.type) {
>   case ixgbe_mac_82599EB:
>   device_id = IXGBE_82599_VF_DEVICE_ID;

The rest of this patch is headed for v6.10, but I dropped this ixgbe
change for now.

These TLP Log registers are generic, not device-specific, and if
there's something lacking in the PCI core that leads to ixgbe reading
and dumping them itself, I'd rather improve the PCI core so all
drivers will benefit without having to add code like this.

83c61fa97a7d ("ixgbe: Add protection from VF invalid target DMA") [1]
added the ixgbe TLP Log dumping way back in v3.2 (2012).  It does do
some device-specific VF checking and so on, but even back then, it
looks like the PCI core would have dumped the log itself [2], so I
don't know why we needed the extra dumping in ixgbe.

So what I'd really like is to remove the TLP Log reading and printing
from ixgbe completely, but keep the VF checking.

Bjorn

[1] https://git.kernel.org/linus/83c61fa97a7d
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aer/aerdrv_errprint.c?id=83c61fa97a7d#n181

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Dave Hansen

On 3/14/24 09:45, John Baldwin wrote:
> On 3/14/24 8:37 AM, Dave Hansen wrote:
>> On 3/14/24 04:23, Vignesh Balasubramanian wrote:
>>> Add a new .note section containing type, size, offset and flags of
>>> every xfeature that is present.
>>
>> Mechanically, I'd much rather have all of that info in the cover letter
>> in the actual changelog instead.
>>
>> I'd also love to see a practical example of what an actual example core
>> dump looks like on two conflicting systems:
>>
>>     * Total XSAVE size
>>     * XCR0 value
>>     * XSTATE_BV from the core dump
>>     * XFEATURE offsets for each feature
> 
> I noticed this when I bought an AMD Ryzen 9 5900X based system for
> my desktop running FreeBSD and found that the XSAVE core dump notes
> were not recognized by GDB (FreeBSD dumps an XSAVE register set note
> that matches the same layout of NT_X86_XSTATE used by Linux).

I just want to make sure that you heard what I asked.  I'd like to see a
practical example of how the real-world enumeration changes between two
real world systems.

Is that possible?

Here's the raw CPUID data from the XSAVE region on my laptop:

>0x000d 0x00: eax=0x02e7 ebx=0x0a88 ecx=0x0a88 
> edx=0x
>0x000d 0x01: eax=0x000f ebx=0x0998 ecx=0x3900 
> edx=0x
>0x000d 0x02: eax=0x0100 ebx=0x0240 ecx=0x 
> edx=0x
>0x000d 0x05: eax=0x0040 ebx=0x0440 ecx=0x 
> edx=0x
>0x000d 0x06: eax=0x0200 ebx=0x0480 ecx=0x 
> edx=0x
>0x000d 0x07: eax=0x0400 ebx=0x0680 ecx=0x 
> edx=0x
>0x000d 0x08: eax=0x0080 ebx=0x ecx=0x0001 
> edx=0x
>0x000d 0x09: eax=0x0008 ebx=0x0a80 ecx=0x 
> edx=0x
>0x000d 0x0b: eax=0x0010 ebx=0x ecx=0x0001 
> edx=0x
>0x000d 0x0c: eax=0x0018 ebx=0x ecx=0x0001 
> edx=0x
>0x000d 0x0d: eax=0x0008 ebx=0x ecx=0x0001 
> edx=0x

Could we get that for an impacted AMD system, please?

cpuid -1 --raw | grep "   0x000d "

should do it.

> In particular, as the cover letter notes, on this AMD processor,
> there is no "gap" for MPX, so the PKRU registers it provides are at a
> different offset than on Intel CPUs.  Furthermore, my reading of the
> SDM is that there is no guarantee of architectural offsets of a given
> XSAVE feature and that software should be querying CPUID to determine
> the layout.

I'd say the SDM is an aspirational document.  Intel _aspires_ to be able
to change the layouts whenever it wants.  That doesn't mean that it
could actually pull it off in practice.

In practice, the offset are fixed and Intel can't change them.

> FWIW, the relevant CPUID leaves for my AMD system:
> 
>    XSAVE features (0xd/0):
>   XCR0 valid bit field mask   = 0x0207>  
> x87 state    = true
...

So, those actually aren't the relevant ones.  We need EAX (size) and EBX
(user offset) from all of the sub-leaves.

>> This is pretty close to just a raw dump of the XSAVE CPUID leaves.
>> Rather than come up with an XSAVE-specific ABI that depends on CPUID
>> *ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
>> should just bite the bullet and dump out (some of) the raw CPUID space.
> 
> So the current note I initially proposed and implemented for FreeBSD
> (https://reviews.freebsd.org/D42136) and an initial patch set for GDB
> (https://sourceware.org/pipermail/gdb-patches/2023-October/203083.html)
> do indeed dump a raw set of CPUID leaves.  The version I have for FreeBSD
> only dumps the raw leaf values for leaf 0x0d though the note format is
> extensible should additional leaves be needed in the future.  One of the
> questions if we wanted to use a CPUID leaf note is which leaves to dump
> (e.g. do you dump all of them, or do you just dump the subset that is
> currently needed).

You dump what is needed and add to the dump over time.

> Another quirky question is what to do about systems with hetergeneous
> cores (E vs P for example).
That's irrelevant for now.  The cores may be heterogeneous but the
userspace ISA and (and thus XSAVE formats) are identical.  If they're
not, then we have bigger problems on our hands.

> Currently those systems use the same XSAVE layout across all cores,
> but other CPUID leaves do already vary across cores on those systems.

There shouldn't be any CPUID leaves that differ _and_ matter to
userspace and thus core dumps.

> However, there are other wrinkles with the leaf approach.  Namely, one
> of the use cases that I currently have an ugly hack for in GDB is if
> you are using gdb against a remote host running gdbserver and then use
> 'gcore' to generate a core dump.  GDB needs to write out a NT_X86_XSTATE
> note, but that note requires a layout.  What GDB does today is just pick
> a known

Re: [FYI][PATCH] tracing/treewide: Remove second parameter of __assign_str()

2024-03-14 Thread Alison Schofield

On Fri, Feb 23, 2024 at 12:56:34PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> [
>This is a treewide change. I will likely re-create this patch again in
>the second week of the merge window of v6.9 and submit it then. Hoping
>to keep the conflicts that it will cause to a minimum.
> ]
> 
> With the rework of how the __string() handles dynamic strings where it
> saves off the source string in field in the helper structure[1], the
> assignment of that value to the trace event field is stored in the helper
> value and does not need to be passed in again.
> 
> This means that with:
> 
>   __string(field, mystring)
> 
> Which use to be assigned with __assign_str(field, mystring), no longer
> needs the second parameter and it is unused. With this, __assign_str()
> will now only get a single parameter.
> 
> There's over 700 users of __assign_str() and because coccinelle does not
> handle the TRACE_EVENT() macro I ended up using the following sed script:
> 
>   git grep -l __assign_str | while read a ; do
>   sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
>   mv /tmp/test-file $a;
>   done
> 
> I then searched for __assign_str() that did not end with ';' as those
> were multi line assignments that the sed script above would fail to catch.
> 
> Note, the same updates will need to be done for:
> 
>   __assign_str_len()
>   __assign_rel_str()
>   __assign_rel_str_len()
>   __assign_bitmask()
>   __assign_rel_bitmask()
>   __assign_cpumask()
>   __assign_rel_cpumask()
> 
> [1] 
> https://lore.kernel.org/linux-trace-kernel/2024011442.634192...@goodmis.org/
> 
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  arch/arm64/kernel/trace-events-emulation.h|   2 +-
>  arch/powerpc/include/asm/trace.h  |   4 +-
>  arch/x86/kvm/trace.h  |   2 +-
>  drivers/base/regmap/trace.h   |  18 +--
>  drivers/base/trace.h  |   2 +-
>  drivers/block/rnbd/rnbd-srv-trace.h   |  12 +-
>  drivers/cxl/core/trace.h  |  24 ++--

snip to CXL


> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index bdf117a33744..07ba4e033347 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h

snip to poison

> @@ -668,8 +668,8 @@ TRACE_EVENT(cxl_poison,
>   ),
>  
>   TP_fast_assign(
> - __assign_str(memdev, dev_name(>dev));
> - __assign_str(host, dev_name(cxlmd->dev.parent));
> + __assign_str(memdev);
> + __assign_str(host);

I think I get that the above changes work because the TP_STRUCT__entry for
these did:
__string(memdev, dev_name(>dev))
__string(host, dev_name(cxlmd->dev.parent))

>   __entry->serial = cxlmd->cxlds->serial;
>   __entry->overflow_ts = cxl_poison_overflow(flags, overflow_ts);
>   __entry->dpa = cxl_poison_record_dpa(record);
> @@ -678,12 +678,12 @@ TRACE_EVENT(cxl_poison,
>   __entry->trace_type = trace_type;
>   __entry->flags = flags;
>   if (region) {
> - __assign_str(region, dev_name(>dev));
> + __assign_str(region);
>   memcpy(__entry->uuid, >params.uuid, 16);
>   __entry->hpa = cxl_trace_hpa(region, cxlmd,
>__entry->dpa);
>   } else {
> - __assign_str(region, "");
> + __assign_str(region);
>   memset(__entry->uuid, 0, 16);
>   __entry->hpa = ULLONG_MAX;

For the above 2, there was no helper in TP_STRUCT__entry. A recently
posted patch is fixing that up to be __string(region, NULL) See [1],
with the actual assignment still happening in TP_fast_assign.

Does that assign logic need to move to the TP_STRUCT__entry definition
when you merge these changes? I'm not clear how much logic is able to be
included, ie like 'C' style code in the TP_STRUCT__entry.

[1]
https://lore.kernel.org/linux-cxl/20240314044301.2108650-1-alison.schofi...@intel.com/

Thanks for helping,
Alison


>   }

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Dave Hansen

On 3/14/24 09:29, Borislav Petkov wrote:
> 
>> That argument breaks down a bit on the flags though:
>>
>>  xc.xfeat_flags = xstate_flags[i];
>>
>> Because it comes _directly_ from CPUID with zero filtering:
>>
>>  cpuid_count(XSTATE_CPUID, i, , , , );
>>  ...
>>  xstate_flags[i] = ecx;
>>
>> So this layout is quite dependent on what's in x86's CPUID.
> Yeah, no, this should not be copying CPUID flags - those flags should be
> *translated* to independently defined flags which describe those
> buffers.

Ditto for:

xc.xfeat_type = i;

Right now, that's bound to CPUID and XSAVE.  "feat_type==10" can only
ever be PKRU and that's derived from the XSAVE architecture.

If you want this to be extensible to things outside of the XSAVE
architecture, it needs to be something actually extensible and not
entangled with XSAVE.

In other words "xc.xfeat_type" can enumerate XSAVE state components
being in the dump, but it should not be limited to XSAVE.  Just as an
example:

enum feat_type {
FEATURE_XSAVE_PKRU,
FEATURE_XSAVE__YMM,
FEATURE_XSAVE_BNDREGS,
FEATURE_XSAVE_BNDCSR,
...
RANDOM_STATE_NOT_XSAVE
};

See how feat_type==1 is PKRU and *NOT* feat_type==10?  That opens the
door to RANDOM_STATE_NOT_XSAVE or anything else you want.  This would be
_actually_ extensible.

Re: [PATCH 0/1] Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts

2024-03-14 Thread Borislav Petkov

On Thu, Mar 14, 2024 at 04:25:44PM +, Willgerodt, Felix wrote:
> I am wondering if it wouldn't be easier for everyone if corefiles would just
> contain space for all possible XSAVE components?

You mean we should shuffle out from the kernel 8K of AMX state even if
nothing uses it or the machine doesn't even support it?

That's silly.

Please have a look at this:

+struct xfeat_component {
+   u32 xfeat_type;
+   u32 xfeat_sz;
+   u32 xfeat_off;
+   u32 xfeat_flags;
+} __packed;

What is wrong with having a blob of such xfeat_component things
describing the XSTATE buffer and parsing it in gdb?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Borislav Petkov

On Thu, Mar 14, 2024 at 09:19:15AM -0700, Dave Hansen wrote:
> Are you envisioning an *XSAVE* state component that's not described by
> CPUID?

I want to be prepared. You can imagine some of the short cuts and
corners cutting hw guys would do so I'd want to be prepared there and
not tie this to CPUID.

> Or some _other_ (non-XSAVE) component in a core dump that isn't
> described by CPUID?

Yes, that too.

Since the format of this buffer is so simple and machine-independent, it
can be extended as needed without issues.

> That argument breaks down a bit on the flags though:
> 
>   xc.xfeat_flags = xstate_flags[i];
> 
> Because it comes _directly_ from CPUID with zero filtering:
> 
>   cpuid_count(XSTATE_CPUID, i, , , , );
>   ...
>   xstate_flags[i] = ecx;
> 
> So this layout is quite dependent on what's in x86's CPUID.

Yeah, no, this should not be copying CPUID flags - those flags should be
*translated* to independently defined flags which describe those
buffers.

This is even more important if we change our xstate_flags[] machinery.
This buffer should not use any kernel-internal definitions and
structures but be completely self-describing.

Vignesh, pls fix that.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Dave Hansen

On 3/14/24 09:08, Borislav Petkov wrote:
> On Thu, Mar 14, 2024 at 08:37:09AM -0700, Dave Hansen wrote:
>> This is pretty close to just a raw dump of the XSAVE CPUID leaves.
>> Rather than come up with an XSAVE-specific ABI that depends on CPUID
>> *ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
>> should just bite the bullet and dump out (some of) the raw CPUID space.
> 
> Funny you should say that. This was what they had done originally but if
> you dump CPUID and you want to add another component in the future which
> is *not* described by CPUID, your scheme breaks.

Are you envisioning an *XSAVE* state component that's not described by
CPUID?

Or some _other_ (non-XSAVE) component in a core dump that isn't
described by CPUID?

> So the idea is to have a self-describing buffers layout, independent
> from any x86-ism. You can extend this in a straight-forward way then
> later.

That argument breaks down a bit on the flags though:

xc.xfeat_flags = xstate_flags[i];

Because it comes _directly_ from CPUID with zero filtering:

cpuid_count(XSTATE_CPUID, i, , , , );
...
xstate_flags[i] = ecx;

So this layout is quite dependent on what's in x86's CPUID.

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Borislav Petkov

On Thu, Mar 14, 2024 at 08:37:09AM -0700, Dave Hansen wrote:
> This is pretty close to just a raw dump of the XSAVE CPUID leaves.
> Rather than come up with an XSAVE-specific ABI that depends on CPUID
> *ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
> should just bite the bullet and dump out (some of) the raw CPUID space.

Funny you should say that. This was what they had done originally but if
you dump CPUID and you want to add another component in the future which
is *not* described by CPUID, your scheme breaks.

So the idea is to have a self-describing buffers layout, independent
from any x86-ism. You can extend this in a straight-forward way then
later.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Kees Cook

On Thu, Mar 14, 2024 at 04:53:28PM +0530, Vignesh Balasubramanian wrote:
> Add a new .note section containing type, size, offset and flags of
> every xfeature that is present.
> 
> This information will be used by the debuggers to understand the XSAVE
> layout of the machine where the core file is dumped, and to read XSAVE
> registers, especially during cross-platform debugging.

I see binutils in CC. Can someone from gdb confirm that this solution
can be used?

> 
> Co-developed-by: Jini Susan George 
> Signed-off-by: Jini Susan George 
> Signed-off-by: Vignesh Balasubramanian 
> ---
>  arch/Kconfig   |   9 +++
>  arch/powerpc/Kconfig   |   1 +
>  arch/powerpc/include/asm/elf.h |   2 -
>  arch/x86/Kconfig   |   1 +
>  arch/x86/include/asm/elf.h |   7 +++
>  arch/x86/kernel/fpu/xstate.c   | 101 +
>  include/linux/elf.h|   2 +-
>  include/uapi/linux/elf.h   |   1 +
>  8 files changed, 121 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index fd18b7db2c77..3bd8a0b2bba1 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -502,6 +502,15 @@ config MMU_LAZY_TLB_SHOOTDOWN
>  config ARCH_HAVE_NMI_SAFE_CMPXCHG
>   bool
>  
> +config ARCH_HAVE_EXTRA_ELF_NOTES
> + bool
> + help
> +   An architecture should select this in order to enable adding an
> +   arch-specific ELF note section to core files. It must provide two
> +   functions: elf_coredump_extra_notes_size() and
> +   elf_coredump_extra_notes_write() which are invoked by the ELF core
> +   dumper.
> +
>  config ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
>   bool
>  
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index a91cb070ca4a..3b31bd7490e2 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -156,6 +156,7 @@ config PPC
>   select ARCH_HAS_UACCESS_FLUSHCACHE
>   select ARCH_HAS_UBSAN
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
> + select ARCH_HAVE_EXTRA_ELF_NOTESif SPU_BASE
>   select ARCH_KEEP_MEMBLOCK
>   select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
>   select ARCH_MIGHT_HAVE_PC_PARPORT
> diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
> index 79f1c480b5eb..bb4b9d3e 100644
> --- a/arch/powerpc/include/asm/elf.h
> +++ b/arch/powerpc/include/asm/elf.h
> @@ -127,8 +127,6 @@ extern int arch_setup_additional_pages(struct 
> linux_binprm *bprm,
>  /* Notes used in ET_CORE. Note name is "SPU//". */
>  #define NT_SPU   1
>  
> -#define ARCH_HAVE_EXTRA_ELF_NOTES
> -
>  #endif /* CONFIG_SPU_BASE */
>  
>  #ifdef CONFIG_PPC64
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 78050d5d7fac..35e8d1201099 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -104,6 +104,7 @@ config X86
>   select ARCH_HAS_DEBUG_WX
>   select ARCH_HAS_ZONE_DMA_SET if EXPERT
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
> + select ARCH_HAVE_EXTRA_ELF_NOTES
>   select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
>   select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
>   select ARCH_MIGHT_HAVE_PC_PARPORT
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 1fb83d47711f..1b9f0b4bf6bc 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -13,6 +13,13 @@
>  #include 
>  #include 
>  
> +struct xfeat_component {
> + u32 xfeat_type;
> + u32 xfeat_sz;
> + u32 xfeat_off;
> + u32 xfeat_flags;
> +} __packed;

While it is currently true, just for robustness, can you add
a _Static_assert that sizeof(struct xfeat_component) % 4 == 0 ?
Notes must be 4-byte aligned.

> +
>  typedef unsigned long elf_greg_t;
>  
>  #define ELF_NGREG (sizeof(struct user_regs_struct) / sizeof(elf_greg_t))
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 117e74c44e75..6e5ea483ec1d 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1836,3 +1837,103 @@ int proc_pid_arch_status(struct seq_file *m, struct 
> pid_namespace *ns,
>   return 0;
>  }
>  #endif /* CONFIG_PROC_PID_ARCH_STATUS */
> +
> +/*
> + * Dump type, size, offset and flag values for every xfeature that is 
> present.
> + */
> +static int dump_xsave_layout_desc(struct coredump_params *cprm)
> +{
> +
> + struct xfeat_component xc;
> + int num_records = 0;
> + int i;
> +
> + /* XFEATURE_FPU and XFEATURE_SSE, both are fixed legacy states. */
> + for (i = 0; i < FIRST_EXTENDED_XFEATURE; i++) {
> + xc.xfeat_type = i;
> + xc.xfeat_sz = xstate_sizes[i];
> + xc.xfeat_off = xstate_offsets[i];
> + xc.xfeat_flags = xstate_flags[i];
> +
> + if (!dump_emit(cprm, , sizeof(struct xfeat_component)))
> + return 0;
> + num_records++;
> + }
> +
> +

Re: [PATCH v9 07/27] net: wan: Add support for QMC HDLC

2024-03-14 Thread Guenter Roeck


On 3/14/24 08:31, Christophe Leroy wrote:



Le 14/03/2024 à 16:21, Guenter Roeck a écrit :

On Wed, Nov 15, 2023 at 03:39:43PM +0100, Herve Codina wrote:

The QMC HDLC driver provides support for HDLC using the QMC (QUICC
Multichannel Controller) to transfer the HDLC data.

Signed-off-by: Herve Codina 
Reviewed-by: Christophe Leroy 
Acked-by: Jakub Kicinski 
---

[ ... ]


+
+static const struct of_device_id qmc_hdlc_id_table[] = {
+   { .compatible = "fsl,qmc-hdlc" },
+   {} /* sentinel */
+};
+MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);


I am a bit puzzled. How does this even compile ?


Because

#else  /* !MODULE */
#define MODULE_DEVICE_TABLE(type, name)
#endif



Ah, makes sense. We live and learn.



We should probably try to catch those errors when CONFIG_MODULE is not set.

By the way, a fix is available at
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20240314123346.461350-1-herve.cod...@bootlin.com/



Great, I'll add that to my testing branch for the time being.

Thanks!
Guenter

Re: [PATCH 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-03-14 Thread Dave Hansen

On 3/14/24 04:23, Vignesh Balasubramanian wrote:
> Add a new .note section containing type, size, offset and flags of
> every xfeature that is present.

Mechanically, I'd much rather have all of that info in the cover letter
in the actual changelog instead.

I'd also love to see a practical example of what an actual example core
dump looks like on two conflicting systems:

   * Total XSAVE size
   * XCR0 value
   * XSTATE_BV from the core dump
   * XFEATURE offsets for each feature

Do you have any information about what other OSes are doing in this
area?  I thought Windows, for instance, was even less flexible about the
XSAVE format than Linux is.

Why didn't LWP cause this problem?

>From the cover letter:

> But this patch series depends on heuristics based on the total XSAVE
> register set size and the XCR0 mask to infer the layouts of the
> various register blocks for core dumps, and hence, is not a foolproof
> mechanism to determine the layout of the XSAVE area.

It may not be theoretically foolproof.  But I'm struggling to think of a
case where it would matter in practice.  Is there any CPU from any
vendor where this is actually _needed_?

Sure, it's ugly as hell, but these notes aren't going to be available
universally _ever_, so it's not like the crummy heuristic code gets to
go away.

Have you seen the APX spec?

>
https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

It makes this even more fun because it adds a new XSAVE state component,
but reuses the MPX offsets.

> This information will be used by the debuggers to understand the XSAVE
> layout of the machine where the core file is dumped, and to read XSAVE
> registers, especially during cross-platform debugging.

This is pretty close to just a raw dump of the XSAVE CPUID leaves.
Rather than come up with an XSAVE-specific ABI that depends on CPUID
*ANYWAY* (because it dumps the "flags" register aka. ECX), maybe we
should just bite the bullet and dump out (some of) the raw CPUID space.

Re: [PATCH v9 07/27] net: wan: Add support for QMC HDLC

2024-03-14 Thread Christophe Leroy



Le 14/03/2024 à 16:21, Guenter Roeck a écrit :
> On Wed, Nov 15, 2023 at 03:39:43PM +0100, Herve Codina wrote:
>> The QMC HDLC driver provides support for HDLC using the QMC (QUICC
>> Multichannel Controller) to transfer the HDLC data.
>>
>> Signed-off-by: Herve Codina 
>> Reviewed-by: Christophe Leroy 
>> Acked-by: Jakub Kicinski 
>> ---
> [ ... ]
> 
>> +
>> +static const struct of_device_id qmc_hdlc_id_table[] = {
>> +{ .compatible = "fsl,qmc-hdlc" },
>> +{} /* sentinel */
>> +};
>> +MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
> 
> I am a bit puzzled. How does this even compile ?

Because

#else  /* !MODULE */
#define MODULE_DEVICE_TABLE(type, name)
#endif


We should probably try to catch those errors when CONFIG_MODULE is not set.

By the way, a fix is available at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20240314123346.461350-1-herve.cod...@bootlin.com/

Christophe

Re: [PATCH 00/14] Add support for suppressing warning backtraces

2024-03-14 Thread Guenter Roeck


On 3/14/24 08:02, Maxime Ripard wrote:

On Thu, Mar 14, 2024 at 07:37:13AM -0700, Guenter Roeck wrote:

On 3/14/24 06:36, Geert Uytterhoeven wrote:

Hi Günter,

On Tue, Mar 12, 2024 at 6:03 PM Guenter Roeck  wrote:

Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.

Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.

One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log.  However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.

Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.

The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.


Thanks for your series!

I gave it a try on m68k, just running backtrace-suppression-test,
and that seems to work fine.


Design note:
Function pointers are only added to the __bug_table section if both
CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
size increases if CONFIG_KUNIT=n. There would be some benefits to
adding those pointers all the time (reduced complexity, ability to
display function names in BUG/WARNING messages). That change, if
desired, can be made later.


Unfortunately this also increases kernel size in the CONFIG_KUNIT=m
case (ca. 80 KiB for atari_defconfig), making it less attractive to have
kunit and all tests enabled as modules in my standard kernel.



Good point. Indeed, it does. I wanted to avoid adding a configuration option,
but maybe I should add it after all. How about something like this ?

+config KUNIT_SUPPRESS_BACKTRACE
+   bool "KUnit - Enable backtrace suppression"
+   default y
+   help
+ Enable backtrace suppression for KUnit. If enabled, backtraces
+ generated intentionally by KUnit tests can be suppressed. Disable
+ to reduce kernel image size if image size is more important than
+ suppression of backtraces generated by KUnit tests.
+


How are tests using that API supposed to handle it then?

Select the config option or put an ifdef?

If the former, we end up in the same situation than without the symbol.
If the latter, we end up in a similar situation than disabling KUNIT
entirely, with some tests not being run which is just terrible.



The API definitions are themselves within #ifdef and dummies if
KUNIT_SUPPRESS_BACKTRACE (currently CONFIG_KUNIT) is disabled.
In include/kunit/bug.h:

#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
...
#else
#define DEFINE_SUPPRESSED_WARNING(func)
#define START_SUPPRESSED_WARNING(func)
#define END_SUPPRESSED_WARNING(func)
#define IS_SUPPRESSED_WARNING(func) (false)
#define SUPPRESSED_WARNING_COUNT(func) (0)
#endif

Only difference to the current patch series would be

- #if IS_ENABLED(CONFIG_KUNIT)
+ #ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE

in that file and elsewhere.

With KUNIT_SUPPRESS_BACKTRACE=n you'd still get warning backtraces
triggered by the tests, but the number of tests executed as well
as the test results would still be the same.

Thanks,
Guenter

Re: [PATCH v9 07/27] net: wan: Add support for QMC HDLC

2024-03-14 Thread Guenter Roeck

On Wed, Nov 15, 2023 at 03:39:43PM +0100, Herve Codina wrote:
> The QMC HDLC driver provides support for HDLC using the QMC (QUICC
> Multichannel Controller) to transfer the HDLC data.
> 
> Signed-off-by: Herve Codina 
> Reviewed-by: Christophe Leroy 
> Acked-by: Jakub Kicinski 
> ---
[ ... ]

> +
> +static const struct of_device_id qmc_hdlc_id_table[] = {
> + { .compatible = "fsl,qmc-hdlc" },
> + {} /* sentinel */
> +};
> +MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);

I am a bit puzzled. How does this even compile ?

Building powerpc:ppc32_allmodconfig ... failed
--
Error log:
In file included from include/linux/device/driver.h:21,
 from include/linux/device.h:32,
 from include/linux/dma-mapping.h:8,
 from drivers/net/wan/fsl_qmc_hdlc.c:14:
drivers/net/wan/fsl_qmc_hdlc.c:783:25: error: 'qmc_hdlc_driver' undeclared here 
(not in a function); did you mean 'qmc_hdlc_probe'?
  783 | MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);

Guenter

> +
> +static struct platform_driver qmc_hdlc_driver = {
> + .driver = {
> + .name = "fsl-qmc-hdlc",
> + .of_match_table = qmc_hdlc_id_table,
> + },
> + .probe = qmc_hdlc_probe,
> + .remove = qmc_hdlc_remove,
> +};
> +module_platform_driver(qmc_hdlc_driver);
> +
> +MODULE_AUTHOR("Herve Codina ");
> +MODULE_DESCRIPTION("QMC HDLC driver");
> +MODULE_LICENSE("GPL");
> -- 
> 2.41.0
>

Re: [PATCH 00/14] Add support for suppressing warning backtraces

2024-03-14 Thread Guenter Roeck


On 3/14/24 06:36, Geert Uytterhoeven wrote:

Hi Günter,

On Tue, Mar 12, 2024 at 6:03 PM Guenter Roeck  wrote:

Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.

Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
   investigated and has to be marked to be ignored, for example by
   adjusting filter scripts. Such filters are ad-hoc because there is
   no real standard format for warnings. On top of that, such filter
   scripts would require constant maintenance.

One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log.  However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.

Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.

The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.


Thanks for your series!

I gave it a try on m68k, just running backtrace-suppression-test,
and that seems to work fine.


Design note:
   Function pointers are only added to the __bug_table section if both
   CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
   size increases if CONFIG_KUNIT=n. There would be some benefits to
   adding those pointers all the time (reduced complexity, ability to
   display function names in BUG/WARNING messages). That change, if
   desired, can be made later.


Unfortunately this also increases kernel size in the CONFIG_KUNIT=m
case (ca. 80 KiB for atari_defconfig), making it less attractive to have
kunit and all tests enabled as modules in my standard kernel.



Good point. Indeed, it does. I wanted to avoid adding a configuration option,
but maybe I should add it after all. How about something like this ?

+config KUNIT_SUPPRESS_BACKTRACE
+   bool "KUnit - Enable backtrace suppression"
+   default y
+   help
+ Enable backtrace suppression for KUnit. If enabled, backtraces
+ generated intentionally by KUnit tests can be suppressed. Disable
+ to reduce kernel image size if image size is more important than
+ suppression of backtraces generated by KUnit tests.
+

Thanks,
Guenter

[PATCH 1/1] ASoC: fsl: fsl_ssi: Add dev_err_probe if PCM DMA init fails

2024-03-14 Thread Alexander Stein

This happens especially if this driver is built-in, but SDMA driver
is configured as module.

Signed-off-by: Alexander Stein 
---
 sound/soc/fsl/fsl_ssi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index ab6ec1974807..4ca3a16f7ac0 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1401,8 +1401,10 @@ static int fsl_ssi_imx_probe(struct platform_device 
*pdev,
goto error_pcm;
} else {
ret = imx_pcm_dma_init(pdev);
-   if (ret)
+   if (ret) {
+   dev_err_probe(dev, ret, "Failed to init PCM DMA\n");
goto error_pcm;
+   }
}
 
return 0;
-- 
2.34.1

Re: [PATCH 12/13] mm/treewide: Remove pXd_huge()

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 08:56:59AM +, Christophe Leroy wrote:
> 
> 
> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> > From: Peter Xu 
> > 
> > This API is not used anymore, drop it for the whole tree.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >   arch/arm/mm/Makefile  |  1 -
> >   arch/arm/mm/hugetlbpage.c | 29 ---
> >   arch/arm64/mm/hugetlbpage.c   | 10 ---
> >   arch/loongarch/mm/hugetlbpage.c   | 10 ---
> >   arch/mips/include/asm/pgtable-32.h|  2 +-
> >   arch/mips/include/asm/pgtable-64.h|  2 +-
> >   arch/mips/mm/hugetlbpage.c| 10 ---
> >   arch/parisc/mm/hugetlbpage.c  | 11 ---
> >   .../include/asm/book3s/64/pgtable-4k.h| 10 ---
> >   .../include/asm/book3s/64/pgtable-64k.h   | 25 
> >   arch/powerpc/include/asm/nohash/pgtable.h | 10 ---
> >   arch/riscv/mm/hugetlbpage.c   | 10 ---
> >   arch/s390/mm/hugetlbpage.c| 10 ---
> >   arch/sh/mm/hugetlbpage.c  | 10 ---
> >   arch/sparc/mm/hugetlbpage.c   | 10 ---
> >   arch/x86/mm/hugetlbpage.c | 16 --
> >   include/linux/hugetlb.h   | 24 ---
> >   17 files changed, 2 insertions(+), 198 deletions(-)
> >   delete mode 100644 arch/arm/mm/hugetlbpage.c
> > 
> 
> > diff --git a/arch/mips/include/asm/pgtable-32.h 
> > b/arch/mips/include/asm/pgtable-32.h
> > index 0e196650f4f4..92b7591aac2a 100644
> > --- a/arch/mips/include/asm/pgtable-32.h
> > +++ b/arch/mips/include/asm/pgtable-32.h
> > @@ -129,7 +129,7 @@ static inline int pmd_none(pmd_t pmd)
> >   static inline int pmd_bad(pmd_t pmd)
> >   {
> >   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
> > -   /* pmd_huge(pmd) but inline */
> > +   /* pmd_leaf(pmd) but inline */
> 
> Shouldn't this comment have been changed in patch 11 ?

IMHO it's fine to be here, as this is the patch to finally drop _huge().
Patch 11 only converts the callers to use _leaf()s.  So this comment is
still valid until this patch, because this patch removes that definition.

> 
> > if (unlikely(pmd_val(pmd) & _PAGE_HUGE))
> 
> Unlike pmd_huge() which is an outline function, pmd_leaf() is a macro so 
> it could be used here instead of open coping.

I worry it will break things as pmd_leaf() can sometimes be defined after
arch *pgtable.h headers.  So I avoided touching it except what I think I'm
confident.  I had a feeling it's inlined just because of a similar reason
for the old _huge().

> 
> > return 0;
> >   #endif
> > diff --git a/arch/mips/include/asm/pgtable-64.h 
> > b/arch/mips/include/asm/pgtable-64.h
> > index 20ca48c1b606..7c28510b3768 100644
> > --- a/arch/mips/include/asm/pgtable-64.h
> > +++ b/arch/mips/include/asm/pgtable-64.h
> > @@ -245,7 +245,7 @@ static inline int pmd_none(pmd_t pmd)
> >   static inline int pmd_bad(pmd_t pmd)
> >   {
> >   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
> > -   /* pmd_huge(pmd) but inline */
> > +   /* pmd_leaf(pmd) but inline */
> 
> Same
> 
> > if (unlikely(pmd_val(pmd) & _PAGE_HUGE))
> 
> Same
> 
> > return 0;
> >   #endif
> 
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h 
> > b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> > index 2fce3498b000..579a7153857f 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> > @@ -4,31 +4,6 @@
> >   
> >   #ifndef __ASSEMBLY__
> >   #ifdef CONFIG_HUGETLB_PAGE
> > -/*
> > - * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have
> > - * 16GB hugepage pte in PGD and 16MB hugepage pte at PMD;
> > - *
> > - * Defined in such a way that we can optimize away code block at build time
> > - * if CONFIG_HUGETLB_PAGE=n.
> > - *
> > - * returns true for pmd migration entries, THP, devmap, hugetlb
> > - * But compile time dependent on CONFIG_HUGETLB_PAGE
> > - */
> 
> Should we keep this comment somewhere for documentation ?

The 2nd/3rd paragraphs are definitely obsolete, so should be dropped.

OTOH, I'm not sure how much that will help if e.g. I move that over to
pmd_leaf(): a check over cpu_to_be64(_PAGE_PTE) is an implementation as
simple as it could be to explain itself with even no comment to me..

I also don't fully digest why that 1st paragraph discusses PGD entries: for
example, there's no pgd_huge() defined.  It may not mean that the comment
is wrong, perhaps it means that I may lack some knowledge around this area
on Power..

Would you suggest how I should move paragraph 1 (and help to explain what
it is describing)?  Or maybe we can provide a separate patch for Power's
huge page sizes but posted separately (and very possibly I'm not the best
candidate then..).

> 
> > -static inline int pmd_huge(pmd_t pmd)
> > -{
> > -   /*
> > -* leaf pte for huge page
> > -

Re: [PATCH 11/14] s390: Add support for suppressing warning backtraces

2024-03-14 Thread Guenter Roeck


On 3/14/24 00:57, Geert Uytterhoeven wrote:

Hi Günter,

On Tue, Mar 12, 2024 at 6:06 PM Guenter Roeck  wrote:

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
are enabled. Otherwise, the __func__ assembly parameter is replaced with a
(dummy) NULL parameter to avoid an image size increase due to unused
__func__ entries (this is necessary because __func__ is not a define but a
virtual variable).

Signed-off-by: Guenter Roeck 


Thanks for your patch!


--- a/arch/s390/include/asm/bug.h
+++ b/arch/s390/include/asm/bug.h
@@ -8,19 +8,30 @@

  #ifdef CONFIG_DEBUG_BUGVERBOSE

+#if IS_ENABLED(CONFIG_KUNIT)
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR"   .long   %0-.\n"
+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNC_PTR
+# define __BUG_FUNCNULL
+#endif /* IS_ENABLED(CONFIG_KUNIT) */
+
  #define __EMIT_BUG(x) do { \
 asm_inline volatile(\
 "0: mc  0,0\n"  \
 ".section .rodata.str,\"aMS\",@progbits,1\n"\
 "1: .asciz  \""__FILE__"\"\n"   \
 ".previous\n"   \
-   ".section __bug_table,\"awM\",@progbits,%2\n"   \
+   ".section __bug_table,\"awM\",@progbits,%3\n"   \


This change conflicts with commit 3938490e78f443fb ("s390/bug:
remove entry size from __bug_table section") in linus/master.
I guess it should just be dropped?



Yes, I know. I'll send v2 rebased to v6.9-rc1 once it is available and,
yes, the change will be gone after that.

Thanks,
Guenter

Re: [RFC] sched/eevdf: sched feature to dismiss lag on wakeup

2024-03-14 Thread Tobias Huschle

On Fri, Mar 08, 2024 at 03:11:38PM +, Luis Machado wrote:
> On 2/28/24 16:10, Tobias Huschle wrote:
> > 
> > Questions:
> > 1. The kworker getting its negative lag occurs in the following scenario
> >- kworker and a cgroup are supposed to execute on the same CPU
> >- one task within the cgroup is executing and wakes up the kworker
> >- kworker with 0 lag, gets picked immediately and finishes its
> >  execution within ~5000ns
> >- on dequeue, kworker gets assigned a negative lag
> >Is this expected behavior? With this short execution time, I would
> >expect the kworker to be fine.
> 
> That strikes me as a bit odd as well. Have you been able to determine how a 
> negative lag
> is assigned to the kworker after such a short runtime?
> 

I did some more trace reading though and found something.

What I observed if everything runs regularly:
- vhost and kworker run alternating on the same CPU
- if the kworker is done, it leaves the runqueue
- vhost wakes up the kworker if it needs it
--> this means:
  - vhost starts alone on an otherwise empty runqueue
  - it seems like it never gets dequeued
(unless another unrelated task joins or migration hits)
  - if vhost wakes up the kworker, the kworker gets selected
  - vhost runtime > kworker runtime 
--> kworker gets positive lag and gets selected immediately next time

What happens if it does go wrong:
>From what I gather, there seem to be occasions where the vhost either
executes suprisingly quick, or the kworker surprinsingly slow. If these
outliers reach critical values, it can happen, that
   vhost runtime < kworker runtime
which now causes the kworker to get the negative lag.

In this case it seems like, that the vhost is very fast in waking up
the kworker. And coincidentally, the kworker takes, more time than usual
to finish. We speak of 4-digit to low 5-digit nanoseconds.

So, for these outliers, the scheduler extrapolates that the kworker 
out-consumes the vhost and should be slowed down, although in the majority
of other cases this does not happen.

Therefore this particular usecase would profit from being able to ignore
such outliers, or being able to ignore a certain amount of difference in the
lag values, i.e. introduce some grace value around the average runtime for
which lag is not accounted. But not sure if I like that idea.

So the negative lag can be somewhat justified, but for this particular case
it leads to a problem where one outlier can cause havoc. As mentioned in the
vhost discussion, it could also be argued that the vhost should not rely on 
the fact that the kworker gets always scheduled on wake up, since these
timing issues can always happen.

Hence, the two options:
- offer the alternative strategy which dismisses lag on wake up for workloads
  where we know that a task usually finishes faster than others but should
  not be punished by rare outliers (if that is predicatble, I don't know)
- require vhost to adress this issue on their side (if possible without 
  creating an armada of side effects)

(plus the third one mentioned above, but that requires a magic cutoff value, 
meh)

> I was looking at a different thread 
> (https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.c...@intel.com/) 
> that
> uncovers a potential overflow in the eligibility calculation. Though I don't 
> think that is the case for this particular
> vhost problem.

Yea, the numbers I see do not look very overflowy.

Re: [PATCH 00/14] Add support for suppressing warning backtraces

2024-03-14 Thread Geert Uytterhoeven

Hi Günter,

On Tue, Mar 12, 2024 at 6:03 PM Guenter Roeck  wrote:
> Some unit tests intentionally trigger warning backtraces by passing bad
> parameters to kernel API functions. Such unit tests typically check the
> return value from such calls, not the existence of the warning backtrace.
>
> Such intentionally generated warning backtraces are neither desirable
> nor useful for a number of reasons.
> - They can result in overlooked real problems.
> - A warning that suddenly starts to show up in unit tests needs to be
>   investigated and has to be marked to be ignored, for example by
>   adjusting filter scripts. Such filters are ad-hoc because there is
>   no real standard format for warnings. On top of that, such filter
>   scripts would require constant maintenance.
>
> One option to address problem would be to add messages such as "expected
> warning backtraces start / end here" to the kernel log.  However, that
> would again require filter scripts, it might result in missing real
> problematic warning backtraces triggered while the test is running, and
> the irrelevant backtrace(s) would still clog the kernel log.
>
> Solve the problem by providing a means to identify and suppress specific
> warning backtraces while executing test code. Support suppressing multiple
> backtraces while at the same time limiting changes to generic code to the
> absolute minimum. Architecture specific changes are kept at minimum by
> retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
> CONFIG_KUNIT are enabled.
>
> The first patch of the series introduces the necessary infrastructure.
> The second patch introduces support for counting suppressed backtraces.
> This capability is used in patch three to implement unit tests.
> Patch four documents the new API.
> The next two patches add support for suppressing backtraces in drm_rect
> and dev_addr_lists unit tests. These patches are intended to serve as
> examples for the use of the functionality introduced with this series.
> The remaining patches implement the necessary changes for all
> architectures with GENERIC_BUG support.

Thanks for your series!

I gave it a try on m68k, just running backtrace-suppression-test,
and that seems to work fine.

> Design note:
>   Function pointers are only added to the __bug_table section if both
>   CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
>   size increases if CONFIG_KUNIT=n. There would be some benefits to
>   adding those pointers all the time (reduced complexity, ability to
>   display function names in BUG/WARNING messages). That change, if
>   desired, can be made later.

Unfortunately this also increases kernel size in the CONFIG_KUNIT=m
case (ca. 80 KiB for atari_defconfig), making it less attractive to have
kunit and all tests enabled as modules in my standard kernel.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v6 1/9] locking/mutex: introduce devm_mutex_init

2024-03-14 Thread Waiman Long




On 3/14/24 04:45, George Stark wrote:

Using of devm API leads to a certain order of releasing resources.
So all dependent resources which are not devm-wrapped should be deleted
with respect to devm-release order. Mutex is one of such objects that
often is bound to other resources and has no own devm wrapping.
Since mutex_destroy() actually does nothing in non-debug builds
frequently calling mutex_destroy() is just ignored which is safe for now
but wrong formally and can lead to a problem if mutex_destroy() will be
extended so introduce devm_mutex_init()

Signed-off-by: George Stark 
Suggested by-by: Christophe Leroy 
---
  include/linux/mutex.h| 27 +++
  kernel/locking/mutex-debug.c | 11 +++
  2 files changed, 38 insertions(+)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 67edc4ca2bee..f57e005ded24 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -22,6 +22,8 @@
  #include 
  #include 
  
+struct device;

+
  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  # define __DEP_MAP_MUTEX_INITIALIZER(lockname)\
, .dep_map = {  \
@@ -117,6 +119,31 @@ do {   
\
  } while (0)
  #endif /* CONFIG_PREEMPT_RT */
  
+#ifdef CONFIG_DEBUG_MUTEXES

+
+int __devm_mutex_init(struct device *dev, struct mutex *lock);
+
+#else
+
+static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   /*
+* When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
+* no really need to register it in devm subsystem.
+*/
+   return 0;
+}
+
+#endif
+
+#define devm_mutex_init(dev, mutex)\
+({ \
+   typeof(mutex) mutex_ = (mutex); \
+   \
+   mutex_init(mutex_); \
+   __devm_mutex_init(dev, mutex_); \
+})
+
  /*
   * See kernel/locking/mutex.c for detailed documentation of these APIs.
   * Also see Documentation/locking/mutex-design.rst.
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index bc8abb8549d2..6aa77e3dc82e 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "mutex.h"
  
@@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char *name,

lock->magic = lock;
  }
  
+static void devm_mutex_release(void *res)

+{
+   mutex_destroy(res);
+}
+
+int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
+}
+
  /***
   * mutex_destroy - mark a mutex unusable
   * @lock: the mutex to be destroyed

Acked-by: Waiman Long

Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

2024-03-14 Thread Christophe Leroy



Le 14/03/2024 à 13:53, Peter Xu a écrit :
> On Thu, Mar 14, 2024 at 08:45:34AM +, Christophe Leroy wrote:
>>
>>
>> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
>>> From: Peter Xu 
>>>
>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
>>> constantly returns 0 for hash MMUs.  As Michael Ellerman pointed out [1],
>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
>>> it will keep returning false.
>>>
>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
>>> such huge mappings for 4K hash MMUs.  Meanwhile, the major powerpc hugetlb
>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb
>>> mappings.
>>>
>>> The goal should be that we will have one API pXd_leaf() to detect all kinds
>>> of huge mappings.  AFAICT we need to use the pXd_leaf() impl (rather than
>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
>>
>> All kinds of huge mappings ?
>>
>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are
>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages
>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report
>> those huge pages.
> 
> Ah yes, I should always mention this is in the context of leaf huge pages
> only.  Are the examples you provided all fall into hugepd category?  If so
> I can reword the commit message, as:

On powerpc 8xx, only the 8M huge pages fall into the hugepd case.

The 512k hugepages are at PTE level, they are handled more or less like 
CONT_PTE on ARM. see function set_huge_pte_at() for more context.

You can also look at pte_leaf_size() and pgd_leaf_size().

By the way pgd_leaf_size() looks odd because it is called only when 
pgd_leaf_size() returns true, which never happens for 8M pages.

> 
>  As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to
>  create such huge mappings for 4K hash MMUs.  Meanwhile, the major
>  powerpc hugetlb pgtable walker __find_linux_pte() already used
>  pXd_leaf() to check leaf hugetlb mappings.
> 
>  The goal should be that we will have one API pXd_leaf() to detect
>  all kinds of huge mappings except hugepd.  AFAICT we need to use
>  the pXd_leaf() impl (rather than pXd_huge() ones) to make sure
>  ie. THPs on hash MMU will also return true.
> 
> Does this look good to you?
> 
> Thanks,
>

Re: [PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf()

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 08:50:20AM +, Christophe Leroy wrote:
> 
> 
> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> > From: Peter Xu 
> > 
> > Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(),
> > reuse it.  Luckily, pXd_huge() isn't widely used.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >   arch/arm/include/asm/pgtable-3level.h | 2 +-
> >   arch/arm64/include/asm/pgtable.h  | 2 +-
> >   arch/arm64/mm/hugetlbpage.c   | 4 ++--
> >   arch/loongarch/mm/hugetlbpage.c   | 2 +-
> >   arch/mips/mm/tlb-r4k.c| 2 +-
> >   arch/powerpc/mm/pgtable_64.c  | 6 +++---
> >   arch/x86/mm/pgtable.c | 4 ++--
> >   mm/gup.c  | 4 ++--
> >   mm/hmm.c  | 2 +-
> >   mm/memory.c   | 2 +-
> >   10 files changed, 15 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/pgtable-3level.h 
> > b/arch/arm/include/asm/pgtable-3level.h
> > index e7aecbef75c9..9e3c44f0aea2 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -190,7 +190,7 @@ static inline pte_t pte_mkspecial(pte_t pte)
> >   #define pmd_dirty(pmd)(pmd_isset((pmd), L_PMD_SECT_DIRTY))
> >   
> >   #define pmd_hugewillfault(pmd)(!pmd_young(pmd) || !pmd_write(pmd))
> > -#define pmd_thp_or_huge(pmd)   (pmd_huge(pmd) || pmd_trans_huge(pmd))
> > +#define pmd_thp_or_huge(pmd)   (pmd_leaf(pmd) || pmd_trans_huge(pmd))
> 
> Previous patch said pmd_trans_huge() implies pmd_leaf().

Ah here I remember I kept this arm definition there because I think we
should add a patch to drop pmd_thp_or_huge() completely.  If you won't mind
I can add one more patch instead of doing it here.  Then I keep this patch
purely as a replacement patch without further changes on arch-cleanups.

> 
> Or is that only for GUP ?

I think it should apply to all.

> 
> >   
> >   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >   #define pmd_trans_huge(pmd)   (pmd_val(pmd) && !pmd_table(pmd))
> 
> 
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index c95b9ec5d95f..93aebd9cc130 100644
> > --- a/mm/hmm.c
> > +++ b/mm/hmm.c
> > @@ -429,7 +429,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
> > start, unsigned long end,
> > return hmm_vma_walk_hole(start, end, -1, walk);
> > }
> >   
> > -   if (pud_huge(pud) && pud_devmap(pud)) {
> > +   if (pud_leaf(pud) && pud_devmap(pud)) {
> 
> Didn't previous patch say devmap implies leaf ? Or is it only for GUP ?

This is an extra safety check that I didn't remove.  Devmap used separate
bits even though I'm not clear on why.  It should still imply a leaf though.

Thanks,

> 
> > unsigned long i, npages, pfn;
> > unsigned int required_fault;
> > unsigned long *hmm_pfns;
> 
> 

-- 
Peter Xu

Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

2024-03-14 Thread Peter Xu

On Thu, Mar 14, 2024 at 08:45:34AM +, Christophe Leroy wrote:
> 
> 
> Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> > From: Peter Xu 
> > 
> > PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> > constantly returns 0 for hash MMUs.  As Michael Ellerman pointed out [1],
> > it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
> > it will keep returning false.
> > 
> > As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
> > such huge mappings for 4K hash MMUs.  Meanwhile, the major powerpc hugetlb
> > pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb
> > mappings.
> > 
> > The goal should be that we will have one API pXd_leaf() to detect all kinds
> > of huge mappings.  AFAICT we need to use the pXd_leaf() impl (rather than
> > pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
> 
> All kinds of huge mappings ?
> 
> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are 
> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages 
> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report 
> those huge pages.

Ah yes, I should always mention this is in the context of leaf huge pages
only.  Are the examples you provided all fall into hugepd category?  If so
I can reword the commit message, as:

As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to
create such huge mappings for 4K hash MMUs.  Meanwhile, the major
powerpc hugetlb pgtable walker __find_linux_pte() already used
pXd_leaf() to check leaf hugetlb mappings.

The goal should be that we will have one API pXd_leaf() to detect
all kinds of huge mappings except hugepd.  AFAICT we need to use
the pXd_leaf() impl (rather than pXd_huge() ones) to make sure
ie. THPs on hash MMU will also return true.

Does this look good to you?

Thanks,

-- 
Peter Xu

[PATCH] net: wan: fsl_qmc_hdlc: Fix module compilation

2024-03-14 Thread Herve Codina

The fsl_qmc_driver does not compile as module:
  error: ‘qmc_hdlc_driver’ undeclared here (not in a function);
405 | MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
| ^~~

Fix the typo.

Fixes: b40f00ecd463 ("net: wan: Add support for QMC HDLC")
Reported-by: Michael Ellerman 
Closes: https://lore.kernel.org/linux-kernel/87ttl93f7i.fsf@mail.lhotse/
Signed-off-by: Herve Codina 
---
 drivers/net/wan/fsl_qmc_hdlc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wan/fsl_qmc_hdlc.c b/drivers/net/wan/fsl_qmc_hdlc.c
index 960371df470a..f69b1f579a0c 100644
--- a/drivers/net/wan/fsl_qmc_hdlc.c
+++ b/drivers/net/wan/fsl_qmc_hdlc.c
@@ -780,7 +780,7 @@ static const struct of_device_id qmc_hdlc_id_table[] = {
{ .compatible = "fsl,qmc-hdlc" },
{} /* sentinel */
 };
-MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
+MODULE_DEVICE_TABLE(of, qmc_hdlc_id_table);
 
 static struct platform_driver qmc_hdlc_driver = {
.driver = {
-- 
2.44.0

Re: [PATCH v7 1/5] net: wan: Add support for QMC HDLC

2024-03-14 Thread Michael Ellerman

Herve Codina  writes:
> Hi Michael,
>
> On Thu, 14 Mar 2024 10:05:37 +1100
> Michael Ellerman  wrote:
>
>> Hi Herve,
>> 
>> Herve Codina  writes:
> ..
>> This breaks when building as a module (eg. ppc32_allmodconfig):
>> 
>>   In file included from ../include/linux/device/driver.h:21,
>>from ../include/linux/device.h:32,
>>from ../include/linux/dma-mapping.h:8,
>>from ../drivers/net/wan/fsl_qmc_hdlc.c:13:
>>   ../drivers/net/wan/fsl_qmc_hdlc.c:405:25: error: ‘qmc_hdlc_driver’ 
>> undeclared here (not in a function); did you mean ‘qmc_hdlc_probe’?
>> 405 | MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
>> | ^~~
>> 
>> 
>> IIUIC it should be pointing to the table, not the driver, so:
>> 
>> diff --git a/drivers/net/wan/fsl_qmc_hdlc.c b/drivers/net/wan/fsl_qmc_hdlc.c
>> index 5fd7ed325f5b..705c3681fb92 100644
>> --- a/drivers/net/wan/fsl_qmc_hdlc.c
>> +++ b/drivers/net/wan/fsl_qmc_hdlc.c
>> @@ -402,7 +402,7 @@ static const struct of_device_id qmc_hdlc_id_table[] = {
>> { .compatible = "fsl,qmc-hdlc" },
>> {} /* sentinel */
>>  };
>> -MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
>> +MODULE_DEVICE_TABLE(of, qmc_hdlc_id_table);
>> 
>>  static struct platform_driver qmc_hdlc_driver = {
>> .driver = {
>> 
>> 
>> Which then builds correctly.
>
> My bad, I missed that one.
> I fully agree with your modification.
>
> Do you want me to make a patch (copy/paste of your proposed modification)
> or do you plan to send the patch on your side ?

Yes if you can please turn it into a proper patch and submit it.

No need to add my SoB, it's trivial.

cheers

Re: [PATCH v6 1/9] locking/mutex: introduce devm_mutex_init

2024-03-14 Thread Marek Behún

On Thu, 14 Mar 2024 11:45:23 +0300
George Stark  wrote:

> Using of devm API leads to a certain order of releasing resources.
> So all dependent resources which are not devm-wrapped should be deleted
> with respect to devm-release order. Mutex is one of such objects that
> often is bound to other resources and has no own devm wrapping.
> Since mutex_destroy() actually does nothing in non-debug builds
> frequently calling mutex_destroy() is just ignored which is safe for now
> but wrong formally and can lead to a problem if mutex_destroy() will be
> extended so introduce devm_mutex_init()
> 
> Signed-off-by: George Stark 
> Suggested by-by: Christophe Leroy 

Reviewed-by: Marek Behún

Re: [PATCH v6 0/9] devm_led_classdev_register() usage problem

2024-03-14 Thread Andy Shevchenko

On Thu, Mar 14, 2024 at 10:46 AM George Stark  wrote:
>
> This patch series fixes the problem of devm_led_classdev_register misusing.
>
> The basic problem is described in [1]. Shortly when 
> devm_led_classdev_register()
> is used then led_classdev_unregister() called after driver's remove() 
> callback.
> led_classdev_unregister() calls driver's brightness_set callback and that 
> callback
> may use resources which were destroyed already in driver's remove().
>
> After discussion with maintainers [2] [3] we decided:
> 1) don't touch led subsytem core code and don't remove led_set_brightness() 
> from it

subsystem

> but fix drivers
> 2) don't use devm_led_classdev_unregister
>
> So the solution is to use devm wrappers for all resources
> driver's brightness_set() depends on. And introduce dedicated devm wrapper
> for mutex as it's often used resource.

The leds related changes (except the last one) LGTM, hence FWIW,
Reviewed-by: Andy Shevchenko 
(for patches 2-8)

> [1] 
> https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/
> [2] 
> https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/#mc132b9b350fa51931b4fcfe14705d9f06e91421f
> [3] 
> https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/#mdbf572a85c33f869a553caf986b6228bb65c8383

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v6 9/9] leds: powernv: use LED_RETAIN_AT_SHUTDOWN flag for leds

2024-03-14 Thread Andy Shevchenko

On Thu, Mar 14, 2024 at 10:46 AM George Stark  wrote:
>
> This driver wants to keep its LEDs state after module is removed
> and implemented it in its own way. LED subsystem supports dedicated
> flag LED_RETAIN_AT_SHUTDOWN for the same purpose so use the flag
> instead of custom implementation.

So, this change is not related to the main purpose of the series...

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v6 1/9] locking/mutex: introduce devm_mutex_init

2024-03-14 Thread Andy Shevchenko

On Thu, Mar 14, 2024 at 10:46 AM George Stark  wrote:
>
> Using of devm API leads to a certain order of releasing resources.
> So all dependent resources which are not devm-wrapped should be deleted
> with respect to devm-release order. Mutex is one of such objects that
> often is bound to other resources and has no own devm wrapping.
> Since mutex_destroy() actually does nothing in non-debug builds
> frequently calling mutex_destroy() is just ignored which is safe for now
> but wrong formally and can lead to a problem if mutex_destroy() will be
> extended so introduce devm_mutex_init()

Missing period at the end.



> Suggested by-by: Christophe Leroy 

Needs properly spelled tag.

...

> +static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
> +{
> +   /*
> +* When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so

mutex_destroy()

> +* no really need to register it in devm subsystem.

in the devm

> +*/
> +   return 0;
> +}

...

> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Without seeing much context can't say if there is a better (more
ordered) place to squeeze a new header to. Please, check.

...

After addressing the above comments
Reviewed-by: Andy Shevchenko 

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v6 1/9] locking/mutex: introduce devm_mutex_init

2024-03-14 Thread Christophe Leroy



Le 14/03/2024 à 09:45, George Stark a écrit :
> Using of devm API leads to a certain order of releasing resources.
> So all dependent resources which are not devm-wrapped should be deleted
> with respect to devm-release order. Mutex is one of such objects that
> often is bound to other resources and has no own devm wrapping.
> Since mutex_destroy() actually does nothing in non-debug builds
> frequently calling mutex_destroy() is just ignored which is safe for now
> but wrong formally and can lead to a problem if mutex_destroy() will be
> extended so introduce devm_mutex_init()
> 
> Signed-off-by: George Stark 
> Suggested by-by: Christophe Leroy 

s/Suggested by-by/Suggested-by:

Reviewed-by: Christophe Leroy 

> ---
>   include/linux/mutex.h| 27 +++
>   kernel/locking/mutex-debug.c | 11 +++
>   2 files changed, 38 insertions(+)
> 
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 67edc4ca2bee..f57e005ded24 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -22,6 +22,8 @@
>   #include 
>   #include 
>   
> +struct device;
> +
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   # define __DEP_MAP_MUTEX_INITIALIZER(lockname)  \
>   , .dep_map = {  \
> @@ -117,6 +119,31 @@ do { 
> \
>   } while (0)
>   #endif /* CONFIG_PREEMPT_RT */
>   
> +#ifdef CONFIG_DEBUG_MUTEXES
> +
> +int __devm_mutex_init(struct device *dev, struct mutex *lock);
> +
> +#else
> +
> +static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
> +{
> + /*
> +  * When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
> +  * no really need to register it in devm subsystem.
> +  */
> + return 0;
> +}
> +
> +#endif
> +
> +#define devm_mutex_init(dev, mutex)  \
> +({   \
> + typeof(mutex) mutex_ = (mutex); \
> + \
> + mutex_init(mutex_); \
> + __devm_mutex_init(dev, mutex_); \
> +})
> +
>   /*
>* See kernel/locking/mutex.c for detailed documentation of these APIs.
>* Also see Documentation/locking/mutex-design.rst.
> diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
> index bc8abb8549d2..6aa77e3dc82e 100644
> --- a/kernel/locking/mutex-debug.c
> +++ b/kernel/locking/mutex-debug.c
> @@ -19,6 +19,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   
>   #include "mutex.h"
>   
> @@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char *name,
>   lock->magic = lock;
>   }
>   
> +static void devm_mutex_release(void *res)
> +{
> + mutex_destroy(res);
> +}
> +
> +int __devm_mutex_init(struct device *dev, struct mutex *lock)
> +{
> + return devm_add_action_or_reset(dev, devm_mutex_release, lock);
> +}
> +
>   /***
>* mutex_destroy - mark a mutex unusable
>* @lock: the mutex to be destroyed

Re: [PATCH 12/13] mm/treewide: Remove pXd_huge()

2024-03-14 Thread Christophe Leroy



Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> From: Peter Xu 
> 
> This API is not used anymore, drop it for the whole tree.
> 
> Signed-off-by: Peter Xu 
> ---
>   arch/arm/mm/Makefile  |  1 -
>   arch/arm/mm/hugetlbpage.c | 29 ---
>   arch/arm64/mm/hugetlbpage.c   | 10 ---
>   arch/loongarch/mm/hugetlbpage.c   | 10 ---
>   arch/mips/include/asm/pgtable-32.h|  2 +-
>   arch/mips/include/asm/pgtable-64.h|  2 +-
>   arch/mips/mm/hugetlbpage.c| 10 ---
>   arch/parisc/mm/hugetlbpage.c  | 11 ---
>   .../include/asm/book3s/64/pgtable-4k.h| 10 ---
>   .../include/asm/book3s/64/pgtable-64k.h   | 25 
>   arch/powerpc/include/asm/nohash/pgtable.h | 10 ---
>   arch/riscv/mm/hugetlbpage.c   | 10 ---
>   arch/s390/mm/hugetlbpage.c| 10 ---
>   arch/sh/mm/hugetlbpage.c  | 10 ---
>   arch/sparc/mm/hugetlbpage.c   | 10 ---
>   arch/x86/mm/hugetlbpage.c | 16 --
>   include/linux/hugetlb.h   | 24 ---
>   17 files changed, 2 insertions(+), 198 deletions(-)
>   delete mode 100644 arch/arm/mm/hugetlbpage.c
> 

> diff --git a/arch/mips/include/asm/pgtable-32.h 
> b/arch/mips/include/asm/pgtable-32.h
> index 0e196650f4f4..92b7591aac2a 100644
> --- a/arch/mips/include/asm/pgtable-32.h
> +++ b/arch/mips/include/asm/pgtable-32.h
> @@ -129,7 +129,7 @@ static inline int pmd_none(pmd_t pmd)
>   static inline int pmd_bad(pmd_t pmd)
>   {
>   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
> - /* pmd_huge(pmd) but inline */
> + /* pmd_leaf(pmd) but inline */

Shouldn't this comment have been changed in patch 11 ?

>   if (unlikely(pmd_val(pmd) & _PAGE_HUGE))

Unlike pmd_huge() which is an outline function, pmd_leaf() is a macro so 
it could be used here instead of open coping.

>   return 0;
>   #endif
> diff --git a/arch/mips/include/asm/pgtable-64.h 
> b/arch/mips/include/asm/pgtable-64.h
> index 20ca48c1b606..7c28510b3768 100644
> --- a/arch/mips/include/asm/pgtable-64.h
> +++ b/arch/mips/include/asm/pgtable-64.h
> @@ -245,7 +245,7 @@ static inline int pmd_none(pmd_t pmd)
>   static inline int pmd_bad(pmd_t pmd)
>   {
>   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
> - /* pmd_huge(pmd) but inline */
> + /* pmd_leaf(pmd) but inline */

Same

>   if (unlikely(pmd_val(pmd) & _PAGE_HUGE))

Same

>   return 0;
>   #endif

> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> index 2fce3498b000..579a7153857f 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable-64k.h
> @@ -4,31 +4,6 @@
>   
>   #ifndef __ASSEMBLY__
>   #ifdef CONFIG_HUGETLB_PAGE
> -/*
> - * We have PGD_INDEX_SIZ = 12 and PTE_INDEX_SIZE = 8, so that we can have
> - * 16GB hugepage pte in PGD and 16MB hugepage pte at PMD;
> - *
> - * Defined in such a way that we can optimize away code block at build time
> - * if CONFIG_HUGETLB_PAGE=n.
> - *
> - * returns true for pmd migration entries, THP, devmap, hugetlb
> - * But compile time dependent on CONFIG_HUGETLB_PAGE
> - */

Should we keep this comment somewhere for documentation ?

> -static inline int pmd_huge(pmd_t pmd)
> -{
> - /*
> -  * leaf pte for huge page
> -  */
> - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
> -static inline int pud_huge(pud_t pud)
> -{
> - /*
> -  * leaf pte for huge page
> -  */
> - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> -}
>   
>   /*
>* With 64k page size, we have hugepage ptes in the pgd and pmd entries. We 
> don't

Re: [PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf()

2024-03-14 Thread Christophe Leroy



Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> From: Peter Xu 
> 
> Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(),
> reuse it.  Luckily, pXd_huge() isn't widely used.
> 
> Signed-off-by: Peter Xu 
> ---
>   arch/arm/include/asm/pgtable-3level.h | 2 +-
>   arch/arm64/include/asm/pgtable.h  | 2 +-
>   arch/arm64/mm/hugetlbpage.c   | 4 ++--
>   arch/loongarch/mm/hugetlbpage.c   | 2 +-
>   arch/mips/mm/tlb-r4k.c| 2 +-
>   arch/powerpc/mm/pgtable_64.c  | 6 +++---
>   arch/x86/mm/pgtable.c | 4 ++--
>   mm/gup.c  | 4 ++--
>   mm/hmm.c  | 2 +-
>   mm/memory.c   | 2 +-
>   10 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/include/asm/pgtable-3level.h 
> b/arch/arm/include/asm/pgtable-3level.h
> index e7aecbef75c9..9e3c44f0aea2 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -190,7 +190,7 @@ static inline pte_t pte_mkspecial(pte_t pte)
>   #define pmd_dirty(pmd)  (pmd_isset((pmd), L_PMD_SECT_DIRTY))
>   
>   #define pmd_hugewillfault(pmd)  (!pmd_young(pmd) || !pmd_write(pmd))
> -#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
> +#define pmd_thp_or_huge(pmd) (pmd_leaf(pmd) || pmd_trans_huge(pmd))

Previous patch said pmd_trans_huge() implies pmd_leaf().

Or is that only for GUP ?

>   
>   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   #define pmd_trans_huge(pmd) (pmd_val(pmd) && !pmd_table(pmd))


> diff --git a/mm/hmm.c b/mm/hmm.c
> index c95b9ec5d95f..93aebd9cc130 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -429,7 +429,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
> start, unsigned long end,
>   return hmm_vma_walk_hole(start, end, -1, walk);
>   }
>   
> - if (pud_huge(pud) && pud_devmap(pud)) {
> + if (pud_leaf(pud) && pud_devmap(pud)) {

Didn't previous patch say devmap implies leaf ? Or is it only for GUP ?

>   unsigned long i, npages, pfn;
>   unsigned int required_fault;
>   unsigned long *hmm_pfns;

[PATCH v6 9/9] leds: powernv: use LED_RETAIN_AT_SHUTDOWN flag for leds

2024-03-14 Thread George Stark

This driver wants to keep its LEDs state after module is removed
and implemented it in its own way. LED subsystem supports dedicated
flag LED_RETAIN_AT_SHUTDOWN for the same purpose so use the flag
instead of custom implementation.

Signed-off-by: George Stark 
---
 drivers/leds/leds-powernv.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/leds/leds-powernv.c b/drivers/leds/leds-powernv.c
index 4f01acb75727..9c6fb7d6e0e7 100644
--- a/drivers/leds/leds-powernv.c
+++ b/drivers/leds/leds-powernv.c
@@ -30,15 +30,6 @@ static const struct led_type_map led_type_map[] = {
 };
 
 struct powernv_led_common {
-   /*
-* By default unload path resets all the LEDs. But on PowerNV
-* platform we want to retain LED state across reboot as these
-* are controlled by firmware. Also service processor can modify
-* the LEDs independent of OS. Hence avoid resetting LEDs in
-* unload path.
-*/
-   boolled_disabled;
-
/* Max supported LED type */
__be64  max_led_type;
 
@@ -178,10 +169,6 @@ static int powernv_brightness_set(struct led_classdev 
*led_cdev,
struct powernv_led_common *powernv_led_common = powernv_led->common;
int rc;
 
-   /* Do not modify LED in unload path */
-   if (powernv_led_common->led_disabled)
-   return 0;
-
mutex_lock(_led_common->lock);
rc = powernv_led_set(powernv_led, value);
mutex_unlock(_led_common->lock);
@@ -225,6 +212,14 @@ static int powernv_led_create(struct device *dev,
 
powernv_led->cdev.brightness_set_blocking = powernv_brightness_set;
powernv_led->cdev.brightness_get = powernv_brightness_get;
+   /*
+* By default unload path resets all the LEDs. But on PowerNV
+* platform we want to retain LED state across reboot as these
+* are controlled by firmware. Also service processor can modify
+* the LEDs independent of OS. Hence avoid resetting LEDs in
+* unload path.
+*/
+   powernv_led->cdev.flags = LED_RETAIN_AT_SHUTDOWN;
powernv_led->cdev.brightness = LED_OFF;
powernv_led->cdev.max_brightness = LED_FULL;
 
@@ -313,9 +308,7 @@ static void powernv_led_remove(struct platform_device *pdev)
 {
struct powernv_led_common *powernv_led_common;
 
-   /* Disable LED operation */
powernv_led_common = platform_get_drvdata(pdev);
-   powernv_led_common->led_disabled = true;
 
/* Destroy lock */
mutex_destroy(_led_common->lock);
-- 
2.25.1

[PATCH v6 7/9] leds: mlxreg: use devm_mutex_init for mutex initializtion

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses mutex which was destroyed already
in module's remove() so use devm API instead.

Signed-off-by: George Stark 
---
 drivers/leds/leds-mlxreg.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/leds/leds-mlxreg.c b/drivers/leds/leds-mlxreg.c
index d8e3d5d8d2d0..b1510cd32e47 100644
--- a/drivers/leds/leds-mlxreg.c
+++ b/drivers/leds/leds-mlxreg.c
@@ -257,6 +257,7 @@ static int mlxreg_led_probe(struct platform_device *pdev)
 {
struct mlxreg_core_platform_data *led_pdata;
struct mlxreg_led_priv_data *priv;
+   int err;
 
led_pdata = dev_get_platdata(>dev);
if (!led_pdata) {
@@ -268,26 +269,21 @@ static int mlxreg_led_probe(struct platform_device *pdev)
if (!priv)
return -ENOMEM;
 
-   mutex_init(>access_lock);
+   err = devm_mutex_init(>dev, >access_lock);
+   if (err)
+   return err;
+
priv->pdev = pdev;
priv->pdata = led_pdata;
 
return mlxreg_led_config(priv);
 }
 
-static void mlxreg_led_remove(struct platform_device *pdev)
-{
-   struct mlxreg_led_priv_data *priv = dev_get_drvdata(>dev);
-
-   mutex_destroy(>access_lock);
-}
-
 static struct platform_driver mlxreg_led_driver = {
.driver = {
.name = "leds-mlxreg",
},
.probe = mlxreg_led_probe,
-   .remove_new = mlxreg_led_remove,
 };
 
 module_platform_driver(mlxreg_led_driver);
-- 
2.25.1

[PATCH v6 8/9] leds: an30259a: use devm_mutext_init for mutext initialization

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses mutex which was destroyed already
in module's remove() so use devm API instead.

Signed-off-by: George Stark 
---
 drivers/leds/leds-an30259a.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/leds/leds-an30259a.c b/drivers/leds/leds-an30259a.c
index 0216afed3b6e..decfca447d8a 100644
--- a/drivers/leds/leds-an30259a.c
+++ b/drivers/leds/leds-an30259a.c
@@ -283,7 +283,10 @@ static int an30259a_probe(struct i2c_client *client)
if (err < 0)
return err;
 
-   mutex_init(>mutex);
+   err = devm_mutex_init(>dev, >mutex);
+   if (err)
+   return err;
+
chip->client = client;
i2c_set_clientdata(client, chip);
 
@@ -317,17 +320,9 @@ static int an30259a_probe(struct i2c_client *client)
return 0;
 
 exit:
-   mutex_destroy(>mutex);
return err;
 }
 
-static void an30259a_remove(struct i2c_client *client)
-{
-   struct an30259a *chip = i2c_get_clientdata(client);
-
-   mutex_destroy(>mutex);
-}
-
 static const struct of_device_id an30259a_match_table[] = {
{ .compatible = "panasonic,an30259a", },
{ /* sentinel */ },
@@ -347,7 +342,6 @@ static struct i2c_driver an30259a_driver = {
.of_match_table = an30259a_match_table,
},
.probe = an30259a_probe,
-   .remove = an30259a_remove,
.id_table = an30259a_id,
 };
 
-- 
2.25.1

[PATCH v6 5/9] leds: lm3532: use devm API to cleanup module's resources

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().

Signed-off-by: George Stark 
---
 drivers/leds/leds-lm3532.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/leds/leds-lm3532.c b/drivers/leds/leds-lm3532.c
index 13662a4aa1f2..aa7966eb506f 100644
--- a/drivers/leds/leds-lm3532.c
+++ b/drivers/leds/leds-lm3532.c
@@ -542,6 +542,13 @@ static int lm3532_parse_als(struct lm3532_data *priv)
return ret;
 }
 
+static void gpio_set_low_action(void *data)
+{
+   struct lm3532_data *priv = (struct lm3532_data *)data;
+
+   gpiod_direction_output(priv->enable_gpio, 0);
+}
+
 static int lm3532_parse_node(struct lm3532_data *priv)
 {
struct fwnode_handle *child = NULL;
@@ -556,6 +563,12 @@ static int lm3532_parse_node(struct lm3532_data *priv)
if (IS_ERR(priv->enable_gpio))
priv->enable_gpio = NULL;
 
+   if (priv->enable_gpio) {
+   ret = devm_add_action(>client->dev, gpio_set_low_action, 
priv);
+   if (ret)
+   return ret;
+   }
+
priv->regulator = devm_regulator_get(>client->dev, "vin");
if (IS_ERR(priv->regulator))
priv->regulator = NULL;
@@ -691,7 +704,10 @@ static int lm3532_probe(struct i2c_client *client)
return ret;
}
 
-   mutex_init(>lock);
+   ret = devm_mutex_init(>dev, >lock);
+   if (ret)
+   return ret;
+
i2c_set_clientdata(client, drvdata);
 
ret = lm3532_parse_node(drvdata);
@@ -703,16 +719,6 @@ static int lm3532_probe(struct i2c_client *client)
return ret;
 }
 
-static void lm3532_remove(struct i2c_client *client)
-{
-   struct lm3532_data *drvdata = i2c_get_clientdata(client);
-
-   mutex_destroy(>lock);
-
-   if (drvdata->enable_gpio)
-   gpiod_direction_output(drvdata->enable_gpio, 0);
-}
-
 static const struct of_device_id of_lm3532_leds_match[] = {
{ .compatible = "ti,lm3532", },
{},
@@ -727,7 +733,6 @@ MODULE_DEVICE_TABLE(i2c, lm3532_id);
 
 static struct i2c_driver lm3532_i2c_driver = {
.probe = lm3532_probe,
-   .remove = lm3532_remove,
.id_table = lm3532_id,
.driver = {
.name = LM3532_NAME,
-- 
2.25.1

[PATCH v6 3/9] leds: aw200xx: use devm API to cleanup module's resources

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().

Signed-off-by: George Stark 
---
 drivers/leds/leds-aw200xx.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/leds/leds-aw200xx.c b/drivers/leds/leds-aw200xx.c
index f584a7f98fc5..5cba52d07b38 100644
--- a/drivers/leds/leds-aw200xx.c
+++ b/drivers/leds/leds-aw200xx.c
@@ -530,6 +530,16 @@ static const struct regmap_config aw200xx_regmap_config = {
.disable_locking = true,
 };
 
+static void aw200xx_chip_reset_action(void *data)
+{
+   aw200xx_chip_reset(data);
+}
+
+static void aw200xx_disable_action(void *data)
+{
+   aw200xx_disable(data);
+}
+
 static int aw200xx_probe(struct i2c_client *client)
 {
const struct aw200xx_chipdef *cdef;
@@ -568,11 +578,17 @@ static int aw200xx_probe(struct i2c_client *client)
 
aw200xx_enable(chip);
 
+   ret = devm_add_action(>dev, aw200xx_disable_action, chip);
+   if (ret)
+   return ret;
+
ret = aw200xx_chip_check(chip);
if (ret)
return ret;
 
-   mutex_init(>mutex);
+   ret = devm_mutex_init(>dev, >mutex);
+   if (ret)
+   return ret;
 
/* Need a lock now since after call aw200xx_probe_fw, sysfs nodes 
created */
mutex_lock(>mutex);
@@ -581,6 +597,10 @@ static int aw200xx_probe(struct i2c_client *client)
if (ret)
goto out_unlock;
 
+   ret = devm_add_action(>dev, aw200xx_chip_reset_action, chip);
+   if (ret)
+   goto out_unlock;
+
ret = aw200xx_probe_fw(>dev, chip);
if (ret)
goto out_unlock;
@@ -595,15 +615,6 @@ static int aw200xx_probe(struct i2c_client *client)
return ret;
 }
 
-static void aw200xx_remove(struct i2c_client *client)
-{
-   struct aw200xx *chip = i2c_get_clientdata(client);
-
-   aw200xx_chip_reset(chip);
-   aw200xx_disable(chip);
-   mutex_destroy(>mutex);
-}
-
 static const struct aw200xx_chipdef aw20036_cdef = {
.channels = 36,
.display_size_rows_max = 3,
@@ -652,7 +663,6 @@ static struct i2c_driver aw200xx_driver = {
.of_match_table = aw200xx_match_table,
},
.probe = aw200xx_probe,
-   .remove = aw200xx_remove,
.id_table = aw200xx_id,
 };
 module_i2c_driver(aw200xx_driver);
-- 
2.25.1

[PATCH v6 6/9] leds: nic78bx: use devm API to cleanup module's resources

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().

Signed-off-by: George Stark 
---
 drivers/leds/leds-nic78bx.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/leds/leds-nic78bx.c b/drivers/leds/leds-nic78bx.c
index a86b43dd995e..f3049fa14f04 100644
--- a/drivers/leds/leds-nic78bx.c
+++ b/drivers/leds/leds-nic78bx.c
@@ -118,6 +118,15 @@ static struct nic78bx_led nic78bx_leds[] = {
}
 };
 
+static void lock_led_reg_action(void *data)
+{
+   struct nic78bx_led_data *led_data = (struct nic78bx_led_data *)data;
+
+   /* Lock LED register */
+   outb(NIC78BX_LOCK_VALUE,
+led_data->io_base + NIC78BX_LOCK_REG_OFFSET);
+}
+
 static int nic78bx_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -152,6 +161,10 @@ static int nic78bx_probe(struct platform_device *pdev)
led_data->io_base = io_rc->start;
spin_lock_init(_data->lock);
 
+   ret = devm_add_action(dev, lock_led_reg_action, led_data);
+   if (ret)
+   return ret;
+
for (i = 0; i < ARRAY_SIZE(nic78bx_leds); i++) {
nic78bx_leds[i].data = led_data;
 
@@ -167,15 +180,6 @@ static int nic78bx_probe(struct platform_device *pdev)
return ret;
 }
 
-static void nic78bx_remove(struct platform_device *pdev)
-{
-   struct nic78bx_led_data *led_data = platform_get_drvdata(pdev);
-
-   /* Lock LED register */
-   outb(NIC78BX_LOCK_VALUE,
-led_data->io_base + NIC78BX_LOCK_REG_OFFSET);
-}
-
 static const struct acpi_device_id led_device_ids[] = {
{"NIC78B3", 0},
{"", 0},
@@ -184,7 +188,6 @@ MODULE_DEVICE_TABLE(acpi, led_device_ids);
 
 static struct platform_driver led_driver = {
.probe = nic78bx_probe,
-   .remove_new = nic78bx_remove,
.driver = {
.name = KBUILD_MODNAME,
.acpi_match_table = ACPI_PTR(led_device_ids),
-- 
2.25.1

[PATCH v6 1/9] locking/mutex: introduce devm_mutex_init

2024-03-14 Thread George Stark

Using of devm API leads to a certain order of releasing resources.
So all dependent resources which are not devm-wrapped should be deleted
with respect to devm-release order. Mutex is one of such objects that
often is bound to other resources and has no own devm wrapping.
Since mutex_destroy() actually does nothing in non-debug builds
frequently calling mutex_destroy() is just ignored which is safe for now
but wrong formally and can lead to a problem if mutex_destroy() will be
extended so introduce devm_mutex_init()

Signed-off-by: George Stark 
Suggested by-by: Christophe Leroy 
---
 include/linux/mutex.h| 27 +++
 kernel/locking/mutex-debug.c | 11 +++
 2 files changed, 38 insertions(+)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 67edc4ca2bee..f57e005ded24 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+struct device;
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
, .dep_map = {  \
@@ -117,6 +119,31 @@ do {   
\
 } while (0)
 #endif /* CONFIG_PREEMPT_RT */
 
+#ifdef CONFIG_DEBUG_MUTEXES
+
+int __devm_mutex_init(struct device *dev, struct mutex *lock);
+
+#else
+
+static inline int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   /*
+* When CONFIG_DEBUG_MUTEXES is off mutex_destroy is just a nop so
+* no really need to register it in devm subsystem.
+*/
+   return 0;
+}
+
+#endif
+
+#define devm_mutex_init(dev, mutex)\
+({ \
+   typeof(mutex) mutex_ = (mutex); \
+   \
+   mutex_init(mutex_); \
+   __devm_mutex_init(dev, mutex_); \
+})
+
 /*
  * See kernel/locking/mutex.c for detailed documentation of these APIs.
  * Also see Documentation/locking/mutex-design.rst.
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index bc8abb8549d2..6aa77e3dc82e 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mutex.h"
 
@@ -89,6 +90,16 @@ void debug_mutex_init(struct mutex *lock, const char *name,
lock->magic = lock;
 }
 
+static void devm_mutex_release(void *res)
+{
+   mutex_destroy(res);
+}
+
+int __devm_mutex_init(struct device *dev, struct mutex *lock)
+{
+   return devm_add_action_or_reset(dev, devm_mutex_release, lock);
+}
+
 /***
  * mutex_destroy - mark a mutex unusable
  * @lock: the mutex to be destroyed
-- 
2.25.1

[PATCH v6 4/9] leds: lp3952: use devm API to cleanup module's resources

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().
Also drop explicit turning LEDs off from remove() due to they will be off
anyway by led_classdev_unregister().

Signed-off-by: George Stark 
---
 drivers/leds/leds-lp3952.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/leds/leds-lp3952.c b/drivers/leds/leds-lp3952.c
index 5d18bbfd1f23..e24bfe686312 100644
--- a/drivers/leds/leds-lp3952.c
+++ b/drivers/leds/leds-lp3952.c
@@ -207,6 +207,13 @@ static const struct regmap_config lp3952_regmap = {
.cache_type = REGCACHE_MAPLE,
 };
 
+static void gpio_set_low_action(void *data)
+{
+   struct lp3952_led_array *priv = (struct lp3952_led_array *)data;
+
+   gpiod_set_value(priv->enable_gpio, 0);
+}
+
 static int lp3952_probe(struct i2c_client *client)
 {
int status;
@@ -226,6 +233,10 @@ static int lp3952_probe(struct i2c_client *client)
return status;
}
 
+   status = devm_add_action(>dev, gpio_set_low_action, priv);
+   if (status)
+   return status;
+
priv->regmap = devm_regmap_init_i2c(client, _regmap);
if (IS_ERR(priv->regmap)) {
int err = PTR_ERR(priv->regmap);
@@ -254,15 +265,6 @@ static int lp3952_probe(struct i2c_client *client)
return 0;
 }
 
-static void lp3952_remove(struct i2c_client *client)
-{
-   struct lp3952_led_array *priv;
-
-   priv = i2c_get_clientdata(client);
-   lp3952_on_off(priv, LP3952_LED_ALL, false);
-   gpiod_set_value(priv->enable_gpio, 0);
-}
-
 static const struct i2c_device_id lp3952_id[] = {
{LP3952_NAME, 0},
{}
@@ -274,7 +276,6 @@ static struct i2c_driver lp3952_i2c_driver = {
.name = LP3952_NAME,
},
.probe = lp3952_probe,
-   .remove = lp3952_remove,
.id_table = lp3952_id,
 };
 
-- 
2.25.1

[PATCH v6 0/9] devm_led_classdev_register() usage problem

2024-03-14 Thread George Stark

This patch series fixes the problem of devm_led_classdev_register misusing.

The basic problem is described in [1]. Shortly when devm_led_classdev_register()
is used then led_classdev_unregister() called after driver's remove() callback.
led_classdev_unregister() calls driver's brightness_set callback and that 
callback
may use resources which were destroyed already in driver's remove().

After discussion with maintainers [2] [3] we decided:
1) don't touch led subsytem core code and don't remove led_set_brightness() 
from it
but fix drivers
2) don't use devm_led_classdev_unregister

So the solution is to use devm wrappers for all resources
driver's brightness_set() depends on. And introduce dedicated devm wrapper
for mutex as it's often used resource.

[1] 
https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/
[2] 
https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/#mc132b9b350fa51931b4fcfe14705d9f06e91421f
[3] 
https://lore.kernel.org/lkml/8704539b-ed3b-44e6-aa82-586e2f895...@salutedevices.com/T/#mdbf572a85c33f869a553caf986b6228bb65c8383

Changelog:
v1->v2:
revise patch series completely

v2->v3:
locking: add define if mutex_destroy() is not an empty function
new patch, discussed here [8]

devm-helpers: introduce devm_mutex_init
previous version [4]
- revise code based on mutex_destroy define
- update commit message
- update devm_mutex_init()'s description

leds: aw2013: unlock mutex before destroying it
previous version [5]
- make this patch first in the series
- add tags Fixes and RvB by Andy

leds: aw2013: use devm API to cleanup module's resources
previous version [6]
- make aw2013_chip_disable_action()'s body oneline
- don't shadow devm_mutex_init() return code

leds: aw200xx: use devm API to cleanup module's resources
previous version [7]
- make aw200xx_*_action()'s bodies oneline
- don't shadow devm_mutex_init() return code

leds: lm3532: use devm API to cleanup module's resources
leds: nic78bx: use devm API to cleanup module's resources
leds: mlxreg: use devm_mutex_init for mutex initializtion
leds: an30259a: use devm_mutext_init for mutext initialization
leds: powernv: add LED_RETAIN_AT_SHUTDOWN flag for leds
- those pathes were planned but not sent in the series #2 due to mail 
server
problem on my side. I revised them according to the comments.

v3->v4:
locking: introduce devm_mutex_init
new patch
- move devm_mutex_init implementation completely from devm-helpers.h to 
mutex.h

locking: add define if mutex_destroy() is not an empty function
drop the patch [9]

devm-helpers: introduce devm_mutex_init
drop the patch [10]

leds: aw2013: use devm API to cleanup module's resources
- add tag Tested-by: Nikita Travkin 

v4->v5:
leds: aw2013: unlock mutex before destroying it
merged separately and removed from the series

locking/mutex: move mutex_destroy() definition lower
introduce optional refactoring patch

locking/mutex: introduce devm_mutex_init
leave only one devm_mutex_init definition
add tag Signed-off-by: Christophe Leroy 

leds* patches
remove #include  due to devm_mutext_init in 
mutex.h now

v5->v6:
locking/mutex: move mutex_destroy() definition lower [11]
drop the patch due to devm_mutex_init block is big enough to be 
declared standalone.

locking/mutex: introduce devm_mutex_init
redesign devm_mutex_init function to macro to keep lockdep feature 
working
use typeof to redeclare mutex var in macro body (thanks to checkpatch)
previous version [12]

[4] 
https://lore.kernel.org/lkml/20231204180603.470421-1-gnst...@salutedevices.com/T/#mf500af0eda2a9ffc95594607dbe4cb64f2e3c9a8
[5] 
https://lore.kernel.org/lkml/20231204180603.470421-1-gnst...@salutedevices.com/T/#mc92df4fb4f7d4187fb01cc1144acfa5fb5230dd2
[6] 
https://lore.kernel.org/lkml/20231204180603.470421-1-gnst...@salutedevices.com/T/#m300df89710c43cc2ab598baa16c68dd0a0d7d681
[7] 
https://lore.kernel.org/lkml/20231204180603.470421-1-gnst...@salutedevices.com/T/#m8e5c65e0c6b137c91fa00bb9320ad581164d1d0b
[8] 
https://lore.kernel.org/lkml/377e4437-7051-4d88-ae68-1460bcd69...@redhat.com/T/#m5f84a4a2f387d49678783e652b9e658e02c27450
[9] 
https://lore.kernel.org/lkml/20231213223020.2713164-1-gnst...@salutedevices.com/T/#m19ad1fc04c560012c1e27418e3156d0c9306dd84
[10] 
https://lore.kernel.org/lkml/20231213223020.2713164-1-gnst...@salutedevices.com/T/#m63126025f5d1bdcef69bcad50f2e58274d42e2d
[11] 
https://lore.kernel.org/lkml/20240307024034.1548605-2-gnst...@salutedevices.com/
[12] 
https://lore.kernel.org/lkml/20240307024034.1548605-3-gnst...@salutedevices.com/

George Stark (9):
  locking/mutex: introduce devm_mutex_init
  leds: aw2013: use devm API to cleanup module's resources
  leds: aw200xx: use devm API to cleanup module's

[PATCH v6 2/9] leds: aw2013: use devm API to cleanup module's resources

2024-03-14 Thread George Stark

In this driver LEDs are registered using devm_led_classdev_register()
so they are automatically unregistered after module's remove() is done.
led_classdev_unregister() calls module's led_set_brightness() to turn off
the LEDs and that callback uses resources which were destroyed already
in module's remove() so use devm API instead of remove().

Signed-off-by: George Stark 
Tested-by: Nikita Travkin 
---
 drivers/leds/leds-aw2013.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/leds/leds-aw2013.c b/drivers/leds/leds-aw2013.c
index 17235a5e576a..6475eadcb0df 100644
--- a/drivers/leds/leds-aw2013.c
+++ b/drivers/leds/leds-aw2013.c
@@ -320,6 +320,11 @@ static int aw2013_probe_dt(struct aw2013 *chip)
return 0;
 }
 
+static void aw2013_chip_disable_action(void *data)
+{
+   aw2013_chip_disable(data);
+}
+
 static const struct regmap_config aw2013_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
@@ -336,7 +341,10 @@ static int aw2013_probe(struct i2c_client *client)
if (!chip)
return -ENOMEM;
 
-   mutex_init(>mutex);
+   ret = devm_mutex_init(>dev, >mutex);
+   if (ret)
+   return ret;
+
mutex_lock(>mutex);
 
chip->client = client;
@@ -384,6 +392,10 @@ static int aw2013_probe(struct i2c_client *client)
goto error_reg;
}
 
+   ret = devm_add_action(>dev, aw2013_chip_disable_action, chip);
+   if (ret)
+   goto error_reg;
+
ret = aw2013_probe_dt(chip);
if (ret < 0)
goto error_reg;
@@ -406,19 +418,9 @@ static int aw2013_probe(struct i2c_client *client)
 
 error:
mutex_unlock(>mutex);
-   mutex_destroy(>mutex);
return ret;
 }
 
-static void aw2013_remove(struct i2c_client *client)
-{
-   struct aw2013 *chip = i2c_get_clientdata(client);
-
-   aw2013_chip_disable(chip);
-
-   mutex_destroy(>mutex);
-}
-
 static const struct of_device_id aw2013_match_table[] = {
{ .compatible = "awinic,aw2013", },
{ /* sentinel */ },
@@ -432,7 +434,6 @@ static struct i2c_driver aw2013_driver = {
.of_match_table = aw2013_match_table,
},
.probe = aw2013_probe,
-   .remove = aw2013_remove,
 };
 
 module_i2c_driver(aw2013_driver);
-- 
2.25.1

Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

2024-03-14 Thread Christophe Leroy



Le 13/03/2024 à 22:47, pet...@redhat.com a écrit :
> From: Peter Xu 
> 
> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> constantly returns 0 for hash MMUs.  As Michael Ellerman pointed out [1],
> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
> it will keep returning false.
> 
> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
> such huge mappings for 4K hash MMUs.  Meanwhile, the major powerpc hugetlb
> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb
> mappings.
> 
> The goal should be that we will have one API pXd_leaf() to detect all kinds
> of huge mappings.  AFAICT we need to use the pXd_leaf() impl (rather than
> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.

All kinds of huge mappings ?

pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are 
also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages 
and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report 
those huge pages.

> 
> This helps to simplify a follow up patch to drop pXd_huge() treewide.
> 
> NOTE: *_leaf() definition need to be moved before the inclusion of
> asm/book3s/64/pgtable-4k.h, which defines pXd_huge() with it.
> 
> [1] https://lore.kernel.org/r/87v85zo6w7.fsf@mail.lhotse
> 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: "Aneesh Kumar K.V" 
> Cc: "Naveen N. Rao" 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Peter Xu 
> ---
>   .../include/asm/book3s/64/pgtable-4k.h| 14 ++
>   arch/powerpc/include/asm/book3s/64/pgtable.h  | 27 +--
>   2 files changed, 14 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> index 48f21820afe2..92545981bb49 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable-4k.h
> @@ -8,22 +8,12 @@
>   #ifdef CONFIG_HUGETLB_PAGE
>   static inline int pmd_huge(pmd_t pmd)
>   {
> - /*
> -  * leaf pte for huge page
> -  */
> - if (radix_enabled())
> - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> - return 0;
> + return pmd_leaf(pmd);
>   }
>   
>   static inline int pud_huge(pud_t pud)
>   {
> - /*
> -  * leaf pte for huge page
> -  */
> - if (radix_enabled())
> - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> - return 0;
> + return pud_leaf(pud);
>   }
>   
>   /*
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index df66dce8306f..fd7180fded75 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -262,6 +262,18 @@ extern unsigned long __kernel_io_end;
>   
>   extern struct page *vmemmap;
>   extern unsigned long pci_io_base;
> +
> +#define pmd_leaf pmd_leaf
> +static inline bool pmd_leaf(pmd_t pmd)
> +{
> + return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
> +#define pud_leaf pud_leaf
> +static inline bool pud_leaf(pud_t pud)
> +{
> + return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> +}
>   #endif /* __ASSEMBLY__ */
>   
>   #include 
> @@ -1436,20 +1448,5 @@ static inline bool is_pte_rw_upgrade(unsigned long 
> old_val, unsigned long new_va
>   return false;
>   }
>   
> -/*
> - * Like pmd_huge(), but works regardless of config options
> - */
> -#define pmd_leaf pmd_leaf
> -static inline bool pmd_leaf(pmd_t pmd)
> -{
> - return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
> -#define pud_leaf pud_leaf
> -static inline bool pud_leaf(pud_t pud)
> -{
> - return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> -}
> -
>   #endif /* __ASSEMBLY__ */
>   #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */

Re: [PATCH 11/14] s390: Add support for suppressing warning backtraces

2024-03-14 Thread Geert Uytterhoeven

Hi Günter,

On Tue, Mar 12, 2024 at 6:06 PM Guenter Roeck  wrote:
> Add name of functions triggering warning backtraces to the __bug_table
> object section to enable support for suppressing WARNING backtraces.
>
> To limit image size impact, the pointer to the function name is only added
> to the __bug_table section if both CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE
> are enabled. Otherwise, the __func__ assembly parameter is replaced with a
> (dummy) NULL parameter to avoid an image size increase due to unused
> __func__ entries (this is necessary because __func__ is not a define but a
> virtual variable).
>
> Signed-off-by: Guenter Roeck 

Thanks for your patch!

> --- a/arch/s390/include/asm/bug.h
> +++ b/arch/s390/include/asm/bug.h
> @@ -8,19 +8,30 @@
>
>  #ifdef CONFIG_DEBUG_BUGVERBOSE
>
> +#if IS_ENABLED(CONFIG_KUNIT)
> +# define HAVE_BUG_FUNCTION
> +# define __BUG_FUNC_PTR"   .long   %0-.\n"
> +# define __BUG_FUNC__func__
> +#else
> +# define __BUG_FUNC_PTR
> +# define __BUG_FUNCNULL
> +#endif /* IS_ENABLED(CONFIG_KUNIT) */
> +
>  #define __EMIT_BUG(x) do { \
> asm_inline volatile(\
> "0: mc  0,0\n"  \
> ".section .rodata.str,\"aMS\",@progbits,1\n"\
> "1: .asciz  \""__FILE__"\"\n"   \
> ".previous\n"   \
> -   ".section __bug_table,\"awM\",@progbits,%2\n"   \
> +   ".section __bug_table,\"awM\",@progbits,%3\n"   \

This change conflicts with commit 3938490e78f443fb ("s390/bug:
remove entry size from __bug_table section") in linus/master.
I guess it should just be dropped?

> "2: .long   0b-.\n" \
> "   .long   1b-.\n" \
> -   "   .short  %0,%1\n"\
> -   "   .org2b+%2\n"\
> +   __BUG_FUNC_PTR  \
> +   "   .short  %1,%2\n"\
> +   "   .org2b+%3\n"\
> ".previous\n"   \
> -   : : "i" (__LINE__), \
> +   : : "i" (__BUG_FUNC),   \
> +   "i" (__LINE__), \
> "i" (x),\
> "i" (sizeof(struct bug_entry)));\
>  } while (0)

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH v10 8/8] PCI: dwc: ep: Remove "core_init_notifier" flag

2024-03-14 Thread Manivannan Sadhasivam

"core_init_notifier" flag is set by the glue drivers requiring refclk from
the host to complete the DWC core initialization. Also, those drivers will
send a notification to the EPF drivers once the initialization is fully
completed using the pci_epc_init_notify() API. Only then, the EPF drivers
will start functioning.

For the rest of the drivers generating refclk locally, EPF drivers will
start functioning post binding with them. EPF drivers rely on the
'core_init_notifier' flag to differentiate between the drivers.
Unfortunately, this creates two different flows for the EPF drivers.

So to avoid that, let's get rid of the "core_init_notifier" flag and follow
a single initialization flow for the EPF drivers. This is done by calling
the dw_pcie_ep_init_notify() from all glue drivers after the completion of
dw_pcie_ep_init_registers() API. This will allow all the glue drivers to
send the notification to the EPF drivers once the initialization is fully
completed.

Only difference here is that, the drivers requiring refclk from host will
send the notification once refclk is received, while others will send it
during probe time itself.

But this also requires the EPC core driver to deliver the notification
after EPF driver bind. Because, the glue driver can send the notification
before the EPF drivers bind() and in those cases the EPF drivers will miss
the event. To accommodate this, EPC core is now caching the state of the
EPC initialization in 'init_complete' flag and pci-ep-cfs driver sends the
notification to EPF drivers based on that after each EPF driver bind.

Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   |  2 ++
 drivers/pci/controller/dwc/pci-imx6.c |  2 ++
 drivers/pci/controller/dwc/pci-keystone.c |  2 ++
 drivers/pci/controller/dwc/pci-layerscape-ep.c|  2 ++
 drivers/pci/controller/dwc/pcie-artpec6.c |  2 ++
 drivers/pci/controller/dwc/pcie-designware-plat.c |  2 ++
 drivers/pci/controller/dwc/pcie-keembay.c |  2 ++
 drivers/pci/controller/dwc/pcie-qcom-ep.c |  1 -
 drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  2 ++
 drivers/pci/controller/dwc/pcie-tegra194.c|  1 -
 drivers/pci/controller/dwc/pcie-uniphier-ep.c |  2 ++
 drivers/pci/endpoint/functions/pci-epf-test.c | 18 +-
 drivers/pci/endpoint/pci-ep-cfs.c |  9 +
 drivers/pci/endpoint/pci-epc-core.c   | 22 ++
 include/linux/pci-epc.h   |  7 ---
 15 files changed, 58 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 395042b29ffc..d2d17d37d3e0 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -474,6 +474,8 @@ static int dra7xx_add_pcie_ep(struct dra7xx_pcie *dra7xx,
return ret;
}
 
+   dw_pcie_ep_init_notify(ep);
+
return 0;
 }
 
diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index bfcafa440ddb..894b5de76e3a 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1144,6 +1144,8 @@ static int imx6_add_pcie_ep(struct imx6_pcie *imx6_pcie,
return ret;
}
 
+   dw_pcie_ep_init_notify(ep);
+
/* Start LTSSM. */
imx6_pcie_ltssm_enable(dev);
 
diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index 093dbb725e41..b7b30470b394 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -1293,6 +1293,8 @@ static int ks_pcie_probe(struct platform_device *pdev)
goto err_ep_init;
}
 
+   dw_pcie_ep_init_notify(>ep);
+
break;
default:
dev_err(dev, "INVALID device type %d\n", mode);
diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index b712fdd06549..c513598a46d7 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -283,6 +283,8 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
return ret;
}
 
+   dw_pcie_ep_init_notify(>ep);
+
return ls_pcie_ep_interrupt_init(pcie, pdev);
 }
 
diff --git a/drivers/pci/controller/dwc/pcie-artpec6.c 
b/drivers/pci/controller/dwc/pcie-artpec6.c
index a6095561db4a..a4630b92489b 100644
--- a/drivers/pci/controller/dwc/pcie-artpec6.c
+++ b/drivers/pci/controller/dwc/pcie-artpec6.c
@@ -452,6 +452,8 @@ static int artpec6_pcie_probe(struct platform_device *pdev)
return ret;
}
 
+   dw_pcie_ep_init_notify(>ep);
+
break;
default:
dev_err(dev, "INVALID device type %d\n",

[PATCH v10 7/8] PCI: dwc: ep: Call dw_pcie_ep_init_registers() API directly from all glue drivers

2024-03-14 Thread Manivannan Sadhasivam

Currently, dw_pcie_ep_init_registers() API is directly called by the glue
drivers requiring active refclk from host. But for the other drivers, it is
getting called implicitly by dw_pcie_ep_init(). This is due to the fact
that this API initializes DWC EP specific registers and that requires an
active refclk (either from host or generated locally by endpoint itsef).

But, this causes a discrepancy among the glue drivers. So to avoid this
confusion, let's call this API directly from all glue drivers irrespective
of refclk dependency. Only difference here is that the drivers requiring
refclk from host will call this API only after the refclk is received and
other drivers without refclk dependency will call this API right after
dw_pcie_ep_init().

With this change, the check for 'core_init_notifier' flag can now be
dropped from dw_pcie_ep_init() API. This will also allow us to remove the
'core_init_notifier' flag completely in the later commits.

Reviewed-by: Yoshihiro Shimoda 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   |  7 +++
 drivers/pci/controller/dwc/pci-imx6.c |  8 
 drivers/pci/controller/dwc/pci-keystone.c |  9 +
 drivers/pci/controller/dwc/pci-layerscape-ep.c|  7 +++
 drivers/pci/controller/dwc/pcie-artpec6.c | 13 -
 drivers/pci/controller/dwc/pcie-designware-ep.c   | 22 --
 drivers/pci/controller/dwc/pcie-designware-plat.c |  9 +
 drivers/pci/controller/dwc/pcie-keembay.c | 16 +++-
 drivers/pci/controller/dwc/pcie-rcar-gen4.c   | 12 +++-
 drivers/pci/controller/dwc/pcie-uniphier-ep.c | 13 -
 10 files changed, 90 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 0e406677060d..395042b29ffc 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -467,6 +467,13 @@ static int dra7xx_add_pcie_ep(struct dra7xx_pcie *dra7xx,
return ret;
}
 
+   ret = dw_pcie_ep_init_registers(ep);
+   if (ret) {
+   dev_err(dev, "Failed to initialize DWC endpoint registers\n");
+   dw_pcie_ep_deinit(ep);
+   return ret;
+   }
+
return 0;
 }
 
diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index dc2c036ab28c..bfcafa440ddb 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1136,6 +1136,14 @@ static int imx6_add_pcie_ep(struct imx6_pcie *imx6_pcie,
dev_err(dev, "failed to initialize endpoint\n");
return ret;
}
+
+   ret = dw_pcie_ep_init_registers(ep);
+   if (ret) {
+   dev_err(dev, "Failed to initialize DWC endpoint registers\n");
+   dw_pcie_ep_deinit(ep);
+   return ret;
+   }
+
/* Start LTSSM. */
imx6_pcie_ltssm_enable(dev);
 
diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index c0c62533a3f1..093dbb725e41 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -1286,6 +1286,13 @@ static int ks_pcie_probe(struct platform_device *pdev)
ret = dw_pcie_ep_init(>ep);
if (ret < 0)
goto err_get_sync;
+
+   ret = dw_pcie_ep_init_registers(>ep);
+   if (ret) {
+   dev_err(dev, "Failed to initialize DWC endpoint 
registers\n");
+   goto err_ep_init;
+   }
+
break;
default:
dev_err(dev, "INVALID device type %d\n", mode);
@@ -1295,6 +1302,8 @@ static int ks_pcie_probe(struct platform_device *pdev)
 
return 0;
 
+err_ep_init:
+   dw_pcie_ep_deinit(>ep);
 err_get_sync:
pm_runtime_put(dev);
pm_runtime_disable(dev);
diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 2e398494e7c0..b712fdd06549 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -276,6 +276,13 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
if (ret)
return ret;
 
+   ret = dw_pcie_ep_init_registers(>ep);
+   if (ret) {
+   dev_err(dev, "Failed to initialize DWC endpoint registers\n");
+   dw_pcie_ep_deinit(>ep);
+   return ret;
+   }
+
return ls_pcie_ep_interrupt_init(pcie, pdev);
 }
 
diff --git a/drivers/pci/controller/dwc/pcie-artpec6.c 
b/drivers/pci/controller/dwc/pcie-artpec6.c
index 9ed0a9ba7619..a6095561db4a 100644
--- a/drivers/pci/controller/dwc/pcie-artpec6.c
+++ b/drivers/pci/controller/dwc/pcie-artpec6.c
@@ -441,7 +441,18 @@ static int

[PATCH v10 6/8] PCI: dwc: ep: Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers()

2024-03-14 Thread Manivannan Sadhasivam

The goal of the dw_pcie_ep_init_complete() API is to initialize the DWC
specific registers post registering the controller with the EP framework.

But the naming doesn't reflect its functionality and causes confusion. So,
let's rename it to dw_pcie_ep_init_registers() to make it clear that it
initializes the DWC specific registers.

Reviewed-by: Frank Li 
Reviewed-by: Niklas Cassel 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 14 +++---
 drivers/pci/controller/dwc/pcie-designware.h|  4 ++--
 drivers/pci/controller/dwc/pcie-qcom-ep.c   |  2 +-
 drivers/pci/controller/dwc/pcie-tegra194.c  |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 4c21a38245b6..9354671644b6 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -673,14 +673,14 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct 
dw_pcie *pci, int cap)
 }
 
 /**
- * dw_pcie_ep_init_complete - Complete DWC EP initialization
+ * dw_pcie_ep_init_registers - Initialize DWC EP specific registers
  * @ep: DWC EP device
  *
- * Complete the initialization of the registers (CSRs) specific to DWC EP. This
- * API should be called only when the endpoint receives an active refclk 
(either
- * from host or generated locally).
+ * Initialize the registers (CSRs) specific to DWC EP. This API should be 
called
+ * only when the endpoint receives an active refclk (either from host or
+ * generated locally).
  */
-int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
+int dw_pcie_ep_init_registers(struct dw_pcie_ep *ep)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct dw_pcie_ep_func *ep_func;
@@ -795,7 +795,7 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 
return ret;
 }
-EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
+EXPORT_SYMBOL_GPL(dw_pcie_ep_init_registers);
 
 /**
  * dw_pcie_ep_init - Initialize the endpoint device
@@ -874,7 +874,7 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 * (Ex: tegra194). Any hardware access on such platforms result
 * in system hang.
 */
-   ret = dw_pcie_ep_init_complete(ep);
+   ret = dw_pcie_ep_init_registers(ep);
if (ret)
goto err_free_epc_mem;
 
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index 351d2fe3ea4d..f8e5431a207b 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -669,7 +669,7 @@ static inline void __iomem *dw_pcie_own_conf_map_bus(struct 
pci_bus *bus,
 #ifdef CONFIG_PCIE_DW_EP
 void dw_pcie_ep_linkup(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init(struct dw_pcie_ep *ep);
-int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep);
+int dw_pcie_ep_init_registers(struct dw_pcie_ep *ep);
 void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep);
 void dw_pcie_ep_deinit(struct dw_pcie_ep *ep);
 void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep);
@@ -693,7 +693,7 @@ static inline int dw_pcie_ep_init(struct dw_pcie_ep *ep)
return 0;
 }
 
-static inline int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
+static inline int dw_pcie_ep_init_registers(struct dw_pcie_ep *ep)
 {
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c 
b/drivers/pci/controller/dwc/pcie-qcom-ep.c
index 59b1c0110288..3697b4a944cc 100644
--- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
+++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
@@ -463,7 +463,7 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci)
  PARF_INT_ALL_LINK_UP | PARF_INT_ALL_EDMA;
writel_relaxed(val, pcie_ep->parf + PARF_INT_ALL_MASK);
 
-   ret = dw_pcie_ep_init_complete(_ep->pci.ep);
+   ret = dw_pcie_ep_init_registers(_ep->pci.ep);
if (ret) {
dev_err(dev, "Failed to complete initialization: %d\n", ret);
goto err_disable_resources;
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c 
b/drivers/pci/controller/dwc/pcie-tegra194.c
index 68bfeed3429b..264ee76bf008 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -1897,7 +1897,7 @@ static void pex_ep_event_pex_rst_deassert(struct 
tegra_pcie_dw *pcie)
val = (upper_32_bits(ep->msi_mem_phys) & MSIX_ADDR_MATCH_HIGH_OFF_MASK);
dw_pcie_writel_dbi(pci, MSIX_ADDR_MATCH_HIGH_OFF, val);
 
-   ret = dw_pcie_ep_init_complete(ep);
+   ret = dw_pcie_ep_init_registers(ep);
if (ret) {
dev_err(dev, "Failed to complete initialization: %d\n", ret);
goto fail_init_complete;

-- 
2.25.1

[PATCH v10 5/8] PCI: dwc: ep: Introduce dw_pcie_ep_cleanup() API for drivers supporting PERST#

2024-03-14 Thread Manivannan Sadhasivam

For DWC glue drivers supporting PERST# (currently Qcom and Tegra194), some
of the DWC resources like eDMA should be cleaned up during the PERST#
assert time.

So let's introduce a dw_pcie_ep_cleanup() API that could be called by these
drivers to cleanup the DWC specific resources. Currently, it just removes
eDMA.

Reported-by: Niklas Cassel 
Closes: https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon
Reviewed-by: Frank Li 
Reviewed-by: Niklas Cassel 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 19 +--
 drivers/pci/controller/dwc/pcie-designware.h|  5 +
 drivers/pci/controller/dwc/pcie-qcom-ep.c   |  1 +
 drivers/pci/controller/dwc/pcie-tegra194.c  |  2 ++
 4 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index fa7b26da8718..4c21a38245b6 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -618,6 +618,22 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 
func_no,
return 0;
 }
 
+/**
+ * dw_pcie_ep_cleanup - Cleanup DWC EP resources after fundamental reset
+ * @ep: DWC EP device
+ *
+ * Cleans up the DWC EP specific resources like eDMA etc... after fundamental
+ * reset like PERST#. Note that this API is only applicable for drivers
+ * supporting PERST# or any other methods of fundamental reset.
+ */
+void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+
+   dw_pcie_edma_remove(pci);
+}
+EXPORT_SYMBOL_GPL(dw_pcie_ep_cleanup);
+
 /**
  * dw_pcie_ep_deinit - Deinitialize the endpoint device
  * @ep: DWC EP device
@@ -627,10 +643,9 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 
func_no,
  */
 void dw_pcie_ep_deinit(struct dw_pcie_ep *ep)
 {
-   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct pci_epc *epc = ep->epc;
 
-   dw_pcie_edma_remove(pci);
+   dw_pcie_ep_cleanup(ep);
 
pci_epc_mem_free_addr(epc, ep->msi_mem_phys, ep->msi_mem,
  epc->mem->window.page_size);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index 61465203bb60..351d2fe3ea4d 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -672,6 +672,7 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep);
 void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep);
 void dw_pcie_ep_deinit(struct dw_pcie_ep *ep);
+void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep);
 int dw_pcie_ep_raise_intx_irq(struct dw_pcie_ep *ep, u8 func_no);
 int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 u8 interrupt_num);
@@ -705,6 +706,10 @@ static inline void dw_pcie_ep_deinit(struct dw_pcie_ep *ep)
 {
 }
 
+static inline void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep)
+{
+}
+
 static inline int dw_pcie_ep_raise_intx_irq(struct dw_pcie_ep *ep, u8 func_no)
 {
return 0;
diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c 
b/drivers/pci/controller/dwc/pcie-qcom-ep.c
index 36e5e80cd22f..59b1c0110288 100644
--- a/drivers/pci/controller/dwc/pcie-qcom-ep.c
+++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c
@@ -507,6 +507,7 @@ static void qcom_pcie_perst_assert(struct dw_pcie *pci)
return;
}
 
+   dw_pcie_ep_cleanup(>ep);
qcom_pcie_disable_resources(pcie_ep);
pcie_ep->link_status = QCOM_PCIE_EP_LINK_DISABLED;
 }
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c 
b/drivers/pci/controller/dwc/pcie-tegra194.c
index 7afa9e9aabe2..68bfeed3429b 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -1715,6 +1715,8 @@ static void pex_ep_event_pex_rst_assert(struct 
tegra_pcie_dw *pcie)
if (ret)
dev_err(pcie->dev, "Failed to go Detect state: %d\n", ret);
 
+   dw_pcie_ep_cleanup(>pci.ep);
+
reset_control_assert(pcie->core_rst);
 
tegra_pcie_disable_phy(pcie);

-- 
2.25.1

[PATCH v10 4/8] PCI: dwc: ep: Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit()

2024-03-14 Thread Manivannan Sadhasivam

dw_pcie_ep_exit() API is undoing what the dw_pcie_ep_init() API has done
already (at least partly). But the API name dw_pcie_ep_exit() is not quite
reflecting that. So let's rename it to dw_pcie_ep_deinit() to make the
purpose of this API clear. This also aligns with the DWC host driver.

Reviewed-by: Frank Li 
Reviewed-by: Niklas Cassel 
Reviewed-by: Yoshihiro Shimoda 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 6 +++---
 drivers/pci/controller/dwc/pcie-designware.h| 4 ++--
 drivers/pci/controller/dwc/pcie-rcar-gen4.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index e59e35fd7251..fa7b26da8718 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -619,13 +619,13 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 
func_no,
 }
 
 /**
- * dw_pcie_ep_exit - Deinitialize the endpoint device
+ * dw_pcie_ep_deinit - Deinitialize the endpoint device
  * @ep: DWC EP device
  *
  * Deinitialize the endpoint device. EPC device is not destroyed since that 
will
  * taken care by Devres.
  */
-void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
+void dw_pcie_ep_deinit(struct dw_pcie_ep *ep)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct pci_epc *epc = ep->epc;
@@ -637,7 +637,7 @@ void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
 
pci_epc_mem_exit(epc);
 }
-EXPORT_SYMBOL_GPL(dw_pcie_ep_exit);
+EXPORT_SYMBOL_GPL(dw_pcie_ep_deinit);
 
 static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int 
cap)
 {
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index ab7431a37209..61465203bb60 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -671,7 +671,7 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init(struct dw_pcie_ep *ep);
 int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep);
 void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep);
-void dw_pcie_ep_exit(struct dw_pcie_ep *ep);
+void dw_pcie_ep_deinit(struct dw_pcie_ep *ep);
 int dw_pcie_ep_raise_intx_irq(struct dw_pcie_ep *ep, u8 func_no);
 int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 u8 interrupt_num);
@@ -701,7 +701,7 @@ static inline void dw_pcie_ep_init_notify(struct dw_pcie_ep 
*ep)
 {
 }
 
-static inline void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
+static inline void dw_pcie_ep_deinit(struct dw_pcie_ep *ep)
 {
 }
 
diff --git a/drivers/pci/controller/dwc/pcie-rcar-gen4.c 
b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
index ac97d594ea47..9d9d22e367bb 100644
--- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
+++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
@@ -430,7 +430,7 @@ static int rcar_gen4_add_dw_pcie_ep(struct rcar_gen4_pcie 
*rcar)
 
 static void rcar_gen4_remove_dw_pcie_ep(struct rcar_gen4_pcie *rcar)
 {
-   dw_pcie_ep_exit(>dw.ep);
+   dw_pcie_ep_deinit(>dw.ep);
rcar_gen4_pcie_ep_deinit(rcar);
 }
 

-- 
2.25.1

[PATCH v10 3/8] PCI: dwc: ep: Remove deinit() callback from struct dw_pcie_ep_ops

2024-03-14 Thread Manivannan Sadhasivam

deinit() callback was solely introduced for the pcie-rcar-gen4 driver where
it is used to do platform specific resource deallocation. And this callback
is called right at the end of the dw_pcie_ep_exit() API. So it doesn't
matter whether it is called within or outside of dw_pcie_ep_exit() API.

So let's remove this callback and directly call rcar_gen4_pcie_ep_deinit()
in pcie-rcar-gen4 driver to do resource deallocation after the completion
of dw_pcie_ep_exit() API in rcar_gen4_remove_dw_pcie_ep().

This simplifies the DWC layer.

Reviewed-by: Frank Li 
Reviewed-by: Niklas Cassel 
Reviewed-by: Yoshihiro Shimoda 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pcie-designware-ep.c |  9 +
 drivers/pci/controller/dwc/pcie-designware.h|  1 -
 drivers/pci/controller/dwc/pcie-rcar-gen4.c | 14 --
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index d7e8f2dda6ce..e59e35fd7251 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -636,9 +636,6 @@ void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
  epc->mem->window.page_size);
 
pci_epc_mem_exit(epc);
-
-   if (ep->ops->deinit)
-   ep->ops->deinit(ep);
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_exit);
 
@@ -838,7 +835,7 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
   ep->page_size);
if (ret < 0) {
dev_err(dev, "Failed to initialize address space\n");
-   goto err_ep_deinit;
+   return ret;
}
 
ep->msi_mem = pci_epc_mem_alloc_addr(epc, >msi_mem_phys,
@@ -875,10 +872,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 err_exit_epc_mem:
pci_epc_mem_exit(epc);
 
-err_ep_deinit:
-   if (ep->ops->deinit)
-   ep->ops->deinit(ep);
-
return ret;
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_init);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index 26dae4837462..ab7431a37209 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -333,7 +333,6 @@ struct dw_pcie_rp {
 struct dw_pcie_ep_ops {
void(*pre_init)(struct dw_pcie_ep *ep);
void(*init)(struct dw_pcie_ep *ep);
-   void(*deinit)(struct dw_pcie_ep *ep);
int (*raise_irq)(struct dw_pcie_ep *ep, u8 func_no,
 unsigned int type, u16 interrupt_num);
const struct pci_epc_features* (*get_features)(struct dw_pcie_ep *ep);
diff --git a/drivers/pci/controller/dwc/pcie-rcar-gen4.c 
b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
index e9166619b1f9..ac97d594ea47 100644
--- a/drivers/pci/controller/dwc/pcie-rcar-gen4.c
+++ b/drivers/pci/controller/dwc/pcie-rcar-gen4.c
@@ -352,11 +352,8 @@ static void rcar_gen4_pcie_ep_init(struct dw_pcie_ep *ep)
dw_pcie_ep_reset_bar(pci, bar);
 }
 
-static void rcar_gen4_pcie_ep_deinit(struct dw_pcie_ep *ep)
+static void rcar_gen4_pcie_ep_deinit(struct rcar_gen4_pcie *rcar)
 {
-   struct dw_pcie *dw = to_dw_pcie_from_ep(ep);
-   struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
-
writel(0, rcar->base + PCIEDMAINTSTSEN);
rcar_gen4_pcie_common_deinit(rcar);
 }
@@ -408,7 +405,6 @@ static unsigned int 
rcar_gen4_pcie_ep_get_dbi2_offset(struct dw_pcie_ep *ep,
 static const struct dw_pcie_ep_ops pcie_ep_ops = {
.pre_init = rcar_gen4_pcie_ep_pre_init,
.init = rcar_gen4_pcie_ep_init,
-   .deinit = rcar_gen4_pcie_ep_deinit,
.raise_irq = rcar_gen4_pcie_ep_raise_irq,
.get_features = rcar_gen4_pcie_ep_get_features,
.get_dbi_offset = rcar_gen4_pcie_ep_get_dbi_offset,
@@ -418,18 +414,24 @@ static const struct dw_pcie_ep_ops pcie_ep_ops = {
 static int rcar_gen4_add_dw_pcie_ep(struct rcar_gen4_pcie *rcar)
 {
struct dw_pcie_ep *ep = >dw.ep;
+   int ret;
 
if (!IS_ENABLED(CONFIG_PCIE_RCAR_GEN4_EP))
return -ENODEV;
 
ep->ops = _ep_ops;
 
-   return dw_pcie_ep_init(ep);
+   ret = dw_pcie_ep_init(ep);
+   if (ret)
+   rcar_gen4_pcie_ep_deinit(rcar);
+
+   return ret;
 }
 
 static void rcar_gen4_remove_dw_pcie_ep(struct rcar_gen4_pcie *rcar)
 {
dw_pcie_ep_exit(>dw.ep);
+   rcar_gen4_pcie_ep_deinit(rcar);
 }
 
 /* Common */

-- 
2.25.1

[PATCH v10 2/8] PCI: dwc: ep: Add Kernel-doc comments for APIs

2024-03-14 Thread Manivannan Sadhasivam

All of the APIs are missing the Kernel-doc comments. Hence, add them.

Reviewed-by: Frank Li 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 78 +
 1 file changed, 78 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index c05304eabb89..d7e8f2dda6ce 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -14,6 +14,10 @@
 #include 
 #include 
 
+/**
+ * dw_pcie_ep_linkup - Notify EPF drivers about link up event
+ * @ep: DWC EP device
+ */
 void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
 {
struct pci_epc *epc = ep->epc;
@@ -22,6 +26,11 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_linkup);
 
+/**
+ * dw_pcie_ep_init_notify - Notify EPF drivers about EPC initialization
+ * complete
+ * @ep: DWC EP device
+ */
 void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
 {
struct pci_epc *epc = ep->epc;
@@ -30,6 +39,14 @@ void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_init_notify);
 
+/**
+ * dw_pcie_ep_get_func_from_ep - Get the struct dw_pcie_ep_func corresponding 
to
+ *  the endpoint function
+ * @ep: DWC EP device
+ * @func_no: Function number of the endpoint device
+ *
+ * Return: struct dw_pcie_ep_func if success, NULL otherwise.
+ */
 struct dw_pcie_ep_func *
 dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
 {
@@ -60,6 +77,11 @@ static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 
func_no,
dw_pcie_dbi_ro_wr_dis(pci);
 }
 
+/**
+ * dw_pcie_ep_reset_bar - Reset endpoint BAR
+ * @pci: DWC PCI device
+ * @bar: BAR number of the endpoint
+ */
 void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
 {
u8 func_no, funcs;
@@ -439,6 +461,13 @@ static const struct pci_epc_ops epc_ops = {
.get_features   = dw_pcie_ep_get_features,
 };
 
+/**
+ * dw_pcie_ep_raise_intx_irq - Raise INTx IRQ to the host
+ * @ep: DWC EP device
+ * @func_no: Function number of the endpoint
+ *
+ * Return: 0 if success, errono otherwise.
+ */
 int dw_pcie_ep_raise_intx_irq(struct dw_pcie_ep *ep, u8 func_no)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
@@ -450,6 +479,14 @@ int dw_pcie_ep_raise_intx_irq(struct dw_pcie_ep *ep, u8 
func_no)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_raise_intx_irq);
 
+/**
+ * dw_pcie_ep_raise_msi_irq - Raise MSI IRQ to the host
+ * @ep: DWC EP device
+ * @func_no: Function number of the endpoint
+ * @interrupt_num: Interrupt number to be raised
+ *
+ * Return: 0 if success, errono otherwise.
+ */
 int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 u8 interrupt_num)
 {
@@ -498,6 +535,15 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 
func_no,
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_raise_msi_irq);
 
+/**
+ * dw_pcie_ep_raise_msix_irq_doorbell - Raise MSIX to the host using Doorbell
+ * method
+ * @ep: DWC EP device
+ * @func_no: Function number of the endpoint device
+ * @interrupt_num: Interrupt number to be raised
+ *
+ * Return: 0 if success, errno otherwise.
+ */
 int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
   u16 interrupt_num)
 {
@@ -517,6 +563,14 @@ int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep 
*ep, u8 func_no,
return 0;
 }
 
+/**
+ * dw_pcie_ep_raise_msix_irq - Raise MSIX to the host
+ * @ep: DWC EP device
+ * @func_no: Function number of the endpoint device
+ * @interrupt_num: Interrupt number to be raised
+ *
+ * Return: 0 if success, errno otherwise.
+ */
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
  u16 interrupt_num)
 {
@@ -564,6 +618,13 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 
func_no,
return 0;
 }
 
+/**
+ * dw_pcie_ep_exit - Deinitialize the endpoint device
+ * @ep: DWC EP device
+ *
+ * Deinitialize the endpoint device. EPC device is not destroyed since that 
will
+ * taken care by Devres.
+ */
 void dw_pcie_ep_exit(struct dw_pcie_ep *ep)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
@@ -599,6 +660,14 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct 
dw_pcie *pci, int cap)
return 0;
 }
 
+/**
+ * dw_pcie_ep_init_complete - Complete DWC EP initialization
+ * @ep: DWC EP device
+ *
+ * Complete the initialization of the registers (CSRs) specific to DWC EP. This
+ * API should be called only when the endpoint receives an active refclk 
(either
+ * from host or generated locally).
+ */
 int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
@@ -716,6 +785,15 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
 
+/**
+ *

[PATCH v10 1/8] PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host

2024-03-14 Thread Manivannan Sadhasivam

The DWC glue drivers requiring an active reference clock from the PCIe host
for initializing their PCIe EP core, set a flag called 'core_init_notifier'
to let DWC driver know that these drivers need a special attention during
initialization. In these drivers, access to the hw registers (like DBI)
before receiving the active refclk from host will result in access failure
and also could cause a whole system hang.

But the current DWC EP driver doesn't honor the requirements of the drivers
setting 'core_init_notifier' flag and tries to access the DBI registers
during dw_pcie_ep_init(). This causes the system hang for glue drivers such
as Tegra194 and Qcom EP as they depend on refclk from host and have set the
above mentioned flag.

To workaround this issue, users of the affected platforms have to maintain
the dependency with the PCIe host by booting the PCIe EP after host boot.
But this won't provide a good user experience, since PCIe EP is _one_ of
the features of those platforms and it doesn't make sense to delay the
whole platform booting due to PCIe requiring active refclk.

So to fix this issue, let's move all the DBI access from
dw_pcie_ep_init() in the DWC EP driver to the dw_pcie_ep_init_complete()
API. This API will only be called by the drivers setting
'core_init_notifier' flag once refclk is received from host. For the rest
of the drivers that gets the refclk locally, this API will be called
within dw_pcie_ep_init().

Fixes: e966f7390da9 ("PCI: dwc: Refactor core initialization code for EP mode")
Co-developed-by: Vidya Sagar 
Signed-off-by: Vidya Sagar 
Reviewed-by: Frank Li 
Reviewed-by: Niklas Cassel 
Signed-off-by: Manivannan Sadhasivam 
---
 drivers/pci/controller/dwc/pcie-designware-ep.c | 120 ++--
 1 file changed, 71 insertions(+), 49 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 5befed2dc02b..c05304eabb89 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -602,11 +602,16 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct 
dw_pcie *pci, int cap)
 int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
 {
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct dw_pcie_ep_func *ep_func;
+   struct device *dev = pci->dev;
+   struct pci_epc *epc = ep->epc;
unsigned int offset, ptm_cap_base;
unsigned int nbars;
u8 hdr_type;
+   u8 func_no;
+   int i, ret;
+   void *addr;
u32 reg;
-   int i;
 
hdr_type = dw_pcie_readb_dbi(pci, PCI_HEADER_TYPE) &
   PCI_HEADER_TYPE_MASK;
@@ -617,6 +622,58 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
return -EIO;
}
 
+   dw_pcie_version_detect(pci);
+
+   dw_pcie_iatu_detect(pci);
+
+   ret = dw_pcie_edma_detect(pci);
+   if (ret)
+   return ret;
+
+   if (!ep->ib_window_map) {
+   ep->ib_window_map = devm_bitmap_zalloc(dev, pci->num_ib_windows,
+  GFP_KERNEL);
+   if (!ep->ib_window_map)
+   goto err_remove_edma;
+   }
+
+   if (!ep->ob_window_map) {
+   ep->ob_window_map = devm_bitmap_zalloc(dev, pci->num_ob_windows,
+  GFP_KERNEL);
+   if (!ep->ob_window_map)
+   goto err_remove_edma;
+   }
+
+   if (!ep->outbound_addr) {
+   addr = devm_kcalloc(dev, pci->num_ob_windows, 
sizeof(phys_addr_t),
+   GFP_KERNEL);
+   if (!addr)
+   goto err_remove_edma;
+   ep->outbound_addr = addr;
+   }
+
+   for (func_no = 0; func_no < epc->max_functions; func_no++) {
+
+   ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
+   if (ep_func)
+   continue;
+
+   ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
+   if (!ep_func)
+   goto err_remove_edma;
+
+   ep_func->func_no = func_no;
+   ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
+ PCI_CAP_ID_MSI);
+   ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
+  PCI_CAP_ID_MSIX);
+
+   list_add_tail(_func->list, >func_list);
+   }
+
+   if (ep->ops->init)
+   ep->ops->init(ep);
+
offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
ptm_cap_base = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_PTM);
 
@@ -651,14 +708,17 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
dw_pcie_dbi_ro_wr_dis(pci);
 
return 0;
+
+err_remove_edma:
+

[PATCH v10 0/8] PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host

2024-03-14 Thread Manivannan Sadhasivam

Hello,

This series is the continuation of previous work by Vidya Sagar [1] to fix the
issues related to accessing DBI register space before completing the core
initialization in some EP platforms like Tegra194/234 and Qcom EP.

Since Vidya is busy, I took over the series based on his consent (off-list
discussion).

NOTE


Based on the comments received in v7 [2], I've heavily modified the series
to fix several other issues reported by Bjorn and Niklas. One noticeable
change is getting rid of the 'core_init_notifer' flag added to differentiate
between glue drivers requiring refclk from host and drivers getting refclk
locally.

By getting rid of this flag, now both the DWC EP driver and the EPF drivers
can use a single flow and need not distinguish between the glue drivers.

We can also get rid of the 'link_up_notifier' flag in the future by following
the same convention.

Testing
===

I've tested the series on Qcom SM8450 based dev board that depends on refclk
from host with EPF_MHI driver. It'd be good to test this series on platforms
that generate refclk locally and also with EPF_TEST driver.

- Mani

[1] https://lore.kernel.org/linux-pci/20221013175712.7539-1-vid...@nvidia.com/
[2] 
https://lore.kernel.org/linux-pci/20231120084014.108274-1-manivannan.sadhasi...@linaro.org/

Changes in v10:
- Reordered the commits by moving the independent fixes/cleanups first (Niklas)
- Addressed several comments from Niklas
- Moved PTM register setting out of dw_pcie_ep_init_non_sticky_registers() 
(Niklas)
- Addressed the issue that EPF drivers were missing init notification after the
  removal of core_init_notifier (Niklas)
- Dropped a few cleanup patches to be clubbed with the follow up series
- Collected review tags
- Dropped the review tags for patch 8/8 as it got changed 
- Link to v9: 
https://lore.kernel.org/r/20240304-pci-dbi-rework-v9-0-29d433d99...@linaro.org

Changes in v9:
- Incorporated changes for missing drivers (Niklas)
- Reworded the dw_pcie_ep_cleanup() API kdoc (Niklas)
- Reworded the description of patch 6/10 (Frank)
- Collected reviews
- Link to v8: 
https://lore.kernel.org/r/20240224-pci-dbi-rework-v8-0-64c7fd0cf...@linaro.org

Changes in v8:

- Rebased on top of v6.8-rc1
- Removed the deinit callback from struct dw_pcie_ep_ops
- Renamed dw_pcie_ep_exit() to dw_pcie_ep_deinit()
- Introduced dw_pcie_ep_cleanup() API for drivers supporting PERST#
- Renamed dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers()
- Called dw_pcie_ep_init_registers() API directly from all glue drivers
- Removed "core_init_notifier" flag
- Added a generic dw_pcie_ep_linkdown() API to handle LINK_DOWN event and used
  it in qcom driver
- Added Kernel-doc comments for DWC EP APIs

Changes in v7:

- Rebased on top of v6.7-rc1
- Kept the current dw_pcie_ep_init_complete() API instead of renaming it to
  dw_pcie_ep_init_late(), since changing the name causes a slight ambiguity.
- Splitted the change that moves pci_epc_init_notify() inside
  dw_pcie_ep_init_notify() to help bisecting and also to avoid build issue.
- Added a new patch that moves pci_epc_init_notify() inside
  dw_pcie_ep_init_notify().
- Took over the authorship and dropped the previous Ack as the patches are
  heavily modified.

Changes in v6:

- Rebased on top of pci/next (6e2fca71e187)
- removed ep_init_late() callback as it is no longer necessary

For previous changelog, please refer [1].

Signed-off-by: Manivannan Sadhasivam 
---
Manivannan Sadhasivam (8):
  PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from 
host
  PCI: dwc: ep: Add Kernel-doc comments for APIs
  PCI: dwc: ep: Remove deinit() callback from struct dw_pcie_ep_ops
  PCI: dwc: ep: Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit()
  PCI: dwc: ep: Introduce dw_pcie_ep_cleanup() API for drivers supporting 
PERST#
  PCI: dwc: ep: Rename dw_pcie_ep_init_complete() to 
dw_pcie_ep_init_registers()
  PCI: dwc: ep: Call dw_pcie_ep_init_registers() API directly from all glue 
drivers
  PCI: dwc: ep: Remove "core_init_notifier" flag

 drivers/pci/controller/dwc/pci-dra7xx.c   |   9 +
 drivers/pci/controller/dwc/pci-imx6.c |  10 +
 drivers/pci/controller/dwc/pci-keystone.c |  11 +
 drivers/pci/controller/dwc/pci-layerscape-ep.c|   9 +
 drivers/pci/controller/dwc/pcie-artpec6.c |  15 +-
 drivers/pci/controller/dwc/pcie-designware-ep.c   | 238 +++---
 drivers/pci/controller/dwc/pcie-designware-plat.c |  11 +
 drivers/pci/controller/dwc/pcie-designware.h  |  14 +-
 drivers/pci/controller/dwc/pcie-keembay.c |  18 +-
 drivers/pci/controller/dwc/pcie-qcom-ep.c |   4 +-
 drivers/pci/controller/dwc/pcie-rcar-gen4.c   |  28 ++-
 drivers/pci/controller/dwc/pcie-tegra194.c|   5 +-
 drivers/pci/controller/dwc/pcie-uniphier-ep.c |  15 +-
 drivers/pci/endpoint/functions/pci-epf-test.c |  18 +-
 drivers/pci/endpoint/pci-ep-cfs.c |   9 +

Re: [PATCH v9 06/10] PCI: dwc: ep: Call dw_pcie_ep_init_registers() API directly from all glue drivers

2024-03-14 Thread Manivannan Sadhasivam

On Fri, Mar 08, 2024 at 11:22:52AM +0100, Niklas Cassel wrote:
> On Fri, Mar 08, 2024 at 03:19:47PM +0530, Manivannan Sadhasivam wrote:
> > > > > > @@ -467,6 +467,13 @@ static int dra7xx_add_pcie_ep(struct 
> > > > > > dra7xx_pcie *dra7xx,
> > > > > > return ret;
> > > > > > }
> > > > > >  
> > > > > > +   ret = dw_pcie_ep_init_registers(ep);
> > > > > > +   if (ret) {
> > > > > 
> > > > > Here you are using if (ret) to error check the return from
> > > > > dw_pcie_ep_init_registers().
> > > > > 
> > > > > 
> > > > > > index c0c62533a3f1..8392894ed286 100644
> > > > > > --- a/drivers/pci/controller/dwc/pci-keystone.c
> > > > > > +++ b/drivers/pci/controller/dwc/pci-keystone.c
> > > > > > @@ -1286,6 +1286,13 @@ static int ks_pcie_probe(struct 
> > > > > > platform_device *pdev)
> > > > > > ret = dw_pcie_ep_init(>ep);
> > > > > > if (ret < 0)
> > > > > > goto err_get_sync;
> > > > > > +
> > > > > > +   ret = dw_pcie_ep_init_registers(>ep);
> > > > > > +   if (ret < 0) {
> > > > > 
> > > > > Here you are using if (ret < 0) to error check the return from
> > > > > dw_pcie_ep_init_registers(). Please be consistent.
> > > > > 
> > > > 
> > > > I maintained the consistency w.r.t individual drivers. Please check them
> > > > individually.
> > > > 
> > > > If I maintain consistency w.r.t this patch, then the style will change 
> > > > within
> > > > the drivers.
> > > 
> > > Personally, I disagree with that.
> > > 
> > > All glue drivers should use the same way of checking dw_pcie_ep_init(),
> > > depending on the kdoc of dw_pcie_ep_init().
> > > 
> > > If the kdoc for dw_pcie_ep_init() says returns 0 on success,
> > > then I think that it is strictly more correct to do:
> > > 
> > > ret = dw_pcie_ep_init()
> > > if (ret) {
> > >   
> > > }
> > > 
> > > And if a glue driver doesn't look like that, then I think we should change
> > > them. (Same reasoning for dw_pcie_ep_init_registers().)
> > > 
> > > 
> > > If you read code that looks like:
> > > ret = dw_pcie_ep_init()
> > > if (ret < 0) {
> > >   
> > > }
> > > 
> > > then you assume that is is a function with a kdoc that says it can return > > > 0
> > > or a positive value on success, e.g. a function that returns an index in 
> > > an
> > > array.
> > > 
> > 
> > But if you read the same function from the individual drivers, it could 
> > present
> > a different opinion because the samantics is different than others.
> 
> Is there any glue driver where a positive result from dw_pcie_ep_init() is
> considered valid?
> 
> 
> > 
> > I'm not opposed to keeping the API semantics consistent, but we have to take
> > account of the drivers style as well.
> 
> kdoc > "driver style"
> IMO, but you are the maintainer, I just offered my 50 cents :)
> 

Those valuable 50 cents :) Looking at it again, I think you are right. We
should honor the API over driver's own style.

I've changed the semantics in next version, thanks!

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH 00/14] Add support for suppressing warning backtraces

2024-03-14 Thread Naresh Kamboju

On Tue, 12 Mar 2024 at 22:33, Guenter Roeck  wrote:



> This series is based on the RFC patch and subsequent discussion at
> https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b08-ba81-d94f3b691c9a@moroto.mountain/
> and offers a more comprehensive solution of the problem discussed there.

Thanks for the patchset.
This patch series applied on top of Linux next and tested.

Tested-by: Linux Kernel Functional Testing 


--
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH v7 1/5] net: wan: Add support for QMC HDLC

2024-03-14 Thread Herve Codina

Hi Michael,

On Thu, 14 Mar 2024 10:05:37 +1100
Michael Ellerman  wrote:

> Hi Herve,
> 
> Herve Codina  writes:
...
> This breaks when building as a module (eg. ppc32_allmodconfig):
> 
>   In file included from ../include/linux/device/driver.h:21,
>from ../include/linux/device.h:32,
>from ../include/linux/dma-mapping.h:8,
>from ../drivers/net/wan/fsl_qmc_hdlc.c:13:
>   ../drivers/net/wan/fsl_qmc_hdlc.c:405:25: error: ‘qmc_hdlc_driver’ 
> undeclared here (not in a function); did you mean ‘qmc_hdlc_probe’?
> 405 | MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
> | ^~~
> 
> 
> IIUIC it should be pointing to the table, not the driver, so:
> 
> diff --git a/drivers/net/wan/fsl_qmc_hdlc.c b/drivers/net/wan/fsl_qmc_hdlc.c
> index 5fd7ed325f5b..705c3681fb92 100644
> --- a/drivers/net/wan/fsl_qmc_hdlc.c
> +++ b/drivers/net/wan/fsl_qmc_hdlc.c
> @@ -402,7 +402,7 @@ static const struct of_device_id qmc_hdlc_id_table[] = {
> { .compatible = "fsl,qmc-hdlc" },
> {} /* sentinel */
>  };
> -MODULE_DEVICE_TABLE(of, qmc_hdlc_driver);
> +MODULE_DEVICE_TABLE(of, qmc_hdlc_id_table);
> 
>  static struct platform_driver qmc_hdlc_driver = {
> .driver = {
> 
> 
> Which then builds correctly.

My bad, I missed that one.
I fully agree with your modification.

Do you want me to make a patch (copy/paste of your proposed modification)
or do you plan to send the patch on your side ?

Best regards,
Hervé

-- 
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

74 matches

Mail list logo