Re: [PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA

2023-05-04 Thread Guo Ren
On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann  wrote:
>
> From: Arnd Bergmann 
>
> For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned
> first to let the device see data written by the CPU, and invalidated
> after the transfer to let the CPU see data written by the device.
>
> riscv also invalidates the caches before the transfer, which does
> not appear to serve any purpose.
Yes, we can't guarantee the CPU pre-load cache lines randomly during
dma working.

But I've two purposes to keep invalidates before dma transfer:
 - We clearly tell the CPU these cache lines are invalid. The caching
algorithm would use these invalid slots first instead of replacing
valid ones.
 - Invalidating is very cheap. Actually, flush and clean have the same
performance in our machine.

So, how about:

diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index d919efab6eba..2c52fbc15064 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
break;
case DMA_FROM_DEVICE:
-   ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
-   break;
case DMA_BIDIRECTIONAL:
ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
break;
@@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
break;
case DMA_FROM_DEVICE:
case DMA_BIDIRECTIONAL:
/* I'm not sure all drivers have guaranteed cacheline
alignment. If not, this inval would cause problems */
-   ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+   ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size);
break;
default:
break;

>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/riscv/mm/dma-noncoherent.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index 640f4c496d26..69c80b2155a1 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -25,7 +25,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t 
> size,
> ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> break;
> case DMA_BIDIRECTIONAL:
> -   ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
> +   ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
> break;
> default:
> break;
> --
> 2.39.2
>


-- 
Best Regards
 Guo Ren


Re: [PASEMI NEMO] Boot issue with the PowerPC updates 6.4-1

2023-05-04 Thread Christian Zigotzky

On 03 May 2023 at 08:15 pm, Rob Herring wrote:

On Wed, May 3, 2023 at 12:40 PM Christian Zigotzky
 wrote:




On 3. May 2023, at 18:51, Rob Herring  wrote:

On Wed, May 3, 2023 at 11:27 AM Christophe Leroy
 wrote:

+Rob as he's the commit's Author.


Le 03/05/2023 à 17:46, Christian Zigotzky a écrit :
On 02 May 2023 at 11:28 am, Michael Ellerman wrote:

Christian Zigotzky  writes:

Hello,

Our PASEMI Nemo board [1] doesn't boot with the PowerPC updates 6.4-1
[2].

The kernel hangs right after the booting Linux via __start() @
0x ...

I was able to revert the PowerPC updates 6.4-1 [2] with the following
command: git revert 70cc1b5307e8ee3076fdf2ecbeb89eb973aa0ff7 -m 1

After a re-compiling, the kernel boots without any problems without the
PowerPC updates 6.4-1 [2].

Could you please explain me, what you have done in the boot area?

There's a few possibilities, but nothing obvious.

To begin with can you please test the following commits?

77e69ee7ce07
e4ab08be5b49
eeac8ede1755

cheers

git revert e4ab08be5b4902e5b350b0e1e1a3c25eb21d76d4

[master 0086e2cbbec0] Revert "powerpc/isa-bridge: Remove open coded
"ranges" parsing"
  1 file changed, 129 insertions(+), 37 deletions(-)

After a recompiling it boots without any problems.

e4ab08be5b49 -- powerpc/isa-bridge: Remove open coded "ranges" parsing
is the bad commit.

Could I get a DT file for this board?

In the meantime, just revert this commit. I don't think I'll be able
to fix it before I'm out on sabbatical.

Rob

FYI:

Darren Stevens wrote:

The dtb passed by the CFE firmware has a number of issues, which up till
now have been fixed by use of patches applied to the mainline kernel.
This occasionally causes problems with changes made to mainline.

Changing the firmware is not the only way to modify the DT. Perhaps a
DT overlay would work better than carrying patches if the patches
aren't upstreamable. It kind of depends on how early you'd need to
apply the overlay and whether you'd need external phandles (aka
__symbols__ node, which the base DTB wouldn't support).

Looking at the DT, I think this change might fix it. Can you test this change:

diff --git a/drivers/of/address.c b/drivers/of/address.c
index e692809ff822..475b74413fdd 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -284,7 +284,7 @@ EXPORT_SYMBOL(of_range_to_resource);

  static int of_bus_isa_match(struct device_node *np)
  {
-   return of_node_name_eq(np, "isa");
+   return of_node_is_type(np, "isa") || of_node_name_eq(np, "isa");
  }

  static void of_bus_isa_count_cells(struct device_node *child,

I tested this patch today but unfortunately the Nemo board doesn't boot.

- Christian


[PATCH] mm: kfence: Fix false positives on big endian

2023-05-04 Thread Michael Ellerman
Since commit 1ba3cbf3ec3b ("mm: kfence: improve the performance of
__kfence_alloc() and __kfence_free()"), kfence reports failures in
random places at boot on big endian machines.

The problem is that the new KFENCE_CANARY_PATTERN_U64 encodes the
address of each byte in its value, so it needs to be byte swapped on big
endian machines.

The compiler is smart enough to do the le64_to_cpu() at compile time, so
there is no runtime overhead.

Fixes: 1ba3cbf3ec3b ("mm: kfence: improve the performance of __kfence_alloc() 
and __kfence_free()")
Signed-off-by: Michael Ellerman 
---
 mm/kfence/kfence.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h
index 2aafc46a4aaf..392fb273e7bd 100644
--- a/mm/kfence/kfence.h
+++ b/mm/kfence/kfence.h
@@ -29,7 +29,7 @@
  * canary of every 8 bytes is the same. 64-bit memory can be filled and checked
  * at a time instead of byte by byte to improve performance.
  */
-#define KFENCE_CANARY_PATTERN_U64 ((u64)0x ^ 
(u64)(0x0706050403020100))
+#define KFENCE_CANARY_PATTERN_U64 ((u64)0x ^ 
(u64)(le64_to_cpu(0x0706050403020100)))
 
 /* Maximum stack depth for reports. */
 #define KFENCE_STACK_DEPTH 64
-- 
2.40.1



Re: [PATCH v4 11/17] watchdog/hardlockup: Rename some "NMI watchdog" constants/function

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> Do a search and replace of:
> - NMI_WATCHDOG_ENABLED => HARD_WATCHDOG_ENABLED
> - watchdog_nmi_ => watchdog_hardlockup_

These are just making prefixes inconsistent again.

If you really want to do a prefix, I would call it hardlockup which
probably best matches existing code and sysctl / boot stuff, and
concentrate on non-static symbols.

No problem with minor things like this that touch arch/powerpc
going through Andrew's tree though. I'm sure sparc maintainers
wouldn't mind either.

Thanks,
Nick


Re: [PATCH v4 08/17] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / ..._is_lockedup()

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> These are tiny style changes:
> - Add a blank line before a "return".
> - Renames two globals to use the "watchdog_hld" prefix.

Particularly static ones don't really need the namespace prefixes.

Not sure if processed is better than warn. allcpu_dumped is better
than dumped_stacks though because the all-CPUs-dump is a particular
thing.

Other style bits seem fine.

Thanks,
Nick

> - Store processor id in "unsigned int" rather than "int".
> - Minor comment rewording.
> - Use "else" rather than extra returns since it seemed more symmetric.
>
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v4:
> - ("Style changes to watchdog_hardlockup_check ...") new for v4.
>
>  kernel/watchdog.c | 32 +++-
>  1 file changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 2d319cdf64b9..f46669c1671d 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -89,8 +89,8 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> -static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> -static unsigned long hardlockup_allcpu_dumped;
> +static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed);
> +static unsigned long watchdog_hardlockup_dumped_stacks;
>  
>  static bool watchdog_hardlockup_is_lockedup(void)
>  {
> @@ -100,6 +100,7 @@ static bool watchdog_hardlockup_is_lockedup(void)
>   return true;
>  
>   __this_cpu_write(hrtimer_interrupts_saved, hrint);
> +
>   return false;
>  }
>  
> @@ -110,21 +111,20 @@ static void watchdog_hardlockup_interrupt_count(void)
>  
>  void watchdog_hardlockup_check(struct pt_regs *regs)
>  {
> - /* check for a hardlockup
> -  * This is done by making sure our timer interrupt
> -  * is incrementing.  The timer interrupt should have
> -  * fired multiple times before we overflow'd.  If it hasn't
> + /*
> +  * Check for a hardlockup by making sure the CPU's timer
> +  * interrupt is incrementing. The timer interrupt should have
> +  * fired multiple times before we overflow'd. If it hasn't
>* then this is a good indication the cpu is stuck
>*/
>   if (watchdog_hardlockup_is_lockedup()) {
> - int this_cpu = smp_processor_id();
> + unsigned int this_cpu = smp_processor_id();
>  
> - /* only print hardlockups once */
> - if (__this_cpu_read(hard_watchdog_warn) == true)
> + /* Only handle hardlockups once. */
> + if (__this_cpu_read(watchdog_hardlockup_processed))
>   return;
>  
> - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
> -  this_cpu);
> + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);
>   print_modules();
>   print_irqtrace_events(current);
>   if (regs)
> @@ -137,18 +137,16 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
>* generating interleaving traces
>*/
>   if (sysctl_hardlockup_all_cpu_backtrace &&
> - !test_and_set_bit(0, _allcpu_dumped))
> + !test_and_set_bit(0, _hardlockup_dumped_stacks))
>   trigger_allbutself_cpu_backtrace();
>  
>   if (hardlockup_panic)
>   nmi_panic(regs, "Hard LOCKUP");
>  
> - __this_cpu_write(hard_watchdog_warn, true);
> - return;
> + __this_cpu_write(watchdog_hardlockup_processed, true);
> + } else {
> + __this_cpu_write(watchdog_hardlockup_processed, false);
>   }
> -
> - __this_cpu_write(hard_watchdog_warn, false);
> - return;
>  }
>  
>  #else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
> -- 
> 2.40.1.521.gf1e218fcd8-goog



Re: [PATCH v4 07/17] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> The perf hardlockup detector works by looking at interrupt counts and
> seeing if they change from run to run. The interrupt counts are
> managed by the common watchdog code via its watchdog_timer_fn().
>
> Currently the API between the perf detector and the common code is a
> function: is_hardlockup(). When the hard lockup detector sees that
> function return true then it handles printing out debug info and
> inducing a panic if necessary.
>
> Let's change the API a little bit in preparation for the buddy
> hardlockup detector. The buddy hardlockup detector wants to print

I think the name change is a gratuitous. Especially since it's now
static.

watchdog_hardlockup_ is a pretty long prefix too, hardlockup_ 
should be enough?

Seems okay otherwise though.

Thanks,
Nick

> nearly the same debug info and have nearly the same panic
> behavior. That means we want to move all that code to the common
> file. For now, the code in the common file will only be there if the
> perf hardlockup detector is enabled, but eventually it will be
> selected by a common config.
>
> Right now, this _just_ moves the code from the perf detector file to
> the common file and changes the names. It doesn't make the changes
> that the buddy hardlockup detector will need and doesn't do any style
> cleanups. A future patch will do cleanup to make it more obvious what
> changed.
>
> With the above, we no longer have any callers of is_hardlockup()
> outside of the "watchdog.c" file, so we can remove it from the header,
> make it static, move it to the same "#ifdef" block as our new
> watchdog_hardlockup_check(), and rename it to make it obvious it's
> just for hardlockup detectors. While doing this, it can be noted that
> even if no hardlockup detectors were configured the existing code used
> to still have the code for counting/checking "hrtimer_interrupts" even
> if the perf hardlockup detector wasn't configured. We didn't need to
> do that, so move all the "hrtimer_interrupts" counting to only be
> there if the perf hardlockup detector is configured as well.
>
> This change is expected to be a no-op.
>
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v4:
> - ("Move perf hardlockup checking/panic ...") new for v4.
>
>  include/linux/nmi.h|  5 ++-
>  kernel/watchdog.c  | 92 +-
>  kernel/watchdog_perf.c | 42 +--
>  3 files changed, 78 insertions(+), 61 deletions(-)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 35d09d70f394..c6cb9bc5dc80 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -15,7 +15,6 @@
>  void lockup_detector_init(void);
>  void lockup_detector_soft_poweroff(void);
>  void lockup_detector_cleanup(void);
> -bool is_hardlockup(void);
>  
>  extern int watchdog_user_enabled;
>  extern int nmi_watchdog_user_enabled;
> @@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic;
>  static inline void hardlockup_detector_disable(void) {}
>  #endif
>  
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +void watchdog_hardlockup_check(struct pt_regs *regs);
> +#endif
> +
>  #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
>  # define NMI_WATCHDOG_SYSCTL_PERM0644
>  #else
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index c705a18b26bf..2d319cdf64b9 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  #endif /* CONFIG_HARDLOCKUP_DETECTOR */
>  
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +
> +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> +static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> +static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> +static unsigned long hardlockup_allcpu_dumped;
> +
> +static bool watchdog_hardlockup_is_lockedup(void)
> +{
> + unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> +
> + if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> + return true;
> +
> + __this_cpu_write(hrtimer_interrupts_saved, hrint);
> + return false;
> +}
> +
> +static void watchdog_hardlockup_interrupt_count(void)
> +{
> + __this_cpu_inc(hrtimer_interrupts);
> +}
> +
> +void watchdog_hardlockup_check(struct pt_regs *regs)
> +{
> + /* check for a hardlockup
> +  * This is done by making sure our timer interrupt
> +  * is incrementing.  The timer interrupt should have
> +  * fired multiple times before we overflow'd.  If it hasn't
> +  * then this is a good indication the cpu is stuck
> +  */
> + if (watchdog_hardlockup_is_lockedup()) {
> + int this_cpu = smp_processor_id();
> +
> + /* only print hardlockups once */
> + if (__this_cpu_read(hard_watchdog_warn) == true)
> + return;
> +
> + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
> +  

Re: [PATCH v4 06/17] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> The code currently in "watchdog_hld.c" is for detecting hardlockups
> using perf, as evidenced by the line in the Makefile that only
> compiles this file if CONFIG_HARDLOCKUP_DETECTOR_PERF is
> defined. Rename the file to prepare for the buddy hardlockup detector,
> which doesn't use perf.
>
> It could be argued that the new name makes it less obvious that this
> is a hardlockup detector. While true, it's not hard to remember that
> the "perf" detector is always a hardlockup detector and it's nice not
> to have names that are too convoluted.

Better than the confusion that the perf version is *the* hardlockup
detector IMO.

Acked-by: Nicholas Piggin 

>
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v4:
> - ("Rename watchdog_hld.c to watchdog_perf.c") new for v4.
>
>  kernel/Makefile| 2 +-
>  kernel/{watchdog_hld.c => watchdog_perf.c} | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>  rename kernel/{watchdog_hld.c => watchdog_perf.c} (99%)
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 10ef068f598d..406c4dd3 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -91,7 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
>  obj-$(CONFIG_KGDB) += debug/
>  obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
>  obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
> -obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o
>  obj-$(CONFIG_SECCOMP) += seccomp.o
>  obj-$(CONFIG_RELAY) += relay.o
>  obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_perf.c
> similarity index 99%
> rename from kernel/watchdog_hld.c
> rename to kernel/watchdog_perf.c
> index 96b717205952..c3d8ceb149da 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_perf.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
> - * Detect hard lockups on a system
> + * Detect hard lockups on a system using perf
>   *
>   * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc.
>   *
> -- 
> 2.40.1.521.gf1e218fcd8-goog



Re: [PATCH v4 05/17] watchdog/hardlockup: Rename touch_nmi_watchdog() to touch_hardlockup_watchdog()

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> In preparation for the buddy hardlockup detector, rename
> touch_nmi_watchdog() to touch_hardlockup_watchdog() to make it clear
> that it will touch whatever hardlockup detector is configured. We'll
> add a #define for the old name (touch_nmi_watchdog) so that we don't
> have to touch every piece of code referring to the old name.

Is this really helpful? Now it's got two names Could just leave it.
If you insist then it'd be better just to rename everything in one
go at the end of a merge window IMO. Conflicts would be trivial.

> Ideally this change would also rename the arch_touch_nmi_watchdog(),
> but that is harder since arch_touch_nmi_watchdog() is exported with
> EXPORT_SYMBOL() and thus is ABI. Add a comment next to the call to
> hopefully alleviate some of the confusion here.

We don't keep ABI fixed upstream.

Thanks,
Nick

>
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v4:
> - ("Rename touch_nmi_watchdog() to ...") new for v4.
>
>  include/linux/nmi.h | 27 ++-
>  1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 454fe99c4874..35d09d70f394 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -125,18 +125,35 @@ void watchdog_nmi_disable(unsigned int cpu);
>  void lockup_detector_reconfigure(void);
>  
>  /**
> - * touch_nmi_watchdog - restart NMI watchdog timeout.
> + * touch_hardlockup_watchdog - manually pet the hardlockup watchdog.
>   *
> - * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
> - * may be used to reset the timeout - for code which intentionally
> - * disables interrupts for a long time. This call is stateless.
> + * If we support detecting hardlockups, touch_hardlockup_watchdog() may be
> + * used to pet the watchdog (reset the timeout) - for code which
> + * intentionally disables interrupts for a long time. This call is stateless.
>   */
> -static inline void touch_nmi_watchdog(void)
> +static inline void touch_hardlockup_watchdog(void)
>  {
> + /*
> +  * Pass on to the hardlockup detector selected via CONFIG_. Note that
> +  * the hardlockup detector may not be arch-specific nor using NMIs,
> +  * but arch_touch_nmi_watchdog() is exported with EXPORT_SYMBOL() and
> +  * is thus ABI.
> +  */
>   arch_touch_nmi_watchdog();
> +
> + /*
> +  * Touching the hardlock detector implcitily pets the
> +  * softlockup detector too
> +  */
>   touch_softlockup_watchdog();
>  }
>  
> +/*
> + * It's encouraged for code to refer to the new name, but allow the old
> + * name as well.
> + */
> +#define touch_nmi_watchdog   touch_hardlockup_watchdog
> +
>  /*
>   * Create trigger_all_cpu_backtrace() out of the arch-provided
>   * base function. Return whether such support was available,
> -- 
> 2.40.1.521.gf1e218fcd8-goog



Re: [PATCH v4 03/17] watchdog/hardlockup: change watchdog_nmi_enable() to void

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> From: Lecopzer Chen 
>
> Nobody cares about the return value of watchdog_nmi_enable(),
> changing its prototype to void.
>

Acked-by: Nicholas Piggin 

> Signed-off-by: Pingfan Liu 
> Signed-off-by: Lecopzer Chen 
> Reviewed-by: Petr Mladek 
> Signed-off-by: Douglas Anderson 
> ---
> I yanked this patch from the mailing lists [1] into my series just to
> make it easier to avoid conflicts between my series and the one adding
> the arm64 perf hardlockup detector, in case someone wanted to test
> them both together. This is a nice cleanup and could land together
> with the rest of my series if that makes sense.
>
> I changed the patch prefix to match others in my series.
>
> [1] 
> https://lore.kernel.org/r/20220903093415.15850-3-lecopzer.c...@mediatek.com/
>
> Changes in v4:
> - Pulled ("change watchdog_nmi_enable() to void") into my series for v4.
>
>  arch/sparc/kernel/nmi.c | 8 +++-
>  include/linux/nmi.h | 2 +-
>  kernel/watchdog.c   | 3 +--
>  3 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
> index 060fff95a305..5dcf31f7e81f 100644
> --- a/arch/sparc/kernel/nmi.c
> +++ b/arch/sparc/kernel/nmi.c
> @@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
>   * sparc specific NMI watchdog enable function.
>   * Enables watchdog if it is not enabled already.
>   */
> -int watchdog_nmi_enable(unsigned int cpu)
> +void watchdog_nmi_enable(unsigned int cpu)
>  {
>   if (atomic_read(_active) == -1) {
>   pr_warn("NMI watchdog cannot be enabled or disabled\n");
> - return -1;
> + return;
>   }
>  
>   /*
> @@ -295,11 +295,9 @@ int watchdog_nmi_enable(unsigned int cpu)
>* process first.
>*/
>   if (!nmi_init_done)
> - return 0;
> + return;
>  
>   smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
> -
> - return 0;
>  }
>  /*
>   * sparc specific NMI watchdog disable function.
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 771d77b62bc1..454fe99c4874 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { 
> return 0; }
>  void watchdog_nmi_stop(void);
>  void watchdog_nmi_start(void);
>  int watchdog_nmi_probe(void);
> -int watchdog_nmi_enable(unsigned int cpu);
> +void watchdog_nmi_enable(unsigned int cpu);
>  void watchdog_nmi_disable(unsigned int cpu);
>  
>  void lockup_detector_reconfigure(void);
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 582d572e1379..c705a18b26bf 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -93,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>   * softlockup watchdog start and stop. The arch must select the
>   * SOFTLOCKUP_DETECTOR Kconfig.
>   */
> -int __weak watchdog_nmi_enable(unsigned int cpu)
> +void __weak watchdog_nmi_enable(unsigned int cpu)
>  {
>   hardlockup_detector_perf_enable();
> - return 0;
>  }
>  
>  void __weak watchdog_nmi_disable(unsigned int cpu)
> -- 
> 2.40.1.521.gf1e218fcd8-goog



Re: [PATCH v4 01/17] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> The real watchdog_update_hrtimer_threshold() is defined in
> watchdog_hardlockup_perf.c. That file is included if

In kernel/watchdog_hld.c.

> CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that
> file if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP.
>
> The dummy version of the function in "nmi.h" didn't get that quite
> right. While this doesn't appear to be a huge deal, it's nice to make
> it consistent.

It doesn't break builds because CHECK_TIMESTAMP is only defined by
x86 so others don't get a double definition, and x86 uses perf lockup
detector, so it gets the out of line version.

So has no functional change but should be fixed.

Reviewed-by: Nicholas Piggin 

>
> Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo 
> modes")
> Signed-off-by: Douglas Anderson 
> ---
>
> Changes in v4:
> - ("Define dummy watchdog_update_hrtimer_threshold() ...") new for v4.
>
>  include/linux/nmi.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 048c0b9aa623..771d77b62bc1 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -197,7 +197,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh);
>  #endif
>  
>  #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
> -defined(CONFIG_HARDLOCKUP_DETECTOR)
> +defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  void watchdog_update_hrtimer_threshold(u64 period);
>  #else
>  static inline void watchdog_update_hrtimer_threshold(u64 period) { }
> -- 
> 2.40.1.521.gf1e218fcd8-goog



Re: [PATCH v4 13/17] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs

2023-05-04 Thread Nicholas Piggin
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote:
> From: Colin Cross 
>
> Implement a hardlockup detector that doesn't doesn't need any extra
> arch-specific support code to detect lockups. Instead of using
> something arch-specific we will use the buddy system, where each CPU
> watches out for another one. Specifically, each CPU will use its
> softlockup hrtimer to check that the next CPU is processing hrtimer
> interrupts by verifying that a counter is increasing.

Powerpc's watchdog has an SMP checker, did you see it? It's all to
all rather than buddy which makes it more complicated but arguably
bit better functionality.

Thanks,
Nick


Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

2023-05-04 Thread Michael Ellerman
Gaurav Batra  writes:
> Hello Michael,
>
> I agree with your concerns regarding a device been able to access memory 
> that doesn't belong to it. That exposure we have today with 2MB TCEs. 
> With 2MB TCEs, DMA window size will be big enough, for dedicated 
> adapters, that whole memory is going to be mapped "direct". Which 
> essentially means, that a "rogue" device/driver has the potential to 
> corrupt LPAR wide memory.

Yes that's always been a trade-off between performance and robustness,
and performance is generally the winner.

There have been various command line flags in the past to configure
stricter behaviour, disable bypass etc. Some of those are now generic,
iommu.strict/passthrough, it would be good to get them wired up to work
on powerpc at some point.

> I have sent you v3.

Thanks.

cheers


Re: [PATCH v8 2/7] PCI: Export PCI link retrain timeout

2023-05-04 Thread Bjorn Helgaas
On Thu, Apr 06, 2023 at 01:21:09AM +0100, Maciej W. Rozycki wrote:
> Rename LINK_RETRAIN_TIMEOUT to PCIE_LINK_RETRAIN_TIMEOUT and make it
> available via "pci.h" for PCI drivers to use.

> +#define PCIE_LINK_RETRAIN_TIMEOUT HZ

This is basically just a rename and move, but since we're touching it
anyway, can we make it "PCIE_LINK_RETRAIN_TIMEOUT_MS 1000" here and
use msecs_to_jiffies() below?

I know jiffies and HZ are probably idiomatic elsewhere in the kernel,
and this particular timeout is arbitrary and not based on anything in
the spec, but many of the delays in PCI *are* straight from a spec, so
I'd like to make the units more explicit.

>  extern const unsigned char pcie_link_speed[];
>  extern bool pci_early_dump;
>  
> Index: linux-macro/drivers/pci/pcie/aspm.c
> ===
> --- linux-macro.orig/drivers/pci/pcie/aspm.c
> +++ linux-macro/drivers/pci/pcie/aspm.c
> @@ -90,8 +90,6 @@ static const char *policy_str[] = {
>   [POLICY_POWER_SUPERSAVE] = "powersupersave"
>  };
>  
> -#define LINK_RETRAIN_TIMEOUT HZ
> -
>  /*
>   * The L1 PM substate capability is only implemented in function 0 in a
>   * multi function device.
> @@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
>   }
>  
>   /* Wait for link training end. Break out after waiting for timeout */
> - end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
> + end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
>   do {
>   pcie_capability_read_word(parent, PCI_EXP_LNKSTA, );
>   if (!(reg16 & PCI_EXP_LNKSTA_LT))


Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-05-04 Thread Bjorn Helgaas
On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote:
> Attempt to handle cases such as with a downstream port of the ASMedia 
> ASM2824 PCIe switch where link training never completes and the link 
> continues switching between speeds indefinitely with the data link layer 
> never reaching the active state.

We're going to land this series this cycle, come hell or high water.

We talked about reusing pcie_retrain_link() earlier.  IIRC that didn't
work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices
support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA
because the erratum makes PCI_EXP_LNKSTA_LT flap.

What if we made pcie_retrain_link() reusable by making it:

  bool pcie_retrain_link(struct pci_dev *pdev, u16 link_status_bit)

so ASPM could use pcie_retrain_link(link->pdev, PCI_EXP_LNKSTA_LT) and
you could use pcie_retrain_link(dev, PCI_EXP_LNKSTA_DLLLA)?

Maybe do it two steps?

  1) Move pcie_retrain_link() just after pcie_wait_for_link() and make
  it take link->pdev instead of link.

  2) Add the bit parameter.

I'm OK with having pcie_retrain_link() in pci.c, but the surrounding
logic about restricting to 2.5GT/s, retraining, removing the
restriction, retraining again is stuff I'd rather have in quirks.c so
it doesn't clutter pci.c.

I think it'd be good if the pci_device_add() path made clear that this
is a workaround for a problem, e.g.,

  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
  {
...
if (pcie_link_failed(dev))
  pcie_fix_link_train(dev);

where pcie_fix_link_train() could live in quirks.c (with a stub when
CONFIG_PCI_QUIRKS isn't enabled).  It *might* even be worth adding it
and the stub first because that's a trivial patch and wouldn't clutter
the probe.c git history with all the grotty details about ASM2824 and
this topology.

> +int pcie_downstream_link_retrain(struct pci_dev *dev)
> +{
> + static const struct pci_device_id ids[] = {
> + { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
> + {}
> + };
> + u16 lnksta, lnkctl2;
> +
> + if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
> + !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
> + return -1;
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, );
> + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, );
> + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> + PCI_EXP_LNKSTA_LBMS) {

You go to some trouble to make sure PCI_EXP_LNKSTA_LBMS is set, and I
can't remember what the reason is.  If you make a preparatory patch
like this, it would give a place for that background, e.g.,

  +bool pcie_link_failed(struct pci_dev *dev)
  +{
  +   u16 lnksta;
  +
  +   if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
  +   !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
  +   return false;
  +
  +   pcie_capability_read_word(dev, PCI_EXP_LNKSTA, );
  +   if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
  +   PCI_EXP_LNKSTA_LBMS)
  +   return true;
  +
  +   return false;
  +}

If this is a generic thing and checking PCI_EXP_LNKSTA_LBMS makes
sense for everybody, it could go in pci.c; otherwise it could go in
quirks.c as well.  I guess it's not *truly* generic anyway because it
only detects link training failures for devices that have LNKCTL2 and
link_active_reporting.

> + unsigned long timeout;
> + u16 lnkctl;
> +
> + pci_info(dev, "broken device, retraining non-functional 
> downstream link at 2.5GT/s\n");
> +
> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, );
> + lnkctl |= PCI_EXP_LNKCTL_RL;
> + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> + /*
> +  * Due to an erratum in some devices the Retrain Link bit
> +  * needs to be cleared again manually to allow the link
> +  * training to succeed.
> +  */
> + lnkctl &= ~PCI_EXP_LNKCTL_RL;
> + if (dev->clear_retrain_link)
> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
> +lnkctl);
> +
> + timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
> + do {
> + pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
> +  );
> + if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> + break;
> + usleep_range(1, 2);
> + } while (time_before(jiffies, timeout));
> +
> + if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
> + pci_info(dev, "retraining 

[PATCH v4 17/17] arm64: Enable perf events based hard lockup detector

2023-05-04 Thread Douglas Anderson
With the recent feature added to enable perf events to use pseudo NMIs
as interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.

One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). To cope with that, override
arch_perf_nmi_is_available() to let the watchdog framework know PMU
not ready, and inform the framework to re-initialize lockup detection
once PMU has been initialized.

Co-developed-by: Sumit Garg 
Signed-off-by: Sumit Garg 
Co-developed-by: Pingfan Liu 
Signed-off-by: Pingfan Liu 
Signed-off-by: Lecopzer Chen 
Signed-off-by: Douglas Anderson 
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together.

As part of making this match with my series, I needed to resolve
conflicts with the patch ("watchdog/hardlockup: Have the perf
hardlockup use __weak functions more cleanly"). This makes ${SUBJECT}
patch now depend on the patch ("watchdog/perf: Add a weak function for
an arch to detect if perf can use NMIs"). As talked about in that
patch, there may be better alternatives to accomplish the same thing.

As mentioned in the cover letter, I'm not really expecting this patch
to land together with the patches for the buddy detector. I included
it with my series simply for convenience of testing both series
together.

NOTE: the previous patch posted by Lecopzer pointed to Sumit's
patch [2] in the commit text but provided no context. I moved it to
this "after the cut" note.

[1] https://lore.kernel.org/r/20220903093415.15850-7-lecopzer.c...@mediatek.com/
[2] 
http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.g...@linaro.org

Changes in v4:
- Pulled ("Enable perf events based hard ...") into my series for v4.

 arch/arm64/Kconfig   |  2 ++
 arch/arm64/kernel/perf_event.c   | 12 ++--
 arch/arm64/kernel/watchdog_hld.c | 12 
 drivers/perf/arm_pmu.c   |  5 +
 include/linux/perf/arm_pmu.h |  2 ++
 5 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..4ee9f3a7bc7a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -197,12 +197,14 @@ config ARM64
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GCC_PLUGINS
+   select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IOREMAP_PROT
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KVM
select HAVE_NMI
select HAVE_PERF_EVENTS
+   select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_PREEMPT_DYNAMIC_KEY
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index dde06c0f97f3..732bc5d76cb1 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* ARMv8 Cortex-A53 specific event types. */
 #define ARMV8_A53_PERFCTR_PREF_LINEFILL0xC2
@@ -1396,10 +1397,17 @@ static struct platform_driver armv8_pmu_driver = {
 
 static int __init armv8_pmu_driver_init(void)
 {
+   int ret;
+
if (acpi_disabled)
-   return platform_driver_register(_pmu_driver);
+   ret = platform_driver_register(_pmu_driver);
else
-   return arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
+   ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
+
+   if (!ret)
+   lockup_detector_retry_init();
+
+   return ret;
 }
 device_initcall(armv8_pmu_driver_init)
 
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
index 2401eb1b7e55..dcd25322127c 100644
--- a/arch/arm64/kernel/watchdog_hld.c
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
+#include 
 #include 
+#include 
 
 /*
  * Safe maximum CPU frequency in case a particular platform doesn't implement
@@ -22,3 +24,13 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 
return (u64)max_cpu_freq * watchdog_thresh;
 }
+
+bool __init arch_perf_nmi_is_available(void)
+{
+   /*
+* hardlockup_detector_perf_init() will success even if Pseudo-NMI 
turns off,
+* however, the pmu interrupts will act like a normal interrupt instead 
of
+* NMI and the hardlockup detector would be broken.
+*/
+   return arm_pmu_irq_is_nmi();
+}
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 15bd1e34a88e..7b9caa502d33 100644

[PATCH v4 16/17] arm64: add hw_nmi_get_sample_period for preparation of lockup detector

2023-05-04 Thread Douglas Anderson
From: Lecopzer Chen 

Set safe maximum CPU frequency to 5 GHz in case a particular platform
doesn't implement cpufreq driver.
Although, architecture doesn't put any restrictions on
maximum frequency but 5 GHz seems to be safe maximum given the available
Arm CPUs in the market which are clocked much less than 5 GHz.

On the other hand, we can't make it much higher as it would lead to
a large hard-lockup detection timeout on parts which are running
slower (eg. 1GHz on Developerbox) and doesn't possess a cpufreq driver.

Co-developed-by: Sumit Garg 
Signed-off-by: Sumit Garg 
Co-developed-by: Pingfan Liu 
Signed-off-by: Pingfan Liu 
Signed-off-by: Lecopzer Chen 
Signed-off-by: Douglas Anderson 
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together.

The only change I made here was to remove an extra blank line that
checkpatch complained about.

As mentioned in the cover letter, I'm not really expecting this patch
to land together with the patches for the buddy detector. I included
it with my series simply for convenience of testing both series
together.

NOTE: the previous patch posted by Lecopzer pointed to Sumit's
patch [2] in the commit text but provided no context. I moved it to
this "after the cut" note.

[1] https://lore.kernel.org/r/20220903093415.15850-6-lecopzer.c...@mediatek.com/
[2] 
http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.g...@linaro.org

Changes in v4:
- Pulled ("add hw_nmi_get_sample_period for ...") into my series for v4.

 arch/arm64/kernel/Makefile   |  1 +
 arch/arm64/kernel/watchdog_hld.c | 24 
 2 files changed, 25 insertions(+)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index ceba6792f5b3..f8e2645406a4 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES) += module.o
 obj-$(CONFIG_ARM64_MODULE_PLTS)+= module-plts.o
 obj-$(CONFIG_PERF_EVENTS)  += perf_regs.o perf_callchain.o
 obj-$(CONFIG_HW_PERF_EVENTS)   += perf_event.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
 obj-$(CONFIG_CPU_PM)   += sleep.o suspend.o
 obj-$(CONFIG_CPU_IDLE) += cpuidle.o
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
new file mode 100644
index ..2401eb1b7e55
--- /dev/null
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+
+/*
+ * Safe maximum CPU frequency in case a particular platform doesn't implement
+ * cpufreq driver. Although, architecture doesn't put any restrictions on
+ * maximum frequency but 5 GHz seems to be safe maximum given the available
+ * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
+ * hand, we can't make it much higher as it would lead to a large hard-lockup
+ * detection timeout on parts which are running slower (eg. 1GHz on
+ * Developerbox) and doesn't possess a cpufreq driver.
+ */
+#define SAFE_MAX_CPU_FREQ  50UL // 5 GHz
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
+{
+   unsigned int cpu = smp_processor_id();
+   unsigned long max_cpu_freq;
+
+   max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+   if (!max_cpu_freq)
+   max_cpu_freq = SAFE_MAX_CPU_FREQ;
+
+   return (u64)max_cpu_freq * watchdog_thresh;
+}
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 15/17] watchdog/perf: Adapt the watchdog_perf interface for async model

2023-05-04 Thread Douglas Anderson
From: Lecopzer Chen 

When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be
not ready yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach and try to initialize
the watchdog once again later.
The delayed probe is called using workqueues. It need to allocate
memory and must be proceed in a normal context.
The delayed probe is able to use if watchdog_hardlockup_probe() returns
non-zero which means the return code returned when PMU is not ready yet.

Provide an API - lockup_detector_retry_init() for anyone who needs
to delayed init lockup detector if they had ever failed at
lockup_detector_init().

The original assumption is: nobody should use delayed probe after
lockup_detector_check() which has __init attribute.
That is, anyone uses this API must call between lockup_detector_init()
and lockup_detector_check(), and the caller must have __init attribute

Reviewed-by: Petr Mladek 
Co-developed-by: Pingfan Liu 
Signed-off-by: Pingfan Liu 
Signed-off-by: Lecopzer Chen 
Suggested-by: Petr Mladek 
Signed-off-by: Douglas Anderson 
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together.

As part of making this match with my series, I resolved a few
conflicts and did a few renames. I also fixed the kernel test robot
yell [2] by providing a dummy implementation of the retry function if
CONFIG_LOCKUP_DETECTOR wasn't defined. That led me to move the
definition of the function and sanitize the name of the function to
match others around it.

This patch makes less sense without the patches to add support for the
arm64 perf hardlockup detector. Thus, I've put it at the end of the
series.

I removed the "kernel test robot" tag since that didn't really make
sense. Presumably the robot found a problem on a previous version of
the patch, but that's not normally a reason to add the Reported-by.

[1] https://lore.kernel.org/r/20220903093415.15850-5-lecopzer.c...@mediatek.com/
[2] https://lore.kernel.org/r/202209050639.jdawd49e-...@intel.com/

Changes in v4:
- Pulled ("Adapt the watchdog_hld interface ...") into my series for v4.

 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 67 -
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9caea5ba494d..3cacc5fc16b9 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -13,6 +13,7 @@
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 void lockup_detector_init(void);
+void lockup_detector_retry_init(void);
 void lockup_detector_soft_poweroff(void);
 void lockup_detector_cleanup(void);
 
@@ -34,6 +35,7 @@ extern int sysctl_hardlockup_all_cpu_backtrace;
 
 #else /* CONFIG_LOCKUP_DETECTOR */
 static inline void lockup_detector_init(void) { }
+static inline void lockup_detector_retry_init(void) { }
 static inline void lockup_detector_soft_poweroff(void) { }
 static inline void lockup_detector_cleanup(void) { }
 #endif /* !CONFIG_LOCKUP_DETECTOR */
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 678d55172ef4..55471634d932 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -211,7 +211,13 @@ void __weak watchdog_hardlockup_enable(unsigned int cpu) { 
}
 
 void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
 
-/* Return 0, if a hardlockup watchdog is available. Error code otherwise */
+/*
+ * Watchdog-detector specific API.
+ *
+ * Return 0 when hardlockup watchdog is available, negative value otherwise.
+ * Note that the negative value means that a delayed probe might
+ * succeed later.
+ */
 int __weak __init watchdog_hardlockup_probe(void)
 {
/*
@@ -957,6 +963,62 @@ static void __init watchdog_sysctl_init(void)
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
+static void __init lockup_detector_delay_init(struct work_struct *work);
+static bool allow_lockup_detector_init_retry __initdata;
+
+static struct work_struct detector_work __initdata =
+   __WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
+
+static void __init lockup_detector_delay_init(struct work_struct *work)
+{
+   int ret;
+
+   ret = watchdog_hardlockup_probe();
+   if (ret) {
+   pr_info("Delayed init of the lockup detector failed: %d\n", 
ret);
+   pr_info("Hard watchdog permanently disabled\n");
+   return;
+   }
+
+   allow_lockup_detector_init_retry = false;
+
+   nmi_watchdog_available = true;
+   lockup_detector_setup();
+}
+
+/*
+ * lockup_detector_retry_init - retry init lockup detector if possible.
+ *
+ * Retry hardlockup detector init. It is useful when it requires some
+ * 

[PATCH v4 14/17] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs

2023-05-04 Thread Douglas Anderson
On arm64, NMI support needs to be detected at runtime. Add a weak
function to the perf hardlockup detector so that an architecture can
implement it to detect whether NMIs are available.

Signed-off-by: Douglas Anderson 
---
While I won't object to this patch landing, I consider it part of the
arm64 perf hardlockup effort. I would be OK with the earlier patches
in the series landing and then not landing ${SUBJECT} patch nor
anything else later.

I'll also note that, as an alternative to this, it would be nice if we
could figure out how to make perf_event_create_kernel_counter() fail
on arm64 if NMIs aren't available. Maybe we could add a "must_use_nmi"
element to "struct perf_event_attr"?

Changes in v4:
- ("Add a weak function for an arch to detect ...") new for v4.

 include/linux/nmi.h|  1 +
 kernel/watchdog_perf.c | 12 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 90aa33317b4c..9caea5ba494d 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -212,6 +212,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+bool arch_perf_nmi_is_available(void);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index a55a6eab1b3a..0d1c292a655d 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -234,12 +234,22 @@ void __init hardlockup_detector_perf_restart(void)
}
 }
 
+bool __weak __init arch_perf_nmi_is_available(void)
+{
+   return true;
+}
+
 /**
  * watchdog_hardlockup_probe - Probe whether NMI event is available at all
  */
 int __init watchdog_hardlockup_probe(void)
 {
-   int ret = hardlockup_detector_event_create();
+   int ret;
+
+   if (!arch_perf_nmi_is_available())
+   return -ENODEV;
+
+   ret = hardlockup_detector_event_create();
 
if (ret) {
pr_info("Perf NMI watchdog permanently disabled\n");
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 13/17] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs

2023-05-04 Thread Douglas Anderson
From: Colin Cross 

Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using
something arch-specific we will use the buddy system, where each CPU
watches out for another one. Specifically, each CPU will use its
softlockup hrtimer to check that the next CPU is processing hrtimer
interrupts by verifying that a counter is increasing.

NOTE: unlike the other hard lockup detectors, the buddy one can't
easily show what's happening on the CPU that locked up just by doing a
simple backtrace. It relies on some other mechanism in the system to
get information about the locked up CPUs. This could be support for
NMI backtraces like [1], it could be a mechanism for printing the PC
of locked CPUs at panic time like [2] / [3], or it could be something
else. Even though that means we still rely on arch-specific code, this
arch-specific code seems to often be implemented even on architectures
that don't have a hardlockup detector.

This style of hardlockup detector originated in some downstream
Android trees and has been rebased on / carried in ChromeOS trees for
quite a long time for use on arm and arm64 boards. Historically on
these boards we've leveraged mechanism [2] / [3] to get information
about hung CPUs, but we could move to [1].

Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific
hardlockup detector. On x86, for instance, there is a 24-part patch
series [4] in progress switching the arch-specific hard lockup
detector from a scarce perf counter to a less-scarce hardware
resource. Potentially the buddy system could be a simpler alternative
to free up the perf counter but still get hard lockup detection.

Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
  arch-specific hardlockup detector, like arm32 and arm64 (though it's
  being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
  system could still detect the problem (though it would be unlikely
  to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
  detector on the running CPU as it uses to detect hardlockups on
  other CPUs. Compared to other hardlockup detectors, this means it
  generates fewer interrupts and thus is likely better able to let
  CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
  can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
  backtrace) to get info about the locked up CPU.

[1] https://lore.kernel.org/r/20230419225604.21204-1-diand...@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] 
https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calde...@linux.intel.com/
[5] 
https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.c...@mediatek.com/

Signed-off-by: Colin Cross 
Signed-off-by: Matthias Kaehlcke 
Signed-off-by: Guenter Roeck 
Signed-off-by: Tzung-Bi Shih 
Signed-off-by: Douglas Anderson 
---
This patch has been rebased in ChromeOS kernel trees many times, and
each time someone had to do work on it they added their
Signed-off-by. I've included those here. I've also left the author as
Colin Cross since the core code is still his, even if it's now been
reorganized a lot.

Changes in v4:
- Reworked atop a whole pile of cleanups, as suggested by Petr.

Changes in v3:
- Added a note in commit message about the effect on idle.
- Cleaned up commit message pros/cons to be complete sentences.
- More cpu => CPU (in Kconfig and comments).
- No code changes other than comments.

Changes in v2:
- No code changes.
- Reworked description and Kconfig based on v1 discussion.
- cpu => CPU (in commit message).

 include/linux/nmi.h |  9 +++-
 kernel/Makefile |  1 +
 kernel/watchdog.c   | 29 +
 kernel/watchdog_buddy.c | 93 +
 lib/Kconfig.debug   | 52 +--
 5 files changed, 173 insertions(+), 11 deletions(-)
 create mode 100644 kernel/watchdog_buddy.c

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 094a0e7ed97d..90aa33317b4c 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -87,8 +87,9 @@ extern unsigned int hardlockup_panic;
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
 void arch_touch_nmi_watchdog(void);
+void watchdog_hardlockup_touch_cpu(unsigned int cpu);
 void 

[PATCH v4 12/17] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly

2023-05-04 Thread Douglas Anderson
The fact that there watchdog_hardlockup_enable(),
watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
declared __weak means that the configured hardlockup detector can
define non-weak versions of those functions if it needs to. Instead of
doing this, the perf hardlockup detector hooked itself into the
default __weak implementation, which was a bit awkward. Clean this up.

>From comments, it looks as if the original design was done because the
__weak function were expected to implemented by the architecture and
not by the configured hardlockup detector. This got awkward when we
tried to add the buddy lockup detector which was not arch-specific but
wanted to hook into those same functions.

This is not expected to have any functional impact.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Have the perf hardlockup use __weak ...") new for v4.

 include/linux/nmi.h| 10 --
 kernel/watchdog.c  | 30 ++
 kernel/watchdog_perf.c | 20 ++--
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4ff48f189ab1..094a0e7ed97d 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -103,21 +103,11 @@ static inline void arch_touch_nmi_watchdog(void) { }
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
 extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
-# endif
 #endif
 
 void watchdog_hardlockup_stop(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 8e11b2b69e2c..e21896a0a9d5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -190,27 +190,33 @@ static inline void 
watchdog_hardlockup_interrupt_count(void) { }
 #endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
 
 /*
- * These functions can be overridden if an architecture implements its
- * own hardlockup detector.
+ * These functions can be overridden based on the configured hardlockdup 
detector.
  *
  * watchdog_hardlockup_enable/disable can be implemented to start and stop when
- * softlockup watchdog start and stop. The arch must select the
+ * softlockup watchdog start and stop. The detector must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-void __weak watchdog_hardlockup_enable(unsigned int cpu)
-{
-   hardlockup_detector_perf_enable();
-}
+void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
 
-void __weak watchdog_hardlockup_disable(unsigned int cpu)
-{
-   hardlockup_detector_perf_disable();
-}
+void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
 
 /* Return 0, if a hardlockup watchdog is available. Error code otherwise */
 int __weak __init watchdog_hardlockup_probe(void)
 {
-   return hardlockup_detector_perf_init();
+   /*
+* If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
+* is assumed to have the hard watchdog available and we return 0.
+*/
+   if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
+   return 0;
+
+   /*
+* Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
+* are required to implement a non-weak version of this probe function
+* to tell whether they are available. If they don't override then
+* we'll return -ENODEV.
+*/
+   return -ENODEV;
 }
 
 /**
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index aadc52b79f5b..a55a6eab1b3a 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -132,10 +132,14 @@ static int hardlockup_detector_event_create(void)
 }
 
 /**
- * hardlockup_detector_perf_enable - Enable the local event
+ * watchdog_hardlockup_enable - Enable the local event
+ *
+ * @cpu: The CPU to enable hard lockup on.
  */
-void hardlockup_detector_perf_enable(void)
+void watchdog_hardlockup_enable(unsigned int cpu)
 {
+   WARN_ON_ONCE(cpu != smp_processor_id());
+
if (hardlockup_detector_event_create())
return;
 
@@ -147,12 +151,16 @@ void hardlockup_detector_perf_enable(void)
 }
 
 /**
- * hardlockup_detector_perf_disable - Disable the local event
+ * watchdog_hardlockup_disable - Disable the local event
+ *
+ * @cpu: The CPU to enable hard lockup on.
  */
-void hardlockup_detector_perf_disable(void)
+void 

[PATCH v4 11/17] watchdog/hardlockup: Rename some "NMI watchdog" constants/function

2023-05-04 Thread Douglas Anderson
Do a search and replace of:
- NMI_WATCHDOG_ENABLED => HARD_WATCHDOG_ENABLED
- watchdog_nmi_ => watchdog_hardlockup_

Then update a few comments near where names were changed.

This is specifically to make it less confusing when we want to
introduce the buddy hardlockup detector, which isn't using NMIs.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Rename some "NMI watchdog" constants/function ...") new for v4.

 arch/powerpc/include/asm/nmi.h|  4 +-
 arch/powerpc/kernel/watchdog.c| 12 ++---
 arch/powerpc/platforms/pseries/mobility.c |  4 +-
 arch/sparc/kernel/nmi.c   |  4 +-
 include/linux/nmi.h   | 14 +++---
 kernel/watchdog.c | 60 +++
 kernel/watchdog_perf.c|  2 +-
 7 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index c3c7adef74de..43bfd4de868f 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,10 +5,10 @@
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
 long soft_nmi_interrupt(struct pt_regs *regs);
-void watchdog_nmi_set_timeout_pct(u64 pct);
+void watchdog_hardlockup_set_timeout_pct(u64 pct);
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
-static inline void watchdog_nmi_set_timeout_pct(u64 pct) {}
+static inline void watchdog_hardlockup_set_timeout_pct(u64 pct) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index dbcc4a793f0b..27d1f0dba5b3 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -438,7 +438,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
 
-   if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+   if (!(watchdog_enabled & HARD_WATCHDOG_ENABLED))
return HRTIMER_NORESTART;
 
if (!cpumask_test_cpu(cpu, _cpumask))
@@ -479,7 +479,7 @@ static void start_watchdog(void *arg)
return;
}
 
-   if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+   if (!(watchdog_enabled & HARD_WATCHDOG_ENABLED))
return;
 
if (!cpumask_test_cpu(cpu, _cpumask))
@@ -546,7 +546,7 @@ static void watchdog_calc_timeouts(void)
wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
 }
 
-void watchdog_nmi_stop(void)
+void watchdog_hardlockup_stop(void)
 {
int cpu;
 
@@ -554,7 +554,7 @@ void watchdog_nmi_stop(void)
stop_watchdog_on_cpu(cpu);
 }
 
-void watchdog_nmi_start(void)
+void watchdog_hardlockup_start(void)
 {
int cpu;
 
@@ -566,7 +566,7 @@ void watchdog_nmi_start(void)
 /*
  * Invoked from core watchdog init.
  */
-int __init watchdog_nmi_probe(void)
+int __init watchdog_hardlockup_probe(void)
 {
int err;
 
@@ -582,7 +582,7 @@ int __init watchdog_nmi_probe(void)
 }
 
 #ifdef CONFIG_PPC_PSERIES
-void watchdog_nmi_set_timeout_pct(u64 pct)
+void watchdog_hardlockup_set_timeout_pct(u64 pct)
 {
pr_info("Set the NMI watchdog timeout factor to %llu%%\n", pct);
WRITE_ONCE(wd_timeout_pct, pct);
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 643d309d1bd0..9475388235a3 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -758,7 +758,7 @@ static int pseries_migrate_partition(u64 handle)
goto out;
 
if (factor)
-   watchdog_nmi_set_timeout_pct(factor);
+   watchdog_hardlockup_set_timeout_pct(factor);
 
ret = pseries_suspend(handle);
if (ret == 0) {
@@ -774,7 +774,7 @@ static int pseries_migrate_partition(u64 handle)
pseries_cancel_migration(handle, ret);
 
if (factor)
-   watchdog_nmi_set_timeout_pct(0);
+   watchdog_hardlockup_set_timeout_pct(0);
 
 out:
vas_migration_handler(VAS_RESUME);
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 5dcf31f7e81f..9d9e29b75c43 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,7 +282,7 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
  * sparc specific NMI watchdog enable function.
  * Enables watchdog if it is not enabled already.
  */
-void watchdog_nmi_enable(unsigned int cpu)
+void watchdog_hardlockup_enable(unsigned int cpu)
 {
if (atomic_read(_active) == -1) {
pr_warn("NMI watchdog cannot be enabled or disabled\n");
@@ -303,7 +303,7 @@ void watchdog_nmi_enable(unsigned int cpu)
  * sparc specific NMI watchdog disable function.
  * Disables watchdog if it is not disabled already.
  */
-void watchdog_nmi_disable(unsigned int cpu)
+void watchdog_hardlockup_disable(unsigned int cpu)
 {
if (atomic_read(_active) == -1)
pr_warn_once("NMI watchdog cannot be enabled or disabled\n");

[PATCH v4 10/17] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c

2023-05-04 Thread Douglas Anderson
In preparation for the buddy hardlockup detector, which wants the same
petting logic as the current perf hardlockup detector, move the code
to watchdog.c. While doing this, rename the global variable to match
others nearby. The arch_touch_nmi_watchdog() function is not renamed
since that is exported with "EXPORT_SYMBOL" and is thus ABI.

Currently the code in watchdog.c is guarded by
CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem
silly. However, a future patch will change this.

NOTE: there is a slight side effect to this change, though from code
analysis I believe it to be a beneficial one. Specifically the perf
hardlockup detector will now do check the "timestamp" before clearing
any watchdog pets. Previously the order was reversed. With the old
order if the watchdog perf event was firing super fast then it would
also be clearing existing watchdog pets super fast. The new behavior
of handling super-fast perf before clearing watchdog pets seems
better.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Move perf hardlockup watchdog petting to watchdog.c") new for v4.

 include/linux/nmi.h|  5 +++--
 kernel/watchdog.c  | 19 +++
 kernel/watchdog_perf.c | 19 ---
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 2c9ea1ba285c..94ff84e1c8ec 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -88,7 +88,10 @@ static inline void hardlockup_detector_disable(void) {}
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+void arch_touch_nmi_watchdog(void);
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
+#elif !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) { }
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
@@ -98,7 +101,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct 
pt_regs *regs);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
-extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
@@ -113,7 +115,6 @@ static inline void hardlockup_detector_perf_enable(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
-static inline void arch_touch_nmi_watchdog(void) {}
 # else
 static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 367bea0167a5..9c17090611f2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -90,8 +90,22 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed);
+static DEFINE_PER_CPU(bool, watchdog_hardlockup_touch);
 static unsigned long watchdog_hardlockup_dumped_stacks;
 
+notrace void arch_touch_nmi_watchdog(void)
+{
+   /*
+* Using __raw here because some code paths have
+* preemption enabled.  If preemption is enabled
+* then interrupts should be enabled too, in which
+* case we shouldn't have to worry about the watchdog
+* going off.
+*/
+   raw_cpu_write(watchdog_hardlockup_touch, true);
+}
+EXPORT_SYMBOL(arch_touch_nmi_watchdog);
+
 static bool watchdog_hardlockup_is_lockedup(unsigned int cpu)
 {
unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
@@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
+   if (__this_cpu_read(watchdog_hardlockup_touch)) {
+   __this_cpu_write(watchdog_hardlockup_touch, false);
+   return;
+   }
+
/*
 * Check for a hardlockup by making sure the CPU's timer
 * interrupt is incrementing. The timer interrupt should have
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 9be90b2a2ea7..547917ebd5d3 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -20,26 +20,12 @@
 #include 
 #include 
 
-static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
-notrace void arch_touch_nmi_watchdog(void)
-{
-   /*
-* Using __raw here because some code paths have
-* preemption enabled.  If preemption is enabled
-* then interrupts should be enabled too, in which
-* case we shouldn't have to worry about the watchdog
-* going off.
-*/
-   raw_cpu_write(watchdog_nmi_touch, true);
-}

[PATCH v4 09/17] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()

2023-05-04 Thread Douglas Anderson
In preparation for the buddy hardlockup detector where the CPU
checking for lockup might not be the currently running CPU, add a
"cpu" parameter to watchdog_hardlockup_check().

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Add a "cpu" param to watchdog_hardlockup_check()") new for v4.

 include/linux/nmi.h|  2 +-
 kernel/watchdog.c  | 47 --
 kernel/watchdog_perf.c |  2 +-
 3 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c6cb9bc5dc80..2c9ea1ba285c 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -88,7 +88,7 @@ static inline void hardlockup_detector_disable(void) {}
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
-void watchdog_hardlockup_check(struct pt_regs *regs);
+void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index f46669c1671d..367bea0167a5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,14 +92,14 @@ static DEFINE_PER_CPU(unsigned long, 
hrtimer_interrupts_saved);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed);
 static unsigned long watchdog_hardlockup_dumped_stacks;
 
-static bool watchdog_hardlockup_is_lockedup(void)
+static bool watchdog_hardlockup_is_lockedup(unsigned int cpu)
 {
-   unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
+   unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
 
-   if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+   if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
return true;
 
-   __this_cpu_write(hrtimer_interrupts_saved, hrint);
+   per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 
return false;
 }
@@ -109,7 +109,7 @@ static void watchdog_hardlockup_interrupt_count(void)
__this_cpu_inc(hrtimer_interrupts);
 }
 
-void watchdog_hardlockup_check(struct pt_regs *regs)
+void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
/*
 * Check for a hardlockup by making sure the CPU's timer
@@ -117,35 +117,50 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
 * fired multiple times before we overflow'd. If it hasn't
 * then this is a good indication the cpu is stuck
 */
-   if (watchdog_hardlockup_is_lockedup()) {
+   if (watchdog_hardlockup_is_lockedup(cpu)) {
unsigned int this_cpu = smp_processor_id();
+   struct cpumask backtrace_mask = *cpu_online_mask;
 
/* Only handle hardlockups once. */
-   if (__this_cpu_read(watchdog_hardlockup_processed))
+   if (per_cpu(watchdog_hardlockup_processed, cpu))
return;
 
-   pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);
+   pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
print_modules();
print_irqtrace_events(current);
-   if (regs)
+   if (regs) {
show_regs(regs);
-   else
-   dump_stack();
+   cpumask_clear_cpu(cpu, _mask);
+   } else {
+   /*
+* If the locked up CPU is different than the CPU we're
+* running on then we'll try to backtrace the CPU that
+* locked up and then exclude it from later backtraces.
+* If that fails or if we're running on the locked up
+* CPU, just do a normal backtrace.
+*/
+   if (cpu != this_cpu && 
trigger_single_cpu_backtrace(cpu)) {
+   cpumask_clear_cpu(cpu, _mask);
+   } else {
+   dump_stack();
+   cpumask_clear_cpu(this_cpu, _mask);
+   }
+   }
 
/*
-* Perform all-CPU dump only once to avoid multiple hardlockups
-* generating interleaving traces
+* Perform multi-CPU dump only once to avoid multiple
+* hardlockups generating interleaving traces
 */
if (sysctl_hardlockup_all_cpu_backtrace &&
!test_and_set_bit(0, _hardlockup_dumped_stacks))
-   trigger_allbutself_cpu_backtrace();
+   trigger_cpumask_backtrace(_mask);
 
if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
 
-   __this_cpu_write(watchdog_hardlockup_processed, true);
+   per_cpu(watchdog_hardlockup_processed, cpu) = true;
} else {
-   __this_cpu_write(watchdog_hardlockup_processed, false);
+   

[PATCH v4 08/17] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / ..._is_lockedup()

2023-05-04 Thread Douglas Anderson
These are tiny style changes:
- Add a blank line before a "return".
- Renames two globals to use the "watchdog_hld" prefix.
- Store processor id in "unsigned int" rather than "int".
- Minor comment rewording.
- Use "else" rather than extra returns since it seemed more symmetric.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Style changes to watchdog_hardlockup_check ...") new for v4.

 kernel/watchdog.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 2d319cdf64b9..f46669c1671d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -89,8 +89,8 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
-static DEFINE_PER_CPU(bool, hard_watchdog_warn);
-static unsigned long hardlockup_allcpu_dumped;
+static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed);
+static unsigned long watchdog_hardlockup_dumped_stacks;
 
 static bool watchdog_hardlockup_is_lockedup(void)
 {
@@ -100,6 +100,7 @@ static bool watchdog_hardlockup_is_lockedup(void)
return true;
 
__this_cpu_write(hrtimer_interrupts_saved, hrint);
+
return false;
 }
 
@@ -110,21 +111,20 @@ static void watchdog_hardlockup_interrupt_count(void)
 
 void watchdog_hardlockup_check(struct pt_regs *regs)
 {
-   /* check for a hardlockup
-* This is done by making sure our timer interrupt
-* is incrementing.  The timer interrupt should have
-* fired multiple times before we overflow'd.  If it hasn't
+   /*
+* Check for a hardlockup by making sure the CPU's timer
+* interrupt is incrementing. The timer interrupt should have
+* fired multiple times before we overflow'd. If it hasn't
 * then this is a good indication the cpu is stuck
 */
if (watchdog_hardlockup_is_lockedup()) {
-   int this_cpu = smp_processor_id();
+   unsigned int this_cpu = smp_processor_id();
 
-   /* only print hardlockups once */
-   if (__this_cpu_read(hard_watchdog_warn) == true)
+   /* Only handle hardlockups once. */
+   if (__this_cpu_read(watchdog_hardlockup_processed))
return;
 
-   pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
-this_cpu);
+   pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);
print_modules();
print_irqtrace_events(current);
if (regs)
@@ -137,18 +137,16 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
 * generating interleaving traces
 */
if (sysctl_hardlockup_all_cpu_backtrace &&
-   !test_and_set_bit(0, _allcpu_dumped))
+   !test_and_set_bit(0, _hardlockup_dumped_stacks))
trigger_allbutself_cpu_backtrace();
 
if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
 
-   __this_cpu_write(hard_watchdog_warn, true);
-   return;
+   __this_cpu_write(watchdog_hardlockup_processed, true);
+   } else {
+   __this_cpu_write(watchdog_hardlockup_processed, false);
}
-
-   __this_cpu_write(hard_watchdog_warn, false);
-   return;
 }
 
 #else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 07/17] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c

2023-05-04 Thread Douglas Anderson
The perf hardlockup detector works by looking at interrupt counts and
seeing if they change from run to run. The interrupt counts are
managed by the common watchdog code via its watchdog_timer_fn().

Currently the API between the perf detector and the common code is a
function: is_hardlockup(). When the hard lockup detector sees that
function return true then it handles printing out debug info and
inducing a panic if necessary.

Let's change the API a little bit in preparation for the buddy
hardlockup detector. The buddy hardlockup detector wants to print
nearly the same debug info and have nearly the same panic
behavior. That means we want to move all that code to the common
file. For now, the code in the common file will only be there if the
perf hardlockup detector is enabled, but eventually it will be
selected by a common config.

Right now, this _just_ moves the code from the perf detector file to
the common file and changes the names. It doesn't make the changes
that the buddy hardlockup detector will need and doesn't do any style
cleanups. A future patch will do cleanup to make it more obvious what
changed.

With the above, we no longer have any callers of is_hardlockup()
outside of the "watchdog.c" file, so we can remove it from the header,
make it static, move it to the same "#ifdef" block as our new
watchdog_hardlockup_check(), and rename it to make it obvious it's
just for hardlockup detectors. While doing this, it can be noted that
even if no hardlockup detectors were configured the existing code used
to still have the code for counting/checking "hrtimer_interrupts" even
if the perf hardlockup detector wasn't configured. We didn't need to
do that, so move all the "hrtimer_interrupts" counting to only be
there if the perf hardlockup detector is configured as well.

This change is expected to be a no-op.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Move perf hardlockup checking/panic ...") new for v4.

 include/linux/nmi.h|  5 ++-
 kernel/watchdog.c  | 92 +-
 kernel/watchdog_perf.c | 42 +--
 3 files changed, 78 insertions(+), 61 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 35d09d70f394..c6cb9bc5dc80 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -15,7 +15,6 @@
 void lockup_detector_init(void);
 void lockup_detector_soft_poweroff(void);
 void lockup_detector_cleanup(void);
-bool is_hardlockup(void);
 
 extern int watchdog_user_enabled;
 extern int nmi_watchdog_user_enabled;
@@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic;
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+void watchdog_hardlockup_check(struct pt_regs *regs);
+#endif
+
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 # define NMI_WATCHDOG_SYSCTL_PERM  0644
 #else
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c705a18b26bf..2d319cdf64b9 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+
+static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
+static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+static DEFINE_PER_CPU(bool, hard_watchdog_warn);
+static unsigned long hardlockup_allcpu_dumped;
+
+static bool watchdog_hardlockup_is_lockedup(void)
+{
+   unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
+
+   if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+   return true;
+
+   __this_cpu_write(hrtimer_interrupts_saved, hrint);
+   return false;
+}
+
+static void watchdog_hardlockup_interrupt_count(void)
+{
+   __this_cpu_inc(hrtimer_interrupts);
+}
+
+void watchdog_hardlockup_check(struct pt_regs *regs)
+{
+   /* check for a hardlockup
+* This is done by making sure our timer interrupt
+* is incrementing.  The timer interrupt should have
+* fired multiple times before we overflow'd.  If it hasn't
+* then this is a good indication the cpu is stuck
+*/
+   if (watchdog_hardlockup_is_lockedup()) {
+   int this_cpu = smp_processor_id();
+
+   /* only print hardlockups once */
+   if (__this_cpu_read(hard_watchdog_warn) == true)
+   return;
+
+   pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
+this_cpu);
+   print_modules();
+   print_irqtrace_events(current);
+   if (regs)
+   show_regs(regs);
+   else
+   dump_stack();
+
+   /*
+* Perform all-CPU dump only once to avoid multiple hardlockups
+* generating interleaving traces
+*/
+   if (sysctl_hardlockup_all_cpu_backtrace &&
+ 

[PATCH v4 06/17] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c

2023-05-04 Thread Douglas Anderson
The code currently in "watchdog_hld.c" is for detecting hardlockups
using perf, as evidenced by the line in the Makefile that only
compiles this file if CONFIG_HARDLOCKUP_DETECTOR_PERF is
defined. Rename the file to prepare for the buddy hardlockup detector,
which doesn't use perf.

It could be argued that the new name makes it less obvious that this
is a hardlockup detector. While true, it's not hard to remember that
the "perf" detector is always a hardlockup detector and it's nice not
to have names that are too convoluted.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Rename watchdog_hld.c to watchdog_perf.c") new for v4.

 kernel/Makefile| 2 +-
 kernel/{watchdog_hld.c => watchdog_perf.c} | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename kernel/{watchdog_hld.c => watchdog_perf.c} (99%)

diff --git a/kernel/Makefile b/kernel/Makefile
index 10ef068f598d..406c4dd3 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -91,7 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_perf.c
similarity index 99%
rename from kernel/watchdog_hld.c
rename to kernel/watchdog_perf.c
index 96b717205952..c3d8ceb149da 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_perf.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Detect hard lockups on a system
+ * Detect hard lockups on a system using perf
  *
  * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc.
  *
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 05/17] watchdog/hardlockup: Rename touch_nmi_watchdog() to touch_hardlockup_watchdog()

2023-05-04 Thread Douglas Anderson
In preparation for the buddy hardlockup detector, rename
touch_nmi_watchdog() to touch_hardlockup_watchdog() to make it clear
that it will touch whatever hardlockup detector is configured. We'll
add a #define for the old name (touch_nmi_watchdog) so that we don't
have to touch every piece of code referring to the old name.

Ideally this change would also rename the arch_touch_nmi_watchdog(),
but that is harder since arch_touch_nmi_watchdog() is exported with
EXPORT_SYMBOL() and thus is ABI. Add a comment next to the call to
hopefully alleviate some of the confusion here.

Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Rename touch_nmi_watchdog() to ...") new for v4.

 include/linux/nmi.h | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 454fe99c4874..35d09d70f394 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -125,18 +125,35 @@ void watchdog_nmi_disable(unsigned int cpu);
 void lockup_detector_reconfigure(void);
 
 /**
- * touch_nmi_watchdog - restart NMI watchdog timeout.
+ * touch_hardlockup_watchdog - manually pet the hardlockup watchdog.
  *
- * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
- * may be used to reset the timeout - for code which intentionally
- * disables interrupts for a long time. This call is stateless.
+ * If we support detecting hardlockups, touch_hardlockup_watchdog() may be
+ * used to pet the watchdog (reset the timeout) - for code which
+ * intentionally disables interrupts for a long time. This call is stateless.
  */
-static inline void touch_nmi_watchdog(void)
+static inline void touch_hardlockup_watchdog(void)
 {
+   /*
+* Pass on to the hardlockup detector selected via CONFIG_. Note that
+* the hardlockup detector may not be arch-specific nor using NMIs,
+* but arch_touch_nmi_watchdog() is exported with EXPORT_SYMBOL() and
+* is thus ABI.
+*/
arch_touch_nmi_watchdog();
+
+   /*
+* Touching the hardlock detector implcitily pets the
+* softlockup detector too
+*/
touch_softlockup_watchdog();
 }
 
+/*
+ * It's encouraged for code to refer to the new name, but allow the old
+ * name as well.
+ */
+#define touch_nmi_watchdog touch_hardlockup_watchdog
+
 /*
  * Create trigger_all_cpu_backtrace() out of the arch-provided
  * base function. Return whether such support was available,
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 04/17] watchdog/perf: Ensure CPU-bound context when creating hardlockup detector event

2023-05-04 Thread Douglas Anderson
From: Pingfan Liu 

hardlockup_detector_event_create() should create perf_event on the
current CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead,
the CPU locality is achieved by processing the code in a per-CPU
bound kthread.

Add a check to prevent mistakes when calling the code in another
code path.

Signed-off-by: Pingfan Liu 
Co-developed-by: Lecopzer Chen 
Signed-off-by: Lecopzer Chen 
Reviewed-by: Petr Mladek 
Signed-off-by: Douglas Anderson 
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together. This is a nice cleanup and could land together
with the rest of my series if that makes sense.

I changed the patch prefix to match others in my series.

[1] https://lore.kernel.org/r/20220903093415.15850-4-lecopzer.c...@mediatek.com/

Changes in v4:
- Pulled ("Ensure CPU-bound context when creating ...") into my series for v4.

 kernel/watchdog_hld.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..96b717205952 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,16 @@ static void watchdog_overflow_callback(struct perf_event 
*event,
 
 static int hardlockup_detector_event_create(void)
 {
-   unsigned int cpu = smp_processor_id();
+   unsigned int cpu;
struct perf_event_attr *wd_attr;
struct perf_event *evt;
 
+   /*
+* Preemption is not disabled because memory will be allocated.
+* Ensure CPU-locality by calling this in per-CPU kthread.
+*/
+   WARN_ON(!is_percpu_thread());
+   cpu = raw_smp_processor_id();
wd_attr = _hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 03/17] watchdog/hardlockup: change watchdog_nmi_enable() to void

2023-05-04 Thread Douglas Anderson
From: Lecopzer Chen 

Nobody cares about the return value of watchdog_nmi_enable(),
changing its prototype to void.

Signed-off-by: Pingfan Liu 
Signed-off-by: Lecopzer Chen 
Reviewed-by: Petr Mladek 
Signed-off-by: Douglas Anderson 
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together. This is a nice cleanup and could land together
with the rest of my series if that makes sense.

I changed the patch prefix to match others in my series.

[1] https://lore.kernel.org/r/20220903093415.15850-3-lecopzer.c...@mediatek.com/

Changes in v4:
- Pulled ("change watchdog_nmi_enable() to void") into my series for v4.

 arch/sparc/kernel/nmi.c | 8 +++-
 include/linux/nmi.h | 2 +-
 kernel/watchdog.c   | 3 +--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..5dcf31f7e81f 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
  * sparc specific NMI watchdog enable function.
  * Enables watchdog if it is not enabled already.
  */
-int watchdog_nmi_enable(unsigned int cpu)
+void watchdog_nmi_enable(unsigned int cpu)
 {
if (atomic_read(_active) == -1) {
pr_warn("NMI watchdog cannot be enabled or disabled\n");
-   return -1;
+   return;
}
 
/*
@@ -295,11 +295,9 @@ int watchdog_nmi_enable(unsigned int cpu)
 * process first.
 */
if (!nmi_init_done)
-   return 0;
+   return;
 
smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
-
-   return 0;
 }
 /*
  * sparc specific NMI watchdog disable function.
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 771d77b62bc1..454fe99c4874 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { 
return 0; }
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
 void lockup_detector_reconfigure(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 582d572e1379..c705a18b26bf 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -93,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
  * softlockup watchdog start and stop. The arch must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
 {
hardlockup_detector_perf_enable();
-   return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 02/17] watchdog: remove WATCHDOG_DEFAULT

2023-05-04 Thread Douglas Anderson
From: Lecopzer Chen 

No reference to WATCHDOG_DEFAULT, remove it.

Signed-off-by: Pingfan Liu 
Signed-off-by: Lecopzer Chen 
Reviewed-by: Petr Mladek 
Signed-off-by: Douglas Anderson 
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together. This is a nice cleanup and could land together
with the rest of my series if that makes sense.

I changed the patch prefix to match others in my series.

[1] https://lore.kernel.org/r/20220903093415.15850-2-lecopzer.c...@mediatek.com/

Changes in v4:
- Pulled ("remove WATCHDOG_DEFAULT") into my series for v4.

 kernel/watchdog.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 8e61f21e7e33..582d572e1379 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,10 +30,8 @@
 static DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define WATCHDOG_DEFAULT  (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT  1
 #else
-# define WATCHDOG_DEFAULT  (SOFT_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT  0
 #endif
 
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 01/17] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config

2023-05-04 Thread Douglas Anderson
The real watchdog_update_hrtimer_threshold() is defined in
watchdog_hardlockup_perf.c. That file is included if
CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that
file if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP.

The dummy version of the function in "nmi.h" didn't get that quite
right. While this doesn't appear to be a huge deal, it's nice to make
it consistent.

Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo 
modes")
Signed-off-by: Douglas Anderson 
---

Changes in v4:
- ("Define dummy watchdog_update_hrtimer_threshold() ...") new for v4.

 include/linux/nmi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 048c0b9aa623..771d77b62bc1 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -197,7 +197,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
-defined(CONFIG_HARDLOCKUP_DETECTOR)
+defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 void watchdog_update_hrtimer_threshold(u64 period);
 #else
 static inline void watchdog_update_hrtimer_threshold(u64 period) { }
-- 
2.40.1.521.gf1e218fcd8-goog



[PATCH v4 00/17] watchdog/hardlockup: Add the buddy hardlockup detector

2023-05-04 Thread Douglas Anderson
This patch series adds the "buddy" hardlockup detector. In brief, the
buddy hardlockup detector can detect hardlockups without arch-level
support by having CPUs checkup on a "buddy" CPU periodically. All the
details are in the patch ("watchdog/hardlockup: detect hard lockups
using secondary (buddy) CPUs") and I encourage folks to reply to that
patch for general comments about this approach.

Note that in v3, this was a single patch [1]. It's now exploded into a
much larger series. The much larger series does a bunch of cleanup
that Petr requested in response to v3 [2] [3]. This new series ends up
sharing a lot more code with the perf lockup detector. It also tries
to bring a little sanity to some of the naming we had.

The fact that this series now touches across the whole lockup detector
codebase also caused conflicts with the series trying to add a arm64
perf-based hardlockup detector. That was a bit incovenient for me
since I was testing on arm64 and wanted to make sure that the perf
and buddy hardlockup detectors both compiled and worked. Because of
this, I've pulled the latest arm64 perf-based lockup detector patches
into my series. Specifically:
- Patches #1 - #3 of the arm64 perf-based lockup detector patches were
  generic cleanups. I added them early in my series. IMO these should
  just land.
- Patches #4 - #6 of the arm64 perf-based lockup detector patches were
  less generic but still caused conflict with my series. I tacked them
  to the end of my series after adapting them to my changes. However,
  I don't really consider them part of this series and I'd be OK if
  the series landed without them. For use cases I'm aware of the buddy
  system is sufficient and I see no urgent need to land the arm64 perf
  hardlockup support, though I also don't have any strong objections
  to the patches.

I will note that this patch series currently has no conflicts with the
other patch series I posed recently adding support for pseudo-NMI
based backtraces [5] and the two patches series are meant to work
together.

Given the new design of this patch series, testing all combinations is
fairly difficult. I've attempted to make sure that all combinations of
CONFIG_ options are good, but it wouldn't surprise me if I missed
something. I apologize in advance and I'll do my best to fix any
problems that are found.

I'll also note that the CC list is pretty giant. Some of this is based
on get_maintainers and some of this is people I thought might be
interested. Now that this series is touching so many files, I've
stopped auto-CCing everyone that get_maintainers suggests. I'll reply
to v3 and point at this patch to make sure folks are aware of it, but
if I stopped CCing you and you want back on then please yell.

As far as I can tell, there's no true MAINTAINER listed for the
existing watchdog code. Assuming people don't hate this, maybe it
would go through Andrew Morton's tree? There is now some arch-specific
code for sparc and powerpc, but it's all still watchdog code so
hopefully that would still be fine to go through the same tree.

[1] 
https://lore.kernel.org/r/20230501082341.v3.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
[2] https://lore.kernel.org/r/ZFEqynvf5nqkzEvQ@alley
[3] https://lore.kernel.org/r/ZFErmshcrcikrSU1@alley
[4] 
https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.c...@mediatek.com/
[5] https://lore.kernel.org/r/20230419225604.21204-1-diand...@chromium.org

Changes in v4:
- ("Add a "cpu" param to watchdog_hardlockup_check()") new for v4.
- ("Add a weak function for an arch to detect ...") new for v4.
- ("Define dummy watchdog_update_hrtimer_threshold() ...") new for v4.
- ("Have the perf hardlockup use __weak ...") new for v4.
- ("Move perf hardlockup checking/panic ...") new for v4.
- ("Move perf hardlockup watchdog petting to watchdog.c") new for v4.
- ("Rename some "NMI watchdog" constants/function ...") new for v4.
- ("Rename touch_nmi_watchdog() to ...") new for v4.
- ("Rename watchdog_hld.c to watchdog_perf.c") new for v4.
- ("Style changes to watchdog_hardlockup_check ...") new for v4.
- Pulled ("Adapt the watchdog_hld interface ...") into my series for v4.
- Pulled ("Enable perf events based hard ...") into my series for v4.
- Pulled ("Ensure CPU-bound context when creating ...") into my series for v4.
- Pulled ("add hw_nmi_get_sample_period for ...") into my series for v4.
- Pulled ("change watchdog_nmi_enable() to void") into my series for v4.
- Pulled ("remove WATCHDOG_DEFAULT") into my series for v4.
- Reworked atop a whole pile of cleanups, as suggested by Petr.

Changes in v3:
- Added a note in commit message about the effect on idle.
- Cleaned up commit message pros/cons to be complete sentences.
- More cpu => CPU (in Kconfig and comments).
- No code changes other than comments.

Changes in v2:
- No code changes.
- Reworked description and Kconfig based on v1 discussion.
- cpu => CPU (in commit message).

Colin Cross (1):
  watchdog/hardlockup: 

Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

2023-05-04 Thread Gaurav Batra

Hello Michael,

I agree with your concerns regarding a device been able to access memory 
that doesn't belong to it. That exposure we have today with 2MB TCEs. 
With 2MB TCEs, DMA window size will be big enough, for dedicated 
adapters, that whole memory is going to be mapped "direct". Which 
essentially means, that a "rogue" device/driver has the potential to 
corrupt LPAR wide memory.


I have sent you v3.

Thanks,

Gaurav

On 5/4/23 12:10 AM, Michael Ellerman wrote:

Gaurav Batra  writes:

Hello Michael,

I was looking into the Bug: 199106
(https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).

In the Bug, Mellanox driver was timing out when enabling SRIOV device.

I tested, Alexey's patch and it fixes the issue with Mellanox driver.
The down side

to Alexey's fix is that even a small memory request by the driver will
be aligned up

to 2MB. In my test, the Mellanox driver is issuing multiple requests of
64K size.

All these will get aligned up to 2MB, which is quite a waste of resources.

OK. I guess we should use your patch then.

It's not ideal as it means the device can potentially read/write to
memory it shouldn't, but 2MB is a lot to waste for a 64K alloc.


In any case, both the patches work. Let me know which approach you
prefer. In case

we decide to go with my patch, I just realized that I need to fix
nio_pages in

iommu_free_coherent() as well.

Can you send a v3 with that fixed please.

cheers


[PATCH v3] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

2023-05-04 Thread Gaurav Batra
When DMA window is backed by 2MB TCEs, the DMA address for the mapped
page should be the offset of the page relative to the 2MB TCE. The code
was incorrectly setting the DMA address to the beginning of the TCE
range.

Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
ethernet port, when DMA window is backed by 2MB TCEs.

Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M IOMMU page size")

Signed-off-by: Gaurav Batra 

Reviewed-by: Greg Joyce 
Reviewed-by: Brian King 
---
 arch/powerpc/kernel/iommu.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ee95937bdaf1..b8b7a189cd3c 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
/* Convert entry to a dma_addr_t */
entry += tbl->it_offset;
dma_addr = entry << tbl->it_page_shift;
-   dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
+   dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
 
DBG("  - %lu pages, entry: %lx, dma_addr: %lx\n",
npages, entry, dma_addr);
@@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
unsigned int order;
unsigned int nio_pages, io_order;
struct page *page;
+   int tcesize = (1 << tbl->it_page_shift);
 
size = PAGE_ALIGN(size);
order = get_order(size);
@@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
memset(ret, 0, size);
 
/* Set up tces to cover the allocated range */
-   nio_pages = size >> tbl->it_page_shift;
+   nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
+
io_order = get_iommu_order(size, tbl);
mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL,
  mask >> tbl->it_page_shift, io_order, 0);
@@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, struct 
iommu_table *tbl,
free_pages((unsigned long)ret, order);
return NULL;
}
-   *dma_handle = mapping;
+
+   *dma_handle = mapping | ((u64)ret & (tcesize - 1));
return ret;
 }
 
@@ -949,7 +952,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
unsigned int nio_pages;
 
size = PAGE_ALIGN(size);
-   nio_pages = size >> tbl->it_page_shift;
+   nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
iommu_free(tbl, dma_handle, nio_pages);
size = PAGE_ALIGN(size);
free_pages((unsigned long)vaddr, get_order(size));
-- 



Re: [PASEMI NEMO] Boot issue with the PowerPC updates 6.4-1

2023-05-04 Thread Christian Zigotzky


> On 3. May 2023, at 19:28, Christian Zigotzky  wrote:
> 
> 
> 
>>> On 3. May 2023, at 18:51, Rob Herring  wrote:
>>> 
>>> On Wed, May 3, 2023 at 11:27 AM Christophe Leroy
>>>  wrote:
>>> 
>>> +Rob as he's the commit's Author.
>>> 
 Le 03/05/2023 à 17:46, Christian Zigotzky a écrit :
 On 02 May 2023 at 11:28 am, Michael Ellerman wrote:
> Christian Zigotzky  writes:
>> Hello,
>> 
>> Our PASEMI Nemo board [1] doesn't boot with the PowerPC updates 6.4-1
>> [2].
>> 
>> The kernel hangs right after the booting Linux via __start() @
>> 0x ...
>> 
>> I was able to revert the PowerPC updates 6.4-1 [2] with the following
>> command: git revert 70cc1b5307e8ee3076fdf2ecbeb89eb973aa0ff7 -m 1
>> 
>> After a re-compiling, the kernel boots without any problems without the
>> PowerPC updates 6.4-1 [2].
>> 
>> Could you please explain me, what you have done in the boot area?
> There's a few possibilities, but nothing obvious.
> 
> To begin with can you please test the following commits?
> 
> 77e69ee7ce07
> e4ab08be5b49
> eeac8ede1755
> 
> cheers
 git revert e4ab08be5b4902e5b350b0e1e1a3c25eb21d76d4
 
 [master 0086e2cbbec0] Revert "powerpc/isa-bridge: Remove open coded
 "ranges" parsing"
 1 file changed, 129 insertions(+), 37 deletions(-)
 
 After a recompiling it boots without any problems.
 
 e4ab08be5b49 -- powerpc/isa-bridge: Remove open coded "ranges" parsing
 is the bad commit.
>> 
>> Could I get a DT file for this board?
>> 
>> In the meantime, just revert this commit. I don't think I'll be able
>> to fix it before I'm out on sabbatical.
>> 
>> Rob
> 
> FYI:
> 
> Darren Stevens wrote:
> 
> The dtb passed by the CFE firmware has a number of issues, which up till
> now have been fixed by use of patches applied to the mainline kernel.
> This occasionally causes problems with changes made to mainline.
> 
> Patching the firmware to correct the dtb is not an option for the 
> following reasons:
> 
> It was modified by a 3rd party, and we don't have a copy of the source.
> 
> All versions of CFE used on the X1000 export the same dtb.
> 
> At least one machine suffered damage during a firmware upgrade attempt,
> many people will be unwilling to reflash their system if an upgrade is
> produced.
> 
> 

Nemo’s DT:  
https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406