Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

2022-05-19 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 14, 2022 9:16 am:
> On Tue, May 10, 2022 at 08:38:22PM +1000, Nicholas Piggin wrote:
>> Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
>> > Certain implementations of the hardlockup detector require support for
>> > Inter-Processor Interrupt shorthands. On x86, support for these can only
>> > be determined after all the possible CPUs have booted once (in
>> > smp_init()). Other architectures may not need such check.
>> > 
>> > lockup_detector_init() only performs the initializations of data
>> > structures of the lockup detector. Hence, there are no dependencies on
>> > smp_init().
>> 
> 
> Thank you for your feedback Nicholas!
> 
>> I think this is the only real thing which affects other watchdog types?
> 
> Also patches 18 and 19 that decouple the NMI watchdog functionality from
> perf.
> 
>> 
>> Not sure if it's a big problem, the secondary CPUs coming up won't
>> have their watchdog active until quite late, and the primary could
>> implement its own timeout in __cpu_up for secondary coming up, and
>> IPI it to get traces if necessary which is probably more robust.
> 
> Indeed that could work. Another alternative I have been pondering is to boot
> the system with the perf-based NMI watchdog enabled. Once all CPUs are up
> and running, switch to the HPET-based NMI watchdog and free the PMU counters.

Just to cover smp_init()? Unless you could move the watchdog 
significantly earlier, I'd say it's probably not worth bothering
with.

Yes the boot CPU is doing *some* work that could lock up, but most 
complexity is in the secondaries coming up and they won't have their own 
watchdog coverage for a good chunk of that anyway.

If anything I would just add some timeout warning or IPI or something in
those wait loops in x86's __cpu_up code if you are worried about 
catching issues here. Actually the watchdog probably wouldn't catch any
of those anyway because they either run with interrupts enabled or
touch_nmi_watchdog()! So yeah that'd be pretty pointless.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 29/29] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET-based hardlockup detector relies on the TSC to determine if an
> observed NMI interrupt was originated by HPET timer. Hence, this detector
> can no longer be used with an unstable TSC.
> 
> In such case, permanently stop the HPET-based hardlockup detector and
> start the perf-based detector.
> 
> Cc: Andi Kleen 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Suggested-by: Thomas Gleixner 
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Relocated the delcaration of hardlockup_detector_switch_to_perf() to
>x86/nmi.h It does not depend on HPET.
>  * Removed function stub. The shim hardlockup detector is always for x86.
> 
> Changes since v4:
>  * Added a stub version of hardlockup_detector_switch_to_perf() for
>!CONFIG_HPET_TIMER. (lkp)
>  * Reconfigure the whole lockup detector instead of unconditionally
>starting the perf-based hardlockup detector.
> 
> Changes since v3:
>  * None
> 
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/include/asm/nmi.h | 6 ++
>  arch/x86/kernel/tsc.c  | 2 ++
>  arch/x86/kernel/watchdog_hld.c | 6 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> index 4a0d5b562c91..47752ff67d8b 100644
> --- a/arch/x86/include/asm/nmi.h
> +++ b/arch/x86/include/asm/nmi.h
> @@ -63,4 +63,10 @@ void stop_nmi(void);
>  void restart_nmi(void);
>  void local_touch_nmi(void);
>  
> +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR
> +void hardlockup_detector_switch_to_perf(void);
> +#else
> +static inline void hardlockup_detector_switch_to_perf(void) { }
> +#endif
> +
>  #endif /* _ASM_X86_NMI_H */
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cc1843044d88..74772ffc79d1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1176,6 +1176,8 @@ void mark_tsc_unstable(char *reason)
>  
>   clocksource_mark_unstable(_tsc_early);
>   clocksource_mark_unstable(_tsc);
> +
> + hardlockup_detector_switch_to_perf();
>  }
>  
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
> index ef11f0af4ef5..7940977c6312 100644
> --- a/arch/x86/kernel/watchdog_hld.c
> +++ b/arch/x86/kernel/watchdog_hld.c
> @@ -83,3 +83,9 @@ void watchdog_nmi_start(void)
>   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
>   hardlockup_detector_hpet_start();
>  }
> +
> +void hardlockup_detector_switch_to_perf(void)
> +{
> + detector_type = X86_HARDLOCKUP_DETECTOR_PERF;

Another possible problem along the same lines here,
isn't your watchdog still running at this point? And
it uses detector_type in the switch.

> + lockup_detector_reconfigure();

Actually the detector_type switch is used in some
functions called by lockup_detector_reconfigure()
e.g., watchdog_nmi_stop, so this seems buggy even
without concurrent watchdog.

Is this switching a good idea in general? The admin
has asked for non-standard option because they want
more PMU counterss available and now it eats a
counter potentially causing a problem rather than
detecting one.

I would rather just disable with a warning if it were
up to me. If you *really* wanted to be fancy then
allow admin to re-enable via proc maybe.

Thanks,
Nick

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET hardlockup detector relies on tsc_khz to estimate the value of
> that the TSC will have when its HPET channel fires. A refined tsc_khz
> helps to estimate better the expected TSC value.
> 
> Using the early value of tsc_khz may lead to a large error in the expected
> TSC value. Restarting the NMI watchdog detector has the effect of kicking
> its HPET channel and make use of the refined tsc_khz.
> 
> When the HPET hardlockup is not in use, restarting the NMI watchdog is
> a noop.
> 
> Cc: Andi Kleen 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4
>  * N/A
> 
> Changes since v3
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/kernel/tsc.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cafacb2e58cc..cc1843044d88 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   /* Inform the TSC deadline clockevent devices about the recalibration */
>   lapic_update_tsc_freq();
>  
> + /*
> +  * If in use, the HPET hardlockup detector relies on tsc_khz.
> +  * Reconfigure it to make use of the refined tsc_khz.
> +  */
> + lockup_detector_reconfigure();

I don't know if the API is conceptually good.

You change something that the lockup detector is currently using, 
*while* the detector is running asynchronously, and then reconfigure
it. What happens in the window? If this code is only used for small
adjustments maybe it does not really matter but in principle it's
a bad API to export.

lockup_detector_reconfigure as an internal API is okay because it
reconfigures things while the watchdog is stopped [actually that
looks untrue for soft dog which uses watchdog_thresh in
is_softlockup(), but that should be fixed].

You're the arch so you're allowed to stop the watchdog and configure
it, e.g., hardlockup_detector_perf_stop() is called in arch/.

So you want to disable HPET watchdog if it was enabled, then update
wherever you're using tsc_khz, then re-enable.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 24/29] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> Prepare hardlockup_panic_setup() to handle a comma-separated list of
> options. Thus, it can continue parsing its own command-line options while
> ignoring parameters that are relevant only to specific implementations of
> the hardlockup detector. Such implementations may use an early_param to
> parse their own options.

It can't really handle comma separated list though, until the next
patch. nmi_watchdog=panic,0 does not make sense, so you lost error
handling of that.

And is it kosher to double handle options like this? I'm sure it
happens but it's ugly.

Would you consider just add a new option for x86 and avoid changing
this? Less code and patches.

Thanks,
Nick

> 
> Cc: Andi Kleen 
> Cc: Nicholas Piggin 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Corrected typo in commit message. (Tony)
> 
> Changes since v4:
>  * None
> 
> Changes since v3:
>  * None
> 
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * None
> ---
>  kernel/watchdog.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 9166220457bc..6443841a755f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -73,13 +73,13 @@ void __init hardlockup_detector_disable(void)
>  
>  static int __init hardlockup_panic_setup(char *str)
>  {
> - if (!strncmp(str, "panic", 5))
> + if (parse_option_str(str, "panic"))
>   hardlockup_panic = 1;
> - else if (!strncmp(str, "nopanic", 7))
> + else if (parse_option_str(str, "nopanic"))
>   hardlockup_panic = 0;
> - else if (!strncmp(str, "0", 1))
> + else if (parse_option_str(str, "0"))
>   nmi_watchdog_user_enabled = 0;
> - else if (!strncmp(str, "1", 1))
> + else if (parse_option_str(str, "1"))
>   nmi_watchdog_user_enabled = 1;
>   return 1;
>  }
> -- 
> 2.17.1
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
> Certain implementations of the hardlockup detector require support for
> Inter-Processor Interrupt shorthands. On x86, support for these can only
> be determined after all the possible CPUs have booted once (in
> smp_init()). Other architectures may not need such check.
> 
> lockup_detector_init() only performs the initializations of data
> structures of the lockup detector. Hence, there are no dependencies on
> smp_init().

I think this is the only real thing which affects other watchdog types?

Not sure if it's a big problem, the secondary CPUs coming up won't
have their watchdog active until quite late, and the primary could
implement its own timeout in __cpu_up for secondary coming up, and
IPI it to get traces if necessary which is probably more robust.

Acked-by: Nicholas Piggin 

> 
> Cc: Andi Kleen 
> Cc: Nicholas Piggin 
> Cc: Andrew Morton 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: x...@kernel.org
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4:
>  * N/A
> 
> Changes since v3:
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  init/main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/init/main.c b/init/main.c
> index 98182c3c2c4b..62c52c9e4c2b 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1600,9 +1600,11 @@ static noinline void __init kernel_init_freeable(void)
>  
>   rcu_init_tasks_generic();
>   do_pre_smp_initcalls();
> - lockup_detector_init();
>  
>   smp_init();
> +
> + lockup_detector_init();
> +
>   sched_init_smp();
>  
>   padata_init();
> -- 
> 2.17.1
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()

2021-08-01 Thread Nicholas Piggin
Excerpts from Will Deacon's message of July 30, 2021 9:42 pm:
> Commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
> introduced a set_memory_encrypted() call to swiotlb_exit() so that the
> buffer pages are returned to an encrypted state prior to being freed.
> 
> Sachin reports that this leads to the following crash on a Power server:
> 
> [0.010799] software IO TLB: tearing down default memory pool
> [0.010805] [ cut here ]
> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> 
> Nick spotted that this is because set_memory_encrypted() is issuing an
> ultracall which doesn't exist for the processor, and should therefore
> be gated by mem_encrypt_active() to mirror the x86 implementation.
> 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Claire Chang 
> Cc: Christoph Hellwig 
> Cc: Robin Murphy 
> Fixes: ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
> Suggested-by: Nicholas Piggin 
> Reported-by: Sachin Sant 
> Tested-by: Sachin Sant 
> Tested-by: Nathan Chancellor 
> Link: 
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/
> Signed-off-by: Will Deacon 
> ---

Thanks for writing it.

Acked-by: Nicholas Piggin 

>  arch/powerpc/platforms/pseries/svm.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/svm.c 
> b/arch/powerpc/platforms/pseries/svm.c
> index 1d829e257996..87f001b4c4e4 100644
> --- a/arch/powerpc/platforms/pseries/svm.c
> +++ b/arch/powerpc/platforms/pseries/svm.c
> @@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void)
>  
>  int set_memory_encrypted(unsigned long addr, int numpages)
>  {
> + if (!mem_encrypt_active())
> + return 0;
> +
>   if (!PAGE_ALIGNED(addr))
>   return -EINVAL;
>  
> @@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages)
>  
>  int set_memory_decrypted(unsigned long addr, int numpages)
>  {
> + if (!mem_encrypt_active())
> + return 0;
> +
>   if (!PAGE_ALIGNED(addr))
>   return -EINVAL;
>  
> -- 
> 2.32.0.554.ge1b32706d8-goog
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Nicholas Piggin
Excerpts from Nathan Chancellor's message of July 29, 2021 3:35 am:
> On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
>> linux-next fails to boot on Power server (POWER8/POWER9). Following traces
>> are seen during boot
>> 
>> [0.010799] software IO TLB: tearing down default memory pool
>> [0.010805] [ cut here ]
>> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
>> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
>> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> [0.010820] Modules linked in:
>> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
>> 5.14.0-rc3-next-20210727 #1
>> [0.010830] NIP:  c0032cfc LR: c000c764 CTR: 
>> c000c670
>> [0.010834] REGS: c3603b10 TRAP: 0700   Not tainted  
>> (5.14.0-rc3-next-20210727)
>> [0.010838] MSR:  80029033   CR: 28000222  
>> XER: 0002
>> [0.010848] CFAR: c000c760 IRQMASK: 3 
>> [0.010848] GPR00: c000c764 c3603db0 c29bd000 
>> 0001 
>> [0.010848] GPR04: 0a68 0400 c3603868 
>>  
>> [0.010848] GPR08:    
>> 0003 
>> [0.010848] GPR12:  c0001ec9ee80 c0012a28 
>>  
>> [0.010848] GPR16:    
>>  
>> [0.010848] GPR20:    
>>  
>> [0.010848] GPR24: f134   
>> c3603868 
>> [0.010848] GPR28: 0400 0a68 c202e9c0 
>> c3603e80 
>> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0
>> [0.010901] LR [c000c764] system_call_common+0xf4/0x258
>> [0.010907] Call Trace:
>> [0.010909] [c3603db0] [c016a6dc] 
>> calculate_sigpending+0x4c/0xe0 (unreliable)
>> [0.010915] [c3603e10] [c000c764] 
>> system_call_common+0xf4/0x258
>> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8
>> [0.010926] NIP:  c0092dec LR: c0114fc8 CTR: 
>> 
>> [0.010930] REGS: c3603e80 TRAP: 0c00   Not tainted  
>> (5.14.0-rc3-next-20210727)
>> [0.010934] MSR:  80009033   CR: 28000222  
>> XER: 
>> [0.010943] IRQMASK: 0 
>> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 
>> f134 
>> [0.010943] GPR04: 0a68 0400 c3603868 
>>  
>> [0.010943] GPR08:    
>>  
>> [0.010943] GPR12:  c0001ec9ee80 c0012a28 
>>  
>> [0.010943] GPR16:    
>>  
>> [0.010943] GPR20:    
>>  
>> [0.010943] GPR24: c20033c4 c110afc0 c2081950 
>> c3277d40 
>> [0.010943] GPR28:  ca68 0400 
>> 000d 
>> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8
>> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60
>> [0.010999] --- interrupt: c00
>> [0.011001] [c3603b00] [c000c764] 
>> system_call_common+0xf4/0x258 (unreliable)
>> [0.011008] Instruction dump:
>> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 
>> e87f0108 68690002 
>> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 
>> 792907e0 0b09 
>> [0.011029] ---[ end trace a20ad55589efcb10 ]---
>> [0.012297] 
>> [1.012304] Kernel panic - not syncing: Fatal exception
>> 
>> next-20210723 was good. The boot failure seems to have been introduced with 
>> next-20210726.
>> 
>> I have attached the boot log.
> 
> I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
> commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That
> series just keeps on giving... Adding some people from that thread to
> this one. Original thread:
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/

This is because powerpc's set_memory_encrypted makes an ultracall but it 
does not exist on that processor.

x86's set_memory_encrypted/decrypted have

   /* Nothing to do if memory encryption is not active */
if (!mem_encrypt_active())
return 0;

Probably powerpc should just do that too.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] kernel/dma: remove unnecessary unmap_kernel_range

2021-01-27 Thread Nicholas Piggin
Excerpts from Christoph Hellwig's message of January 27, 2021 5:10 pm:
> On Tue, Jan 26, 2021 at 05:08:46PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jan 26, 2021 at 02:54:01PM +1000, Nicholas Piggin wrote:
>> > vunmap will remove ptes.
>> 
>> Should there be some ASSERT after the vunmap to make sure that is the
>> case? 
> 
> Not really.  removing the PTEs is the whole point of vunmap.  Everything
> else is just house keeping.

Agree. I did double check this and wrote a quick test to check ptes were 
there before the vunmap and cleared after, just to make sure I didn't 
make a silly mistake with the patch. But in general drivers should be 
able to trust code behind the API call will do the right thing. Such 
assertions should go in the vunmap() implementation as appropriate.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] kernel/dma: remove unnecessary unmap_kernel_range

2021-01-25 Thread Nicholas Piggin
vunmap will remove ptes.

Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Nicholas Piggin 
---
 kernel/dma/remap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index 905c3fa005f1..b4526668072e 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -66,6 +66,5 @@ void dma_common_free_remap(void *cpu_addr, size_t size)
return;
}
 
-   unmap_kernel_range((unsigned long)cpu_addr, PAGE_ALIGN(size));
vunmap(cpu_addr);
 }
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-13 Thread Nicholas Piggin
On Wed, 13 Jun 2018 18:31:17 -0700
Ricardo Neri  wrote:

> On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > Thomas Gleixner  wrote:
> >   
> > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > Ricardo Neri  wrote:
> > > > > 
> > > > > > Instead of exposing individual functions for the operations of the 
> > > > > > NMI
> > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > implementations.
> > > > > > 
> > > > > > The struct nmi_watchdog_ops is defined for such operations. These 
> > > > > > initial
> > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > operations.
> > > > > > 
> > > > > > Only a single NMI watchdog can be used in the system. The 
> > > > > > operations of
> > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > variable is set to point the operations of the first NMI watchdog 
> > > > > > that
> > > > > > initializes successfully. Even though at this moment, the only 
> > > > > > available
> > > > > > NMI watchdog is the perf-based hardlockup detector. More 
> > > > > > implementations
> > > > > > can be added in the future.
> > > > > 
> > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > converted as well.
> > > > 
> > > > Yeah, agreed, this looks like half a patch.
> > > 
> > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > low level architecture details so having yet another 'ops' data structure
> > > with a gazillion of callbacks, checks and indirections does not provide
> > > value over the currently available weak stubs.  
> > 
> > The other way to go of course is librify the perf watchdog and make an
> > x86 watchdog that selects between perf and hpet... I also probably
> > prefer that for code such as this, but I wouldn't strongly object to
> > ops struct if I'm not writing the code. It's not that bad is it?  
> 
> My motivation to add the ops was that the hpet and perf watchdog share
> significant portions of code.

Right, a good motivation.

> I could look into creating the library for
> common code and relocate the hpet watchdog into arch/x86 for the hpet-
> specific parts.

If you can investigate that approach, that would be appreciated. I hope
I did not misunderstand you there, Thomas.

Basically you would have perf infrastructure and hpet infrastructure,
and then the x86 watchdog driver will use one or the other of those. The
generic watchdog driver will be just a simple shim that uses the perf
infrastructure. Then hopefully the powerpc driver would require almost
no change.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf

2018-06-13 Thread Nicholas Piggin
On Wed, 13 Jun 2018 18:19:01 -0700
Ricardo Neri  wrote:

> On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > The current default implementation of the hardlockup detector assumes that
> > > it is implemented using perf events.  
> > 
> > The sparc and powerpc things are very much not using perf.  
> 
> Isn't it true that the current hardlockup detector
> (under kernel/watchdog_hld.c) is based on perf?

arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
the kernel/watchdog_hld.c framework.

> As far as I understand,
> this hardlockup detector is constructed using perf events for architectures
> that don't provide an NMI watchdog. Perhaps I can be more specific and say
> that this synthetized detector is based on perf.

The perf detector is like that, but we want NMI watchdogs to share
the watchdog_hld code as much as possible even for arch specific NMI
watchdogs, so that kernel and user interfaces and behaviour are
consistent.

Other arch watchdogs like sparc are a little older so they are not
using HLD. You don't have to change those for your series, but it
would be good to bring them into the fold if possible at some time.
IIRC sparc was slightly non-trivial because it has some differences
in sysctl or cmdline APIs that we don't want to break.

But powerpc at least needs to be updated if you change hld apis.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-13 Thread Nicholas Piggin
On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
Thomas Gleixner  wrote:

> On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > Ricardo Neri  wrote:
> > >   
> > > > Instead of exposing individual functions for the operations of the NMI
> > > > watchdog, define a common interface that can be used across multiple
> > > > implementations.
> > > > 
> > > > The struct nmi_watchdog_ops is defined for such operations. These 
> > > > initial
> > > > definitions include the enable, disable, start, stop, and cleanup
> > > > operations.
> > > > 
> > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > variable is set to point the operations of the first NMI watchdog that
> > > > initializes successfully. Even though at this moment, the only available
> > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > can be added in the future.  
> > > 
> > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > least have their own NMI watchdogs, it would be good to have those
> > > converted as well.  
> > 
> > Yeah, agreed, this looks like half a patch.  
> 
> Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> low level architecture details so having yet another 'ops' data structure
> with a gazillion of callbacks, checks and indirections does not provide
> value over the currently available weak stubs.

The other way to go of course is librify the perf watchdog and make an
x86 watchdog that selects between perf and hpet... I also probably
prefer that for code such as this, but I wouldn't strongly object to
ops struct if I'm not writing the code. It's not that bad is it?

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations

2018-06-13 Thread Nicholas Piggin
On Tue, 12 Jun 2018 17:57:32 -0700
Ricardo Neri  wrote:

> Instead of exposing individual functions for the operations of the NMI
> watchdog, define a common interface that can be used across multiple
> implementations.
> 
> The struct nmi_watchdog_ops is defined for such operations. These initial
> definitions include the enable, disable, start, stop, and cleanup
> operations.
> 
> Only a single NMI watchdog can be used in the system. The operations of
> this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> variable is set to point the operations of the first NMI watchdog that
> initializes successfully. Even though at this moment, the only available
> NMI watchdog is the perf-based hardlockup detector. More implementations
> can be added in the future.

Cool, this looks pretty nice at a quick glance. sparc and powerpc at
least have their own NMI watchdogs, it would be good to have those
converted as well.

Is hpet a cross platform thing, or just x86? We should avoid
proliferation of files under kernel/ I think, so with these watchdog
driver structs then maybe implementations could go in drivers/ or
arch/

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu