Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote: > On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote: > > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > >> + /* > >> + * If in use, the HPET hardlockup detector relies on tsc_khz. > >> + * Reconfigure it to make use of the refined tsc_khz. > >> + */ > >> + lockup_detector_reconfigure(); > > > > I don't know if the API is conceptually good. > > > > You change something that the lockup detector is currently using, > > *while* the detector is running asynchronously, and then reconfigure > > it. What happens in the window? If this code is only used for small > > adjustments maybe it does not really matter but in principle it's > > a bad API to export. > > > > lockup_detector_reconfigure as an internal API is okay because it > > reconfigures things while the watchdog is stopped [actually that > > looks untrue for soft dog which uses watchdog_thresh in > > is_softlockup(), but that should be fixed]. > > > > You're the arch so you're allowed to stop the watchdog and configure > > it, e.g., hardlockup_detector_perf_stop() is called in arch/. > > > > So you want to disable HPET watchdog if it was enabled, then update > > wherever you're using tsc_khz, then re-enable. > > The real question is whether making this refined tsc_khz value > immediately effective matters at all. IMO, it does not because up to > that point the watchdog was happily using the coarse calibrated value > and the whole use TSC to assess whether the HPET fired mechanism is just > a guestimate anyway. So what's the point of trying to guess 'more > correct'. In some of my test systems I observed that, the TSC value does not fall within the expected error window the first time the HPET channel expires. I inferred that the error computed using the coarser tsc_khz was wrong. Recalculating the error window with refined tsc_khz would correct it. However, restarting the timer has the side-effect of kicking the timer and, therefore pushing the first HPET NMI further in the future. Perhaps kicking HPET channel, not recomputing the error window, corrected (masked?) the problem. I will investigate further and rework or drop this patch as needed. Thanks and BR, Ricardo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote: > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > > The HPET hardlockup detector relies on tsc_khz to estimate the value of > > that the TSC will have when its HPET channel fires. A refined tsc_khz > > helps to estimate better the expected TSC value. > > > > Using the early value of tsc_khz may lead to a large error in the expected > > TSC value. Restarting the NMI watchdog detector has the effect of kicking > > its HPET channel and make use of the refined tsc_khz. > > > > When the HPET hardlockup is not in use, restarting the NMI watchdog is > > a noop. > > > > Cc: Andi Kleen > > Cc: Stephane Eranian > > Cc: "Ravi V. Shankar" > > Cc: iommu@lists.linux-foundation.org > > Cc: linuxppc-...@lists.ozlabs.org > > Cc: x...@kernel.org > > Signed-off-by: Ricardo Neri > > --- > > Changes since v5: > > * Introduced this patch > > > > Changes since v4 > > * N/A > > > > Changes since v3 > > * N/A > > > > Changes since v2: > > * N/A > > > > Changes since v1: > > * N/A > > --- > > arch/x86/kernel/tsc.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index cafacb2e58cc..cc1843044d88 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct > > work_struct *work) > > /* Inform the TSC deadline clockevent devices about the recalibration */ > > lapic_update_tsc_freq(); > > > > + /* > > +* If in use, the HPET hardlockup detector relies on tsc_khz. > > +* Reconfigure it to make use of the refined tsc_khz. > > +*/ > > + lockup_detector_reconfigure(); > > I don't know if the API is conceptually good. > > You change something that the lockup detector is currently using, > *while* the detector is running asynchronously, and then reconfigure > it. Yes, this is what happens. > What happens in the window? If this code is only used for small > adjustments maybe it does not really matter Please see my comment > but in principle it's a bad API to export. > > lockup_detector_reconfigure as an internal API is okay because it > reconfigures things while the watchdog is stopped I see. > [actually that looks untrue for soft dog which uses watchdog_thresh in > is_softlockup(), but that should be fixed]. Perhaps there should be a watchdog_thresh_user. When the user updates it, the detector is stopped, watchdog_thresh is updated, and then the detector is resumed. > > You're the arch so you're allowed to stop the watchdog and configure > it, e.g., hardlockup_detector_perf_stop() is called in arch/. I had it like this but it did not look right to me. You are right, however, I can stop/restart the watchdog when needed. Thanks and BR, Ricardo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote: > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: >> +/* >> + * If in use, the HPET hardlockup detector relies on tsc_khz. >> + * Reconfigure it to make use of the refined tsc_khz. >> + */ >> +lockup_detector_reconfigure(); > > I don't know if the API is conceptually good. > > You change something that the lockup detector is currently using, > *while* the detector is running asynchronously, and then reconfigure > it. What happens in the window? If this code is only used for small > adjustments maybe it does not really matter but in principle it's > a bad API to export. > > lockup_detector_reconfigure as an internal API is okay because it > reconfigures things while the watchdog is stopped [actually that > looks untrue for soft dog which uses watchdog_thresh in > is_softlockup(), but that should be fixed]. > > You're the arch so you're allowed to stop the watchdog and configure > it, e.g., hardlockup_detector_perf_stop() is called in arch/. > > So you want to disable HPET watchdog if it was enabled, then update > wherever you're using tsc_khz, then re-enable. The real question is whether making this refined tsc_khz value immediately effective matters at all. IMO, it does not because up to that point the watchdog was happily using the coarse calibrated value and the whole use TSC to assess whether the HPET fired mechanism is just a guestimate anyway. So what's the point of trying to guess 'more correct'. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > The HPET hardlockup detector relies on tsc_khz to estimate the value of > that the TSC will have when its HPET channel fires. A refined tsc_khz > helps to estimate better the expected TSC value. > > Using the early value of tsc_khz may lead to a large error in the expected > TSC value. Restarting the NMI watchdog detector has the effect of kicking > its HPET channel and make use of the refined tsc_khz. > > When the HPET hardlockup is not in use, restarting the NMI watchdog is > a noop. > > Cc: Andi Kleen > Cc: Stephane Eranian > Cc: "Ravi V. Shankar" > Cc: iommu@lists.linux-foundation.org > Cc: linuxppc-...@lists.ozlabs.org > Cc: x...@kernel.org > Signed-off-by: Ricardo Neri > --- > Changes since v5: > * Introduced this patch > > Changes since v4 > * N/A > > Changes since v3 > * N/A > > Changes since v2: > * N/A > > Changes since v1: > * N/A > --- > arch/x86/kernel/tsc.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index cafacb2e58cc..cc1843044d88 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct > work_struct *work) > /* Inform the TSC deadline clockevent devices about the recalibration */ > lapic_update_tsc_freq(); > > + /* > + * If in use, the HPET hardlockup detector relies on tsc_khz. > + * Reconfigure it to make use of the refined tsc_khz. > + */ > + lockup_detector_reconfigure(); I don't know if the API is conceptually good. You change something that the lockup detector is currently using, *while* the detector is running asynchronously, and then reconfigure it. What happens in the window? If this code is only used for small adjustments maybe it does not really matter but in principle it's a bad API to export. lockup_detector_reconfigure as an internal API is okay because it reconfigures things while the watchdog is stopped [actually that looks untrue for soft dog which uses watchdog_thresh in is_softlockup(), but that should be fixed]. You're the arch so you're allowed to stop the watchdog and configure it, e.g., hardlockup_detector_perf_stop() is called in arch/. So you want to disable HPET watchdog if it was enabled, then update wherever you're using tsc_khz, then re-enable. Thanks, Nick ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
The HPET hardlockup detector relies on tsc_khz to estimate the value of that the TSC will have when its HPET channel fires. A refined tsc_khz helps to estimate better the expected TSC value. Using the early value of tsc_khz may lead to a large error in the expected TSC value. Restarting the NMI watchdog detector has the effect of kicking its HPET channel and make use of the refined tsc_khz. When the HPET hardlockup is not in use, restarting the NMI watchdog is a noop. Cc: Andi Kleen Cc: Stephane Eranian Cc: "Ravi V. Shankar" Cc: iommu@lists.linux-foundation.org Cc: linuxppc-...@lists.ozlabs.org Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- Changes since v5: * Introduced this patch Changes since v4 * N/A Changes since v3 * N/A Changes since v2: * N/A Changes since v1: * N/A --- arch/x86/kernel/tsc.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index cafacb2e58cc..cc1843044d88 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work) /* Inform the TSC deadline clockevent devices about the recalibration */ lapic_update_tsc_freq(); + /* +* If in use, the HPET hardlockup detector relies on tsc_khz. +* Reconfigure it to make use of the refined tsc_khz. +*/ + lockup_detector_reconfigure(); + /* Update the sched_clock() rate to match the clocksource one */ for_each_possible_cpu(cpu) set_cyc2ns_scale(tsc_khz, cpu, tsc_stop); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu